Performance assessment of using single or combined pattern matching


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Performance assessment of using single or combined pattern matching
# 8  
Old 07-10-2017
Quote:
Originally Posted by ananan
or read the pattern one by one and search the whole file each time for each pattern.
Like
Code:
 
While read line
Do
... (same nawk with single pattern in the or portion and after &&  patterns will be same and fixed) 
Done<file

It is a long standing knowledge that such an approach (even if the syntax errors are corrected, because it is NOT Do...Done but do...done - the language is case-sensitive) will always be way slower than using awk (or sed or any other text filter) on the whole file.

The reason is: whenever you call an external program (external to the shell, that is) from the shell you start a new (sub-)process. Starting a process is a resource-consuming activity for the system: it has to load an executable into memory, allocate the resources (memory, etc.) necessary to run it and finally start it. This:

Code:
command

is exactly one such process, while this:

Code:
while read line ; do
     command
done < /some/file

will create such a new process for every line in the input file. When you say the file is 70MB big i suppose these are alot of lines.

Of course, the opening of a single process is no big deal. It will add up, though, and "no big deal" times several thousand times eventually adds up to some big deal.

I hope this helps.

bakunin
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Group Multiple Lines on SINGLE line matching pattern

Hi Guys, I am trying to format my csv file. When I spool the file using sqlplus the single row output is wrapped on three lines. Somehow I managed to format that file and finally i am trying to make the multiple line on single line. The below command is working fine but I need to pass the... (3 Replies)
Discussion started by: RJSKR28
3 Replies

2. UNIX for Dummies Questions & Answers

Grep -v lines starting with pattern 1 and not matching pattern 2

Hi all! Thanks for taking the time to view this! I want to grep out all lines of a file that starts with pattern 1 but also does not match with the second pattern. Example: Drink a soda Eat a banana Eat multiple bananas Drink an apple juice Eat an apple Eat multiple apples I... (8 Replies)
Discussion started by: demmel
8 Replies

3. Shell Programming and Scripting

sed - filter blocks between single delimiters matching a pattern

Hi! I have a file with the following format:CDR ... MSISDN=111 ... CDR ... MSISDN=xxx ... CDR ... MSISDN=xxx ... CDR ... MSISDN=111 (2 Replies)
Discussion started by: Flavius
2 Replies

4. Shell Programming and Scripting

Sed: printing lines AFTER pattern matching EXCLUDING the line containing the pattern

'Hi I'm using the following code to extract the lines(and redirect them to a txt file) after the pattern match. But the output is inclusive of the line with pattern match. Which option is to be used to exclude the line containing the pattern? sed -n '/Conn.*User/,$p' > consumers.txt (11 Replies)
Discussion started by: essem
11 Replies

5. UNIX for Dummies Questions & Answers

Extracting combined differences based on a single column

Dear All, I have two sets of files. File 1 can be any number between 1 and 20 followed by a frequency of that number in a give documents... the lines in the file will be dependent to the analysed document. e.g. file1 1,5 4,1 then I have file two which is basicall same numbers but with... (2 Replies)
Discussion started by: A-V
2 Replies

6. Shell Programming and Scripting

Creating single pattern for matching multiple files.

Hi friends, I have a some files in a directory. for example 856-abc 856-def 851-abc 945-def 956-abc 852-abc i want to display only those files whose name starts with 856* 945* and 851* using a single pattern. i.e 856-abc 856-def 851-abc 945-def the rest of the two files... (2 Replies)
Discussion started by: Little
2 Replies

7. Shell Programming and Scripting

Split single file into multiple files using pattern matching

I have one single shown below and I need to break each ST|850 & SE to separate file using unix script. Below example should create 3 files. We can use ST & SE to filter as these field names will remain same. Please advice with the unix code. ST|850 BEG|PO|1234 LIN|1|23 SE|4 ST|850... (3 Replies)
Discussion started by: prasadm
3 Replies

8. Shell Programming and Scripting

AWK - Pattern Matching & Replacing - Performance

Experts, I am a beginner to Unix Shell Scripting We have source as a flat file which contains CTRL+F character as the delimiter. We need to count the number of records in the file (CTRL+F) to perform file validation Following command being used: awk '{cnt+=gsub(//,"&")}END {print cnt}'... (4 Replies)
Discussion started by: srivijay81
4 Replies

9. Shell Programming and Scripting

counting the lines matching a pattern, in between two pattern, and generate a tab

Hi all, I'm looking for some help. I have a file (very long) that is organized like below: >Cluster 0 0 283nt, >01_FRYJ6ZM12HMXZS... at +/99% 1 279nt, >01_FRYJ6ZM12HN12A... at +/99% 2 281nt, >01_FRYJ6ZM12HM4TS... at +/99% 3 283nt, >01_FRYJ6ZM12HM946... at +/99% 4 279nt,... (4 Replies)
Discussion started by: d.chauliac
4 Replies

10. Shell Programming and Scripting

comment/delete a particular pattern starting from second line of the matching pattern

Hi, I have file 1.txt with following entries as shown: 0152364|134444|10.20.30.40|015236433 0233654|122555|10.20.30.50|023365433 ** ** ** In file 2.txt I have the following entries as shown: 0152364|134444|10.20.30.40|015236433 0233654|122555|10.20.30.50|023365433... (4 Replies)
Discussion started by: imas
4 Replies
Login or Register to Ask a Question