Sponsored Content
Top Forums Shell Programming and Scripting Performance assessment of using single or combined pattern matching Post 303000334 by bakunin on Monday 10th of July 2017 01:55:06 PM
Old 07-10-2017
Quote:
Originally Posted by ananan
or read the pattern one by one and search the whole file each time for each pattern.
Like
Code:
 
While read line
Do
... (same nawk with single pattern in the or portion and after &&  patterns will be same and fixed) 
Done<file

It is a long standing knowledge that such an approach (even if the syntax errors are corrected, because it is NOT Do...Done but do...done - the language is case-sensitive) will always be way slower than using awk (or sed or any other text filter) on the whole file.

The reason is: whenever you call an external program (external to the shell, that is) from the shell you start a new (sub-)process. Starting a process is a resource-consuming activity for the system: it has to load an executable into memory, allocate the resources (memory, etc.) necessary to run it and finally start it. This:

Code:
command

is exactly one such process, while this:

Code:
while read line ; do
     command
done < /some/file

will create such a new process for every line in the input file. When you say the file is 70MB big i suppose these are alot of lines.

Of course, the opening of a single process is no big deal. It will add up, though, and "no big deal" times several thousand times eventually adds up to some big deal.

I hope this helps.

bakunin
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

comment/delete a particular pattern starting from second line of the matching pattern

Hi, I have file 1.txt with following entries as shown: 0152364|134444|10.20.30.40|015236433 0233654|122555|10.20.30.50|023365433 ** ** ** In file 2.txt I have the following entries as shown: 0152364|134444|10.20.30.40|015236433 0233654|122555|10.20.30.50|023365433... (4 Replies)
Discussion started by: imas
4 Replies

2. Shell Programming and Scripting

counting the lines matching a pattern, in between two pattern, and generate a tab

Hi all, I'm looking for some help. I have a file (very long) that is organized like below: >Cluster 0 0 283nt, >01_FRYJ6ZM12HMXZS... at +/99% 1 279nt, >01_FRYJ6ZM12HN12A... at +/99% 2 281nt, >01_FRYJ6ZM12HM4TS... at +/99% 3 283nt, >01_FRYJ6ZM12HM946... at +/99% 4 279nt,... (4 Replies)
Discussion started by: d.chauliac
4 Replies

3. Shell Programming and Scripting

AWK - Pattern Matching & Replacing - Performance

Experts, I am a beginner to Unix Shell Scripting We have source as a flat file which contains CTRL+F character as the delimiter. We need to count the number of records in the file (CTRL+F) to perform file validation Following command being used: awk '{cnt+=gsub(//,"&")}END {print cnt}'... (4 Replies)
Discussion started by: srivijay81
4 Replies

4. Shell Programming and Scripting

Split single file into multiple files using pattern matching

I have one single shown below and I need to break each ST|850 & SE to separate file using unix script. Below example should create 3 files. We can use ST & SE to filter as these field names will remain same. Please advice with the unix code. ST|850 BEG|PO|1234 LIN|1|23 SE|4 ST|850... (3 Replies)
Discussion started by: prasadm
3 Replies

5. Shell Programming and Scripting

Creating single pattern for matching multiple files.

Hi friends, I have a some files in a directory. for example 856-abc 856-def 851-abc 945-def 956-abc 852-abc i want to display only those files whose name starts with 856* 945* and 851* using a single pattern. i.e 856-abc 856-def 851-abc 945-def the rest of the two files... (2 Replies)
Discussion started by: Little
2 Replies

6. UNIX for Dummies Questions & Answers

Extracting combined differences based on a single column

Dear All, I have two sets of files. File 1 can be any number between 1 and 20 followed by a frequency of that number in a give documents... the lines in the file will be dependent to the analysed document. e.g. file1 1,5 4,1 then I have file two which is basicall same numbers but with... (2 Replies)
Discussion started by: A-V
2 Replies

7. Shell Programming and Scripting

Sed: printing lines AFTER pattern matching EXCLUDING the line containing the pattern

'Hi I'm using the following code to extract the lines(and redirect them to a txt file) after the pattern match. But the output is inclusive of the line with pattern match. Which option is to be used to exclude the line containing the pattern? sed -n '/Conn.*User/,$p' > consumers.txt (11 Replies)
Discussion started by: essem
11 Replies

8. Shell Programming and Scripting

sed - filter blocks between single delimiters matching a pattern

Hi! I have a file with the following format:CDR ... MSISDN=111 ... CDR ... MSISDN=xxx ... CDR ... MSISDN=xxx ... CDR ... MSISDN=111 (2 Replies)
Discussion started by: Flavius
2 Replies

9. UNIX for Dummies Questions & Answers

Grep -v lines starting with pattern 1 and not matching pattern 2

Hi all! Thanks for taking the time to view this! I want to grep out all lines of a file that starts with pattern 1 but also does not match with the second pattern. Example: Drink a soda Eat a banana Eat multiple bananas Drink an apple juice Eat an apple Eat multiple apples I... (8 Replies)
Discussion started by: demmel
8 Replies

10. Shell Programming and Scripting

Group Multiple Lines on SINGLE line matching pattern

Hi Guys, I am trying to format my csv file. When I spool the file using sqlplus the single row output is wrapped on three lines. Somehow I managed to format that file and finally i am trying to make the multiple line on single line. The below command is working fine but I need to pass the... (3 Replies)
Discussion started by: RJSKR28
3 Replies
switch(n)						       Tcl Built-In Commands							 switch(n)

__________________________________________________________________________________________________________________________________________________

NAME
switch - Evaluate one of several scripts, depending on a given value SYNOPSIS
switch ?options? string pattern body ?pattern body ...? switch ?options? string {pattern body ?pattern body ...?} _________________________________________________________________ DESCRIPTION
The switch command matches its string argument against each of the pattern arguments in order. As soon as it finds a pattern that matches string it evaluates the following body argument by passing it recursively to the Tcl interpreter and returns the result of that evaluation. If the last pattern argument is default then it matches anything. If no pattern argument matches string and no default is given, then the switch command returns an empty string. If the initial arguments to switch start with - then they are treated as options unless there are exactly two arguments to switch (in which | case the first must the string and the second must be the pattern/body list). The following options are currently supported: -exact Use exact matching when comparing string to a pattern. This is the default. -glob When matching string to the patterns, use glob-style matching (i.e. the same as implemented by the string match command). -regexp When matching string to the patterns, use regular expression matching (as described in the re_syntax reference page). | -nocase | Causes comparisons to be handled in a case-insensitive manner. | -matchvar varName | This option (only legal when -regexp is also specified) specifies the name of a variable into which the list of matches found by | the regular expression engine will be written. The first element of the list written will be the overall substring of the input | string (i.e. the string argument to switch) matched, the second element of the list will be the substring matched by the first | capturing parenthesis in the regular expression that matched, and so on. When a default branch is taken, the variable will have | the empty list written to it. This option may be specified at the same time as the -indexvar option. | -indexvar varName | This option (only legal when -regexp is also specified) specifies the name of a variable into which the list of indices referring | to matching substrings found by the regular expression engine will be written. The first element of the list written will be a | two-element list specifying the index of the start and index of the first character after the end of the overall substring of the | input string (i.e. the string argument to switch) matched, in a similar way to the -indices option to the regexp can obtain. | Similarly, the second element of the list refers to the first capturing parenthesis in the regular expression that matched, and | so on. When a default branch is taken, the variable will have the empty list written to it. This option may be specified at the | same time as the -matchvar option. -- Marks the end of options. The argument following this one will be treated as string even if it starts with a -. This is not | required when the matching patterns and bodies are grouped together in a single argument. Two syntaxes are provided for the pattern and body arguments. The first uses a separate argument for each of the patterns and commands; this form is convenient if substitutions are desired on some of the patterns or commands. The second form places all of the patterns and commands together into a single argument; the argument must have proper list structure, with the elements of the list being the patterns and commands. The second form makes it easy to construct multi-line switch commands, since the braces around the whole list make it unnec- essary to include a backslash at the end of each line. Since the pattern arguments are in braces in the second form, no command or vari- able substitutions are performed on them; this makes the behavior of the second form different than the first form in some cases. If a body is specified as "-" it means that the body for the next pattern should also be used as the body for this pattern (if the next pattern also has a body of "-" then the body after that is used, and so on). This feature makes it possible to share a single body among several patterns. Beware of how you place comments in switch commands. Comments should only be placed inside the execution body of one of the patterns, and not intermingled with the patterns. EXAMPLES
The switch command can match against variables and not just literals, as shown here (the result is 2): set foo "abc" switch abc a - b {expr {1}} $foo {expr {2}} default {expr {3}} Using glob matching and the fall-through body is an alternative to writing regular expressions with alternations, as can be seen here (this returns 1): switch -glob aaab { a*b - b {expr {1}} a* {expr {2}} default {expr {3}} } Whenever nothing matches, the default clause (which must be last) is taken. This example has a result of 3: switch xyz { a - b { # Correct Comment Placement expr {1} } c { expr {2} } default { expr {3} } } When matching against regular expressions, information about what exactly matched is easily obtained using the -matchvar option: | switch -regexp -matchvar foo -- $bar { | a(b*)c { | puts "Found [string length [lindex $foo 1]] 'b's" | } | d(e*)f(g*)h { | puts "Found [string length [lindex $foo 1]] 'e's and | [string length [lindex $foo 2]] 'g's" | } | } | SEE ALSO
for(n), if(n), regexp(n) KEYWORDS
switch, match, regular expression Tcl 8.5 switch(n)
All times are GMT -4. The time now is 06:18 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy