Making a faster alternative to a slow awk command Post: 302666577

Sponsored Content

Top Forums Shell Programming and Scripting Making a faster alternative to a slow awk command Post 302666577 by alister on Wednesday 4th of July 2012 07:16:08 PM

07-04-2012

Registered User

Quote:

Originally Posted by Scrutinizer

Another factor that might prove an important factor is which awk or which grep is used.

Absolutely. GNU tools in particular tend to be slower than their counterparts.

Quote:

Originally Posted by Scrutinizer

@alister, results of tests 1,3 and the bash loop may be flawed because the regex or pattern match do not match the lines of the OP's input spec..

Woops. My test data was delimited by a single space, so the output of the commands would be correct, but the time was slightly underestimated due to the simpler regular expression.

Using ed, I replaced the single space in each line with a <space><tab><space> sequence. I re-ran the tests, replacing the <space> in the regular expression with [<space><tab>]+, and the time for each test increased by 1 to 3 seconds with the rankings unchanged.

Interesting observation: character classes really slowed down GNU grep.

egrep '^83[[:blank:]]+... takes twice as long as egrep '^83[ <tab>]+..., 30s versus 15s. With perl, the difference was approximately 0.6s.

As for the bash trinket, I won't bother fixing that. I'm not _that_ bored.

Thanks for living up to your nick.

Regards,
Alister

alister

View Public Profile for alister

Find all posts by alister

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Which is faster AWK or CUT

If I just wanted to get andred08 from the following ldap dn would I be best to use AWK or CUT? uid=andred08,ou=People,o=example,dc=com It doesn't make a difference if it's just one ldap search I am getting it from but when there's a couple of hundred people in the group that retruns all...

2. UNIX for Advanced & Expert Users

Making things run faster

I am processing some terabytes of information on a computer having 8 processors (each with 4 cores) with a 16GB RAM and 5TB hard drive implemented as a RAID. The processing doesn't seem to be blazingly fast perhaps because of the IO limitation. I am basically running a perl script to read some...

3. UNIX for Dummies Questions & Answers

Which command will be faster? y?

i)wc -c/etc/passwd|awk'{print $1}' ii)ls -al/etc/passwd|awk'{print $5}'

4. UNIX and Linux Applications

Alternative for slow SQL subquery

Hi -- I have the following SQL query in my UNIX shell script -- but the subquery in the second section is very slow. I know there must be a way to do this with a union or something which would be better. Can anyone offer an alternative to this query? Thanks. select count(*) from ...

5. Shell Programming and Scripting

Multi thread awk command for faster performance

Hi, I have a script below for extracting xml from a file. for i in *.txt do echo $i awk '/<.*/ , /.*<\/.*>/' "$i" | tr -d '\n' echo -ne '\n' done . I read about using multi threading to speed up the script. I do not know much about it but read it on this forum. Is it a...

6. Shell Programming and Scripting

Making script run faster

Can someone help me edit the below script to make it run faster? Shell: bash OS: Linux Red Hat The point of the script is to grab entire chunks of information that concerns the service "MEMORY_CHECK". For each chunk, the beginning starts with "service {", and ends with "}". I should...

7. Shell Programming and Scripting

Faster way to use this awk command

awk "/May 23, 2012 /,0" /var/tmp/datafile the above command pulls out information in the datafile. the information it pulls is from the date specified to the end of the file. now, how can i make this faster if the datafile is huge? even if it wasn't huge, i feel there's a better/faster way to...

8. Shell Programming and Scripting

How to make awk command faster?

I have the below command which is referring a large file and it is taking 3 hours to run. Can something be done to make this command faster. awk -F ',' '{OFS=","}{ if ($13 == "9999") print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12 }' ${NLAP_TEMP}/hist1.out|sort -T ${NLAP_TEMP} |uniq>...

9. Shell Programming and Scripting

How to make awk command faster for large amount of data?

I have nginx web server logs with all requests that were made and I'm filtering them by date and time. Each line has the following structure: 127.0.0.1 - xyz.com GET 123.ts HTTP/1.1 (200) 0.000 s 3182 CoreMedia/1.0.0.15F79 (iPhone; U; CPU OS 11_4 like Mac OS X; pt_br) These text files are...

LEARN ABOUT ULTRIX

re_exec

regex(3)						     Library Functions Manual							  regex(3)

Name
       re_comp, re_exec - regular expression handler

Syntax
       char *re_comp(s)
       char *s;

       re_exec(s)
       char *s;

Description
       The  subroutine	compiles  a string into an internal form suitable for pattern matching.  The subroutine checks the argument string against
       the last string passed to

       The subroutine returns 0 if the string s was compiled successfully; otherwise a string containing an  error  message  is  returned.  If	is
       passed 0 or a null string, it returns without changing the currently compiled regular expression.

       The  subroutine returns 1 if the string s matches the last compiled regular expression, 0 if the string s failed to match the last compiled
       regular expression, and -1 if the compiled regular expression was invalid (indicating an internal error).

       The strings passed to both and may have trailing or embedded newline characters; they are terminated by	nulls.	 The  regular  expressions
       recognized are described in the manual entry for given the above difference.

Diagnostics
       The subroutine returns -1 for an internal error.

       The subroutine returns one of the following strings if an error occurs:

       No previous regular expression
       Regular expression too long
       unmatched (
       missing ]
       too many () pairs
       unmatched )

See Also
       ed(1), ex(1), egrep(1), fgrep(1), grep(1)

																	  regex(3)