Sponsored Content
Top Forums Shell Programming and Scripting Making a faster alternative to a slow awk command Post 302666489 by alister on Wednesday 4th of July 2012 01:23:15 PM
Old 07-04-2012
Quote:
Originally Posted by Corona688
I don't think using an awk regex instead of a simple = is going to make it faster, either...
That's a reasonable assumption, but it turns out to be incorrect (at least with the implementation I tested).

In my testing, the following code is over three times faster than the original solution:
Code:
awk '/^83  *(1[0-9][0-9][0-9]|2000)$/' data

I'm curious to know if this solution is also faster on other implementions (nawk and gawk, specifically), but I won't be able to test on them today.

I used an obsolete linux system for all of my testing.

Hardware: Pentium 2 @ 350 MHz (can you feel the power?)
Software: awk is mawk 1.3.3, perl 5.8.8, GNU (e)grep 2.5.1, GNU sed 4.1.5, GNU coreutils 5.97 (cat, wc)
Data: 14 megabytes. 6 line repeating pattern. 1,783,782 lines. 297,297 matches.

Slowest to fastest:
Code:
$ time egrep '^83 (1...|2000)$' data > /dev/null

real    0m15.170s
user    0m15.089s
sys     0m0.080s

$ time awk '$1==83 && $2>=1000 && $2<=2000' data > /dev/null

real    0m11.325s
user    0m11.213s
sys     0m0.112s

$ time perl -ne 'print if /^83 (1[0-9][0-9][0-9]|2000)$/' data > /dev/null

real    0m9.728s
user    0m9.629s
sys     0m0.100s

$ time sed d data

real    0m8.357s
user    0m8.277s
sys     0m0.080s

$ time awk '/^83  *[12][0-9][0-9][0-9]$/ {if ($2>=1000 && $2<=2000) print}' data > /dev/null

real    0m6.809s
user    0m6.692s
sys     0m0.116s

$ time awk '/^83  *(1[0-9][0-9][0-9]|2000)$/' data > /dev/null

real    0m3.555s
user    0m3.404s
sys     0m0.152s

$ time awk 0 data

real    0m1.898s
user    0m1.832s
sys     0m0.068s

$ time wc -l data > /dev/null

real    0m0.721s
user    0m0.316s
sys     0m0.128s

$ time cat data > /dev/null

real    0m0.084s
user    0m0.012s
sys     0m0.072s

Most surprising to me is how long it takes GNU sed to do nothing.


For everyone's amusement (GNU bash 3.1.17):
Code:
$ cat match.sh
while read -r line; do
    case $line in
        83\ 1???|83\ 2000) echo $line;;
    esac
done

$ time sh match.sh < data > /dev/null

real    6m53.128s
user    6m28.776s
sys     0m24.150s

Regards,
Alister

---------- Post updated at 01:23 PM ---------- Previous update was at 01:17 PM ----------

Quote:
Originally Posted by Klashxx
If the first value is fixed try:
Code:
awk '/^83 *[12][0-9][0-9][0-9]/{if($2>=1000 && $2<=2000){print}}' infile

That regular expression says that the space is optional. That's probably not a good idea. The way it's written, 832999 2000 would match.


Quote:
Originally Posted by jayan_jay
Code:
$ egrep "83 (1...$|2000)" infile
83 1453
$

That may require an anchor at the beginning, ^, if numbers with more than 3 digits are possible in the first column. Also, the $ anchor should probably be moved so that it's just after the parenthesized group (for a similar reason).

Regards,
Alister

Last edited by alister; 07-04-2012 at 03:09 PM.. Reason: Added perl version information
These 2 Users Gave Thanks to alister For This Post:
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Which is faster AWK or CUT

If I just wanted to get andred08 from the following ldap dn would I be best to use AWK or CUT? uid=andred08,ou=People,o=example,dc=com It doesn't make a difference if it's just one ldap search I am getting it from but when there's a couple of hundred people in the group that retruns all... (10 Replies)
Discussion started by: dopple
10 Replies

2. UNIX for Advanced & Expert Users

Making things run faster

I am processing some terabytes of information on a computer having 8 processors (each with 4 cores) with a 16GB RAM and 5TB hard drive implemented as a RAID. The processing doesn't seem to be blazingly fast perhaps because of the IO limitation. I am basically running a perl script to read some... (13 Replies)
Discussion started by: Legend986
13 Replies

3. UNIX for Dummies Questions & Answers

Which command will be faster? y?

i)wc -c/etc/passwd|awk'{print $1}' ii)ls -al/etc/passwd|awk'{print $5}' (4 Replies)
Discussion started by: karthi_g
4 Replies

4. UNIX and Linux Applications

Alternative for slow SQL subquery

Hi -- I have the following SQL query in my UNIX shell script -- but the subquery in the second section is very slow. I know there must be a way to do this with a union or something which would be better. Can anyone offer an alternative to this query? Thanks. select count(*) from ... (2 Replies)
Discussion started by: whoknows
2 Replies

5. Shell Programming and Scripting

Multi thread awk command for faster performance

Hi, I have a script below for extracting xml from a file. for i in *.txt do echo $i awk '/<.*/ , /.*<\/.*>/' "$i" | tr -d '\n' echo -ne '\n' done . I read about using multi threading to speed up the script. I do not know much about it but read it on this forum. Is it a... (21 Replies)
Discussion started by: chetan.c
21 Replies

6. Shell Programming and Scripting

Making script run faster

Can someone help me edit the below script to make it run faster? Shell: bash OS: Linux Red Hat The point of the script is to grab entire chunks of information that concerns the service "MEMORY_CHECK". For each chunk, the beginning starts with "service {", and ends with "}". I should... (15 Replies)
Discussion started by: SkySmart
15 Replies

7. Shell Programming and Scripting

Faster way to use this awk command

awk "/May 23, 2012 /,0" /var/tmp/datafile the above command pulls out information in the datafile. the information it pulls is from the date specified to the end of the file. now, how can i make this faster if the datafile is huge? even if it wasn't huge, i feel there's a better/faster way to... (8 Replies)
Discussion started by: SkySmart
8 Replies

8. Shell Programming and Scripting

How to make awk command faster?

I have the below command which is referring a large file and it is taking 3 hours to run. Can something be done to make this command faster. awk -F ',' '{OFS=","}{ if ($13 == "9999") print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12 }' ${NLAP_TEMP}/hist1.out|sort -T ${NLAP_TEMP} |uniq>... (13 Replies)
Discussion started by: Peu Mukherjee
13 Replies

9. Shell Programming and Scripting

How to make awk command faster for large amount of data?

I have nginx web server logs with all requests that were made and I'm filtering them by date and time. Each line has the following structure: 127.0.0.1 - xyz.com GET 123.ts HTTP/1.1 (200) 0.000 s 3182 CoreMedia/1.0.0.15F79 (iPhone; U; CPU OS 11_4 like Mac OS X; pt_br) These text files are... (21 Replies)
Discussion started by: brenoasrm
21 Replies
All times are GMT -4. The time now is 07:42 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy