Making a faster alternative to a slow awk command


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Making a faster alternative to a slow awk command
# 8  
Old 07-05-2012
Thanks everybody for the interesting comments. It has helped me to try different options and get around the problem.

The values that I use for filtering is not fixed so I had to be cautious to use the regular expressions. Also, I have a number of other informative columns that I did not include in the original post for simplicity.

Instead, as Corona688 suggested, I have realized that my main problem is the huge amount of data.
Quote:
Originally Posted by Corona688:
The problem, really, is that you have a huge amount of data, not a slow program. How big are your records, really?
Therefore, I have made two changes to speed things up (not perfect but to an acceptable level):
(1) I sorted the files according to the relevant columns, and then used a modified awk-line that would not need to parse through the full files but exit when relevant:
Code:
awk '{print;if($4>UPPERLIMIT) exit}'

(2) I split the original, but sorted, file (>50,000,000 lines) into smaller fragments with the range of numbers in the columns that are on priori known to be relevant for the current filtering requirements.
# 9  
Old 07-05-2012
Use a relational database - they are specifically designed to do the type of queries you are talking about, and people spend their whole careers optimising them.


Other thoughts - If you have the files on a nice quick SAN or somthing, you might benifit for doing a multi-threaded lookup. Get all your 16 cores working on the problem. You may need to split the file(s) up so you can run multiple processes against their own file.
# 10  
Old 07-12-2012
Quote:
Originally Posted by Klashxx
If the first value is fixed try:
Code:
awk '/^83 *[12][0-9][0-9][0-9]/{if($2>=1000 && $2<=2000){print}}' infile

Hi there,

Newbie here. Can you help to explain the code above, particularly this part:

Code:
awk '/^83 *[12][0-9][0-9][0-9]/

Thanks in advance!
# 11  
Old 07-12-2012
Hi, it should be
Code:
awk '/^83  *[12][0-9][0-9][0-9]/

(two spaces)

It means: match any line that starts (^) with 83 followed by 1 or more spaces and then a number that starts with a 1 or a 2 ([12]) followed by 3 digits ([0-9]).
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to make awk command faster for large amount of data?

I have nginx web server logs with all requests that were made and I'm filtering them by date and time. Each line has the following structure: 127.0.0.1 - xyz.com GET 123.ts HTTP/1.1 (200) 0.000 s 3182 CoreMedia/1.0.0.15F79 (iPhone; U; CPU OS 11_4 like Mac OS X; pt_br) These text files are... (21 Replies)
Discussion started by: brenoasrm
21 Replies

2. Shell Programming and Scripting

How to make awk command faster?

I have the below command which is referring a large file and it is taking 3 hours to run. Can something be done to make this command faster. awk -F ',' '{OFS=","}{ if ($13 == "9999") print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12 }' ${NLAP_TEMP}/hist1.out|sort -T ${NLAP_TEMP} |uniq>... (13 Replies)
Discussion started by: Peu Mukherjee
13 Replies

3. Shell Programming and Scripting

Faster way to use this awk command

awk "/May 23, 2012 /,0" /var/tmp/datafile the above command pulls out information in the datafile. the information it pulls is from the date specified to the end of the file. now, how can i make this faster if the datafile is huge? even if it wasn't huge, i feel there's a better/faster way to... (8 Replies)
Discussion started by: SkySmart
8 Replies

4. Shell Programming and Scripting

Making script run faster

Can someone help me edit the below script to make it run faster? Shell: bash OS: Linux Red Hat The point of the script is to grab entire chunks of information that concerns the service "MEMORY_CHECK". For each chunk, the beginning starts with "service {", and ends with "}". I should... (15 Replies)
Discussion started by: SkySmart
15 Replies

5. Shell Programming and Scripting

Multi thread awk command for faster performance

Hi, I have a script below for extracting xml from a file. for i in *.txt do echo $i awk '/<.*/ , /.*<\/.*>/' "$i" | tr -d '\n' echo -ne '\n' done . I read about using multi threading to speed up the script. I do not know much about it but read it on this forum. Is it a... (21 Replies)
Discussion started by: chetan.c
21 Replies

6. UNIX and Linux Applications

Alternative for slow SQL subquery

Hi -- I have the following SQL query in my UNIX shell script -- but the subquery in the second section is very slow. I know there must be a way to do this with a union or something which would be better. Can anyone offer an alternative to this query? Thanks. select count(*) from ... (2 Replies)
Discussion started by: whoknows
2 Replies

7. UNIX for Dummies Questions & Answers

Which command will be faster? y?

i)wc -c/etc/passwd|awk'{print $1}' ii)ls -al/etc/passwd|awk'{print $5}' (4 Replies)
Discussion started by: karthi_g
4 Replies

8. UNIX for Advanced & Expert Users

Making things run faster

I am processing some terabytes of information on a computer having 8 processors (each with 4 cores) with a 16GB RAM and 5TB hard drive implemented as a RAID. The processing doesn't seem to be blazingly fast perhaps because of the IO limitation. I am basically running a perl script to read some... (13 Replies)
Discussion started by: Legend986
13 Replies

9. Shell Programming and Scripting

Which is faster AWK or CUT

If I just wanted to get andred08 from the following ldap dn would I be best to use AWK or CUT? uid=andred08,ou=People,o=example,dc=com It doesn't make a difference if it's just one ldap search I am getting it from but when there's a couple of hundred people in the group that retruns all... (10 Replies)
Discussion started by: dopple
10 Replies
Login or Register to Ask a Question