Making a faster alternative to a slow awk command

07-05-2012

Registered User

12, 0

Join Date: May 2010

Last Activity: 15 October 2012, 3:39 PM EDT

Posts: 12

Thanks Given: 8

Thanked 0 Times in 0 Posts

Thanks everybody for the interesting comments. It has helped me to try different options and get around the problem.

The values that I use for filtering is not fixed so I had to be cautious to use the regular expressions. Also, I have a number of other informative columns that I did not include in the original post for simplicity.

Instead, as Corona688 suggested, I have realized that my main problem is the huge amount of data.

Quote:

Originally Posted by Corona688:
The problem, really, is that you have a huge amount of data, not a slow program. How big are your records, really?

Therefore, I have made two changes to speed things up (not perfect but to an acceptable level):
(1) I sorted the files according to the relevant columns, and then used a modified awk-line that would not need to parse through the full files but exit when relevant:

Code:

awk '{print;if($4>UPPERLIMIT) exit}'

(2) I split the original, but sorted, file (>50,000,000 lines) into smaller fragments with the range of numbers in the columns that are on priori known to be relevant for the current filtering requirements.

s052866

View Public Profile for s052866

Find all posts by s052866

07-05-2012

Moderator

3,791, 1,452

Join Date: Oct 2010

Last Activity: 1 August 2020, 1:38 AM EDT

Posts: 3,791

Thanks Given: 183

Thanked 1,452 Times in 1,302 Posts

Use a relational database - they are specifically designed to do the type of queries you are talking about, and people spend their whole careers optimising them.

Other thoughts - If you have the files on a nice quick SAN or somthing, you might benifit for doing a multi-threaded lookup. Get all your 16 cores working on the problem. You may need to split the file(s) up so you can run multiple processes against their own file.

Chubler_XL

View Public Profile for Chubler_XL

Find all posts by Chubler_XL

07-12-2012

Registered User

33, 0

Join Date: Mar 2005

Last Activity: 23 October 2013, 8:52 AM EDT

Posts: 33

Thanks Given: 1

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by Klashxx

If the first value is fixed try:

Code:

awk '/^83 *[12][0-9][0-9][0-9]/{if($2>=1000 && $2<=2000){print}}' infile

Hi there,

Newbie here. Can you help to explain the code above, particularly this part:

Code:

awk '/^83 *[12][0-9][0-9][0-9]/

Thanks in advance!

daytripper1021

View Public Profile for daytripper1021

Find all posts by daytripper1021

07-12-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Hi, it should be

Code:

awk '/^83  *[12][0-9][0-9][0-9]/

(two spaces)

It means: match any line that starts (^) with 83 followed by 1 or more spaces and then a number that starts with a 1 or a 2 ([12]) followed by 3 digits ([0-9]).

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Shell Programming and Scripting

Making a faster alternative to a slow awk command

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to make awk command faster for large amount of data?

Discussion started by: brenoasrm

2. Shell Programming and Scripting

How to make awk command faster?

Discussion started by: Peu Mukherjee

3. Shell Programming and Scripting

Faster way to use this awk command

Discussion started by: SkySmart

4. Shell Programming and Scripting

Making script run faster

Discussion started by: SkySmart

5. Shell Programming and Scripting

Multi thread awk command for faster performance

Discussion started by: chetan.c

6. UNIX and Linux Applications

Alternative for slow SQL subquery

Discussion started by: whoknows

7. UNIX for Dummies Questions & Answers

Which command will be faster? y?

Discussion started by: karthi_g

8. UNIX for Advanced & Expert Users

Making things run faster

Discussion started by: Legend986

9. Shell Programming and Scripting

Which is faster AWK or CUT

Discussion started by: dopple