Faster way to use this awk command

05-25-2012

Moderator

3,791, 1,452

Join Date: Oct 2010

Last Activity: 1 August 2020, 1:38 AM EDT

Posts: 3,791

Thanks Given: 183

Thanked 1,452 Times in 1,302 Posts

As you wish to go from a line that contains a date string to the end of the file it's probably safe to assume the lines in your file are in date order.

Knowing this it should be possible to write a program that uses a binary chop to seek to the starting line and then process from there. If the file is large this solution will be orders of magnitude faster that a sequential search.

This perl example seem to be pretty close to what I mean, the downside is that seeking into files in this manor is pretty low level and I cant really think on any elegant solution using unix scriping so it will most likley require a proper programming language like perl, python or C - you also have the added complexity of needing to compare date strings instead of straight text.

---------- Post updated at 01:47 PM ---------- Previous update was at 01:24 PM ----------

Another thought if file only has new data appended on the end - keep another text file with each date and the line number it starts on:

Code:

Jan-01-2001 1
Jan-02-2001 7311
Jan-03-2001 15779
...
May-25-2012 574983989

You can then read the file in and start processing from the line number you require:

Code:

LINE=$(grep "May-16-2012" indexfile.txt | awk '{print $2}')
sed -n $LINE',$p' bigfile.log | # <your processing here >

Last edited by Chubler_XL; 05-25-2012 at 12:52 AM..

This User Gave Thanks to Chubler_XL For This Post:

Chubler_XL

View Public Profile for Chubler_XL

Find all posts by Chubler_XL

05-25-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Another alternative:

Code:

awk '!p{if(/May 23, 2012 /)p=1}p' infile

Further to Corona688's approach, this should work cross platform:

Code:

{ sed '/May 23, 2012 /!d;q' ; cat ;} < infile

On Solaris you would probably need to use /usr/xpg4/bin/sed, so set your PATH variable in your script..

Yet another way to speed up might be to use of mawk, which is a faster awk in most cases.

Last edited by Scrutinizer; 05-25-2012 at 07:08 AM..

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Shell Programming and Scripting

Faster way to use this awk command

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to make awk command faster for large amount of data?

Discussion started by: brenoasrm

2. Shell Programming and Scripting

How to make awk command faster?

Discussion started by: Peu Mukherjee

3. Shell Programming and Scripting

awk changes to make it faster

Discussion started by: mirwasim

4. Shell Programming and Scripting

Making a faster alternative to a slow awk command

Discussion started by: s052866

5. UNIX for Dummies Questions & Answers

A faster equivalent for this sed command

Discussion started by: bobylapointe

6. Shell Programming and Scripting

Multi thread awk command for faster performance

Discussion started by: chetan.c

7. UNIX for Dummies Questions & Answers

Which command will be faster? y?

Discussion started by: karthi_g

8. Shell Programming and Scripting

command faster in crontab..

Discussion started by: silverlocket

9. Shell Programming and Scripting

awk help to make my work faster

Discussion started by: kumar_amit

10. Shell Programming and Scripting

Which is faster AWK or CUT

Discussion started by: dopple