Which is faster AWK or CUT

08-25-2008

Registered User

244, 0

Join Date: Aug 2008

Last Activity: 7 December 2010, 12:28 PM EST

Location: Portugal

Posts: 244

Thanks Given: 0

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by otheus

I really was hoping you were going to say "there are situations where cut just won't cut it." Smilie

Won't let it slip next time

redoubtable

View Public Profile for redoubtable

Find all posts by redoubtable

08-25-2008

Registered User

2,669, 20

Join Date: Sep 2006

Last Activity: 28 January 2015, 8:30 AM EST

Posts: 2,669

Thanks Given: 0

Thanked 20 Times in 20 Posts

Quote:

Originally Posted by redoubtable

testfile has various repetitions of "AAA:BBB\n".
Like other posters said, if you can use cut for you problem you should choose it instead of awk, but there are situations where cut just isn't enough.

Code:

 # du -h file1
207M    file1
# time cut -d":" -f1 file1 > /dev/null

real    0m46.075s
user    0m43.075s
sys     0m0.396s
# time awk -F":" '{print $1}' file1  > /dev/null

real    0m41.344s
user    0m38.422s
sys     0m0.324s
# time cut -d":" -f1 file1 > /dev/null

real    0m45.266s
user    0m43.055s
sys     0m0.328s
# time awk -F":" '{print $1}' file1  > /dev/null

real    0m41.220s
user    0m38.358s
sys     0m0.452s

(g)awk is faster on my machine. version 3.1.5

ghostdog74

View Public Profile for ghostdog74

Find all posts by ghostdog74

08-26-2008

Registered User

5, 0

Join Date: Aug 2008

Last Activity: 30 March 2009, 8:14 AM EDT

Posts: 5

Thanks Given: 0

Thanked 0 Times in 0 Posts

dopple

View Public Profile for dopple

Find all posts by dopple

08-26-2008

Registered User

2,157, 51

Join Date: Feb 2007

Last Activity: 6 September 2017, 5:43 AM EDT

Location: Innsbruck, Austria

Posts: 2,157

Thanks Given: 12

Thanked 51 Times in 48 Posts

Hrm, cut might be slower in some situations...

Code:

[nfs5:otheus] ~tmp/ $ zcat access_log-front9-20080825* |wc -l
4806462

# Run cut twice
[nfs5:otheus] ~tmp/ $ zcat access_log-front9-20080825* |time cut -d" " -f3 >/dev/null
10.36user 1.91system 0:20.07elapsed 61%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+157minor)pagefaults 0swaps

[nfs5:otheus] ~tmp/ $ zcat access_log-front9-20080825* |time cut -d" " -f3 >/dev/null
10.41user 1.81system 0:19.29elapsed 63%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+157minor)pagefaults 0swaps

# average cut time: 10.39s

[nfs5:otheus] ~tmp/ $ zcat access_log-front9-20080825* |time awk '{ print $3 }' >/dev/null
5.58user 2.11system 0:18.16elapsed 42%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (2major+235minor)pagefaults 0swaps

[nfs5:otheus] ~tmp/ $ zcat access_log-front9-20080825* |time awk '{ print $3 }' >/dev/null
5.48user 2.21system 0:17.15elapsed 44%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+227minor)pagefaults 0swaps

# average time: 5.50s

But why?? The results were similar regardless whether the first or third field was printed, whether another delimiter was chosen, although awk did slow down with larger fields. (50% longer when '-' was used as the delimiter -- meaning the fields were longer).

It could be that GNU coreutils' cut is not very optimized. (GNU awk was used here.)

So when is cut shorter? Perhaps it's the parsing routines that make awk slower sometimes. To test this, I took 10 lines of my HTTP access file and timed two runs each of processing this same file 8000 times inside a bash-while loop. One run used field 1, the second run used field 3.

cat to /dev/null
cut to /dev/null
awk to /dev/null

For cut and awk, the cat was part of the pipeline. Thus we should be able to subtract the first time from the other two. Here's what I got:

cat: 16.1 (real) 1.9s (user)
cut: 29.3s (real) 6.5s (user)
awk: 28.9s (real) 8.0s (user)

The idea was to see if cut was better on smaller files. It is relatively better, but even for short files, GNU awk takes less processing time than GNU cut! However, cut would appear to take fewer user-clockticks, if that's any concern to anyone for accounting reasons.

To sum, cut isn't as sharp as it's awkward cousin.

otheus

View Public Profile for otheus

Find all posts by otheus

Shell Programming and Scripting

Which is faster AWK or CUT

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to make awk command faster for large amount of data?

Discussion started by: brenoasrm

2. Shell Programming and Scripting

How to make awk command faster?

Discussion started by: Peu Mukherjee

3. Shell Programming and Scripting

awk changes to make it faster

Discussion started by: mirwasim

4. Shell Programming and Scripting

Making a faster alternative to a slow awk command

Discussion started by: s052866

5. Shell Programming and Scripting

Faster way to use this awk command

Discussion started by: SkySmart

6. Shell Programming and Scripting

Multi thread awk command for faster performance

Discussion started by: chetan.c

7. Shell Programming and Scripting

HELP need to split this line faster than cut-command

Discussion started by: daytripper1021

8. Shell Programming and Scripting

awk help to make my work faster

Discussion started by: kumar_amit

9. UNIX for Dummies Questions & Answers

Help please awk or cut

Discussion started by: thewench

10. Shell Programming and Scripting

[grep awk cut] > awk

Discussion started by: firdousamir