Which is faster AWK or CUT Post: 302229008

Sponsored Content

Top Forums Shell Programming and Scripting Which is faster AWK or CUT Post 302229008 by otheus on Tuesday 26th of August 2008 03:42:53 AM

08-26-2008

Registered User

Hrm, cut might be slower in some situations...

Code:

[nfs5:otheus] ~tmp/ $ zcat access_log-front9-20080825* |wc -l
4806462

# Run cut twice
[nfs5:otheus] ~tmp/ $ zcat access_log-front9-20080825* |time cut -d" " -f3 >/dev/null
10.36user 1.91system 0:20.07elapsed 61%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+157minor)pagefaults 0swaps

[nfs5:otheus] ~tmp/ $ zcat access_log-front9-20080825* |time cut -d" " -f3 >/dev/null
10.41user 1.81system 0:19.29elapsed 63%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+157minor)pagefaults 0swaps

# average cut time: 10.39s

[nfs5:otheus] ~tmp/ $ zcat access_log-front9-20080825* |time awk '{ print $3 }' >/dev/null
5.58user 2.11system 0:18.16elapsed 42%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (2major+235minor)pagefaults 0swaps

[nfs5:otheus] ~tmp/ $ zcat access_log-front9-20080825* |time awk '{ print $3 }' >/dev/null
5.48user 2.21system 0:17.15elapsed 44%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+227minor)pagefaults 0swaps

# average time: 5.50s

But why?? The results were similar regardless whether the first or third field was printed, whether another delimiter was chosen, although awk did slow down with larger fields. (50% longer when '-' was used as the delimiter -- meaning the fields were longer).

It could be that GNU coreutils' cut is not very optimized. (GNU awk was used here.)

So when is cut shorter? Perhaps it's the parsing routines that make awk slower sometimes. To test this, I took 10 lines of my HTTP access file and timed two runs each of processing this same file 8000 times inside a bash-while loop. One run used field 1, the second run used field 3.

cat to /dev/null
cut to /dev/null
awk to /dev/null

For cut and awk, the cat was part of the pipeline. Thus we should be able to subtract the first time from the other two. Here's what I got:

cat: 16.1 (real) 1.9s (user)
cut: 29.3s (real) 6.5s (user)
awk: 28.9s (real) 8.0s (user)

The idea was to see if cut was better on smaller files. It is relatively better, but even for short files, GNU awk takes less processing time than GNU cut! However, cut would appear to take fewer user-clockticks, if that's any concern to anyone for accounting reasons.

To sum, cut isn't as sharp as it's awkward cousin.

otheus

View Public Profile for otheus

Find all posts by otheus

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

[grep awk cut] > awk

Hi, I'm very new to scripting. grep $s $filename | awk '{print $2}' | cut -c 1-8 How can I optimize this using a single awk? I tried: awk '/$s/ {print $2}' $filename | cut -c 1-8 However didn't work, I think the awk is not recognizing $s and the verbal is something else.

2. UNIX for Dummies Questions & Answers

Help please awk or cut

Hi I'm new to unix programming so struggling with something thats probably simple to many of you I have data files of the format : ID, date, value1, value2, blank on each line either value1 or value2 will be zero. I need my output file to contain ID, date, non-zero value The input...

3. Shell Programming and Scripting

awk help to make my work faster

hii everyone , i have a file in which i have line numbers.. file name is file1.txt aa bb cc "12" qw xx yy zz "23" we bb qw we "123249" jh here 12,23,123249. is the line number now according to this line numbers we have to print lines from other file named...

4. Shell Programming and Scripting

HELP need to split this line faster than cut-command

Hi, A datafile containing lines such as below needs to be split: 500000000000932491683600000000000000000000000000016800000GS0000000000932491683600*HOME I need to get the 2-5, 11-20, and 35-40 characters and I can do it via cut command. cut -c 2-5 file > temp1.txt cut -c 11-20 file >...

5. Shell Programming and Scripting

Multi thread awk command for faster performance

Hi, I have a script below for extracting xml from a file. for i in *.txt do echo $i awk '/<.*/ , /.*<\/.*>/' "$i" | tr -d '\n' echo -ne '\n' done . I read about using multi threading to speed up the script. I do not know much about it but read it on this forum. Is it a...

6. Shell Programming and Scripting

Faster way to use this awk command

awk "/May 23, 2012 /,0" /var/tmp/datafile the above command pulls out information in the datafile. the information it pulls is from the date specified to the end of the file. now, how can i make this faster if the datafile is huge? even if it wasn't huge, i feel there's a better/faster way to...

7. Shell Programming and Scripting

Making a faster alternative to a slow awk command

Hi, I have a large number of input files with two columns of numbers. For example: 83 1453 99 3255 99 8482 99 7372 83 175 I only wish to retain lines where the numbers fullfil two requirements. E.g: =83 1000<=<=2000 To do this I use the following...

8. Shell Programming and Scripting

awk changes to make it faster

I have script like below, who is picking number from one file and and searching in another file, and printing output. Bu is is very slow to be run on huge file.can we modify it with awk #! /bin/ksh while read line1 do echo "$line1" a=`echo $line1` if then echo "$num" cat file1|nawk...

9. Shell Programming and Scripting

How to make awk command faster?

I have the below command which is referring a large file and it is taking 3 hours to run. Can something be done to make this command faster. awk -F ',' '{OFS=","}{ if ($13 == "9999") print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12 }' ${NLAP_TEMP}/hist1.out|sort -T ${NLAP_TEMP} |uniq>...

10. Shell Programming and Scripting

How to make awk command faster for large amount of data?

I have nginx web server logs with all requests that were made and I'm filtering them by date and time. Each line has the following structure: 127.0.0.1 - xyz.com GET 123.ts HTTP/1.1 (200) 0.000 s 3182 CoreMedia/1.0.0.15F79 (iPhone; U; CPU OS 11_4 like Mac OS X; pt_br) These text files are...

LEARN ABOUT XFREE86

cut

CUT(1)                                                             User Commands                                                            CUT(1)

NAME

       cut - remove sections from each line of files

SYNOPSIS

       cut OPTION... [FILE]...

DESCRIPTION

       Print selected parts of lines from each FILE to standard output.

       With no FILE, or when FILE is -, read standard input.

       Mandatory arguments to long options are mandatory for short options too.

       -b, --bytes=LIST
              select only these bytes

       -c, --characters=LIST
              select only these characters

       -d, --delimiter=DELIM
              use DELIM instead of TAB for field delimiter

       -f, --fields=LIST
              select only these fields;  also print any line that contains no delimiter character, unless the -s option is specified

       -n     (ignored)

       --complement
              complement the set of selected bytes, characters or fields

       -s, --only-delimited
              do not print lines not containing delimiters

       --output-delimiter=STRING
              use STRING as the output delimiter the default is to use the input delimiter

       -z, --zero-terminated
              line delimiter is NUL, not newline

       --help display this help and exit

       --version
              output version information and exit

       Use  one,  and only one of -b, -c or -f.  Each LIST is made up of one range, or many ranges separated by commas.  Selected input is written
       in the same order that it is read, and is written exactly once.  Each range is one of:

       N      N'th byte, character or field, counted from 1

       N-     from N'th byte, character or field, to end of line

       N-M    from N'th to M'th (included) byte, character or field

       -M     from first to M'th (included) byte, character or field

AUTHOR

       Written by David M. Ihnat, David MacKenzie, and Jim Meyering.

REPORTING BUGS

       GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
       Report cut translation bugs to <http://translationproject.org/team/>

COPYRIGHT

       Copyright (C) 2017 Free Software Foundation, Inc.  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
       This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent permitted by law.

SEE ALSO

       Full documentation at: <http://www.gnu.org/software/coreutils/cut>
       or available locally via: info '(coreutils) cut invocation'

GNU coreutils 8.28                                                 January 2018                                                             CUT(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

[grep awk cut] > awk

Discussion started by: firdousamir

2. UNIX for Dummies Questions & Answers

Help please awk or cut

Discussion started by: thewench

3. Shell Programming and Scripting

awk help to make my work faster

Discussion started by: kumar_amit

4. Shell Programming and Scripting

HELP need to split this line faster than cut-command

Discussion started by: daytripper1021

5. Shell Programming and Scripting

Multi thread awk command for faster performance

Discussion started by: chetan.c

6. Shell Programming and Scripting

Faster way to use this awk command

Discussion started by: SkySmart

7. Shell Programming and Scripting

Making a faster alternative to a slow awk command

Discussion started by: s052866