How to make awk command faster?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to make awk command faster?
# 1  
Old 09-05-2017
How to make awk command faster?

I have the below command which is referring a large file and it is taking 3 hours to run. Can something be done to make this command faster.
Code:
 awk -F ',' '{OFS=","}{ if ($13 == "9999") print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12 }' ${NLAP_TEMP}/hist1.out|sort -T ${NLAP_TEMP} |uniq> ${NLAP_TEMP}/hist2.final

The hist1.out file looks as below
Code:
 rp01_2017,1002302_43,1,103,0074,0,0,0,0,0,0,18,9994
rp01_2017,1002302_43,1,103,0077,0,0,0,0,0,0,18,9999
rp01_2018,1002302_43,1,103,0074,0,0,0,0,0,0,9,9994
rp01_2018,1002302_43,1,103,0077,0,0,0,0,0,0,9,9999
rp10_2017,1002302_43,1,103,0074,0,0,0,0,0,0,16,9994
rp10_2017,1002302_43,1,103,0077,0,0,0,0,0,0,16,9999
rp10_2018,1002302_43,1,103,0074,0,0,0,0,0,0,4,9994
rp10_2018,1002302_43,1,103,0077,0,0,0,0,0,0,4,9999
rp18_2017,1002302_43,1,103,0074,0,0,0,0,0,0,14,9994
rp18_2017,1002302_43,1,103,0077,0,0,0,0,0,0,14,9999


Moderator's Comments:
Mod Comment Welcome! Please use code tags for your code and data, thanks


---------- Post updated at 03:52 AM ---------- Previous update was at 03:16 AM ----------

Code:
 awk -F ',' '{OFS=","}{ if ($13 == "9999") print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12 }' ${NLAP_TEMP}/hist1.out|sort -T ${NLAP_TEMP} |uniq> ${NLAP_TEMP}/hist2.final


Last edited by vbe; 09-05-2017 at 05:40 AM.. Reason: code tags
# 2  
Old 09-05-2017
Would using sed be any quicker?
Code:
sed -n '/,9999$/ s///p' ${NLAP_TEMP}/hist1.out|sort -u -T ${NLAP_TEMP}> ${NLAP_TEMP}/hist2.final

Basically if it matches the last field it deletes that field and then prints the modified line. Also by using sort -u rather than sort | uniq you reduce the number of processes in the pipeline by one.

Andrew
# 3  
Old 09-05-2017
A BEGIN section is only executed once (at the beginning).
And maybe sort can be skipped by just eliminating duplicates?
Code:
awk 'BEGIN { FS=OFS="," } ($13 == "9999" && !($0 in s)) { s[$0]; print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12 }' ${NLAP_TEMP}/hist1.out > ${NLAP_TEMP}/hist2.final

The following variant works like the previous sed solution
Code:
awk '(sub(/,9999$/,"") && !($0 in s)) { s[$0]; print }' ${NLAP_TEMP}/hist1.out > ${NLAP_TEMP}/hist2.final

# 4  
Old 09-06-2017
Hoping that sort has advanced algorithms:
Code:
sort -t, -k13r file | awk -F, 'sub (/,9999$/, _) {print; next} {exit}'
rp01_2017,1002302_43,1,103,0077,0,0,0,0,0,0,18
rp01_2018,1002302_43,1,103,0077,0,0,0,0,0,0,9
rp10_2017,1002302_43,1,103,0077,0,0,0,0,0,0,16
rp10_2018,1002302_43,1,103,0077,0,0,0,0,0,0,4
rp18_2017,1002302_43,1,103,0077,0,0,0,0,0,0,14

# 5  
Old 09-07-2017
Thank you all for your response


The below command has really helped in reducing the time.Would this also sort the rows or do we need to use sort after this?


Code:
awk 'BEGIN { FS=OFS="," } ($13 == "9999" && !($0 in s)) { s[$0]; print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12 }' ${NLAP_TEMP}/hist1.out > ${NLAP_TEMP}/hist2.final


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 09-07-2017 at 11:39 AM.. Reason: Added CODE tags.
# 6  
Old 09-07-2017
That script would not sort the rows as it was not asked to do.

It would be interesting to see a comparison between the different approaches. Could you time each and post the results?
# 7  
Old 09-07-2017
Quote:
Originally Posted by Peu Mukherjee
Thank you all for your response

The below command has really helped in reducing the time. Would this also sort the rows or do we need to use sort after this?
It reduced time by not sorting it. That's liable to be what took the lion's share of the time.

If you need it sorted, and need it sorted faster, point sort to a different disk for temporary space with -T /path/to/folder. Using a different disk for temp space will increase the speed your data can be read.

GNU sort also has a --parallel option, but this is not much help unless you have extraordinarily fast disks.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to make awk command faster for large amount of data?

I have nginx web server logs with all requests that were made and I'm filtering them by date and time. Each line has the following structure: 127.0.0.1 - xyz.com GET 123.ts HTTP/1.1 (200) 0.000 s 3182 CoreMedia/1.0.0.15F79 (iPhone; U; CPU OS 11_4 like Mac OS X; pt_br) These text files are... (21 Replies)
Discussion started by: brenoasrm
21 Replies

2. Shell Programming and Scripting

awk changes to make it faster

I have script like below, who is picking number from one file and and searching in another file, and printing output. Bu is is very slow to be run on huge file.can we modify it with awk #! /bin/ksh while read line1 do echo "$line1" a=`echo $line1` if then echo "$num" cat file1|nawk... (6 Replies)
Discussion started by: mirwasim
6 Replies

3. Shell Programming and Scripting

Making a faster alternative to a slow awk command

Hi, I have a large number of input files with two columns of numbers. For example: 83 1453 99 3255 99 8482 99 7372 83 175 I only wish to retain lines where the numbers fullfil two requirements. E.g: =83 1000<=<=2000 To do this I use the following... (10 Replies)
Discussion started by: s052866
10 Replies

4. Shell Programming and Scripting

Faster way to use this awk command

awk "/May 23, 2012 /,0" /var/tmp/datafile the above command pulls out information in the datafile. the information it pulls is from the date specified to the end of the file. now, how can i make this faster if the datafile is huge? even if it wasn't huge, i feel there's a better/faster way to... (8 Replies)
Discussion started by: SkySmart
8 Replies

5. Shell Programming and Scripting

Multi thread awk command for faster performance

Hi, I have a script below for extracting xml from a file. for i in *.txt do echo $i awk '/<.*/ , /.*<\/.*>/' "$i" | tr -d '\n' echo -ne '\n' done . I read about using multi threading to speed up the script. I do not know much about it but read it on this forum. Is it a... (21 Replies)
Discussion started by: chetan.c
21 Replies

6. Shell Programming and Scripting

Make script faster

Hi all, In bash scripting, I use to read files: cat $file | while read line; do ... doneHowever, it's a very slow way to read file line by line. E.g. In a file that has 3 columns, and less than 400 rows, like this: I run next script: cat $line | while read line; do ## Reads each... (10 Replies)
Discussion started by: AlbertGM
10 Replies

7. Shell Programming and Scripting

Running rename command on large files and make it faster

Hi All, I have some 80,000 files in a directory which I need to rename. Below is the command which I am currently running and it seems, it is taking fore ever to run this command. This command seems too slow. Is there any way to speed up the command. I have have GNU Parallel installed on my... (6 Replies)
Discussion started by: shoaibjameel123
6 Replies

8. Red Hat

Re:How to make the linux pc faster

Hi, Can any one help me out in solving the problem i have a linux database server it is tooo slow that i am unable to open even the terminial is there any solution to get rid of this problem.How to make this server faster. Thanks & Regards Venky (0 Replies)
Discussion started by: venky_vemuri
0 Replies

9. Shell Programming and Scripting

awk help to make my work faster

hii everyone , i have a file in which i have line numbers.. file name is file1.txt aa bb cc "12" qw xx yy zz "23" we bb qw we "123249" jh here 12,23,123249. is the line number now according to this line numbers we have to print lines from other file named... (11 Replies)
Discussion started by: kumar_amit
11 Replies

10. Shell Programming and Scripting

Can anyone make this script run faster?

One of our servers runs Solaris 8 and does not have "ls -lh" as a valid command. I wrote the following script to make the ls output easier to read and emulate "ls -lh" functionality. The script works, but it is slow when executed on a directory that contains a large number of files. Can anyone make... (10 Replies)
Discussion started by: shew01
10 Replies
Login or Register to Ask a Question