How to make awk command faster?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to make awk command faster?
# 8  
Old 09-08-2017
Sure, I will post the time for each of the commands.

Just wanted to check , if the below sort be faster , if I give the temp folder path or should I change the path to some other folder?

Code:
sort -T ${NLAP_TEMP} -u ${NLAP_TEMP}/hist1.out > ${NLAP_TEMP}/hist2.final; VerifyExit

# 9  
Old 09-08-2017
Quote:
Originally Posted by Peu Mukherjee
Just wanted to check , if the below sort be faster , if I give the temp folder path or should I change the path to some other folder?
See, every single disk can do only one thing at a time: reading a byte somewhere means it can't read (or write) a byte somewhere else at that time.

Temporary files are (at least) written once and (at least) read once, your input file is (at least) read once and your output file is written once. For all these tasks you want to involve different disks, so that, while one file is being read or written, another might also be read or written at the same time.

This should answer your question: you want (ideally) for all three involved files separate disks. Perhaps the fastest disk should be assigned to the temporary file because it is probably read and written the most often.

I hope this helps.

bakunin
# 10  
Old 09-08-2017
I could see NLAP_TEMP is fastest directory , I added the same in sort command , but it seems not working, the awk command just takes 7 minutes , the issue is only with sort, which is taking long time .
Code:
sort -T ${NLAP_TEMP} -u ${NLAP_TEMP}/aplymeas5d.dyn.out.tmp1 > ${NLAP_HOME}/backup/aplymeas5d.dyn.final1

---------- Post updated at 05:30 AM ---------- Previous update was at 03:29 AM ----------

Please let me know how can I make sort faster, the file size is 4 GB and the sorting is taking 3 hours. we have only one disk in TEMP folder and 50GB space.
# 11  
Old 09-08-2017
You received several hints in this thread on how to accelerate the sort process. What are the results of either? Did you consult man sort for additional options?
# 12  
Old 09-08-2017
Did you check that the directories are on different physical disks? By that, you need to check that they are separate filesystems and where those filesystems are built from, not just that the directories are different. What you may think of as a single update to a file will cause multiple updates on the disk. There is at least:-
  • the actual disk block for the data
  • the file's inode update with the last modified time
  • the directory (for a new file or rename) and it's inode
  • the filesystem superblock (usually plural) when you get a new disk block from the free list by creating or extending the file
You also have to consider contention from other processing and if this is using NFS mounted filesystems, then you have the overhead of network traffic to bring into it.

I don't know how you have your disks provisioned. Can you explain it? If it is SAN, then that might be more difficult to speed up and depends on the disk at the back-end, the fibre capacity etc. At the other extreme, a PC with a single disk is just going to have contention even if you have a large disk cache.

Overall, if you have lots of data it is just going to take a while. I doubt I will be able to better the suggestions from my fellow learned members. How big is your input file anyway? (in bytes and records) If you try to do too much processing in one chunk, then you may also exhaust memory and cause your server to page/swap. Keeping this to discreet steps may alleviate that bottleneck but may cost more in disk IO. It is difficult to tell.

If you don't mind the 13th field still being there (given that they are all to be 9999) you might be able to save a little by stripping it right back and doing this:-
Code:
grep -E ",9999$" hist1.out | sort -uT ${NLAP_TEMP} > hist2.final

That -u flag on the sort saves the process and therefore all the memory (risk of paging/swapping) and passing the data between them, so that might help.


I hope that this is useful, but there will always be a limit we will hit.



Robin
# 13  
Old 09-12-2017
I tried all the options, but sort is not returning faster.

Since we have one CPU, should we go for splitting the file, then sorting individual files and then merging into a single file?
# 14  
Old 09-12-2017
What about all the questions people asked you?

What physical disks are your various folders on?

If you don't know, trying random folders is unlikely to help.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to make awk command faster for large amount of data?

I have nginx web server logs with all requests that were made and I'm filtering them by date and time. Each line has the following structure: 127.0.0.1 - xyz.com GET 123.ts HTTP/1.1 (200) 0.000 s 3182 CoreMedia/1.0.0.15F79 (iPhone; U; CPU OS 11_4 like Mac OS X; pt_br) These text files are... (21 Replies)
Discussion started by: brenoasrm
21 Replies

2. Shell Programming and Scripting

awk changes to make it faster

I have script like below, who is picking number from one file and and searching in another file, and printing output. Bu is is very slow to be run on huge file.can we modify it with awk #! /bin/ksh while read line1 do echo "$line1" a=`echo $line1` if then echo "$num" cat file1|nawk... (6 Replies)
Discussion started by: mirwasim
6 Replies

3. Shell Programming and Scripting

Making a faster alternative to a slow awk command

Hi, I have a large number of input files with two columns of numbers. For example: 83 1453 99 3255 99 8482 99 7372 83 175 I only wish to retain lines where the numbers fullfil two requirements. E.g: =83 1000<=<=2000 To do this I use the following... (10 Replies)
Discussion started by: s052866
10 Replies

4. Shell Programming and Scripting

Faster way to use this awk command

awk "/May 23, 2012 /,0" /var/tmp/datafile the above command pulls out information in the datafile. the information it pulls is from the date specified to the end of the file. now, how can i make this faster if the datafile is huge? even if it wasn't huge, i feel there's a better/faster way to... (8 Replies)
Discussion started by: SkySmart
8 Replies

5. Shell Programming and Scripting

Multi thread awk command for faster performance

Hi, I have a script below for extracting xml from a file. for i in *.txt do echo $i awk '/<.*/ , /.*<\/.*>/' "$i" | tr -d '\n' echo -ne '\n' done . I read about using multi threading to speed up the script. I do not know much about it but read it on this forum. Is it a... (21 Replies)
Discussion started by: chetan.c
21 Replies

6. Shell Programming and Scripting

Make script faster

Hi all, In bash scripting, I use to read files: cat $file | while read line; do ... doneHowever, it's a very slow way to read file line by line. E.g. In a file that has 3 columns, and less than 400 rows, like this: I run next script: cat $line | while read line; do ## Reads each... (10 Replies)
Discussion started by: AlbertGM
10 Replies

7. Shell Programming and Scripting

Running rename command on large files and make it faster

Hi All, I have some 80,000 files in a directory which I need to rename. Below is the command which I am currently running and it seems, it is taking fore ever to run this command. This command seems too slow. Is there any way to speed up the command. I have have GNU Parallel installed on my... (6 Replies)
Discussion started by: shoaibjameel123
6 Replies

8. Red Hat

Re:How to make the linux pc faster

Hi, Can any one help me out in solving the problem i have a linux database server it is tooo slow that i am unable to open even the terminial is there any solution to get rid of this problem.How to make this server faster. Thanks & Regards Venky (0 Replies)
Discussion started by: venky_vemuri
0 Replies

9. Shell Programming and Scripting

awk help to make my work faster

hii everyone , i have a file in which i have line numbers.. file name is file1.txt aa bb cc "12" qw xx yy zz "23" we bb qw we "123249" jh here 12,23,123249. is the line number now according to this line numbers we have to print lines from other file named... (11 Replies)
Discussion started by: kumar_amit
11 Replies

10. Shell Programming and Scripting

Can anyone make this script run faster?

One of our servers runs Solaris 8 and does not have "ls -lh" as a valid command. I wrote the following script to make the ls output easier to read and emulate "ls -lh" functionality. The script works, but it is slow when executed on a directory that contains a large number of files. Can anyone make... (10 Replies)
Discussion started by: shew01
10 Replies
Login or Register to Ask a Question