Bash script search, improve performance with large files Post: 303033044

Sponsored Content

Top Forums Shell Programming and Scripting Bash script search, improve performance with large files Post 303033044 by SDohmen on Friday 29th of March 2019 04:53:59 AM

03-29-2019

Registered User

Quote:

Originally Posted by Peasant

When processing extremely large files you might consider using split first.
Then in multicore environments spawn several awks or greps to process it in parallel from shell script.
There are also GNU tools which offer parallelism without shell logic.

Should be a bit tougher to program, but processing time will be reduced significantly if you have cores and disks are fast to service.

Memory also comes in play, since split will read the files, and operating system will cache those files in memory, if the same is available.
Making those awks or greps processes much faster on read operations.

Of course, limit being free memory on the system and configuration of the file system caching in general.
In default configurations file system caching will be able to use a large portion free memory on most linux / unix systems i've seen.

Hope that helps
Regards
Peasant.

This sounds very interesting but there are 2 issues.

1. I have to split the files in smaller files (around 5k i guess) which isn't a big deal but a little bit annoying.
2. Since this is running in a script i have no idea how to call multiple instances of awk at the same time. Everything i know says that it handles each part of the script after each other and not at the same time. If you have an idea how to accomplish that please let me know since it does sound interesting/promising.

CPU and MEM arent the issue as they are sufficient. The only thing that can stall the script are the other scripts that are running also. I tried spreading them out as much as possible but some just take quite long to run and thats why i want to slim them down so they dont run together.

SDohmen

View Public Profile for SDohmen

Find all posts by SDohmen

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Improve Performance

hi someone tell me which ways i can improve disk I/O and system process performance.kindly refer some commands so i can do it on my test machine.thanks, Mazhar

2. Shell Programming and Scripting

Any way to improve performance of this script

I have a data file of 2 gig I need to do all these, but its taking hours, any where i can improve performance, thanks a lot #!/usr/bin/ksh echo TIMESTAMP="$(date +'_%y-%m-%d.%H-%M-%S')" function showHelp { cat << EOF >&2 syntax extreme.sh FILENAME Specify filename to parse EOF...

3. Shell Programming and Scripting

Improve the performance of a shell script

Hi Friends, I wrote the below shell script to generate a report on alert messages recieved on a day. But i for processing around 4500 lines (alerts) the script is taking aorund 30 minutes to process. Please help me to make it faster and improve the performace of the script. i would be very...

4. Shell Programming and Scripting

Want to improve the performance of script

Hi All, I have written a script as follows which is taking lot of time in executing/searching only 3500 records taken as input from one file in log file of 12 GB Approximately. Working of script is read the csv file as an input having 2 arguments which are transaction_id,mobile_number and search...

5. Programming

Help with improve the performance of grep

Input file: #content_1 12314345345 242467 #content_14 436677645 576577657 #content_100 3425546 56 #content_12 243254546 1232454 . . Reference file: content_100

6. Shell Programming and Scripting

Performance issue in Grepping large files

I have around 300 files(*.rdf,*.fmb,*.pll,*.ctl,*.sh,*.sql,*.prog) which are of large size. Around 8000 keywords(which will be in the file $keywordfile) needed to be searched inside those files. If a keyword is found in a file..I have to insert the filename,extension,catagoery,keyword,occurrence...

7. UNIX for Dummies Questions & Answers

How to improve the performance of this script?

Hi , i wrote a script to convert dates to the formate i want .it works fine but the conversion is tkaing lot of time . Can some one help me tweek this script #!/bin/bash file=$1 ofile=$2 cp $file $ofile mydates=$(grep -Po '+/+/+' $ofile) # gets 8/1/13 mydates=$(echo "$mydates" | sort |...

8. Shell Programming and Scripting

Copying large files in a bash script stops execution

Hello, I'm new to this forum and like to first of all say hello to everyone. I've got a really annoying problem at the moment. I'm trying to rsync some files (about 200MB with one file of 120MB) from a Raspberry PI with raspbian to a debian server via rsync. This procedure is stored in a...

9. Programming

Improve the performance of my C++ code

Hello, Attached is my very simple C++ code to remove any substrings (DNA sequence) of each other, i.e. any redundant sequence is removed to get unique sequences. Similar to sort | uniq command except there is reverse-complementary for DNA sequence. The program runs well with small dataset, but...

LEARN ABOUT OPENDARWIN

sleep

SLEEP(1)						    BSD General Commands Manual 						  SLEEP(1)

NAME

     sleep -- suspend execution for an interval of time

SYNOPSIS

     sleep seconds

DESCRIPTION

     The sleep command suspends execution for a minimum of seconds.

     If the sleep command receives a signal, it takes the standard action.

IMPLEMENTATION NOTES

     The SIGALRM signal is not handled specially by this implementation.

     The sleep command will accept and honor a non-integer number of specified seconds (with a '.' character as a decimal point).  This is a non-
     portable extension, and its use will nearly guarantee that a shell script will not execute properly on another system.

EXAMPLES

     To schedule the execution of a command for x number seconds later (with csh(1)):

	   (sleep 1800; sh command_file >& errors)&

     This incantation would wait a half hour before running the script command_file.  (See the at(1) utility.)

     To reiteratively run a command (with the csh(1)):

	   while (1)
		   if (! -r zzz.rawdata) then
			   sleep 300
		   else
			   foreach i (`ls *.rawdata`)
				   sleep 70
				   awk -f collapse_data $i >> results
			   end
			   break
		   endif
	   end

     The scenario for a script such as this might be: a program currently running is taking longer than expected to process a series of files, and
     it would be nice to have another program start processing the files created by the first program as soon as it is finished (when zzz.rawdata
     is created).  The script checks every five minutes for the file zzz.rawdata, when the file is found, then another portion processing is done
     courteously by sleeping for 70 seconds in between each awk job.

DIAGNOSTICS

     The sleep utility exits 0 on success, and >0 if an error occurs.

SEE ALSO

     nanosleep(2), sleep(3)

STANDARDS

     The sleep command is expected to be IEEE Std 1003.2 (``POSIX.2'') compatible.

HISTORY

     A sleep command appeared in Version 4 AT&T UNIX.

BSD
								  April 18, 1994							       BSD