Process multiple large files with awk Post: 302961893

Sponsored Content

Top Forums Shell Programming and Scripting Process multiple large files with awk Post 302961893 by Don Cragun on Saturday 5th of December 2015 11:37:47 PM

12-06-2015

Registered User

You are reading all of your large files about 20 times. If you could read those files once (instead of twenty times), you would probably reduce running time to less than 5% of what it is now.

With a 60Gb system and data totaling ~6Gb, is there a reason why you can't schedule a time to run an awk script that will need to be allocated a little ~7Gb of RAM while it is running and let it read all of your data into memory and process it in one pass?

Do you really want 16 output files or do you just want one output file?

Do you care if the output is sorted, or do you just sort the keys to create distinct smaller lists of keys to be processed individually? (I note that the data written into your output files by awk aren't sorted.)

And a trivial performance note... There is no need to fire up a subshell to gather arguments for echo. It is easier to just use:

Code:

date '+START PROCESSING DATE: %d/%m/%y - TIME: %H:%M:%S'
...
date '+END PROCESSING DATE: %d/%m/%y - TIME: %H:%M:%S'

instead of:

Code:

echo $(date "+START PROCESSING DATE: %d/%m/%y - TIME: %H:%M:%S")
...
echo $(date "+END PROCESSING DATE: %d/%m/%y - TIME: %H:%M:%S")

And, for consistency with the printf statements in your awk script, the last print statement in your awk script should be:

Code:

print OFS sum

instead of:

Code:

print " "sum

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

how to divide single large log file into multiple files.

Can you please help me with writing script for following purpose. I have to divide single large web access log file into multiple log files based on dates inside the log file. For example: if data is logged in the access file for jan-10-08 , jan-11-08 , Jan-12-08 then make small log file...

2. Shell Programming and Scripting

AWK Shell Program to Split Large Files

Hi, I need some help creating a tidy shell program with awk or other language that will split large length files efficiently. Here is an example dump: <A001_MAIL.DAT> 0001 Ronald McDonald 01 H81 0002 Elmo St. Elmo 02 H82 0003 Cookie Monster 01 H81 0004 Oscar ...

3. UNIX for Dummies Questions & Answers

multiple smaller files from one large file

I have a file with a simple list of ids. 750,000 rows. I have to break it down into multiple 50,000 row files to submit in a batch process.. Is there an easy script I could write to accomplish this task?

4. Shell Programming and Scripting

Using AWK to separate data from a large XML file into multiple files

I have a 500 MB XML file from a FileMaker database export, it's formatted horribly (no line breaks at all). The node structure is basically <FMPXMLRESULT> <METADATA> <FIELD att="............." id="..."/> </METADATA> <RESULTSET FOUND="1763457"> <ROW att="....." etc="...."> ...

5. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is...

6. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,...

7. Emergency UNIX and Linux Support

Help to make awk script more efficient for large files

Hello, Error awk: Internal software error in the tostring function on TS1101?05044400?.0085498227?0?.0011041461?.0034752266?.00397045?0?0?0?0?0?0?11/02/10?09/23/10???10?no??0??no?sct_det3_10_20110516_143936.txt What it is It is a unix shell script that contains an awk program as well as...

8. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345...

9. Shell Programming and Scripting

Split large zone file dump into multiple files

I have a large zone file dump that consists of ; DNS record for the adomain.com domain data1 data2 data3 data4 data5 CRLF CRLF CRLF ; DNS record for the anotherdomain.com domain data1 data2 data3 data4 data5 data6 CRLF

10. UNIX for Dummies Questions & Answers

Find common numbers from two very large files using awk or the like

I've got two files that each contain a 16-digit number in positions 1-16. The first file has 63,120 entries all sorted numerically. The second file has 142,479 entries, also sorted numerically. I want to read through each file and output the entries that appear in both. So far I've had no...

LEARN ABOUT DEBIAN

igawk

IGAWK(1)							 Utility Commands							  IGAWK(1)

NAME

       igawk - gawk with include files

SYNOPSIS

       igawk [ all gawk options ] -f program-file [ -- ] file ...
       igawk [ all gawk options ] [ -- ] program-text file ...

DESCRIPTION

       Igawk is a simple shell script that adds the ability to have ``include files'' to gawk(1).

       AWK programs for igawk are the same as for gawk, except that, in addition, you may have lines like

	      @include getopt.awk

       in your program to include the file getopt.awk from either the current directory or one of the other directories in the search path.

OPTIONS

       See gawk(1) for a full description of the AWK language and the options that gawk supports.

EXAMPLES

       cat << EOF > test.awk
       @include getopt.awk

       BEGIN {
	    while (getopt(ARGC, ARGV, "am:q") != -1)
		 ...
       }
       EOF

       igawk -f test.awk

SEE ALSO

       gawk(1)

       Effective AWK Programming, Edition 1.0, published by the Free Software Foundation, 1995.

AUTHOR

       Arnold Robbins (arnold@skeeve.com).

Free Software Foundation					    Nov 3 1999								  IGAWK(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

how to divide single large log file into multiple files.

Discussion started by: kamleshm

2. Shell Programming and Scripting

AWK Shell Program to Split Large Files

Discussion started by: mkastin

3. UNIX for Dummies Questions & Answers

multiple smaller files from one large file

Discussion started by: rtroscianecki

4. Shell Programming and Scripting

Using AWK to separate data from a large XML file into multiple files

Discussion started by: JRy

5. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Discussion started by: Liverpaul09

6. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Discussion started by: kam66

7. Emergency UNIX and Linux Support

Help to make awk script more efficient for large files

Discussion started by: script_op2a

8. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

Discussion started by: jimmy12

9. Shell Programming and Scripting

Split large zone file dump into multiple files

Discussion started by: Bluemerlin

10. UNIX for Dummies Questions & Answers

Find common numbers from two very large files using awk or the like

Discussion started by: Scottie1954

LEARN ABOUT DEBIAN

igawk