Sponsored Content
Top Forums Shell Programming and Scripting Process multiple large files with awk Post 302961893 by Don Cragun on Saturday 5th of December 2015 11:37:47 PM
Old 12-06-2015
You are reading all of your large files about 20 times. If you could read those files once (instead of twenty times), you would probably reduce running time to less than 5% of what it is now.

With a 60Gb system and data totaling ~6Gb, is there a reason why you can't schedule a time to run an awk script that will need to be allocated a little ~7Gb of RAM while it is running and let it read all of your data into memory and process it in one pass?

Do you really want 16 output files or do you just want one output file?

Do you care if the output is sorted, or do you just sort the keys to create distinct smaller lists of keys to be processed individually? (I note that the data written into your output files by awk aren't sorted.)

And a trivial performance note... There is no need to fire up a subshell to gather arguments for echo. It is easier to just use:
Code:
date '+START PROCESSING DATE: %d/%m/%y - TIME: %H:%M:%S'
...
date '+END PROCESSING DATE: %d/%m/%y - TIME: %H:%M:%S'

instead of:
Code:
echo $(date "+START PROCESSING DATE: %d/%m/%y - TIME: %H:%M:%S")
...
echo $(date "+END PROCESSING DATE: %d/%m/%y - TIME: %H:%M:%S")

And, for consistency with the printf statements in your awk script, the last print statement in your awk script should be:
Code:
print OFS sum

instead of:
Code:
print " "sum

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

how to divide single large log file into multiple files.

Can you please help me with writing script for following purpose. I have to divide single large web access log file into multiple log files based on dates inside the log file. For example: if data is logged in the access file for jan-10-08 , jan-11-08 , Jan-12-08 then make small log file... (1 Reply)
Discussion started by: kamleshm
1 Replies

2. Shell Programming and Scripting

AWK Shell Program to Split Large Files

Hi, I need some help creating a tidy shell program with awk or other language that will split large length files efficiently. Here is an example dump: <A001_MAIL.DAT> 0001 Ronald McDonald 01 H81 0002 Elmo St. Elmo 02 H82 0003 Cookie Monster 01 H81 0004 Oscar ... (16 Replies)
Discussion started by: mkastin
16 Replies

3. UNIX for Dummies Questions & Answers

multiple smaller files from one large file

I have a file with a simple list of ids. 750,000 rows. I have to break it down into multiple 50,000 row files to submit in a batch process.. Is there an easy script I could write to accomplish this task? (2 Replies)
Discussion started by: rtroscianecki
2 Replies

4. Shell Programming and Scripting

Using AWK to separate data from a large XML file into multiple files

I have a 500 MB XML file from a FileMaker database export, it's formatted horribly (no line breaks at all). The node structure is basically <FMPXMLRESULT> <METADATA> <FIELD att="............." id="..."/> </METADATA> <RESULTSET FOUND="1763457"> <ROW att="....." etc="...."> ... (16 Replies)
Discussion started by: JRy
16 Replies

5. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

6. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies

7. Emergency UNIX and Linux Support

Help to make awk script more efficient for large files

Hello, Error awk: Internal software error in the tostring function on TS1101?05044400?.0085498227?0?.0011041461?.0034752266?.00397045?0?0?0?0?0?0?11/02/10?09/23/10???10?no??0??no?sct_det3_10_20110516_143936.txt What it is It is a unix shell script that contains an awk program as well as... (4 Replies)
Discussion started by: script_op2a
4 Replies

8. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345... (19 Replies)
Discussion started by: jimmy12
19 Replies

9. Shell Programming and Scripting

Split large zone file dump into multiple files

I have a large zone file dump that consists of ; DNS record for the adomain.com domain data1 data2 data3 data4 data5 CRLF CRLF CRLF ; DNS record for the anotherdomain.com domain data1 data2 data3 data4 data5 data6 CRLF (7 Replies)
Discussion started by: Bluemerlin
7 Replies

10. UNIX for Dummies Questions & Answers

Find common numbers from two very large files using awk or the like

I've got two files that each contain a 16-digit number in positions 1-16. The first file has 63,120 entries all sorted numerically. The second file has 142,479 entries, also sorted numerically. I want to read through each file and output the entries that appear in both. So far I've had no... (13 Replies)
Discussion started by: Scottie1954
13 Replies
IGAWK(1)							 Utility Commands							  IGAWK(1)

NAME
igawk - gawk with include files SYNOPSIS
igawk [ all gawk options ] -f program-file [ -- ] file ... igawk [ all gawk options ] [ -- ] program-text file ... DESCRIPTION
Igawk is a simple shell script that adds the ability to have ``include files'' to gawk(1). AWK programs for igawk are the same as for gawk, except that, in addition, you may have lines like @include getopt.awk in your program to include the file getopt.awk from either the current directory or one of the other directories in the search path. OPTIONS
See gawk(1) for a full description of the AWK language and the options that gawk supports. EXAMPLES
cat << EOF > test.awk @include getopt.awk BEGIN { while (getopt(ARGC, ARGV, "am:q") != -1) ... } EOF igawk -f test.awk SEE ALSO
gawk(1) Effective AWK Programming, Edition 1.0, published by the Free Software Foundation, 1995. AUTHOR
Arnold Robbins (arnold@skeeve.com). Free Software Foundation Nov 3 1999 IGAWK(1)
All times are GMT -4. The time now is 04:55 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy