Sponsored Content
Top Forums Shell Programming and Scripting Process multiple large files with awk Post 302961902 by RudiC on Sunday 6th of December 2015 07:29:38 AM
Old 12-06-2015
You need to replace
Code:
sum += d[key, i]

with
Code:
sum += data[key, i]

to have the sum printed.

---------- Post updated at 13:29 ---------- Previous update was at 11:57 ----------

Not sure if this will perform better (or even worse?) but it leaves the memory consumption to sort which has options to handle that. We need the files' names upfront, thus the ls trick. Try
Code:
{ ls -x file[1-3]; grep '' file[1-3] | sort -k2,2 -k1,1; } | awk  '
NR == 1         {for (FC=n=split ($0, FN); n>0; n--) FILES[FN[n]] = n
                 next
                }

function ADDZERO(ST, EN)        {for (i=ST; i<EN; i++) LINE = LINE FS "0"
                                 LF = ++i
                                }


                {split ($1, T, ":")
                 FNO = FILES[T[1]]
                }

$2 != LAST      {ADDZERO(LF, FC+1)
                 if (NR > 2) print LINE, SUM
                 SUM  = 0
                 LINE =  T[2] FS $2
                 LF = 1
                }
                {LAST = $2
                 ADDZERO(LF, FNO)
                 LINE = LINE FS $3
                 SUM += $3
                }
END             {print LINE, SUM
                }
'
a sample_1 200 10 1 211
a.b sample_2 10 0 10 20
a sample_3 10 67 0 77
a sample_4 0 0 20 20


Last edited by RudiC; 12-06-2015 at 08:24 AM..
This User Gave Thanks to RudiC For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

how to divide single large log file into multiple files.

Can you please help me with writing script for following purpose. I have to divide single large web access log file into multiple log files based on dates inside the log file. For example: if data is logged in the access file for jan-10-08 , jan-11-08 , Jan-12-08 then make small log file... (1 Reply)
Discussion started by: kamleshm
1 Replies

2. Shell Programming and Scripting

AWK Shell Program to Split Large Files

Hi, I need some help creating a tidy shell program with awk or other language that will split large length files efficiently. Here is an example dump: <A001_MAIL.DAT> 0001 Ronald McDonald 01 H81 0002 Elmo St. Elmo 02 H82 0003 Cookie Monster 01 H81 0004 Oscar ... (16 Replies)
Discussion started by: mkastin
16 Replies

3. UNIX for Dummies Questions & Answers

multiple smaller files from one large file

I have a file with a simple list of ids. 750,000 rows. I have to break it down into multiple 50,000 row files to submit in a batch process.. Is there an easy script I could write to accomplish this task? (2 Replies)
Discussion started by: rtroscianecki
2 Replies

4. Shell Programming and Scripting

Using AWK to separate data from a large XML file into multiple files

I have a 500 MB XML file from a FileMaker database export, it's formatted horribly (no line breaks at all). The node structure is basically <FMPXMLRESULT> <METADATA> <FIELD att="............." id="..."/> </METADATA> <RESULTSET FOUND="1763457"> <ROW att="....." etc="...."> ... (16 Replies)
Discussion started by: JRy
16 Replies

5. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

6. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies

7. Emergency UNIX and Linux Support

Help to make awk script more efficient for large files

Hello, Error awk: Internal software error in the tostring function on TS1101?05044400?.0085498227?0?.0011041461?.0034752266?.00397045?0?0?0?0?0?0?11/02/10?09/23/10???10?no??0??no?sct_det3_10_20110516_143936.txt What it is It is a unix shell script that contains an awk program as well as... (4 Replies)
Discussion started by: script_op2a
4 Replies

8. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345... (19 Replies)
Discussion started by: jimmy12
19 Replies

9. Shell Programming and Scripting

Split large zone file dump into multiple files

I have a large zone file dump that consists of ; DNS record for the adomain.com domain data1 data2 data3 data4 data5 CRLF CRLF CRLF ; DNS record for the anotherdomain.com domain data1 data2 data3 data4 data5 data6 CRLF (7 Replies)
Discussion started by: Bluemerlin
7 Replies

10. UNIX for Dummies Questions & Answers

Find common numbers from two very large files using awk or the like

I've got two files that each contain a 16-digit number in positions 1-16. The first file has 63,120 entries all sorted numerically. The second file has 142,479 entries, also sorted numerically. I want to read through each file and output the entries that appear in both. So far I've had no... (13 Replies)
Discussion started by: Scottie1954
13 Replies
APSFILTER-BUG(1)					    BSD General Commands Manual 					  APSFILTER-BUG(1)

NAME
apsfilter-bug -- create a half-automatic bug report for apsfilter SYNOPSIS
apsfilter-bug OPTIONS
none DESCRIPTION
With the apsfilter-bug script you can automatically create a bug report for apsfilter(1). A template report file is created which you can edit with your favourite editor ( emacs(1) by default, or whatever is entered in the environment variable EDITOR ). You will see a couple of lines starting with '#'; these lines will be removed before the report is sent. So if you want to use the '#' char- acter at the beginning of a line, just indent it with a space. A very important piece of the bug report is the debugging log created by aps2file(1), probably with a command like aps2file -D -o /dev/null [-P...] [-Z...] [input] 2> log.txt Unless you don't even have a chance to create a debugging log, it is crucial for serious bug hunting. After you have filled in as much (useful) information as possible, you must save the file under its original name, then exit your editor. If you invoked apsfilter-bug by accident, or if you need some additional system information to include into the bug report, just exit your editor without saving the file. NOTES
In addition to the stuff you've edited, some more information is appended to the bug report automatically: apsfilter version 7.2.6-stable ghostscript version (from 'gs --version') system id (from 'uname -a') configure options --prefix=/usr --sysconfdir=/etc --mandir=/usr/share/man --with-awk=/usr/bin/awk --with-sendmail=/usr/sbin/sendmail shell executable /bin/bash awk executable /usr/bin/awk sendmail executable /usr/sbin/sendmail FILES
/usr/bin/aps2file script to create debugging output SEE ALSO
apsfilter(1), aps2file(1) BUGS
See apsfilter software center - http://www.apsfilter.org/ - for new versions, bugfixes and known bugs. Please use the new tool apsfilter-bug(1) to release bug- or problem reports. It automatically presents you a form in an editor window which asks you some standard questions. If you save and quit the editor session, then this report is sent automatically via e-mail to the proper apsfilter mailinglist. If apsfilter fails to print something or prints it in a way you wouldn't expect and you want to report an apsfilter error then please save the debugging output of one print session using the new aps2file(1) utility by typing aps2file -D -Z options file > /dev/null 2> file.debug and including the debugging output in the file file.debug into the edit session of the apsfilter-bug utility, so that it is included into the mail to the apsfilter mailinglist. Please note that you need to run /bin/sh (Bourne Shell), bash or a compatible shell, so that the above mentioned output redirection works. Under C-shell (/bin/csh) or tcsh it would't work. If you don't know, then simply make sure you use the Bournce shell by typing /bin/sh or bash, then you should have no problems with redirection of stdout and stderr (> /dev/null 2> file.debug). DOCUMENTATION
See official apsfilter homepage http://www.apsfilter.org/handbook.html Apsfilter Handbook including the Frequently Asked Questions (FAQ) USER FORUM
Please send questions to the official apsfilter help channel apsfilter-help@apsfilter.org. The above section BUGS and the file HOWTO-BUGREPORTS tells you how to report bugs. If you want to know how to troubleshoot your apsfilter installation, please read the manpage aps2file(1) and apsfilter-bug(1) as well as the Apsfilter Handbook carefully. HISTORY
The apsfilter-bug manpage has been written by Michael LoBin <phallobst@web.de> and first appeared in apsfilter V 7.1.0. BSD
Dec 26, 2001 BSD
All times are GMT -4. The time now is 11:18 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy