Sponsored Content
Top Forums Shell Programming and Scripting Process multiple large files with awk Post 302961928 by Don Cragun on Sunday 6th of December 2015 07:50:53 PM
Old 12-06-2015
Quote:
Originally Posted by RudiC
You need to replace
Code:
sum += d[key, i]

with
Code:
sum += data[key, i]

to have the sum printed.

---------- Post updated at 13:29 ---------- Previous update was at 11:57 ----------

Not sure if this will perform better (or even worse?) but it leaves the memory consumption to sort which has options to handle that. We need the files' names upfront, thus the ls trick. Try
Code:
{ ls -x file[1-3]; grep '' file[1-3] | sort -k2,2 -k1,1; } | awk  '
NR == 1         {for (FC=n=split ($0, FN); n>0; n--) FILES[FN[n]] = n
                 next
                }
 ... ... ...

Thank you for catchng my typo. (I made the mistake of trying to make the array name more descriptive after I tested the script, and missed the last needed change.)

Note that ls -x file* can produce more than one line of output depending on the length of the list of file names. We know that the input files are selected by $BASEPATH/dataset/*, but we haven't been given any indication of the length's of the names matched by * in that pattern nor of what the real length of the expansion of BASEPATH will be.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

how to divide single large log file into multiple files.

Can you please help me with writing script for following purpose. I have to divide single large web access log file into multiple log files based on dates inside the log file. For example: if data is logged in the access file for jan-10-08 , jan-11-08 , Jan-12-08 then make small log file... (1 Reply)
Discussion started by: kamleshm
1 Replies

2. Shell Programming and Scripting

AWK Shell Program to Split Large Files

Hi, I need some help creating a tidy shell program with awk or other language that will split large length files efficiently. Here is an example dump: <A001_MAIL.DAT> 0001 Ronald McDonald 01 H81 0002 Elmo St. Elmo 02 H82 0003 Cookie Monster 01 H81 0004 Oscar ... (16 Replies)
Discussion started by: mkastin
16 Replies

3. UNIX for Dummies Questions & Answers

multiple smaller files from one large file

I have a file with a simple list of ids. 750,000 rows. I have to break it down into multiple 50,000 row files to submit in a batch process.. Is there an easy script I could write to accomplish this task? (2 Replies)
Discussion started by: rtroscianecki
2 Replies

4. Shell Programming and Scripting

Using AWK to separate data from a large XML file into multiple files

I have a 500 MB XML file from a FileMaker database export, it's formatted horribly (no line breaks at all). The node structure is basically <FMPXMLRESULT> <METADATA> <FIELD att="............." id="..."/> </METADATA> <RESULTSET FOUND="1763457"> <ROW att="....." etc="...."> ... (16 Replies)
Discussion started by: JRy
16 Replies

5. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

6. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies

7. Emergency UNIX and Linux Support

Help to make awk script more efficient for large files

Hello, Error awk: Internal software error in the tostring function on TS1101?05044400?.0085498227?0?.0011041461?.0034752266?.00397045?0?0?0?0?0?0?11/02/10?09/23/10???10?no??0??no?sct_det3_10_20110516_143936.txt What it is It is a unix shell script that contains an awk program as well as... (4 Replies)
Discussion started by: script_op2a
4 Replies

8. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345... (19 Replies)
Discussion started by: jimmy12
19 Replies

9. Shell Programming and Scripting

Split large zone file dump into multiple files

I have a large zone file dump that consists of ; DNS record for the adomain.com domain data1 data2 data3 data4 data5 CRLF CRLF CRLF ; DNS record for the anotherdomain.com domain data1 data2 data3 data4 data5 data6 CRLF (7 Replies)
Discussion started by: Bluemerlin
7 Replies

10. UNIX for Dummies Questions & Answers

Find common numbers from two very large files using awk or the like

I've got two files that each contain a 16-digit number in positions 1-16. The first file has 63,120 entries all sorted numerically. The second file has 142,479 entries, also sorted numerically. I want to read through each file and output the entries that appear in both. So far I've had no... (13 Replies)
Discussion started by: Scottie1954
13 Replies
SYSPROFILE(8)						      System Manager's Manual						     SYSPROFILE(8)

NAME
sysprofile - modular centralized shell configuration DESCRIPTION
sysprofile is a generic approach to configure shell settings in a modular and centralized way mostly aimed at avoiding work for lazy sysad- mins. It has only been tested to work with the bash shell. It basically consists of the small /etc/sysprofile shell script which invokes other small shell scripts having a .bash suffix which are contained in the /etc/sysprofile.d/ directory. The system administrator can drop in any script he wants without any naming convention other than that the scripts need to have a .bash suffix to enable automagic sourcing by /etc/sysprofile. This mechanism is set up by inserting a small shell routine into /etc/profile for login shells and optionally into /etc/bashrc and/or /etc/bash.bashrc for non-login shells from where the actual /etc/sysprofile script is invoked: if [ -f /etc/sysprofile ]; then . /etc/sysprofile fi For using "sysprofile" under X11, one can source it in a similar way from /etc/X11/Xsession or your X display manager's Xsession file to provide the same shell environment as under the console in X11. See the example files in /usr/share/doc/sysprofile/ for illustration. For usage of terminal emulators with a non-login bash shell under X11, take care to enable sysprofile via /etc/bash.bashrc. If not set this way, your terminal emulators won't come up with the environment defined by the scripts in /etc/sysprofile.d/. Users not wanting /etc/sysprofile to be sourced for their environment can easily disable it's automatic mechanism. It can be disabled by simply creating an empty file called $HOME/.nosysprofile in the user's home directory using e.g. the touch(1) command. Any single configuration file in /etc/sysprofile.d/ can be overridden by any user by creating a private $HOME/.sysprofile.d/ directory which may contain a user's own version of any configuration file to be sourced instead of the system default. It's names have just to match exactly the system's default /etc/sysprofile.d/ configuration files. Empty versions of these files contained in the $HOME/.syspro- file.d/ directory automatically disable sourcing of the system wide version. Naturally, users can add and include their own private script inventions to be automagically executed by /etc/sysprofile at login time. OPTIONS
There are no options other than those dictated by shell conventions. Anything is defined within the configuration scripts themselves. SEE ALSO
The README files and configuration examples contained in /etc/sysprofile.d/ and the manual pages bash(1), xdm(1x), xdm.options(5), and wdm(1x). Recommended further reading is everything related with shell programming. If you need a similar mechanism for executing code at logout time check out the related package syslogout(8) which is a very close compan- ion to sysprofile. BUGS
sysprofile in its current form is mainly restricted to bash(1) syntax. In fact it is actually a rather embarrassing quick and dirty hack than anything else - but it works. It serves the practical need to enable a centralized bash configuration until something better becomes available. Your constructive criticism in making this into something better" is very welcome. Before i forget to mention it: we take patches... ;-) AUTHOR
sysprofile was developed by Paul Seelig <pseelig@debian.org> specifically for the Debian GNU/Linux system. Feel free to port it to and use it anywhere else under the conditions of either the GNU public license or the BSD license or both. Better yet, please help to make it into something more worthwhile than it currently is. SYSPROFILE(8)
All times are GMT -4. The time now is 08:03 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy