AWK Shell Program to Split Large Files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting AWK Shell Program to Split Large Files
# 1  
Old 06-29-2009
AWK Shell Program to Split Large Files

Hi,

I need some help creating a tidy shell program with awk or other language that will split large length files efficiently.

Code:
Here is an example dump:

<A001_MAIL.DAT>
0001  Ronald   McDonald  01 H81
0002  Elmo     St. Elmo  02 H82
0003  Cookie   Monster   01 H81
0004  Oscar    Grouche   03 H83
0005  Dumb     Name      02 H82
0006  Butter   Face      04 H84
0007  Ford     F150      01 H81
0008  Last     One       03 H83

<A001_MAIL_H81.dat>
0001  Ronald   McDonald  01 H81
0003  Cookie   Monster   01 H81
0007  Ford     F150      01 H81

<A001_MAIL_H82.dat>
0002  Elmo     St. Elmo  02 H82
0005  Dumb     Name      02 H82

<A001_MAIL_H83.dat>
0004  Oscar    Grouche   03 H83
0008  Last     One       03 H83

<A001_MAIL_H84.dat>
0006  Butter   Face      04 H84

This is a very small sample, normally files are 500bytes per line and between a hundred thousand and a hundred million lines.

I'm looking for something that in a simple single line command will pass the file once and create files similar to how I've shown above. I'm very new to awk but I created something that almost accomplished my goal.

Code:
awk '!/^$/{
key=substr($0,28,3) 
print $0 > key".dat"
}' A001_MAIL.DAT

This takes the file and does essentially the following

<H81.dat>
0001  Ronald   McDonald  01 H81
0003  Cookie   Monster   01 H81
0007  Ford     F150      01 H81

<H82.dat>
0002  Elmo     St. Elmo  02 H82
0005  Dumb     Name      02 H82

<H83.dat>
0004  Oscar    Grouche   03 H83
0008  Last     One       03 H83

<H84.dat>
0006  Butter   Face      04 H84

What I need help with is getting the naming convention corrected and turning this into something I can have other execute in a single like that is a simple as possible I was thinking some such as.... $ awksplit filename

Help me recycle this or point me in a new direction.

Thanks for all the help everyone!
Matthew
# 2  
Old 06-29-2009
Code:
nawk '{if(out) close(out);out=$NF ".dat"; print >> out}' myFile

# 3  
Old 06-29-2009
Thanks for the quick reply vgersh99, but I don't have nawk and cannot have it installed.
# 4  
Old 06-29-2009
Quote:
Originally Posted by mkastin
Thanks for the quick reply vgersh99, but I don't have nawk and cannot have it installed.
why don't you try 'awk' instead.
# 5  
Old 06-29-2009
I guess I should've mentioned that I did try that, it didn't work, it created a copy of the orginal file (testmail.dat) as X?.dat.
# 6  
Old 06-29-2009
Quote:
Originally Posted by mkastin
I guess I should've mentioned that I did try that, it didn't work, it created a copy of the orginal file (testmail.dat) as X?.dat.
given mka.txt:
Code:
0001  Ronald   McDonald  01 H81
0002  Elmo     St. Elmo  02 H82
0003  Cookie   Monster   01 H81
0004  Oscar    Grouche   03 H83
0005  Dumb     Name      02 H82
0006  Butter   Face      04 H84
0007  Ford     F150      01 H81
0008  Last     One       03 H83

code:
Code:
nawk '{if(out) close(out);out=$NF ".dat"; print >> out}' mka.txt

produces 4 files: H81.dat, H82.dat, H83.dat and H84.dat.
E.g. H81.dat:
Code:
0001  Ronald   McDonald  01 H81
0003  Cookie   Monster   01 H81
0007  Ford     F150      01 H81

# 7  
Old 06-29-2009
Okay, it's totally my fault for creating a sample that isn't truly accurate to my needs because it does appear to work in this case but it doesn't fit my actual file. Here is a more accurate sample

Code:
0001  Ronald   McDonald  01 H81 0001256 X
0002  Elmo     St. Elmo  02 H82 0089621 X
0003  Cookie   Monster   01 H81 0887141 X
0004  Oscar    Grouche   03 H83 0364471 X
0005  Dumb     Name      02 H82 0000233 X
0006  Butter   Face      04 H84 0014666 X
0007  Ford     F150      01 H81 0000001 X
0008  Last     One       03 H83 7741668 X

Sorry for the confusion.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Split large file into 24 small files on one hour basis

I Have a large file with 24hrs log in the below format.i need to split the large file in to 24 small files on one hour based.i.e ex:from 09:55 to 10:55,10:55-11:55 can any one help me on this.! ... (20 Replies)
Discussion started by: Raghuram717
20 Replies

2. Shell Programming and Scripting

Split large xml into mutiple files and with header and footer in file

Split large xml into mutiple files and with header and footer in file tried below it splits unevenly and also i need help in adding header and footer command : csplit -s -k -f my_XML_split.xml extrfile.xml "/<Document>/" {1} sample xml <?xml version="1.0" encoding="UTF-8"?><Recipient>... (36 Replies)
Discussion started by: karthik
36 Replies

3. UNIX for Beginners Questions & Answers

Split large file into smaller files without disturbing the entry chunks

Dears, Need you help with the below file manipulation. I want to split the file into 8 smaller files but without cutting/disturbing the entries (meaning every small file should start with a entry and end with an empty line). It will be helpful if you can provide a one liner command for this... (12 Replies)
Discussion started by: Kamesh G
12 Replies

4. UNIX for Beginners Questions & Answers

sed awk: split a large file to unique file names

Dear Users, Appreciate your help if you could help me with splitting a large file > 1 million lines with sed or awk. below is the text in the file input file.txt scaffold1 928 929 C/T + scaffold1 942 943 G/C + scaffold1 959 960 C/T +... (6 Replies)
Discussion started by: kapr0001
6 Replies

5. Shell Programming and Scripting

Process multiple large files with awk

Hi there, I'm camor and I'm trying to process huge files with bash scripting and awk. I've got a dataset folder with 10 files (16 millions of row each one - 600MB), and I've got a sorted file with all keys inside. For example: a sample_1 200 a.b sample_2 10 a sample_3 10 a sample_1 10 a... (4 Replies)
Discussion started by: camor
4 Replies

6. Shell Programming and Scripting

Split Large Files Based On Row Pattern..

Hi all. I've tried searching the web but could not find similar problem to mine. I have one large file to be splitted into several files based on the matching pattern found in each row. For example, let's say the file content: ... (13 Replies)
Discussion started by: aimy
13 Replies

7. Shell Programming and Scripting

Help needed - Split large file into smaller files based on pattern match

Help needed urgently please. I have a large file - a few hundred thousand lines. Sample CP START ACCOUNT 1234556 name 1 CP END ACCOUNT CP START ACCOUNT 2224444 name 1 CP END ACCOUNT CP START ACCOUNT 333344444 name 1 CP END ACCOUNT I need to split this file each time "CP START... (7 Replies)
Discussion started by: frustrated1
7 Replies

8. Shell Programming and Scripting

Split large zone file dump into multiple files

I have a large zone file dump that consists of ; DNS record for the adomain.com domain data1 data2 data3 data4 data5 CRLF CRLF CRLF ; DNS record for the anotherdomain.com domain data1 data2 data3 data4 data5 data6 CRLF (7 Replies)
Discussion started by: Bluemerlin
7 Replies

9. Shell Programming and Scripting

Split line to multiple files Awk/Sed/Shell Script help

Hi, I need help to split lines from a file into multiple files. my input look like this: 13 23 45 45 6 7 33 44 55 66 7 13 34 5 6 7 87 45 7 8 8 9 13 44 55 66 77 8 44 66 88 99 6 I want to split every 3 lines from this file to be written to individual files. (3 Replies)
Discussion started by: saint2006
3 Replies

10. Shell Programming and Scripting

Split large file and add header and footer to each small files

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file? (7 Replies)
Discussion started by: ashish4422
7 Replies
Login or Register to Ask a Question