I need some help creating a tidy shell program with awk or other language that will split large length files efficiently.
Code:
Here is an example dump:
<A001_MAIL.DAT>
0001 Ronald McDonald 01 H81
0002 Elmo St. Elmo 02 H82
0003 Cookie Monster 01 H81
0004 Oscar Grouche 03 H83
0005 Dumb Name 02 H82
0006 Butter Face 04 H84
0007 Ford F150 01 H81
0008 Last One 03 H83
<A001_MAIL_H81.dat>
0001 Ronald McDonald 01 H81
0003 Cookie Monster 01 H81
0007 Ford F150 01 H81
<A001_MAIL_H82.dat>
0002 Elmo St. Elmo 02 H82
0005 Dumb Name 02 H82
<A001_MAIL_H83.dat>
0004 Oscar Grouche 03 H83
0008 Last One 03 H83
<A001_MAIL_H84.dat>
0006 Butter Face 04 H84
This is a very small sample, normally files are 500bytes per line and between a hundred thousand and a hundred million lines.
I'm looking for something that in a simple single line command will pass the file once and create files similar to how I've shown above. I'm very new to awk but I created something that almost accomplished my goal.
Code:
awk '!/^$/{
key=substr($0,28,3)
print $0 > key".dat"
}' A001_MAIL.DAT
This takes the file and does essentially the following
<H81.dat>
0001 Ronald McDonald 01 H81
0003 Cookie Monster 01 H81
0007 Ford F150 01 H81
<H82.dat>
0002 Elmo St. Elmo 02 H82
0005 Dumb Name 02 H82
<H83.dat>
0004 Oscar Grouche 03 H83
0008 Last One 03 H83
<H84.dat>
0006 Butter Face 04 H84
What I need help with is getting the naming convention corrected and turning this into something I can have other execute in a single like that is a simple as possible I was thinking some such as.... $ awksplit filename
Help me recycle this or point me in a new direction.
Okay, it's totally my fault for creating a sample that isn't truly accurate to my needs because it does appear to work in this case but it doesn't fit my actual file. Here is a more accurate sample
Code:
0001 Ronald McDonald 01 H81 0001256 X
0002 Elmo St. Elmo 02 H82 0089621 X
0003 Cookie Monster 01 H81 0887141 X
0004 Oscar Grouche 03 H83 0364471 X
0005 Dumb Name 02 H82 0000233 X
0006 Butter Face 04 H84 0014666 X
0007 Ford F150 01 H81 0000001 X
0008 Last One 03 H83 7741668 X
I Have a large file with 24hrs log in the below format.i need to split the large file in to 24 small files on one hour based.i.e ex:from 09:55 to 10:55,10:55-11:55
can any one help me on this.!
... (20 Replies)
Split large xml into mutiple files and with header and footer in file
tried below
it splits unevenly and also i need help in adding header and footer
command :
csplit -s -k -f my_XML_split.xml extrfile.xml "/<Document>/" {1}
sample xml
<?xml version="1.0" encoding="UTF-8"?><Recipient>... (36 Replies)
Dears,
Need you help with the below file manipulation. I want to split the file into 8 smaller files but without cutting/disturbing the entries (meaning every small file should start with a entry and end with an empty line). It will be helpful if you can provide a one liner command for this... (12 Replies)
Dear Users,
Appreciate your help if you could help me with splitting a large file > 1 million lines with sed or awk. below is the text in the file
input file.txt
scaffold1 928 929 C/T +
scaffold1 942 943 G/C +
scaffold1 959 960 C/T +... (6 Replies)
Hi there, I'm camor and I'm trying to process huge files with bash scripting and awk.
I've got a dataset folder with 10 files (16 millions of row each one - 600MB), and I've got a sorted file with all keys inside.
For example:
a sample_1 200
a.b sample_2 10
a sample_3 10
a sample_1 10
a... (4 Replies)
Hi all.
I've tried searching the web but could not find similar problem to mine.
I have one large file to be splitted into several files based on the matching pattern found in each row.
For example, let's say the file content:
... (13 Replies)
Help needed urgently please.
I have a large file - a few hundred thousand lines.
Sample
CP START ACCOUNT
1234556
name 1
CP END ACCOUNT
CP START ACCOUNT
2224444
name 1
CP END ACCOUNT
CP START ACCOUNT
333344444
name 1
CP END ACCOUNT
I need to split this file each time "CP START... (7 Replies)
I have a large zone file dump that consists of
; DNS record for the adomain.com domain
data1
data2
data3
data4
data5
CRLF
CRLF
CRLF
; DNS record for the anotherdomain.com domain
data1
data2
data3
data4
data5
data6
CRLF (7 Replies)
Hi,
I need help to split lines from a file into multiple files.
my input look like this:
13
23 45 45 6 7
33 44 55 66 7
13
34 5 6 7 87
45 7 8 8 9
13
44 55 66 77 8
44 66 88 99 6
I want to split every 3 lines from this file to be written to individual files. (3 Replies)
I have one large file, after every 200 line i have to split the file and the add header and footer to each small file?
It is possible to add different header and footer to each file? (7 Replies)