Can I split a 10GB file into 1 GB sizes using my repeating data pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Can I split a 10GB file into 1 GB sizes using my repeating data pattern
# 1  
Old 07-09-2009
Can I split a 10GB file into 1 GB sizes using my repeating data pattern

I'm not a unix guy so excuses my ignorance... I'm the database ETL guy.

I'm trying to be proactive and devise a plan B for a ETL process where I expect a file 10X larger than what I process daily for a recast job. The ETL may handle it but I just don't know.

This file may need to be split and we don't want to lose related data. I assume it would be easier to do it at the unix level rather than the etl tool providing there are no limitations to file sizes with the unix commands.

The file will most likely be 10GB +- a few GB. It is unknown at this time

The basic file format is as follows with the first 3 characters being the record type (100,401,404,410,411)

The file must be split into segments equal to a daily run approximately 1gb in size and it has to occur just before a 100 record as all the rows that follow a 100 belong together.

1001104vvbvnbvd
4011104ghghghgh
404111kjdkfjkdf
404111kjdkfjkdf
404111kjdkfjkdf
404111kjdkfjkdf
4103445kkjkljlk
4103445kkjkljlk
4113445kkjkljlk
4043445kkjkljlk
10011ffgfgg1250
4011104fffhghgh
404111kjddfjkdf
404111kjdkrtrdf
etc...

thanks in advance. I think we use HP-UX
# 2  
Old 07-10-2009
When posting code, data or logs use CODE-tags for better readability and to keep formatting(indention) etc., ty.

Code:
$> awk '/^100/ {z++; print $0 >> "file_"z; next} {print >> "file_"z}' z=0 infile
$> cat file_1
1001104vvbvnbvd
4011104ghghghgh
404111kjdkfjkdf
404111kjdkfjkdf
404111kjdkfjkdf
404111kjdkfjkdf
4103445kkjkljlk
4103445kkjkljlk
4113445kkjkljlk
4043445kkjkljlk
$> cat file_2
10011ffgfgg1250
4011104fffhghgh
404111kjddfjkdf
404111kjdkrtrdf

Generally for splitting files just by size you can use the command "split" if it is available on your OS.
# 3  
Old 07-10-2009
Thanks for you reply.

That would work if I wanted a million tiny files one for each record segment.

I would like to take the first million rows and cut it just like your script did and build 10 files from 1 giant file.

I could easily split a file into equal portions. bu trhe split cannot occur in a middle of a transaction.

Could I spool 1 milion rows then split.... Spool the next million... split... etc..etc..
# 4  
Old 07-10-2009
Code:
nawk '
   !FNR%chunk {limit=1}
   /^100/ {cut=1}
   FNR==1 || (limit && cut) {close(out);out=FILENAME "_" ++cnt;limit=cut=0}
   { print >> out }' chunk=100000 myHugeFile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Split one file to many based on pattern

Hello All, I have records in a file in a pattern A,B,B,B,B,K,A,B,B,K Is there any command or simple logic I can pull out records into multiple files based on A record? I want output as File1: A,B,B,B,B,K File2: A,B,B,K (9 Replies)
Discussion started by: deal1dealer
9 Replies

2. Shell Programming and Scripting

How to grab a block of data in a file with repeating pattern?

I need to send email to receipient in each block of data in a file which has the sender address under TO and just send that block of data where it ends as COMPANY. I tried to work this out by getting line numbers of the string HELLO but unable to grab the next block of data to send the next... (5 Replies)
Discussion started by: loggedout
5 Replies

3. Shell Programming and Scripting

Split the file based on pattern

Hi , I have huge files around 400 mb, which has clob data and have diffeent scenarios: I am trying to pass scenario number as parameter and and get required modified file based on the scenario number and criteria. Scenario 1: file name : scenario_1.txt ... (2 Replies)
Discussion started by: sol_nov
2 Replies

4. Solaris

How to split 10GB file into small Sizes

Hi Team I have one 10 Gb log file I want to split it into say 10 of 1-1Gb file pls share ur experiences how to do this? Thanks in advance, (3 Replies)
Discussion started by: zimmyyash
3 Replies

5. Shell Programming and Scripting

Sed Replace repeating pattern

Hi, I have an sqlplus output file using the character ';' as a delimiter and I would like to replace the fields without datas (i.e delimited by ';;') by ';0;' Example: my sqlplus output: 11;22;33;44;;;77;; What I would like to have: 11;22;33;44;0;0;77;0; Thanks in advance for your... (2 Replies)
Discussion started by: popesk
2 Replies

6. UNIX for Dummies Questions & Answers

Extract repeating data from file

I want to extract the last rows of a data file, similar to that one below: C1 xxx C2 rrr C3 ttt .... Cn-1 hhh Cn bbb C1 yyy C2 sss C3 uuu ... Cn-1 iii Cn ccc ... I just want to extract the final rows between C1 and Cn at each data file. n is not a constant,... (2 Replies)
Discussion started by: natasha
2 Replies

7. Shell Programming and Scripting

Split binary file with pattern

Hello! Have some problem with extract files from saved session. File contains any kind of special/printable characters. DATA NumberA DATA DATA Begin DATA1.1 DATA1.2 NumberB1 DATA1.3 DATA1.4 End DATA DATA DATA Begin DATA2.1 DATA2.2 NumberB2 DATA2.3 DATA2.4 End DATA DATA ... (4 Replies)
Discussion started by: vvild
4 Replies

8. Shell Programming and Scripting

Remove repeating pattern from beginning of file names.

I want a shell script that will traverse a file system starting at specific path. And look at all file names for repeating sequences of and remove them from the file name. The portion of the name that gets removed has to be a repeating sequence of the same characters. So the script would... (3 Replies)
Discussion started by: z399y
3 Replies

9. Shell Programming and Scripting

Split a file based on a pattern

Dear all, I have a large file which is composed of 8000 frames, what i would like to do is split the file into 8000 single files names file.pdb.1, file.pdb.2 etc etc each frame in the large file is seperated by a "ENDMDL" flag so my thinking is to use this flag a a point to split the files... (4 Replies)
Discussion started by: Mish_99
4 Replies

10. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each... (10 Replies)
Discussion started by: madhunk
10 Replies
Login or Register to Ask a Question