Problem with splitting large file based on pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Problem with splitting large file based on pattern
# 1  
Old 01-29-2012
Problem with splitting large file based on pattern

Hi Experts,

I have to split huge file based on the pattern to create smaller files. The pattern which is expected in the file is:
Code:
Master.....
First...
second....
second...
third..
third...
Master...
First..
second...
third...
Master...
First...
second..
second..
second..
third...
third..
so on...

In the above file total..i have three blocks starting with Master & ending with third which is considered as one block.

If i split this into two files:

My first file should have:
Code:
Master.....
First...
second....
second...
third..
third...
Master...
First..
second...
third...

And my second file should have:
Code:
Master...
First...
second..
second..
second..
third...
third..

I have written the below code which is providing me one block in each file but i want to write mulitiple blocks in each file.
Code:
awk '/^1/{f=1} f{ print $0 > "file_"n ; c++} c>10000 && /^3/ { n++; c=1; close("file_"n) }' c=1 n=1 testfile

Can you please provide any suggestion to me on how to achieve this?

Thanks,
SS


Moderator's Comments:
Mod Comment How to use code tags

Last edited by Franklin52; 01-30-2012 at 04:33 AM.. Reason: Please use code tags for code and data samples, thank you
# 2  
Old 01-30-2012
Try this...
Code:
awk 'BEGIN {blocks=b=5;i=1} /Master/{p=1} p{ print > i".log" } /third/{--p;if(!--b){++i;b=blocks}} ' infile

Assign the required blocks in one file in the BEGIN block.

--ahamed
# 3  
Old 01-30-2012
Hi Ahamed,

I have modifed the command as shown below:
Code:
awk 'BEGIN {blocks=b=5;i=1} /Master/{p=1;p++} p{ print > i".log" } /third/{--p;if(!--b){++i;b=blocks}} ' infile

but this command is dropping the second occurance of third record set in the block.

Can you please look into this?

---------- Post updated at 01:50 AM ---------- Previous update was at 12:59 AM ----------

Hi,

I solved this by using below command. Thanks a lot for looking into this.
Code:
awk '/Master/{n++}{print >> test int((n+100)/100)}' filename

Thanks
SS

Moderator's Comments:
Mod Comment How to use code tags

Last edited by Franklin52; 01-30-2012 at 04:33 AM.. Reason: Please use code tags for code and data samples, thank you
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Splitting a file based on a pattern

Hi All, I am having a problem. I tried to extract the chunk of data and tried to fix I am not able to. Any help please Basically I need to remove the for , values after K, this is how it is now A,, B, C,C, D,D, 12/04/10,12/04/10, K,1,1,1,1,0,3.0, K,1,1,1,2,0,4.0,... (2 Replies)
Discussion started by: arunkumar_mca
2 Replies

2. Shell Programming and Scripting

Help with Splitting a Large XML file based on size AND tags

Hi All, This is my first post here. Hoping to share and gain knowledge from this great forum !!!! I've scanned this forum before posting my problem here, but I'm afraid I couldn't find any thread that addresses this exact problem. I'm trying to split a large XML file (with multiple tag... (7 Replies)
Discussion started by: Aviktheory11
7 Replies

3. Shell Programming and Scripting

Splitting textfile based on pattern and name new file after pattern

Hi there, I am pretty new to those things, so I couldn't figure out how to solve this, and if it is actually that easy. just found that awk could help:(. so i have a textfile with strings and numbers (originally copy pasted from word, therefore some empty cells) in the following structure: SC... (9 Replies)
Discussion started by: luja
9 Replies

4. Shell Programming and Scripting

Split Large Files Based On Row Pattern..

Hi all. I've tried searching the web but could not find similar problem to mine. I have one large file to be splitted into several files based on the matching pattern found in each row. For example, let's say the file content: ... (13 Replies)
Discussion started by: aimy
13 Replies

5. Shell Programming and Scripting

Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file - I have a single text file in the form of : <NAME>house........ SOMETEXT SOMETEXT SOMETEXT . . . . </script> MORETEXT MORETEXT . . . (6 Replies)
Discussion started by: sumguy
6 Replies

6. Shell Programming and Scripting

Help needed - Split large file into smaller files based on pattern match

Help needed urgently please. I have a large file - a few hundred thousand lines. Sample CP START ACCOUNT 1234556 name 1 CP END ACCOUNT CP START ACCOUNT 2224444 name 1 CP END ACCOUNT CP START ACCOUNT 333344444 name 1 CP END ACCOUNT I need to split this file each time "CP START... (7 Replies)
Discussion started by: frustrated1
7 Replies

7. Shell Programming and Scripting

Splitting file based on pattern and first character

I have a file as below pema.txt s2dhshfu dshfkdjh dshfd rjhfjhflhflhvflxhvlxhvx vlvhx sfjhldhfdjhldjhjhjdhjhjxhjhxjxh sjfdhdhfldhlghldhflhflhfhldfhlsh rjsdjh#error occured# skjfhhfdkhfkdhbvfkdhvkjhfvkhf sjkdfhdjfh#error occured# my requirement is to create 3 files frm the... (8 Replies)
Discussion started by: pema.yozer
8 Replies

8. Shell Programming and Scripting

Splitting large file and renaming based on field

I am trying to update an older program on a small cluster. It uses individual files to send jobs to each node. However the newer database comes as one large file, containing over 10,000 records. I therefore need to split this file. It looks like this: HMMER3/b NAME 1-cysPrx_C ACC ... (2 Replies)
Discussion started by: fozrun
2 Replies

9. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345... (19 Replies)
Discussion started by: jimmy12
19 Replies

10. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies
Login or Register to Ask a Question