Splitting a file based on two patterns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Splitting a file based on two patterns
# 1  
Old 05-01-2009
Splitting a file based on two patterns

Hi there,

I've an input file as follows:

*START
1001 a1
1002 a2
1003 a3
1004 a4
*END
*START
1001 b1
1002 b2
1004 b4
*END
*START
1001 c1
1004 c4
*END

I would like to split this file into three files such that the contents of first file contains all the rows between the first set of *START and *END, the contents of the second file contains the rows between the second set of *START and *END and so on.

My output will be three files:

file1
1001 a1
1002 a2
1003 a3
1004 a4

file2
1001 b1
1002 b2
1004 b4

file3
1001 c1
1004 c4


Any assistance is greatly appreciated.
# 2  
Old 05-01-2009
Code:
$
$ cat kbirde.txt
*START
1001 a1
1002 a2
1003 a3
1004 a4
*END
*START
1001 b1
1002 b2
1004 b4
*END
*START
1001 c1
1004 c4
*END
$
$ perl -ne 'BEGIN {$/=""; $i=1;}
> {
>   while (/\*START.(.*?)\*END/gs) {
>     open (F,">file".$i++); print F $1; close(F);
>   }
> }' kbirde.txt
$
$ cat file1
1001 a1
1002 a2
1003 a3
1004 a4
$
$ cat file2
1001 b1
1002 b2
1004 b4
$
$ cat file3
1001 c1
1004 c4
$

Hope that helps,
tyler_durden

______________________________________________
"Only after disaster can we be resurrected."
# 3  
Old 05-01-2009
Hi Tyler,

Thanks a lot. Can this be written in awk (bash shell)?

Regards,
kbirde
# 4  
Old 05-03-2009
Yes, sure,you can definitely use awk to do it as well.

Try this. This should create as many files as there are START and END pair

Code:
awk 'BEGIN{i=1}/\*START/{getline;n=0
while ($0 !~ /\*END/){s=(n==0) ? $0: s"\n"$0;n=i;getline}print s > "file"'"i"';
print "one over";i++
}' filename


cheers,
Devaraj Takhellambam
# 5  
Old 05-04-2009
Hello Devaraj,

Thanks so much for the assistance.

Cheers,
kbirde
# 6  
Old 05-04-2009
Another approach:

Code:
awk '/START/{p=1;i++;next}
/END/{p=0;next}
p{print > "file" i}' file

# 7  
Old 05-05-2009
perl:
Code:
$/="*END\n";
open $fh,"<","a.txt";
while(<$fh>){
	open FH,">file$..txt";
	print FH $_;
	close FH;
}
close $fh;

awk:
Code:
nawk 'BEGIN{n=1}
{
file=sprintf("%s.txt",n)
print $0 >> file
if ($0 ~ /^*END/)
n++
}
' a.txt

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Splitting a file based on a pattern

Hi All, I am having a problem. I tried to extract the chunk of data and tried to fix I am not able to. Any help please Basically I need to remove the for , values after K, this is how it is now A,, B, C,C, D,D, 12/04/10,12/04/10, K,1,1,1,1,0,3.0, K,1,1,1,2,0,4.0,... (2 Replies)
Discussion started by: arunkumar_mca
2 Replies

2. Shell Programming and Scripting

Splitting a file based on line number

Hi I have a file with over a million lines (rows) and I want to split everything from 500,000 to a million into another file (to make the file smaller). Is there a simple command for this? Thank you Phil (4 Replies)
Discussion started by: phil_heath
4 Replies

3. Shell Programming and Scripting

Splitting file based on line numbers

Hello friends, Is there any way to split file from n to n+6 into 1 file and (n+7) to (n+16) into other file etc. f.e I have source pipe delimated file with 20 lines and i need to split 1-6 in file1 and 7-16 in file2 and 17-20 in file 3 I need to split into fixed number of file like 4 files... (2 Replies)
Discussion started by: Rizzu155
2 Replies

4. Shell Programming and Scripting

Splitting file based on column values

Hi all, I have a file (say file.txt) which contains comma-separated rows. Each row has seven columns. Only column 4 or 5 (not both) can have empty values like "" in each line. Sample lines So, now i want all the rows that have column 4 as "" go in file1.txt and all the rows that have column... (8 Replies)
Discussion started by: jakSun8
8 Replies

5. Shell Programming and Scripting

Splitting a file based on context.

I have file as shown below. Would like to split the file based on the context of data. Like, split the content between "---- XXX Info ----" and " ---- YYY Info ----" to a file. When I try using below command, 2nd file contains all the info starting after first "---- YYYY Info ----" instance.... (8 Replies)
Discussion started by: webkid
8 Replies

6. UNIX for Dummies Questions & Answers

Splitting a file based on first 8 chars

I have an input file of this format <Date><other data> For example, 20081213aaaaaaaaa 20081213bbbbbbbbb 20081220ccccccccc 20081220ddddddddd 20081220eeeeeeeee 20081227ffffffffffffff The first 8 chars are date in YYYYMMDD formT. I need to split this file into n files where n is the... (9 Replies)
Discussion started by: paruthiveeran
9 Replies

7. Shell Programming and Scripting

Splitting the file based on logic

Hello I have a requirement where i need to split the Input fixed width file which contains multiple invoices into multiple files with 2 invoices per file. Each invoice can be identified by its first line's second character which is "H" and sixth character is " " space and the invoice would... (10 Replies)
Discussion started by: dsdev_123
10 Replies

8. Shell Programming and Scripting

Splitting file based on number of rows

Hi, I'm, new to shell scripting, I have a requirement where I have to split an incoming file into separate files each containing a maximum of 3 million rows. For e.g: if my incoming file say In.txt has 8 mn rows then I need to create 3 files, in which two will 3 mn rows and one will contain 2... (2 Replies)
Discussion started by: wahi80
2 Replies

9. Shell Programming and Scripting

splitting files based on text in the file

I need to split a file based on certain context inside the file. Is there a unix command that can do this? I have looked into split and csplit but it does not seem like those would work because I need to split this file based on certain text. The file has multiple records and I need to split this... (1 Reply)
Discussion started by: matrix1067
1 Replies

10. Shell Programming and Scripting

Splitting a file based on some condition and naming them

I have a file given below. I want to split the file where ever I came across ***(instead you can put ### symbols in the file) . Also I need to name the file by extracting the report name from the first line which is in bold(eg:RPT507A) concatinated with DD(day on which the file runs). Can someone... (1 Reply)
Discussion started by: srivsn
1 Replies
Login or Register to Ask a Question