Splitting a file based on context.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Splitting a file based on context.
# 1  
Old 12-02-2010
Splitting a file based on context.

I have file as shown below. Would like to split the file based on the context of data.
Like, split the content between "---- XXX Info ----" and "
---- YYY Info ----" to a file.

When I try using below command, 2nd file contains all the info starting after first "---- YYYY Info ----" instance.
Code:
csplit -ks pfm.txt '%XXX Info%' '/^---- YYY Info ----/' {2}

Any suggestions how to split the only reqd. data as mentioned above.
Code:
---- XXX Info ----
Buuuu xxx bbb
Kmmmm rrr ssss uuuu
Kwwww zzzz ccc
Roooowwww eeee
Bxxxx jjjj dddd
---- YYY Info ----
Kuuuu eeeee nnnn
Rpppp cccc vvvv cccc
Rhhhhhhyyyy tttt
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- XXX Info ----
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- YYY Info ----
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
hhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- XXX Info ----
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- YYY Info ----

---------- Post updated at 03:28 PM ---------- Previous update was at 03:26 PM ----------

For clarification:

I need output files like:
file 1:
Code:
---- XXX Info ----
Buuuu xxx bbb
Kmmmm rrr ssss uuuu
Kwwww zzzz ccc
Roooowwww eeee
Bxxxx jjjj dddd

file 2:
Code:
---- XXX Info ----
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee

file 3:
Code:
---- XXX Info ----
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee


Last edited by Scott; 12-02-2010 at 07:10 PM.. Reason: Use code tags...
# 2  
Old 12-02-2010
Code:
awk '/^----/{f="file"(++c)".txt"}{print $0 > f}' input

Code:
$ cat in
---- XXX Info ----
Buuuu xxx bbb
Kmmmm rrr ssss uuuu
Kwwww zzzz ccc
Roooowwww eeee
Bxxxx jjjj dddd
---- YYY Info ----
Kuuuu eeeee nnnn
Rpppp cccc vvvv cccc
Rhhhhhhyyyy tttt
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- XXX Info ----
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- YYY Info ----
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
hhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- XXX Info ----
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
$ awk '/^----/{f="file"(++c)".txt"}{print $0 > f}' in
$ ls *.txt
file1.txt       file2.txt       file3.txt       file4.txt       file5.txt
$ cat file4.txt
---- YYY Info ----
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
hhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
$

This User Gave Thanks to ctsgnb For This Post:
# 3  
Old 12-02-2010
Thanks for the reply. This seems to be working great on small files. However, I am seeing following problem with a big file.
Code:
# awk '/^----/ {print $2}' testy
Port
RG
LU
# awk '/^----/ {f="file"$2".txt"}{print $0 > f}' testy
awk: can't open file
 record number 1

Any idea whats going on?

---------- Post updated at 05:51 PM ---------- Previous update was at 05:08 PM ----------

Actually there were couple of lines a head of the file before it starts with ---- (as shown below). This was causing the problem.
As work around, I removed those lines using csplit prior to run the code you suggested. Is there any better solution for this.
Code:
Kwwww zzzz ccc
Buuuu xxx bbb
---- XXX Info ----
Buuuu xxx bbb
Kmmmm rrr ssss uuuu
Kwwww zzzz ccc
Roooowwww eeee
Bxxxx jjjj dddd
---- YYY Info ----
Kuuuu eeeee nnnn
Rpppp cccc vvvv cccc
Rhhhhhhyyyy tttt
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- XXX Info ----
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- YYY Info ----
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
hhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee
---- XXX Info ----
Kyyyyy iiiii wwww
Rwwww rrrr sssss eeee
Rnnnnn xxxxxxccccc
Lhhhh rrrrrssssss
Bffff mmmm iiiii
Ktttt eeeeeee


Last edited by Scott; 12-02-2010 at 07:11 PM.. Reason: Code tags
# 4  
Old 12-02-2010
You can ignore the first lines until the first ^---- appear by this very little modification of the code :
Code:
awk '/^----/{f="file"(++c)".txt"}c{print$0>f}' input

# 5  
Old 12-02-2010
I am getting following error.
Code:
# awk '/^----/{f="file"(++c)".txt"}c{print$0>f}' /tmp/tt
awk: syntax error near line 1
awk: bailing out near line 1

Moderator's Comments:
Mod Comment Please use code tags
# 6  
Old 12-02-2010
If you are using Solaris, use nawk or /usr/xpg4/bin/awk
# 7  
Old 12-03-2010
nawk works. However, what should I use If I have to use $2 instead of ++c.

Code:
awk '/^----/ {f="file"$2".txt"}?{print $0>f}' /tmp/tt

instead of
Code:
awk '/^----/{f="file"(++c)".txt"}c{print$0>f}' /tmp/tt

Thanks.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Splitting a file based on a pattern

Hi All, I am having a problem. I tried to extract the chunk of data and tried to fix I am not able to. Any help please Basically I need to remove the for , values after K, this is how it is now A,, B, C,C, D,D, 12/04/10,12/04/10, K,1,1,1,1,0,3.0, K,1,1,1,2,0,4.0,... (2 Replies)
Discussion started by: arunkumar_mca
2 Replies

2. Shell Programming and Scripting

Splitting file based on pattern and first character

I have a file as below pema.txt s2dhshfu dshfkdjh dshfd rjhfjhflhflhvflxhvlxhvx vlvhx sfjhldhfdjhldjhjhjdhjhjxhjhxjxh sjfdhdhfldhlghldhflhflhfhldfhlsh rjsdjh#error occured# skjfhhfdkhfkdhbvfkdhvkjhfvkhf sjkdfhdjfh#error occured# my requirement is to create 3 files frm the... (8 Replies)
Discussion started by: pema.yozer
8 Replies

3. Shell Programming and Scripting

Splitting file based on line numbers

Hello friends, Is there any way to split file from n to n+6 into 1 file and (n+7) to (n+16) into other file etc. f.e I have source pipe delimated file with 20 lines and i need to split 1-6 in file1 and 7-16 in file2 and 17-20 in file 3 I need to split into fixed number of file like 4 files... (2 Replies)
Discussion started by: Rizzu155
2 Replies

4. Shell Programming and Scripting

Splitting file based on column values

Hi all, I have a file (say file.txt) which contains comma-separated rows. Each row has seven columns. Only column 4 or 5 (not both) can have empty values like "" in each line. Sample lines So, now i want all the rows that have column 4 as "" go in file1.txt and all the rows that have column... (8 Replies)
Discussion started by: jakSun8
8 Replies

5. UNIX for Dummies Questions & Answers

Splitting a file based on first 8 chars

I have an input file of this format <Date><other data> For example, 20081213aaaaaaaaa 20081213bbbbbbbbb 20081220ccccccccc 20081220ddddddddd 20081220eeeeeeeee 20081227ffffffffffffff The first 8 chars are date in YYYYMMDD formT. I need to split this file into n files where n is the... (9 Replies)
Discussion started by: paruthiveeran
9 Replies

6. Shell Programming and Scripting

Splitting a file based on two patterns

Hi there, I've an input file as follows: *START 1001 a1 1002 a2 1003 a3 1004 a4 *END *START 1001 b1 1002 b2 1004 b4 *END *START 1001 c1 1004 c4 *END (6 Replies)
Discussion started by: kbirde
6 Replies

7. Shell Programming and Scripting

Splitting the file based on logic

Hello I have a requirement where i need to split the Input fixed width file which contains multiple invoices into multiple files with 2 invoices per file. Each invoice can be identified by its first line's second character which is "H" and sixth character is " " space and the invoice would... (10 Replies)
Discussion started by: dsdev_123
10 Replies

8. Shell Programming and Scripting

Splitting file based on number of rows

Hi, I'm, new to shell scripting, I have a requirement where I have to split an incoming file into separate files each containing a maximum of 3 million rows. For e.g: if my incoming file say In.txt has 8 mn rows then I need to create 3 files, in which two will 3 mn rows and one will contain 2... (2 Replies)
Discussion started by: wahi80
2 Replies

9. Shell Programming and Scripting

splitting files based on text in the file

I need to split a file based on certain context inside the file. Is there a unix command that can do this? I have looked into split and csplit but it does not seem like those would work because I need to split this file based on certain text. The file has multiple records and I need to split this... (1 Reply)
Discussion started by: matrix1067
1 Replies

10. Shell Programming and Scripting

Splitting a file based on some condition and naming them

I have a file given below. I want to split the file where ever I came across ***(instead you can put ### symbols in the file) . Also I need to name the file by extracting the report name from the first line which is in bold(eg:RPT507A) concatinated with DD(day on which the file runs). Can someone... (1 Reply)
Discussion started by: srivsn
1 Replies
Login or Register to Ask a Question