Splitting textfile based on pattern and name new file after pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Splitting textfile based on pattern and name new file after pattern
# 1  
Old 03-29-2014
Splitting textfile based on pattern and name new file after pattern

Hi there, I am pretty new to those things, so I couldn't figure out how to solve this, and if it is actually that easy. just found that awk could helpSmilie.

so i have a textfile with strings and numbers (originally copy pasted from word, therefore some empty cells) in the following structure:

Code:
SC 3 5 6
<empty> 5 6 7 
<empty> 7 6 2
<empty><empty><empty>
SP 7 2 3
<empty> 9 6 7 
<empty> 7 5 2
<empty><empty><empty>
...

There are 6 such strings in total that should each define the first line of a new textfile, and the textfile should have that name.
so for example for SC, I would like a textfile SC.txt, with content:
Code:
3 5 6
5 6 7
7 6 2

the 6 textfiles shouldn't contain weird empty spaces anymore, since another program will read them and i want to make sure this won't cause trouble.
(ah, the number of lines is always 6, except for the last string, if that helps).

I am ideally looking for an easy solution in bash or python which i can understand. my files are not big...

thank you for any suggestions

Last edited by Scrutinizer; 03-29-2014 at 04:58 PM.. Reason: code tags
# 2  
Old 03-29-2014
Here is an awk solution:

Code:
awk '{if ($1 !="<empty>") fn=$1; print $2,$3,$4 > fn".txt"}' file.txt

Code:
cat SC.txt                                                           
3 5 6
5 6 7
7 6 2

Code:
cat SP.txt                                                           
7 2 3
9 6 7
7 5 2

# 3  
Old 03-29-2014
mjf's solution is close, but it doesn't deal with those <empty><empty><empty> lines. Please tell us
- how fields are separated, esp. that <empty><empty><empty> ones
- what does <empty> mean
You might want to post a (short) binary listing of a sample file (use od or hexdump).
# 4  
Old 03-29-2014
My awk solution writes blank lines to the file <empty><empty><empty>.txt. Once luja defines how fields are separated then we can make the proper adjustments.
# 5  
Old 03-29-2014
sorry, this was confusing. the <empty> means there is a space. so the solution does not work. there is no empty in the file. the file looks like this:

Code:
SC    32.245    24.153    1
    213.179    24.154    1
    368.069    24.154    1
    464.704    24.153    1
    742.256    24.154    1
    871.102    24.154    1
            
SP    122.712    24.153    1
    245.391    24.153    1
    309.814    24.154    1
    587.383    24.153    1
    710.061    24.154    1
    838.891    24.153    1

uh, hexdump, ok i did that (in the terminal just typed that followed by the file name, yes?!) it gave me something really not similar to the file but ok, if that helps:
Code:
0000000 53 43 09 30 2e 30 33 33 09 32 34 2e 31 35 33 09
0000010 31 0d 0a 09 32 37 33 2e 36 32 31 09 32 34 2e 31
0000020 35 33 09 31 0d 0a 09 34 30 32 2e 34 36 36 09 32
0000030 34 2e 31 35 33 09 31 0d 0a 09 34 39 30 2e 39 32
0000040 36 09 32 34 2e 31 35 34 09 31 0d 0a 09 36 36 37
0000050 2e 38 38 31 09 32 34 2e 31 35 33 09 31 0d 0a 09
0000060 38 32 30 2e 37 38 30 09 32 34 2e 31 35 33 09 31
0000070 0d 0a 09 09 09 0d 0a 53 50 09 31 32 30 2e 37 32
0000080 32 09 32 34 2e 31 35 33 09 31 0d 0a 09 32 34 31

....

---------- Post updated at 06:31 PM ---------- Previous update was at 06:21 PM ----------

i just realised that actually the empty fields do not matter. when i used awk '{print $1}' i got the first column whether there was an empty space at the beginning of the line or not. so i basically just got the first entry in each line. so i guess that makes the solution easier. it should just write the numbers, so colums 2-4 if there is the string in front and then continue with colums 1-3 until the next string. does that make sense?

Last edited by Don Cragun; 03-29-2014 at 08:52 PM.. Reason: Add CODE tags.
# 6  
Old 03-29-2014
luja, try the below as this seems to fit your request:

Code:
awk 'NF>0 {if (NF==4) {fn=$1; print $2,$3,$4 > fn".txt"} else {print $0 >> fn".txt"}}' file.txt

# 7  
Old 03-29-2014
You might need to add a close() when done with each file, to avoid reaching the open file descriptor limit.

Regards,
Alister
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Splitting a file based on a pattern

Hi All, I am having a problem. I tried to extract the chunk of data and tried to fix I am not able to. Any help please Basically I need to remove the for , values after K, this is how it is now A,, B, C,C, D,D, 12/04/10,12/04/10, K,1,1,1,1,0,3.0, K,1,1,1,2,0,4.0,... (2 Replies)
Discussion started by: arunkumar_mca
2 Replies

2. Shell Programming and Scripting

sed -- Find pattern -- print remainder -- plus lines up to pattern -- Minus pattern

The intended result should be : PDF converters 'empty line' gpdftext and pdftotext?xml version="1.0"?> xml:space="preserve"><note-content version="0.1" xmlns:/tomboy/link" xmlns:size="http://beatniksoftware.com/tomboy/size">PDF converters gpdftext and pdftotext</note-content>... (9 Replies)
Discussion started by: Klasform
9 Replies

3. UNIX for Advanced & Expert Users

Split one file to many based on pattern

Hello All, I have records in a file in a pattern A,B,B,B,B,K,A,B,B,K Is there any command or simple logic I can pull out records into multiple files based on A record? I want output as File1: A,B,B,B,B,K File2: A,B,B,K (9 Replies)
Discussion started by: deal1dealer
9 Replies

4. Shell Programming and Scripting

Splitting a file into 4 files containing the same name pattern

Hello, I have one file which is in size around 20 MB , wanted to split up into four files of each size of 5 MB. ABCD_XYZ_20130302223203.xml. Requirement is that to write script which should do as : first three file should be of size 5 MB each, the fourth one content should be in the last... (8 Replies)
Discussion started by: ajju
8 Replies

5. Shell Programming and Scripting

Split the file based on pattern

Hi , I have huge files around 400 mb, which has clob data and have diffeent scenarios: I am trying to pass scenario number as parameter and and get required modified file based on the scenario number and criteria. Scenario 1: file name : scenario_1.txt ... (2 Replies)
Discussion started by: sol_nov
2 Replies

6. UNIX for Dummies Questions & Answers

Find next line based on pattern, if it is similar pattern skip it

Hi, I am able to get next line if it is matching a particular pattern. But i need a way to skip if next line also matches same pattern.. For example: No Records No Records Records found got it Records found Now i want to find 'Records found' after 'No Records' pattern matches.. ... (5 Replies)
Discussion started by: nagpa531
5 Replies

7. Shell Programming and Scripting

Splitting file based on pattern and first character

I have a file as below pema.txt s2dhshfu dshfkdjh dshfd rjhfjhflhflhvflxhvlxhvx vlvhx sfjhldhfdjhldjhjhjdhjhjxhjhxjxh sjfdhdhfldhlghldhflhflhfhldfhlsh rjsdjh#error occured# skjfhhfdkhfkdhbvfkdhvkjhfvkhf sjkdfhdjfh#error occured# my requirement is to create 3 files frm the... (8 Replies)
Discussion started by: pema.yozer
8 Replies

8. Shell Programming and Scripting

Problem with splitting large file based on pattern

Hi Experts, I have to split huge file based on the pattern to create smaller files. The pattern which is expected in the file is: Master..... First... second.... second... third.. third... Master... First.. second... third... Master... First... second.. second.. second..... (2 Replies)
Discussion started by: saisanthi
2 Replies

9. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345... (19 Replies)
Discussion started by: jimmy12
19 Replies

10. Shell Programming and Scripting

Print a pattern between the xml tags based on a search pattern

Hi all, I am trying to extract the values ( text between the xml tags) based on the Order Number. here is the sample input <?xml version="1.0" encoding="UTF-8"?> <NJCustomer> <Header> <MessageIdentifier>Y504173382</MessageIdentifier> ... (13 Replies)
Discussion started by: oky
13 Replies
Login or Register to Ask a Question