Split a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split a file
# 1  
Old 08-23-2011
Question Split a file

Hi all,

A file reports.txt (see attachment) contains 17 pages of patient reports. Each patient is identified by a prefix i.e. 11 and a 7 digits number. There are total six patients reports in the file. One patient report may contain multiple pages. Following are the page count of each Lab no (seven digit number).
Code:
Lab. No:11 1713951 Page count 4
Lab. No:11 1701269 Page count 5
Lab. No:11 1394304 Page count 1
Lab. No:11 1394305 Page count 1
Lab. No:11 1394306 Page count 5
Lab. No:11 1394301 Page count 1

I am looking for an awk or perl solution to split the file according to 7 digit number. The expected file name is prefix (i.e. 11)and the 7 digit number.
Code:
111713951.txt (Should contain 4 pages)
111701269.txt (5 pages)
111394304.txt (1 page)
111394305.txt (1 page)
111394306.txt (5 pages)
111394301.txt (1 page)

So the whole 17 pages would produce 6 individual files with the 7 digits number.

Can any one of you may please give me a hand ?

Note : Sample file (reports.txt) is attached for your ref.

Regards - Sraj142

Moderator's Comments:
Mod Comment Please use code tags, thanks.

Last edited by sraj142; 08-23-2011 at 06:12 AM.. Reason: Split a file
# 2  
Old 08-23-2011
What a "page" is depends on your paper and font, so I can't tell if I have enough pages. But this splits as you ask.

Code:
nawk '{ print > "11" $3 ".txt" }' < file.txt

[edit] Okay, your actual data is nothing like the data you actually showed in your post. Working on it.

---------- Post updated at 03:08 PM ---------- Previous update was at 02:33 PM ----------

The data was so scrambled it took a while to see any patterns. I look for the "Lab." in each page and find the number after it. If no 'Lab.' is found in the page, it uses the last one it found.

Code:
awk 'BEGIN { RS="-\\*-"       }

{       for(N=1; (N<=NF)&&($N != "Lab."); N++)
        if($N == "Lab.")
        {
                N+=2;
                FILE="11" $N ".txt";
        }

        if(FILE) print > FILE;       }' < reports.txt

# 3  
Old 08-24-2011
Hi Corona688,

Thanks a lot for giving me a hand. So far I have copied your code to a file called yy in the same directory where a copy of reports.txt is there. When I used "awk yy", its not doing anything since last 15 mins. Could you please see if I am wrong with any command ?

Regards
# 4  
Old 08-24-2011
This is for the command line. If you can use it as a script the simplest way is to run as
Code:
sh yy

And to save output to OUTPUTFILE:
Code:
sh yy >OUTPUTFILE

# 5  
Old 08-24-2011
Quote:
Originally Posted by sraj142
Thanks a lot for giving me a hand. So far I have copied your code to a file called yy in the same directory where a copy of reports.txt is there. When I used "awk yy", its not doing anything since last 15 mins. Could you please see if I am wrong with any command ?
You waited 15 minutes? Wow, that's patience, it ought to finish nearly instantly Smilie

awk doesn't work that way. I suggest you type what I posted into an actual shell, or put it in a shell script.

Last edited by Corona688; 08-24-2011 at 10:44 AM..
# 6  
Old 08-25-2011
Hi yazu/corona688,

As both of you suggested, I have putted the same code in a shell script and run it by sh yy, I have even try it from the command line too. This time its finished instantly but not produced anything nor even any error Smilie)
# 7  
Old 08-25-2011
Well, the solution doesn't work. (And my second suggestion, about output, is wrong - i was inattentive, sorry.)

Your file was produced by some text processor, not a text editor. It has a lot of special escape sequences. Is it possible to convert your file in your text processor to plain text?
If not it would be hard to give you a solution - it needs to do some binary hacking to define borders of chunks in order to split the file.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

sed awk: split a large file to unique file names

Dear Users, Appreciate your help if you could help me with splitting a large file > 1 million lines with sed or awk. below is the text in the file input file.txt scaffold1 928 929 C/T + scaffold1 942 943 G/C + scaffold1 959 960 C/T +... (6 Replies)
Discussion started by: kapr0001
6 Replies

2. Shell Programming and Scripting

How to split file into multiple files using awk based on 1 field in the file?

Good day all I need some helps, say that I have data like below, each field separated by a tab DATE NAME ADDRESS 15/7/2012 LX a.b.c 15/7/2012 LX1 a.b.c 16/7/2012 AB a.b.c 16/7/2012 AB2 a.b.c 15/7/2012 LX2 a.b.c... (2 Replies)
Discussion started by: alexyyw
2 Replies

3. Shell Programming and Scripting

Split file based on file size in Korn script

I need to split a file if it is over 2GB in size (or any size), preferably split on the lines. I have figured out how to get the file size using awk, and I can split the file based on the number of lines (which I got with wc -l) but I can't figure out how to connect them together in the script. ... (6 Replies)
Discussion started by: ssemple2000
6 Replies

4. Shell Programming and Scripting

awk to split one field and print the last two fields within the split part.

Hello; I have a file consists of 4 columns separated by tab. The problem is the third fields. Some of the them are very long but can be split by the vertical bar "|". Also some of them do not contain the string "UniProt", but I could ignore it at this moment, and sort the file afterwards. Here is... (5 Replies)
Discussion started by: yifangt
5 Replies

5. Shell Programming and Scripting

Split a file into multiple files based on first two digits of file.

Hi , I do have a fixedwidth flatfile that has data for 10 different datasets each identified by the first two digits in the flatfile. 01 in the first two digit position refers to Set A 02 in the first two digit position refers to Set B and so on I want to genrate 10 different files from my... (6 Replies)
Discussion started by: okkadu
6 Replies

6. Shell Programming and Scripting

Split File by Pattern with File Names in Source File... Awk?

Hi all, I'm pretty new to Shell scripting and I need some help to split a source text file into multiple files. The source has a row with pattern where the file needs to be split, and the pattern row also contains the file name of the destination for that specific piece. Here is an example: ... (2 Replies)
Discussion started by: cul8er
2 Replies

7. Shell Programming and Scripting

How to split a data file into separate files with the file names depending upon a column's value?

Hi, I have a data file xyz.dat similar to the one given below, 2345|98|809||x|969|0 2345|98|809||y|0|537 2345|97|809||x|544|0 2345|97|809||y|0|651 9685|98|809||x|321|0 9685|98|809||y|0|357 9685|98|709||x|687|0 9685|98|709||y|0|234 2315|98|809||x|564|0 2315|98|809||y|0|537... (2 Replies)
Discussion started by: nithins007
2 Replies

8. Shell Programming and Scripting

Split one file to Multiple file with report basis in unix

Hi, Please help on this. i want split the below file(11020111.CLT) to more files with some condition. :b: 1) %s stating of the report 2) %e ending of the report example starting of the report: %sAEGONCA| |MUMBAI | :EXPC|N|D ending of the report %eAEGONCA| |MUMBAI | :EXPC 3)so the... (10 Replies)
Discussion started by: krbala1985
10 Replies

9. Shell Programming and Scripting

Split large file and add header and footer to each file

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file? (1 Reply)
Discussion started by: ashish4422
1 Replies

10. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each... (10 Replies)
Discussion started by: madhunk
10 Replies
Login or Register to Ask a Question