Split one file into many


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Split one file into many
# 1  
Old 11-15-2011
Split one file into many

Hello, I have one huge file(20GB maybe). The format likes
Code:
>chr1
AGCTTCACTTACTAGATCATGTCA
AGTCGTCATGTTTATTTTAACCAC
....
>chr2
gGCTTCACTTACTAGATCATGTCA
TGTCGTCATGTTTATTTTAACCAC
....
>chr23
tGCTTCACTTACTAGATCATGTCA
AGTCGTCATGTTTATTTTAACCAC
....

There are 23 blocks.What I want to is to split it into 23 files. The file names can be as "chr1", "chr2", ...."chr23".
Each one contains its own block data.

Thanks.
# 2  
Old 11-15-2011
If you are asking, I presume its because the command split is not adequate... Can you explain why ? in order to understand and help...
# 3  
Old 11-15-2011
Code:
awk '/^>/ { OUT=substr($0, 2); }; OUT { print > OUT }' < infile

should make files chr1, chr2, ..., chr23

---------- Post updated at 10:46 AM ---------- Previous update was at 10:44 AM ----------

If you want to exclude the filename from the output files:

Code:
awk '/^>/ { OUT=substr($0, 2); }; OUT && !/^>/ { print > OUT }' < filename

This User Gave Thanks to Corona688 For This Post:
# 4  
Old 11-16-2011
The above code is working fine.
But in case if we need to handle empty files also, then it's not working

Ex: If chr2 doesn't have data, then its not creating the file.

~Srk
# 5  
Old 11-16-2011
Code:
awk '/^>/ { OUT=substr($0, 2); printf "" >OUT; }; OUT && !/^>/ { print > OUT }' < filename

# 6  
Old 11-16-2011
I just found that in my file there are some other lines beginning with
Code:
>DDD]]]

or other wild chars
Anyway I want to exclude these. Only for beginning with
Code:
>chr

How to
# 7  
Old 11-16-2011
Exclude them as in what? Exclude the entire section following >DDD]]? Or just ignore the >DDD]] and continue writing to the same file?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

sed awk: split a large file to unique file names

Dear Users, Appreciate your help if you could help me with splitting a large file > 1 million lines with sed or awk. below is the text in the file input file.txt scaffold1 928 929 C/T + scaffold1 942 943 G/C + scaffold1 959 960 C/T +... (6 Replies)
Discussion started by: kapr0001
6 Replies

2. Shell Programming and Scripting

How to split file into multiple files using awk based on 1 field in the file?

Good day all I need some helps, say that I have data like below, each field separated by a tab DATE NAME ADDRESS 15/7/2012 LX a.b.c 15/7/2012 LX1 a.b.c 16/7/2012 AB a.b.c 16/7/2012 AB2 a.b.c 15/7/2012 LX2 a.b.c... (2 Replies)
Discussion started by: alexyyw
2 Replies

3. Shell Programming and Scripting

Split file based on file size in Korn script

I need to split a file if it is over 2GB in size (or any size), preferably split on the lines. I have figured out how to get the file size using awk, and I can split the file based on the number of lines (which I got with wc -l) but I can't figure out how to connect them together in the script. ... (6 Replies)
Discussion started by: ssemple2000
6 Replies

4. Shell Programming and Scripting

awk to split one field and print the last two fields within the split part.

Hello; I have a file consists of 4 columns separated by tab. The problem is the third fields. Some of the them are very long but can be split by the vertical bar "|". Also some of them do not contain the string "UniProt", but I could ignore it at this moment, and sort the file afterwards. Here is... (5 Replies)
Discussion started by: yifangt
5 Replies

5. Shell Programming and Scripting

Split a file into multiple files based on first two digits of file.

Hi , I do have a fixedwidth flatfile that has data for 10 different datasets each identified by the first two digits in the flatfile. 01 in the first two digit position refers to Set A 02 in the first two digit position refers to Set B and so on I want to genrate 10 different files from my... (6 Replies)
Discussion started by: okkadu
6 Replies

6. Shell Programming and Scripting

Split File by Pattern with File Names in Source File... Awk?

Hi all, I'm pretty new to Shell scripting and I need some help to split a source text file into multiple files. The source has a row with pattern where the file needs to be split, and the pattern row also contains the file name of the destination for that specific piece. Here is an example: ... (2 Replies)
Discussion started by: cul8er
2 Replies

7. Shell Programming and Scripting

How to split a data file into separate files with the file names depending upon a column's value?

Hi, I have a data file xyz.dat similar to the one given below, 2345|98|809||x|969|0 2345|98|809||y|0|537 2345|97|809||x|544|0 2345|97|809||y|0|651 9685|98|809||x|321|0 9685|98|809||y|0|357 9685|98|709||x|687|0 9685|98|709||y|0|234 2315|98|809||x|564|0 2315|98|809||y|0|537... (2 Replies)
Discussion started by: nithins007
2 Replies

8. Shell Programming and Scripting

Split one file to Multiple file with report basis in unix

Hi, Please help on this. i want split the below file(11020111.CLT) to more files with some condition. :b: 1) %s stating of the report 2) %e ending of the report example starting of the report: %sAEGONCA| |MUMBAI | :EXPC|N|D ending of the report %eAEGONCA| |MUMBAI | :EXPC 3)so the... (10 Replies)
Discussion started by: krbala1985
10 Replies

9. Shell Programming and Scripting

Split large file and add header and footer to each file

I have one large file, after every 200 line i have to split the file and the add header and footer to each small file? It is possible to add different header and footer to each file? (1 Reply)
Discussion started by: ashish4422
1 Replies

10. UNIX for Dummies Questions & Answers

Split a file with no pattern -- Split, Csplit, Awk

I have gone through all the threads in the forum and tested out different things. I am trying to split a 3GB file into multiple files. Some files are even larger than this. For example: split -l 3000000 filename.txt This is very slow and it splits the file with 3 million records in each... (10 Replies)
Discussion started by: madhunk
10 Replies
Login or Register to Ask a Question