Help with file splitting


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Help with file splitting
# 1  
Old 06-23-2010
Help with file splitting

Hey everyone,
I would really appreciate some help with a problem I have filing away some data I have. I have multiple fasta files that have different pieces of information in each. I want to split each file into parts, and then file away each separate part into its own file. Here is an example file:

Code:
                 >mhp001
atgcaaacaaataaaaataatttaaaggttagaacacagcaaattaggcaacaaattga
aaatttattaaatgatcgaatgttgtataacaacttttttagcacaatttatgtactca
acgagacagaaactgaaattattatagattttacagacttaatcgcgaaacaggaagtg
atttcacgctgagttgatacggttgaaaaagctattaaaaatcttgaaatttcaaaaat
cctaacttttaacaatacaaataattataccattaattcaaaagaaagccaaaactttt
ccataaaaaataaatattgcagctttaatattaacaatgttttaaacaaatttaccttt
agaaattttataaaatcaagttataatttccaaatttttagtatttatgacgcaatagt
cgcaaattcaagactaaattactcaccaatttttatttcaggaccatcaggaattggaa
aaacgcattttattaatgcgattggaaatttacttgtagaaaaacagaagaaagttttc
tacattaacgactataaatttatcagttgcgtttcttcctggatgcaaaatggtcaaaa
tgaaaaaattagtgaatttttaaactgattgtctcaagttgacgcttttctttttgatg
atatccaaggtttggctaacaaacaacaaacttcaattgttgcacttgaaattttaaat
agatttatcgaagaggataaaacagtgataataacatctgataaatcgccttctttact
tggtggatttgaagaaagatttataacgcgatttagttcagggttgcacattaaattaa
acaagccgaaaaaagaagactttttgcggatttttaagcataaattagttgaagaaaaa
ttagaaaaacatatttgaacaaatgatgcttttgaatttttgtcaaagcattttcgaaa
ttcgattcgtgagcttgaaggtgcgctaaaatcaattgttttttatatccaaacaaata
aaaataaatttgaggatgaaatttattttgataagaaaaaaatgtttgaaatttttgtt
gaaaaatatgaaatcgaacaaacaatcacccctgatttaatcattgaggttgtctcaaa
atattatggcgtctcaattttagatataaaaagtgaaaaaagaggcaaaaatattgtgc
atgcccgcgatattgcaatctgattaattaaaaatattctggatttaactcataatagc
gtaggaactttttttaacaacagaagacattcaacaataatttctacccttaaaaaaat
tgatactttaaaacaaagcaacaataatgaacttgaaattgcccttaaccatatttata
aacaattaaactgaagttttaaacagcgaaaataa
>mhp001_mycohypo
atgcaaacaaataaaaataatttaaaggttagaacacagcaaattagAcaacaaattga
aaatttattaaatgatcgaatgttgtataacaacttttttagcacaatttatgtactca
acgagacagaaactgaaattattatagattttacagacttaatcgcgaaacaggaagtg
atttcacgctgagttgatacggttgaaaaagctattaaaaatcttgaaatttcaaaaat
cctaacttttaacaatacaaataattataccattaattcaaaagaaagTcaaaactttt
ccataaaaaataaatattgcagctttaatattaacaatgttttaaacaaatttaccttt
GgaaaCtttataaaatcaagttataatttccaaatttttagtatttatgaTgcaatagt
cgcaaattcaCgactaaattactcaccaatttttatttcaggaccatcaggaattggaa
aaacgcattttattaatgcgattggaaatttacttgtGgaaaaacagaagaaaAttttc
tacattaaTgactataaatttatcagCtgcgtttcttcctggatgcaaaaCggtcaaaa
tgaaaaaattagtgaatttttaaactgattgtctcaagttgacgcttttctttttgatg
atatccaaggtttggctaacaaacaacaaacttcaattgttgcGcttgaaattttaaat
agatttatcgaagaggataaaGTagtgataataacatctgaCaaatcAccttctttact
tggtggatttgaagaaagatttataacTcgatttagttcagggttgcacattaaattaa
acaagccgaaaaaagaGgactttttgcggatttttaagcataaattagttgaagaaaaa
ttagaaaaacatatttgaacaaatgatgcttttgaatttttgtcaaaAcattttcgaaa
ttcgattcgCgagcttgaaggtgcgctaaaatcaattgttttttatatccaaacaaata
aaaataaatttgaAAatgaaatttattttgataagaaaaaaatgtttgaaatttttgtt
gaaaaatatgaaatcgaacaaacaatTacccctgatttaatcattgaggttgtctcaaa
atattatggcgtctcaattttagatataaaaagtgaaaaaagaggcaaaaatattgtgc
atgcccgcgatattgcaatctgattaattaaaaatattctggatttaactcataatagc
gtaggaactttttttaacaacagaagacattcaacaataatttctacccttaaaaaaat
tgatactttaaaacaaagcaacaataatgaacttgaaattgcccttaaccatatttata
aacaattaaactgaagttttaaacagcgaaaataa

What I want to do is to split the file at the '>' marks and put each piece into a different file. I'm a newbie to linux, so I really have no idea how to write the script for this.

Thanks for your help!

Last edited by vgersh99; 06-23-2010 at 04:38 PM.. Reason: code tags, please!
# 2  
Old 06-23-2010
Code:
 awk '
     /^>/ {if(length(f)>0){close(f)}; f=substr($0,2) }
    {print $0 > f } ' filename

This creates filenames named after the string following the ">"
# 3  
Old 06-23-2010
Ok!

That works great! Now how do I do that for say, 700 files, and put all the generated files into separate folders according to what file they originated from?
# 4  
Old 06-23-2010
Hint: tell us all of what you want at the very beginning.... it is easier for all of us.

Define what you want by directory, i.e., how do you determine the directory name based on an originating directory.
# 5  
Old 06-23-2010
Sorry

Sorry about that, this is my first time on here, so I didn't think my question through as much as I should have the first time through.

Also, I'm not sure what you're asking? Basically, the directory created should be named the same as the original file, so that all the components are held within.
# 6  
Old 06-23-2010
You want the splits to go into the same directory the file came from.
/path/to/files is the top directory for your file tree.
Code:
#!/bin/ksh
awkit()
{
    awk '
     /^>/ {if(length(f)>0){close(f)}; f=substr($0,2) }
    {print $0 > f } '  $1
}
find /path/to/files -type d |
while read dirname
do
      cd $dirname
      for i in *
      do
           [[ -f $i  ]] && awkit $i
      done 
done

# 7  
Old 06-24-2010
One more thing

Ok, thanks for all you help, it is working well. The only problem is, that when I use the script, the output is just as a "file". I didn't think this was a big deal what kind of file the output was, but I am trying to do some things with a program that needs all of the output to be in fasta format ( .fa ). So, how does the script you gave need to be changed to do that as it outputs?
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Splitting the file based on two fields - Fixed length file

Hi , I am having a scenario where I need to split the file based on two field values. The file is a fixed length file. ex: AA0998703000000000000190510095350019500010005101980301 K 0998703000000000000190510095351019500020005101480 ... (4 Replies)
Discussion started by: saj
4 Replies

2. Shell Programming and Scripting

Splitting a text file into smaller files with awk, how to create a different name for each new file

Hello, I have some large text files that look like, putrescine Mrv1583 01041713302D 6 5 0 0 0 0 999 V2000 2.0928 -0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 5.6650 0.2063 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 3.5217 ... (3 Replies)
Discussion started by: LMHmedchem
3 Replies

3. Shell Programming and Scripting

Execution of loop :Splitting a single file into multiple .dat file

hdr=$(cut -c1 $path$file|head -1)#extract header”H” trl=$(cut -c|path$file|tail -1)#extract trailer “T” SplitFile=$(cut -c 50-250 $path 1$newfile |sed'$/ *$//' head -1')# to trim white space and extract table name If; then # start loop if it is a header While read I #read file Do... (4 Replies)
Discussion started by: SwagatikaP1
4 Replies

4. Shell Programming and Scripting

Splitting XML file on basis of line number into multiple file

Hi All, I have more than half million lines of XML file , wanted to split in four files in a such a way that top 7 lines should be present in each file on top and bottom line of should be present in each file at bottom. from the 8th line actual record starts and each record contains 15 lines... (14 Replies)
Discussion started by: ajju
14 Replies

5. UNIX for Dummies Questions & Answers

Extracting data from one file, based on another file (splitting)

Dear All, I have two files but want to extract data from one based on another... can you please help me file 1 David Tom Ellen and file 2 David|0010|testnamez|resultsz David|0004|testnamex|resultsx Tom|0010|testnamez|resultsz Tom|0004|testnamex|resultsx Ellen|0010|testnamez|resultsz... (12 Replies)
Discussion started by: A-V
12 Replies

6. Shell Programming and Scripting

Splitting a file in to multiple files and passing each individual file to a command

I have an input file with contents like: MainFile.dat: 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 12247689|7896|77698080 16768900|hh78|78959390 ... (4 Replies)
Discussion started by: rkrish
4 Replies

7. Shell Programming and Scripting

File splitting, naming file according to internal field

Hi All, I have a rather stange set of requirements that I'm hoping someone here could help me with. We receive a file that is actually a concatenation of 4 files (don't believe this would change, but ideally the solution would handle n files). The super-file looks like:... (7 Replies)
Discussion started by: Leedor
7 Replies

8. Shell Programming and Scripting

splitting the file

Hi , I have one file which has many headers. Say suppose HEDAER ..data DATA DATA ..data ..data HEADER ..data ..data DATA .data HEADER. ..data ..data If there are 3 HEADERS in source file then I need to split the source file into 3 separate file.... (2 Replies)
Discussion started by: tanyaheerani
2 Replies

9. UNIX for Dummies Questions & Answers

Splitting a file based on record sin another file

All, We receive a file with a large no of records (records can vary) and we have to split it into two files based on another file. e.g. File1: UHDR 2008112 "25187","00000022","00",21-APR-1991,"" ,"D",-000000519,+0000000000,"C", ,+000000000,+000000000,000000000,"2","" ,21-APR-1991... (7 Replies)
Discussion started by: er_ashu
7 Replies

10. Shell Programming and Scripting

[Splitting file] Extracting group of segments from one file to others

Hi there, I need to split one huge file into separate files if the condition is fulfilled according to that the position between 97 and 98 matches with “IT” at the segment MAS. There is no delimiter file is fix-width with varous line length. Could you please help me how I do split the file... (1 Reply)
Discussion started by: ozgurgul
1 Replies
Login or Register to Ask a Question