Breaking a fasta formatted file into multiple files containing each gene separately

03-16-2012

Registered User

1, 0

Join Date: Mar 2012

Last Activity: 16 March 2012, 11:48 AM EDT

Posts: 1

Thanks Given: 0

Thanked 0 Times in 0 Posts

Breaking a fasta formatted file into multiple files containing each gene separately

Hey,

I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time:

Code:

for i in *.rtf.out
do
awk '/^>/{f=++d".fasta"} {print > $i.out}' $i 
done

Last edited by Corona688; 03-16-2012 at 12:47 PM..

Ann Mc Cartney

View Public Profile for Ann Mc Cartney

Find all posts by Ann Mc Cartney

03-16-2012

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

".out" is not in quotes, which causes an error.

You cannot use shell variables inside awk in that manner, which causes malfunctions. $i for instance would be, if i=1, field one, not a shell variable. You don't need the shell variable in the first place fortunately, awk can handle multiple files by itself.

This will create files 0001.out, 0002.out, etc, etc.

Code:

awk 'BEGIN { F=0 }; /^>/ { F++ }; { print > sprintf("%04d", F) ".out" }' *.rtf.out

The name of the file currently being processed is available as the special varaible FILENAME, if you want to base the output filename on it.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

UNIX for Dummies Questions & Answers

Breaking a fasta formatted file into multiple files containing each gene separately

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to append two fasta files?

Discussion started by: dineshkumarsrk

2. Shell Programming and Scripting

Getting unique sequences from multiple fasta file

Discussion started by: Ibk

3. UNIX for Advanced & Expert Users

Map snps into a ref gene file

Discussion started by: marwah

4. Shell Programming and Scripting

Breaking large file into small files

Discussion started by: emily

5. Shell Programming and Scripting

Count and search by sequence in multiple fasta file

Discussion started by: empyrean

6. Shell Programming and Scripting

Output breaking when returning multiple values

Discussion started by: Azrael

7. Shell Programming and Scripting

Breaking a column's value into multiple rows

Discussion started by: mehimadri

8. Shell Programming and Scripting

To extract date and time separately from the file name

Discussion started by: IND123

9. Shell Programming and Scripting

Breaking the files as 10k recs. per file

Discussion started by: mr_manii

10. Shell Programming and Scripting

Breaking one file into many files based on first column?

Discussion started by: kylle345