grep FASTA files


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers grep FASTA files
Prev   Next
# 1  
Old 06-16-2010
grep FASTA files

I would like to extract the sequences larger than 10 bases but shorter than 18 along with the identifier from a FASTA file that looks like this:

> Seq I
ACGACTAGACGATAGACGATAGA
> Seq 2
ACGATGACGTAGCAGT
> Seq 3
ACGATACGAT

I know I can extract the IDs alone with the following code
Code:
grep ">" inputfile > Outputfile

and the sequences that are longer than 10 bases with the following one:
Code:
grep "[[:alpha:]]\{10\}" inputfile > outputfile

However, the ID will be also lost since they are pretty short and the output file will also contain the reads that are longer than 18. Thus, I need to generate a file containing the sequence ID followed by the actaul sequence. Therefore, the only sequence that will be present in the outputfile would be sequence 2:

Quote:
> Seq 2
ACGATGACGTAGCAGT
Any ideas?
Thanks!

Last edited by Xterra; 06-16-2010 at 11:58 PM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to append two fasta files?

I have two fasta files as shown below, File:1 >Contig_1:90600-91187 AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGAACAGGAATTGATGACGGTC >Contig_98:35323-35886 GACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCCTGGATCTGGTCGAAATCCAGCCGCAG >Contig_24:26615-28387... (11 Replies)
Discussion started by: dineshkumarsrk
11 Replies

2. Shell Programming and Scripting

Help with reformat single-line multi-fasta into multi-line multi-fasta

Input File: >Seq1 ASDADAFASFASFADGSDGFSDFSDFSDFSDFSDFSDFSDFSDFSDFSDFSD >Seq2 SDASDAQEQWEQeqAdfaasd >Seq3 ASDSALGHIUDFJANCAGPATHLACJHPAUTYNJKG ...... Desired Output File >Seq1 ASDADAFASF ASFADGSDGF SDFSDFSDFS DFSDFSDFSD FSDFSDFSDF SD >Seq2 (4 Replies)
Discussion started by: patrick87
4 Replies

3. UNIX for Dummies Questions & Answers

Round up -FASTA file

I have the following script: awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }' and the following file: >P39PT-1224 Freq 900 cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg >P39PT-784 Freq 2... (2 Replies)
Discussion started by: Xterra
2 Replies

4. UNIX for Dummies Questions & Answers

Fasta header modification

Hi, I need some help with modifying fasta headers. I have a fasta file with thousands of contigs and I need to modify their headers with the information obtained from a second file. File 1 contains the fasta sequences: >contig0001 length=11115 numreads=10777 agatgtagatctct... (6 Replies)
Discussion started by: Lokaps
6 Replies

5. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Hi I have an alignment file (.fasta) with ~80 sequences. They look like this- >JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0 GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Discussion started by: baika
2 Replies

6. UNIX for Dummies Questions & Answers

Breaking a fasta formatted file into multiple files containing each gene separately

Hey, I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time: for i in *.rtf.out do awk '/^>/{f=++d".fasta"} {print > $i.out}' $i done (1 Reply)
Discussion started by: Ann Mc Cartney
1 Replies

7. UNIX for Dummies Questions & Answers

renaming (renumbering) fasta files

I have a fasta file that looks like this: >Noname ACCAAAATAATTCATGATATACTCAGATCCATCTGAGGGTTTCACCACTTGTAGAGCTAT CAGAAGAATGTCAATCAACTGTCCGAGAAAAAAGAATCCCAGG >Noname ACTATAAACCCTATTTCTCTTTCTAAAAATTGAAATATTAAAGAAACTAGCACTAGCCTG ACCTTTAGCCAGACTTCTCACTCTTAATGCTGCGGACAAACAGA ... I want to... (2 Replies)
Discussion started by: Oyster
2 Replies

8. Shell Programming and Scripting

Changing from FASTA to PHYLIP format

I really need some help with this task. I have a bunch of FASTA files with hundreds of DNA sequences that look like this: >SeqID1 AACCATGACAGAGGAGATGTGAACAGATAGAGGGATGACAGATGACAGATAGACCCAGAC TGACAGGTTCAAAGGCTGCAGTGCAGTGACGTGACGATTT >Sequence 22... (13 Replies)
Discussion started by: Xterra
13 Replies

9. Shell Programming and Scripting

grep for certain files using a file as input to grep and then move

Hi All, I need to grep few files which has words like the below in the file name , which i want to put it in a file and and grep for the files which contain these names and move it to a new directory , full file name -C20091210.1000-20091210.1100_SMGBSC3:1000... (2 Replies)
Discussion started by: anita07
2 Replies

10. UNIX for Dummies Questions & Answers

fasta format?

Hi, I'm in need of creating a file in the fasta format: >1A6A.A HVIIQAEFYLNPDQSGEFMFDFDGDEIFHVDMAKKETVWRLEEFGRFASFEAQGALANIAVDKANLEIMTKRSNYTPITN VPPEVTVLTNSPVELREPNVLICFIDKFTPPVVNVTWLRNGKPVTTGVSETVFLPREDHLFRKFHYLPFLPSTEDVYDCR VEHWGLDEPLLKHWEF >1A6A.B ... (5 Replies)
Discussion started by: lost
5 Replies
Login or Register to Ask a Question