Selectively extracting entries from FASTA file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Selectively extracting entries from FASTA file
# 1  
Old 11-09-2015
Selectively extracting entries from FASTA file

I would like to extract all entries containing the following patterns: ccccta & ccccccccc from the following infile:
Code:
>P39PT-1224_Freq_900
cccctacgacggcattggtaatggctcccgcaagccatctctcttcagccaagg
>P39PT-784_Freq_2
cccctacgacggcattggtaatggcacccgcaagccatctctcttccccccccc
>P39PT-678_Freq_5
cccctacgacggcattggtaatggctcccgcaagtcatctctcttcagccaagg
>P39PT-22_Freq_3
cacctacgacggcattggtaatggctgccgcaagccatctctcttccccccccc

Thus, the desired outfile should look like this:
Code:
>P39PT-784_Freq_2
cccctacgacggcattggtaatggcacccgcaagccatctctcttccccccccc

I am using the following codes to accomplish this task:
Code:
awk -v search="ccccta" '$1~/^>/ {buf=sep=""; found=0} found==1 {print; next} {buf=buf sep $0; sep=RS} $0~search {print buf; found=1}' infile > outfile

awk -v search="ccccccccc" '$1~/^>/ {buf=sep=""; found=0} found==1 {print; next} {buf=buf sep $0; sep=RS} $0~search {print buf; found=1}' outfile > outfile1

However, I would like to use one script only that will search for both patterns at once.
Any help will be greatly appreciated
# 2  
Old 11-09-2015
You mean something like:
Code:
awk -v search1="ccccta" -v search2="ccccccccc" '$1~/^>/ {buf=sep=""; found=0} found==1 {print; next} {buf=buf sep $0; sep=RS} $0~search1 && $0~search2 {print but; found=1}' infile > outfile1

This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 11-09-2015
Or:
Code:
awk '$2~/ccccta/ && $2~/ccccccccc/{print RS $0}' RS=\> ORS= file

or
Code:
awk -v search1="ccccta" -v search2="ccccccccc" '$2~search1 && $2~search2{print RS $0}' RS=\> ORS= file


Last edited by Scrutinizer; 11-09-2015 at 11:03 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 11-09-2015
Got Perl?

Code:
perl -076 -ne '/cccta/ and /ccccccccc/ and chomp and print ">$_"' xterra.fasta

This User Gave Thanks to Aia For This Post:
# 5  
Old 11-09-2015
Awesome!
Thank you guys!
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Round up -FASTA file

I have the following script: awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }' and the following file: >P39PT-1224 Freq 900 cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg >P39PT-784 Freq 2... (2 Replies)
Discussion started by: Xterra
2 Replies

2. Shell Programming and Scripting

How to remove spaces from a file selectively?

Hi i have a file in which i am doing some processing. The code is as follows: #!/bin/ksh grep DATA File1.txt >> File2.txt sed 's/DATA//' File2.txt | tr -d ‘ ‘ >> File4.xls As you can see my output is going in a xl file.The output consist of four columns/feilds out of which the first... (20 Replies)
Discussion started by: Sharma331
20 Replies

3. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

I have two files. File1 is shown below. >153L:B|PDBID|CHAIN|SEQUENCE RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM DIGTTHDDYANDVVARAQYYKQHGY >16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Discussion started by: nelsonfrans
7 Replies

4. Shell Programming and Scripting

Extract sequence from fasta file

Hi, I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help . input > fefrwefrwef X900 AGAGGGAATTGG AGGGGCCTGGAG GGTTCTCTTC > fefrwefrwef X932 AGAGGGAATTGG AGGAGGTGGAG GGTTCTCTTC > fefrwefrwef X937... (2 Replies)
Discussion started by: ritakadm
2 Replies

5. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Hi I have an alignment file (.fasta) with ~80 sequences. They look like this- >JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0 GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Discussion started by: baika
2 Replies

6. Shell Programming and Scripting

echo ls to a file and then read file and selectively delete

I'm trying to write a script that will do an ls of a location, echo it into a file, and then read that file and selectively delete files/folders, so it would go something like this: cd $CLEAN_LOCN ls >>$TMP_FILE while read LINE do if LINE = $DONTDELETE skip elseif LINE =... (2 Replies)
Discussion started by: MaureenT
2 Replies

7. Shell Programming and Scripting

Selectively Find/Replace in a file?

I have a file that is HTML encoded. Each line has something like this on each line.. <href=http://link.com/username.aspx>username </a> more info.. <a href=http://link.com/info1.aspx>info1</a> more code... <a href=http://link.com/info2.aspx>info2</a> I have one goal really.. to clean up the... (2 Replies)
Discussion started by: dragin33
2 Replies

8. UNIX for Dummies Questions & Answers

Removing selectively the last character from a file

Dear Members, Problem is suppose i have 50 lines in a file, 40 lines last character is "\" and the remaining 10 lines are good(i mean these 10 lines do not have "\" character) How can i remove this character from the file. Thanks (1 Reply)
Discussion started by: sandeep_1105
1 Replies

9. UNIX for Advanced & Expert Users

Selectively Reformating a file using AWK

Dear users, I am new to AWK and have been battling with this one for close to a week now. Some of you did offer some help last week but I think I may not have explained myself very well. So I am trying again. I have a dataset that has the following format where the datasets repeat every... (5 Replies)
Discussion started by: sda_rr
5 Replies

10. Shell Programming and Scripting

Selectively splitting a file with C-shell?

I have a rather long csh script that works, but it's terribly ungraceful and takes a while from various loops. I only know enough code to get myself into trouble, so I'm looking for some guidance. I have a large file that is separated at intervals by the same line, like this: ... (2 Replies)
Discussion started by: fusi0n
2 Replies
Login or Register to Ask a Question