Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Selectively extracting entries from FASTA file Post 302960032 by Xterra on Monday 9th of November 2015 08:30:51 PM
Old 11-09-2015
Selectively extracting entries from FASTA file

I would like to extract all entries containing the following patterns: ccccta & ccccccccc from the following infile:
Code:
>P39PT-1224_Freq_900
cccctacgacggcattggtaatggctcccgcaagccatctctcttcagccaagg
>P39PT-784_Freq_2
cccctacgacggcattggtaatggcacccgcaagccatctctcttccccccccc
>P39PT-678_Freq_5
cccctacgacggcattggtaatggctcccgcaagtcatctctcttcagccaagg
>P39PT-22_Freq_3
cacctacgacggcattggtaatggctgccgcaagccatctctcttccccccccc

Thus, the desired outfile should look like this:
Code:
>P39PT-784_Freq_2
cccctacgacggcattggtaatggcacccgcaagccatctctcttccccccccc

I am using the following codes to accomplish this task:
Code:
awk -v search="ccccta" '$1~/^>/ {buf=sep=""; found=0} found==1 {print; next} {buf=buf sep $0; sep=RS} $0~search {print buf; found=1}' infile > outfile

awk -v search="ccccccccc" '$1~/^>/ {buf=sep=""; found=0} found==1 {print; next} {buf=buf sep $0; sep=RS} $0~search {print buf; found=1}' outfile > outfile1

However, I would like to use one script only that will search for both patterns at once.
Any help will be greatly appreciated
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Selectively splitting a file with C-shell?

I have a rather long csh script that works, but it's terribly ungraceful and takes a while from various loops. I only know enough code to get myself into trouble, so I'm looking for some guidance. I have a large file that is separated at intervals by the same line, like this: ... (2 Replies)
Discussion started by: fusi0n
2 Replies

2. UNIX for Advanced & Expert Users

Selectively Reformating a file using AWK

Dear users, I am new to AWK and have been battling with this one for close to a week now. Some of you did offer some help last week but I think I may not have explained myself very well. So I am trying again. I have a dataset that has the following format where the datasets repeat every... (5 Replies)
Discussion started by: sda_rr
5 Replies

3. UNIX for Dummies Questions & Answers

Removing selectively the last character from a file

Dear Members, Problem is suppose i have 50 lines in a file, 40 lines last character is "\" and the remaining 10 lines are good(i mean these 10 lines do not have "\" character) How can i remove this character from the file. Thanks (1 Reply)
Discussion started by: sandeep_1105
1 Replies

4. Shell Programming and Scripting

Selectively Find/Replace in a file?

I have a file that is HTML encoded. Each line has something like this on each line.. <href=http://link.com/username.aspx>username </a> more info.. <a href=http://link.com/info1.aspx>info1</a> more code... <a href=http://link.com/info2.aspx>info2</a> I have one goal really.. to clean up the... (2 Replies)
Discussion started by: dragin33
2 Replies

5. Shell Programming and Scripting

echo ls to a file and then read file and selectively delete

I'm trying to write a script that will do an ls of a location, echo it into a file, and then read that file and selectively delete files/folders, so it would go something like this: cd $CLEAN_LOCN ls >>$TMP_FILE while read LINE do if LINE = $DONTDELETE skip elseif LINE =... (2 Replies)
Discussion started by: MaureenT
2 Replies

6. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Hi I have an alignment file (.fasta) with ~80 sequences. They look like this- >JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0 GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Discussion started by: baika
2 Replies

7. Shell Programming and Scripting

Extract sequence from fasta file

Hi, I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help . input > fefrwefrwef X900 AGAGGGAATTGG AGGGGCCTGGAG GGTTCTCTTC > fefrwefrwef X932 AGAGGGAATTGG AGGAGGTGGAG GGTTCTCTTC > fefrwefrwef X937... (2 Replies)
Discussion started by: ritakadm
2 Replies

8. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

I have two files. File1 is shown below. >153L:B|PDBID|CHAIN|SEQUENCE RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM DIGTTHDDYANDVVARAQYYKQHGY >16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Discussion started by: nelsonfrans
7 Replies

9. Shell Programming and Scripting

How to remove spaces from a file selectively?

Hi i have a file in which i am doing some processing. The code is as follows: #!/bin/ksh grep DATA File1.txt >> File2.txt sed 's/DATA//' File2.txt | tr -d ‘ ‘ >> File4.xls As you can see my output is going in a xl file.The output consist of four columns/feilds out of which the first... (20 Replies)
Discussion started by: Sharma331
20 Replies

10. UNIX for Dummies Questions & Answers

Round up -FASTA file

I have the following script: awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }' and the following file: >P39PT-1224 Freq 900 cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg >P39PT-784 Freq 2... (2 Replies)
Discussion started by: Xterra
2 Replies
Tree::Simple::Visitor::FindByPath(3pm)			User Contributed Perl Documentation		    Tree::Simple::Visitor::FindByPath(3pm)

NAME
Tree::Simple::Visitor::FindByPath - A Visitor for finding an element in a Tree::Simple hierarchy with a path SYNOPSIS
use Tree::Simple::Visitor::FindByPath; # create a visitor object my $visitor = Tree::Simple::Visitor::FindByPath->new(); # set the search path for our tree $visitor->setSearchPath(qw(1 1.2 1.2.2)); # pass the visitor to a tree $tree->accept($visitor); # fetch the result, which will # be the Tree::Simple object that # we have found, or undefined my $result = $visitor->getResult() || die "No Tree found"; # our result's node value should match # the last element in our path print $result->getNodeValue(); # this should print 1.2.2 DESCRIPTION
Given a path and Tree::Simple hierarchy, this Visitor will attempt to find the node specified by the path. METHODS
new There are no arguments to the constructor the object will be in its default state. You can use the "setSearchPath" and "setNodeFilter" methods to customize its behavior. includeTrunk ($boolean) Based upon the value of $boolean, this will tell the visitor to include the trunk of the tree in the search as well. setSearchPath (@path) This is the path we will attempt to follow down the tree. We will do a stringified comparison of each element of the path and the current tree's node (or the value returned by the node filter if it is set). setNodeFilter ($filter_function) This method accepts a CODE reference as its $filter_function argument and throws an exception if it is not a code reference. This code reference is used to filter the tree nodes as they are collected. This can be used to customize output, or to gather specific information from a more complex tree node. The filter function should accept a single argument, which is the current Tree::Simple object. visit ($tree) This is the method that is used by Tree::Simple's "accept" method. It can also be used on its own, it requires the $tree argument to be a Tree::Simple object (or derived from a Tree::Simple object), and will throw and exception otherwise. getResult This method will return the tree found at the specified path (set by the "setSearchPath" method) or "undef" if no tree is found. getResults This method will return the tree's that make up the path specified in "setSearchPath". In the case of a failed search, this can be used to find the elements which did successfully match along the way. BUGS
None that I am aware of. Of course, if you find a bug, let me know, and I will be sure to fix it. CODE COVERAGE
See the CODE COVERAGE section in Tree::Simple::VisitorFactory for more inforamtion. SEE ALSO
These Visitor classes are all subclasses of Tree::Simple::Visitor, which can be found in the Tree::Simple module, you should refer to that module for more information. AUTHOR
stevan little, <stevan@iinteractive.com> COPYRIGHT AND LICENSE
Copyright 2004, 2005 by Infinity Interactive, Inc. <http://www.iinteractive.com> This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.10.1 2005-10-24 Tree::Simple::Visitor::FindByPath(3pm)
All times are GMT -4. The time now is 04:46 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy