Sponsored Content
Top Forums Shell Programming and Scripting Match first pattern first then extract second pattern match Post 302377420 by patrick87 on Friday 4th of December 2009 04:31:19 AM
Old 12-04-2009
Match first pattern first then extract second pattern match

My input file:
Code:
<accession>Q91G55</accession>
<name>043L_IIV6</name>
<protein>
<recommendedName>
<location>
<position position="294"/>
</location>
<fullName>Uncharacterized protein 043L</fullName>

<accession>P18556</accession>
<name>1106L_ASFB7</name>
<protein>
<recommendedName>
<fullName>Protein MGF 110-6L</fullName>

<accession>O55734</accession>
<name>120L_IIV6</name>
<fullName>Uncharacterized protein 120L</fullName>
.
.

My desired output file (extract accession number first, then extract the fullname belong to its):
Code:
<fullName>Uncharacterized protein 043L</fullName>
<fullName>Protein MGF 110-6L</fullName>
<fullName>Uncharacterized protein 120L</fullName>

This is the code I try, but it is not a good code because it will extract some <fullName> detail about other <accession> Smilie

Code:
grep -A8 '<accession>' file | grep '<fullName>'

The original file, each group start with <accession> and end with <fullName>, but the detail description on it, is different within each group.
Actually at first I won't extract all the <accession> from a long list of list. I only want to extract specific <accession> from a long list of data. From those selected <accession> detail, I want extract all of its <fullName>.
Thanks a lot for any suggestion and advice.

Last edited by patrick87; 12-05-2009 at 08:43 PM.. Reason: more code tags
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract data from records that match pattern

Hi Guys, I have a file as follows: a b c 1 2 3 4 pp gg gh hh 1 2 fm 3 4 g h i j k l m 1 2 3 4 d e f g h j i k l 1 2 3 f 3 4 r t y u i o p d p re 1 2 3 f 4 t y w e q w r a s p a 1 2 3 4 I am trying to extract all the 2's from each row. 2 is just an example... (6 Replies)
Discussion started by: npatwardhan
6 Replies

2. Shell Programming and Scripting

Need one liner to search pattern and print everything expect 6 lines from where pattern match made

i need to search for a pattern from a big file and print everything expect the next 6 lines from where the pattern match was made. (8 Replies)
Discussion started by: chidori
8 Replies

3. Shell Programming and Scripting

Awk to match a pattern and perform a search after the first pattern

Hello Guyz I have been following this forum for a while and the solutions provided are super useful. I currently have a scenario where i need to search for a pattern and start searching by keeping the first pattern as a baseline ABC DEF LMN EFG HIJ LMN OPQ In the above text i need to... (8 Replies)
Discussion started by: RickCharles
8 Replies

4. Shell Programming and Scripting

Pattern Match & Extract from a string

Hi, I have long string in 2nd field, as shown below: REF1 | CLESCLJSCSHSCSMSCSNSCSRSCUDSCUFSCU7SCV1SCWPSCXGPDBACAPA0DHDPDMESED6 REF2 | SBR4PCBFPCDRSCSCG3SCHEBSCKNSCKPSCLLSCMCZXTNPCVFPCV6P4KL0DMDSDSASEWG I have a group of fixed patterns which can occur in these long strings & only... (11 Replies)
Discussion started by: karumudi7
11 Replies

5. UNIX for Dummies Questions & Answers

Match Pattern after certain pattern and Print words next to Pattern

Hi experts , im new to Unix,AWK ,and im just not able to get this right. I need to match for some patterns if it matches I need to print the next few words to it.. I have only three such conditions to match… But I need to print only those words that comes after satisfying the first condition..... (2 Replies)
Discussion started by: 100bees
2 Replies

6. Shell Programming and Scripting

Pattern match exclusive return pattern/variable

I have an application(Minecraft Server) that generates a logfile live. Using Crontab and screen I send a 'list' command every minute. Sample Log view: 2013-06-07 19:14:37 <Willrocksyea1> hello* 2013-06-07 19:14:41 <Gromden29> hey 2013-06-07 19:14:42 Gromden29 lost connection:... (1 Reply)
Discussion started by: gatekeeper258
1 Replies

7. Shell Programming and Scripting

Extract lines that match a pattern

Hi all, I got a file that contains the following content, Actually it is a part of the file content, Installing XYZ XYZA Image, API 18, revision 2 Unzipping XYZ XYZA Image, API 18, revision 2 (1%) Unzipping XYZ XYZA Image, API 18, revision 2 (96%) Unzipping XYZ XYZA Image, API 18,... (7 Replies)
Discussion started by: Kashyap
7 Replies

8. Shell Programming and Scripting

Rearrange or replace only the second line after pattern match or pattern match

Im using the command below , but thats not the output that i want. it only prints the odd and even numbers. awk '{if(NR%2){print $0 > "1"}else{print $0 > "2"}}' Im hoping for something like this file1: Text hi this is just a test text1 text2 text3 text4 text5 text6 Text hi... (2 Replies)
Discussion started by: invinzin21
2 Replies

9. Shell Programming and Scripting

Match Pattern and print pattern and multiple lines into one line

Hello Experts , require help . See below output: File inputs ------------------------------------------ Server Host = mike id rl images allocated last updated density vimages expiration last read <------- STATUS ------->... (4 Replies)
Discussion started by: tigerhills
4 Replies

10. UNIX for Beginners Questions & Answers

Help with pattern match and Extract

Hi All, I am having a file like below . Basically when SB comes in the text with B. I have to take the word till SB. When there only B I should take take till B. Tried for cut it by demilter but not able to build the logic SB- CD B_RESTO SB_RESTO CRYSTALS BOILERS -->There SB and B so I... (6 Replies)
Discussion started by: arunkumar_mca
6 Replies
Bio::Tools::Analysis::Protein::Scansite(3pm)		User Contributed Perl Documentation	      Bio::Tools::Analysis::Protein::Scansite(3pm)

NAME
Bio::Tools::Analysis::Protein::Scansite - a wrapper around the Scansite server SYNOPSIS
use Bio::Tools::Analysis::Protein::Scansite; my $seq; # a Bio::PrimarySeqI object my $tool = Bio::Tools::Analysis::Protein::Scansite->new ( -seq => $seq->primary_seq ); # run Scansite prediction on a sequence $tool->run(); # alternatively you can say $tool->seq($seq->primary_seq)->run; die "Could not get a result" unless $tool->status =~ /^COMPLETED/; print $tool->result; # print raw prediction to STDOUT foreach my $feat ( $tool->result('Bio::SeqFeatureI') ) { # do something to SeqFeature # e.g. print as GFF print $feat->gff_string, " "; # or store within the sequence - if it is a Bio::RichSeqI $seq->add_SeqFeature($feat); } DESCRIPTION
This class is a wrapper around the Scansite 2.0 server which produces predictions for serine, threonine and tyrosine phosphorylation sites in eukaryotic proteins. At present this is a basic wrapper for the "Scan protein by input sequence" functionality, which takes a sequence and searches for motifs, with the option to select the search stringency. At present, searches for specific phosphorylation sites are not supported; all predicted sites are returned. Return formats The Scansite results can be obtained in several formats: 1. By calling my $res = $tool->result(''); $res holds a string of the predicted sites in tabular format. 2. By calling my $data_ref = $tool->result('value') $data_ref is a reference to an array of hashes. Each element in the array represents a predicted phosphorylation site. The hash keys are the names of the data fields,i.e., 'motif' => 'Casn_Kin1' # name of kinase 'percentile' => 0.155 # see Scansite docs 'position' => 9 # position in protein 'protein' => 'A1' # protein id 'score' => 0.3696 # see Scansite docs 'sequence' => 'ASYFDTASYFSADAT' # sequence surrounding site 'site' => 'S9' # phosphorylated residue 'zscore' => '-3.110' # see Scansite docs 3. By calling my @fts = $tool->Result('Bio::SeqFeatureI'); which returns an array of Bio::SeqFeatureI compliant objects with primary tag value 'Site' and tag names of 'motif', 'score', 'sequence', 'zscore' as above. See <http://scansite.mit.edu/>. This inherits Bio::SimpleAnalysisI which hopefully makes it easier to write wrappers on various services. This class uses a web resource and therefore inherits from Bio::WebAgent. SEE ALSO
Bio::SimpleAnalysisI, Bio::WebAgent FEEDBACK
Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists Support Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web: https://redmine.open-bio.org/projects/bioperl/ AUTHORS
Richard Adams, Richard.Adams@ed.ac.uk, APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ result Name : result Usage : $job->result (...) Returns : a result created by running an analysis Args : none (but an implementation may choose to add arguments for instructions how to process the raw result) The method returns a scalar representing a result of an executed job. If the job was terminated by an error, the result may contain an error message instead of the real data. This implementation returns differently processed data depending on argument: undef Returns the raw ASCII data stream but without HTML tags 'Bio::SeqFeatureI' The argument string defined the type of bioperl objects returned in an array. The objects are Bio::SeqFeature::Generic. 'parsed' Returns a reference to an array of hashes containing the data of one phosphorylation site prediction. Key values are: motif, percentile, position, protein, score, site, zscore, sequence. stringency Usage : $job->stringency(...) Returns : The significance stringency of a prediction Args : None (retrieves value) or 'High', 'Medium' or 'Low'. Purpose : Get/setter of the stringency to be sumitted for analysis. protein_id Usage : $job->protein_id(...) Returns : The sequence id of the protein or 'unnamed' if not set. Args : None Purpose : Getter of the seq_id. Returns the display_id of the sequence object. perl v5.14.2 2012-03-02 Bio::Tools::Analysis::Protein::Scansite(3pm)
All times are GMT -4. The time now is 05:31 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy