Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Extract specific contents from each line Post 302750567 by Don Cragun on Wednesday 2nd of January 2013 12:10:57 AM
Old 01-02-2013
bipinajith's proposal works as long as all of the allele= and alleles= entries are in the 2nd field in the input lines, and all of the allele= entries come after the alleles= entry, except that it doesn't put the requested "/" between allele= entries when more than one is present. It also provides an extra leading newline that wasn't requested.

The following should work as requested no matter what order they are in nor which fields contain allele and alleles entries even if multiple entries appear on the same line. It will also print multiple alleles= entries if they occur using a comma to separate subsequent occurrences:
Code:
awk -F ' *| *' 'function pr() { 
        if(r) printf("%s %s %s\n", r, p, s)
        p = r = s = ""
        n1 = 1
}
NF == 0 {pr()
         next
}
n1 {    n1 = 0
        r = $1
}
/allele/ {for(i = 1; i <= NF; i++) {
                if($i ~ /allele=/)
                        s = (s ? s "/" : "") substr($i, index($i, "=") + 1)
                if($i ~ /alleles=/)
                        p = (p ? p "," : "") substr($i, index($i, "=") + 1)        }
}END {   pr()}' input

As always, if you're using a Solaris system, use /usr/xpg4/bin/awk or nawk instead of awk.
This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

extract the lines between specific line number from a text file

Hi I want to extract certain text between two line numbers like 23234234324 and 54446655567567 How do I do this with a simple sed or awk command? Thank you. ---------- Post updated at 06:16 PM ---------- Previous update was at 05:55 PM ---------- found it: sed -n '#1,#2p'... (1 Reply)
Discussion started by: return_user
1 Replies

2. Shell Programming and Scripting

Shell script or command help to extract specific contents from a long list of content

Hi, I got a long list of contents: >sequence_1 ASSSSSSSSSSSDDDDDDDDDDDCCCCCCC ASDSFDFFDFDFFWERERERERFSDFESFSFD >sequence_2 ASDFDFDFFDDFFDFDSFDSFDFSDFSDFDSFASDSADSADASD ASDFFDFDFASFASFASFAFSFFSDASFASFASFAFS >sequence_3 VEDFGSDGSDGSDGSDGSDGSDGSDG dDFSDFSDFSDFSDFSDFSDFSDFSDF... (2 Replies)
Discussion started by: patrick87
2 Replies

3. Shell Programming and Scripting

extract specific line if the search pattern is found

Hi, I need to extract <APPNUMBER> tag alone, if the <college> haas IIT Chennai value. college tag value will have spaces embedded. Those spaces should not be suppresses. My Source file <Record><sno>1</sno><empid>E0001</empid><name>Rejsh suderam</name><college>IIT ... (3 Replies)
Discussion started by: Sekar1
3 Replies

4. Shell Programming and Scripting

Extract a specific line from a stream

Hello, I'm trying to code a bash script and I was wondering how to extract a specific line from a stream. E.g. My file "file" contains three lines and i'd like to find a function f which returns after execution a specific line like the second line, which would be : f(file, 2) = Second... (4 Replies)
Discussion started by: Oddant
4 Replies

5. Shell Programming and Scripting

Extract character between specific line numbers

Hi guys, I have txt file and I would need to extract all the contents between specific line numbers. Line 1: apple Line 2: orange Line 3: mango Line 4: grapes Line 5: pine apple I need to extract the content between line 2 and 4, including the contents of Line 2 and 4 so the ouput... (2 Replies)
Discussion started by: gowrishankar05
2 Replies

6. Shell Programming and Scripting

Using awk to read a specific line and a specific field on that line.

Say the input was as follows: Brat 20 x 1000 32rf Pour 15 p 1621 05pr Dart 10 z 1111 22xx My program prompts for an input, what I want is to use the input to locate a specific field. Like if I type in, "Pou" then it would return "Pour" and just "Pour" I currently have this line but it is... (6 Replies)
Discussion started by: Bungkai
6 Replies

7. Shell Programming and Scripting

how to read the contents of two files line by line and compare the line by line?

Hi All, I'm trying to figure out which are the trusted-ips and which are not using a script file.. I have a file named 'ip-list.txt' which contains some ip addresses and another file named 'trusted-ip-list.txt' which also contains some ip addresses. I want to read a line from... (4 Replies)
Discussion started by: mjavalkar
4 Replies

8. Shell Programming and Scripting

sed or awk, cut, to extract specific data from line

Hi guys, I have been trying to do this, but... no luck so maybe you can help me. I have a line like this: Total Handled, Received, on queue Input Mgs: 140 / 14 => 0 I need to, get the number after the / until the =, to get only 14 . Any help is greatly appreciated. Thanks, (4 Replies)
Discussion started by: ocramas
4 Replies

9. Shell Programming and Scripting

sed to replace specific positions on line with file contents

Hi, I am trying to use an awk command to replace specific character positions on a line beginning with 80 with contents of another file. The line beginning with 80 in file1 is as follows: I want to replace the 000000000178800 (positions 34 - 49) on this file with the contents of... (2 Replies)
Discussion started by: nwalsh88
2 Replies

10. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies
Bio::Variation::VariantI(3pm)				User Contributed Perl Documentation			     Bio::Variation::VariantI(3pm)

NAME
Bio::Variation::VariantI - Sequence Change SeqFeature abstract class SYNOPSIS
#get Bio::Variant::VariantI somehow print $var->restriction_changes, " "; foreach $allele ($var->each_Allele) { #work on Bio::Variation::Allele objects } DESCRIPTION
This superclass defines common methods to basic sequence changes. The instantiable classes Bio::Variation::DNAMutation, Bio::Variation::RNAChange and Bio::Variation::AAChange use them. See Bio::Variation::DNAMutation, Bio::Variation::RNAChange, and Bio::Variation::AAChange for more information. These classes store information, heavy computation to detemine allele sequences is done elsewhere. The database cross-references are implemented as Bio::Annotation::DBLink objects. The methods to access them are defined in Bio::DBLinkContainerI. See Bio::Annotation::DBLink and Bio::DBLinkContainerI for details. Bio::Variation::VariantI redifines and extends Bio::SeqFeature::Generic for sequence variations. This class describes specific sequence change events. These events are always from a specific reference sequence to something different. See Bio::SeqFeature::Generic for more information. IMPORTANT: The notion of reference sequence permeates all Bio::Variation classes. This is especially important to remember when dealing with Alleles. In a polymorphic site, there can be a large number of alleles. One of then has to be selected to be the reference allele (allele_ori). ALL the rest has to be passed to the Variant using the method add_Allele, including the mutated allele in a canonical mutation. The IO modules and generated attributes depend on it. They ignore the allele linked to using allele_mut and circulate each Allele returned by each_Allele into allele_mut and calculate the changes between that and allele_ori. FEEDBACK
Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing lists Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists Support Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web: https://redmine.open-bio.org/projects/bioperl/ AUTHOR - Heikki Lehvaslaiho Email: heikki-at-bioperl-dot-org APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ id Title : id Usage : $obj->id Function: Read only method. Returns the id of the variation object. The id is the id of the first DBLink object attached to this object. Example : Returns : scalar Args : none add_Allele Title : add_Allele Usage : $self->add_Allele($allele) Function: Adds one Bio::Variation::Allele into the list of alleles. Note that the method forces the convention that nucleotide sequence is in lower case and amino acds are in upper case. Example : Returns : 1 when succeeds, 0 for failure. Args : Allele object each_Allele Title : alleles Usage : $obj->each_Allele(); Function: Returns a list of Bio::Variation::Allele objects Example : Returns : list of Alleles Args : none isMutation Title : isMutation Usage : print join('/', $obj->each_Allele) if not $obj->isMutation; Function: Returns or sets the boolean value indicating that the variant descibed is a canonical mutation with two alleles assinged to be the original (wild type) allele and mutated allele, respectively. If this value is not set, it is assumed that the Variant descibes polymorphisms. Returns : a boolean allele_ori Title : allele_ori Usage : $obj->allele_ori(); Function: Links to and returns the Bio::Variation::Allele object. If value is not set, returns false. All other Alleles are compared to this. Amino acid sequences are stored in upper case characters, others in lower case. Example : Returns : string Args : string See Bio::Variation::Allele for more. allele_mut Title : allele_mut Usage : $obj->allele_mut(); Function: Links to and returns the Bio::Variation::Allele object. Sets and returns the mutated allele sequence. If value is not set, returns false. Amino acid sequences are stored in upper case characters, others in lower case. Example : Returns : string Args : string See Bio::Variation::Allele for more. length Title : length Usage : $obj->length(); Function: Sets and returns the length of the affected original allele sequence. If value is not set, returns false == 0. Value 0 means that the variant position is before the start=end sequence position. (Value 1 would denote a point mutation). This follows the convension to report an insertion(2insT) in equivalent way to a corresponding deletion(2delT) (Think about indel polymorpism ATC <=> AC where the origianal state is not known ). Example : Returns : string Args : string upStreamSeq Title : upStreamSeq Usage : $obj->upStreamSeq(); Function: Sets and returns upstream flanking sequence string. If value is not set, returns false. The sequence should be >=25 characters long, if possible. Example : Returns : string or false Args : string dnStreamSeq Title : dnStreamSeq Usage : $obj->dnStreamSeq(); Function: Sets and returns dnstream flanking sequence string. If value is not set, returns false. The sequence should be >=25 characters long, if possible. Example : Returns : string or false Args : string label Title : label Usage : $obj->label(); Function: Sets and returns mutation event label(s). If value is not set, or no argument is given returns false. Each instantiable class needs to implement this method. Valid values are listed in 'Mutation event controlled vocabulary' in http://www.ebi.ac.uk/mutations/recommendations/mutevent.html. Example : Returns : string Args : string status Title : status Usage : $obj->status() Function: Returns the status of the sequence change object. Valid values are: 'suspected' and 'proven' Example : $obj->status('proven'); Returns : scalar Args : valid string (optional, for setting) proof Title : proof Usage : $obj->proof() Function: Returns the proof of the sequence change object. Valid values are: 'computed' and 'experimental'. Example : $obj->proof('computed'); Returns : scalar Args : valid string (optional, for setting) region Title : region Usage : $obj->region(); Function: Sets and returns the name of the sequence region type or protein domain at this location. If value is not set, returns false. Example : Returns : string Args : string region_value Title : region_value Usage : $obj->region_value(); Function: Sets and returns the name of the sequence region_value or protein domain at this location. If value is not set, returns false. Example : Returns : string Args : string region_dist Title : region_dist Usage : $obj->region_dist(); Function: Sets and returns the distance tot the closest region (i.e. intro/exon or domain) boundary. If distance is not set, returns false. Example : Returns : integer Args : integer numbering Title : numbering Usage : $obj->numbering() Function: Returns the numbering chema used locating sequnce features. Valid values are: 'entry' and 'coding' Example : $obj->numbering('coding'); Returns : scalar Args : valid string (optional, for setting) mut_number Title : mut_number Usage : $num = $obj->mut_number; : $num = $obj->mut_number($number); Function: Returns or sets the number identifying the order in which the mutation has been issued. Numbers shouldstart from 1. If the number has never been set, the method will return '' If you want the output from IO modules look nice and, for multivariant/allele variations, make sense you better set this attribute. Returns : an integer SeqDiff Title : SeqDiff Usage : $mutobj = $obj->SeqDiff; : $mutobj = $obj->SeqDiff($objref); Function: Returns or sets the link-reference to the umbrella Bio::Variation::SeqDiff object. If there is no link, it will return undef Note: Adding a variant into a SeqDiff object will automatically set this value. Returns : an obj_ref or undef See Bio::Variation::SeqDiff for more information. add_DBLink Title : add_DBLink Usage : $self->add_DBLink($ref) Function: adds a link object Example : Returns : Args : each_DBLink Title : each_DBLink Usage : foreach $ref ( $self->each_DBlink() ) Function: gets an array of DBlink of objects Example : Returns : Args : restriction_changes Title : restriction_changes Usage : $obj->restriction_changes(); Function: Returns a string containing a list of restriction enzyme changes of form +EcoRI, separated by commas. Strings need to be valid restriction enzyme names as stored in REBASE. allele_ori and allele_mut need to be assigned. Example : Returns : string Args : string perl v5.14.2 2012-03-02 Bio::Variation::VariantI(3pm)
All times are GMT -4. The time now is 07:25 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy