Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Find & Replace command - Fasta file Post 302580242 by Cevin21 on Thursday 8th of December 2011 02:35:18 AM
Old 12-08-2011
Find & Replace command - Fasta file

Hi all !

I have a fasta file that looks like that:

>Sequence1
RTYIPLCASQHKLCPITFLAVK

(it's just an example, obviously in reality I have several pairs of lines like that)

Using UNIX command(s), would it be possible to replace all the characters except the "C" of the second line only by a dash. So without modifying the first line.
To be able to obtain this:

>Sequence1
------C------C--------

Or at least which command would you use? (grep, awk, perl, sed...)

Thank you very much for your help !!!
Cevin21
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find & Replace

I get a text file with 70+ columns (seperated by Tab) and about 10000 rows. The 58th Column is all numbers. But sometimes 58th columns has "/xxx=##" after the numeric data. I want to truncate this string using the script. Any Ideas...:confused: (3 Replies)
Discussion started by: gagansharma
3 Replies

2. UNIX for Dummies Questions & Answers

how to use sed or perl command to find and replace a directory in a file

how to use sed command to find and replace a directory i have a file.. which contains lot of paths ... for eg.. file contains.. /usr/kk/rr/12345/1 /usr/kk/rr/12345/2 /usr/kk/rr/12345/3 /usr/kk/rr/12345/4 /usr/kk/rr/12345/5 /usr/kk/rr/12345/6 /usr/kk/rr/12345/7... (1 Reply)
Discussion started by: wip_vasikaran
1 Replies

3. Shell Programming and Scripting

Find & Replace string in multiple files & folders using perl

find . -type f -name "*.sql" -print|xargs perl -i -pe 's/pattern/replaced/g' this is simple logic to find and replace in multiple files & folders Hope this helps. Thanks Zaheer (0 Replies)
Discussion started by: Zaheer.mic
0 Replies

4. Shell Programming and Scripting

find & replace comma in a .csv file.

HI, Please find the text below. I receive a .csv file on server. I need the comma(,) in the second column to be replaced by a semi-colon( ; ). How to do it. Please help. Sample text: "1","lastname1,firstname1","xxxxxx","19/10/2009","23/10/2009","0","N","Leave"... (2 Replies)
Discussion started by: libin4u2000
2 Replies

5. Shell Programming and Scripting

How to use grep & find command to find references to a particular file

Hi all , I'm new to unix I have a checked project , there exists a file called xxx.config . now my task is to find all the files in the checked out project which references to this xxx.config file. how do i use grep or find command . (2 Replies)
Discussion started by: Gangam
2 Replies

6. Shell Programming and Scripting

Find & replace --> create a new file

Hi All, I have a unix shell script file as below. My task is a)to replace 248 to 350 and need to create a new file as BW3_350.sh b)to replace 248 to 380 and need to create a new file as BW3_380.sh c)to replace 248 to 320 and need to create a new file as BW3_320.sh there is no... (6 Replies)
Discussion started by: karthi_mrkg
6 Replies

7. Solaris

Monitoring log file for entries - Find command & sorting

hi, I would like to monitor a log file, which rolls over, everytime a server is restarted. I would like to grep for a string, and to be more efficient i'd like to grep only newly appended data. so something like a 'tail -f' would do, however, as the log rolls over i think a 'tail -F' is... (2 Replies)
Discussion started by: horhif
2 Replies

8. UNIX for Dummies Questions & Answers

Find & Replace

Hi I am looking to rename the contents of this dir, each one with a new timestamp, interval of a second for each so it the existing format is on lhs and what I want is to rename each of these to what is on rhs..hopefully it nake sense CDR.20060505.150006.gb CDR.20121211.191500.gb... (3 Replies)
Discussion started by: rob171171
3 Replies

9. Shell Programming and Scripting

Command Line Perl for parsing fasta file

I would like to take a fasta file formated like >0001 agttcgaggtcagaatt >0002 agttcgag >0003 ggtaacctga and use command line perl to move the all sample gt 8 in length to a new file. the result would be >0001 agttcgaggtcagaatt >0003 ggtaacctga cat ${sample}.fasta | perl -lane... (2 Replies)
Discussion started by: jdilts
2 Replies

10. UNIX for Beginners Questions & Answers

How to find a specific sequence pattern in a fasta file?

I have to mine the following sequence pattern from a large fasta file namely gene.fasta (contains multiple fasta sequences) along with the flanking sequences of 5 bases at starting position and ending position, AAGCZ-N16-AAGCZ Z represents A, C or G (Except T) N16 represents any of the four... (3 Replies)
Discussion started by: dineshkumarsrk
3 Replies
Bio::AlignIO(3pm)					User Contributed Perl Documentation					 Bio::AlignIO(3pm)

NAME
Bio::AlignIO - Handler for AlignIO Formats SYNOPSIS
use Bio::AlignIO; $inputfilename = "testaln.fasta"; $in = Bio::AlignIO->new(-file => $inputfilename , -format => 'fasta'); $out = Bio::AlignIO->new(-file => ">out.aln.pfam" , -format => 'pfam'); while ( my $aln = $in->next_aln() ) { $out->write_aln($aln); } # OR use Bio::AlignIO; open MYIN,"testaln.fasta"; $in = Bio::AlignIO->newFh(-fh => *MYIN, -format => 'fasta'); open my $MYOUT, '>', 'testaln.pfam'; $out = Bio::AlignIO->newFh(-fh => $MYOUT, -format => 'pfam'); # World's smallest Fasta<->pfam format converter: print $out $_ while <$in>; DESCRIPTION
Bio::AlignIO is a handler module for the formats in the AlignIO set, for example, Bio::AlignIO::fasta. It is the officially sanctioned way of getting at the alignment objects. The resulting alignment is a Bio::Align::AlignI-compliant object. The idea is that you request an object for a particular format. All the objects have a notion of an internal file that is read from or written to. A particular AlignIO object instance is configured for either input or output, you can think of it as a stream object. Each object has functions: $stream->next_aln(); And: $stream->write_aln($aln); Also: $stream->type() # returns 'INPUT' or 'OUTPUT' As an added bonus, you can recover a filehandle that is tied to the AlignIO object, allowing you to use the standard <> and print operations to read and write alignment objects: use Bio::AlignIO; # read from standard input $stream = Bio::AlignIO->newFh(-format => 'Fasta'); while ( $aln = <$stream> ) { # do something with $aln } And: print $stream $aln; # when stream is in output mode Bio::AlignIO is patterned on the Bio::SeqIO module and shares most of its features. One significant difference is that Bio::AlignIO usually handles IO for only a single alignment at a time, whereas Bio::SeqIO handles IO for multiple sequences in a single stream. The principal reason for this is that whereas simultaneously handling multiple sequences is a common requirement, simultaneous handling of multiple alignments is not. The only current exception is format "bl2seq" which parses results of the BLAST "bl2seq" program and which may produce several alignment pairs. This set of alignment pairs can be read using multiple calls to next_aln. CONSTRUCTORS
Bio::AlignIO->new() $seqIO = Bio::AlignIO->new(-file => 'filename', -format=>$format); $seqIO = Bio::AlignIO->new(-fh => *FILEHANDLE, -format=>$format); $seqIO = Bio::AlignIO->new(-format => $format); $seqIO = Bio::AlignIO->new(-fh => *STDOUT, -format => $format); The new class method constructs a new Bio::AlignIO object. The returned object can be used to retrieve or print alignment objects. new accepts the following parameters: -file A file path to be opened for reading or writing. The usual Perl conventions apply: 'file' # open file for reading '>file' # open file for writing '>>file' # open file for appending '+<file' # open file read/write 'command |' # open a pipe from the command '| command' # open a pipe to the command -fh You may provide new() with a previously-opened filehandle. For example, to read from STDIN: $seqIO = Bio::AlignIO->new(-fh => *STDIN); Note that you must pass filehandles as references to globs. If neither a filehandle nor a filename is specified, then the module will read from the @ARGV array or STDIN, using the familiar <> semantics. -format Specify the format of the file. Supported formats include: bl2seq Bl2seq Blast output clustalw clustalw (.aln) format emboss EMBOSS water and needle format fasta FASTA format maf Multiple Alignment Format mase mase (seaview) format mega MEGA format meme MEME format msf msf (GCG) format nexus Swofford et al NEXUS format pfam Pfam sequence alignment format phylip Felsenstein PHYLIP format prodom prodom (protein domain) format psi PSI-BLAST format selex selex (hmmer) format stockholm stockholm format Currently only those formats which were implemented in Bio::SimpleAlign have been incorporated into Bio::AlignIO. Specifically, "mase", "stockholm" and "prodom" have only been implemented for input. See the specific module (e.g. Bio::AlignIO::prodom) for notes on supported versions. If no format is specified and a filename is given, then the module will attempt to deduce it from the filename suffix. If this is unsuccessful, "fasta" format is assumed. The format name is case insensitive; "FASTA", "Fasta" and "fasta" are all treated equivalently. Bio::AlignIO->newFh() $fh = Bio::AlignIO->newFh(-fh => *FILEHANDLE, -format=>$format); # read from STDIN or use @ARGV: $fh = Bio::AlignIO->newFh(-format => $format); This constructor behaves like new, but returns a tied filehandle rather than a Bio::AlignIO object. You can read sequences from this object using the familiar <> operator, and write to it using print. The usual array and $_ semantics work. For example, you can read all sequence objects into an array like this: @sequences = <$fh>; Other operations, such as read(), sysread(), write(), close(), and printf() are not supported. -flush By default, all files (or filehandles) opened for writing alignments will be flushed after each write_aln() making the file immediately usable. If you do not need this facility and would like to marginally improve the efficiency of writing multiple sequences to the same file (or filehandle), pass the -flush option '0' or any other value that evaluates as defined but false: my $clustal = Bio::AlignIO->new( -file => "<prot.aln", -format => "clustalw" ); my $msf = Bio::AlignIO->new(-file => ">prot.msf", -format => "msf", -flush => 0 ); # go as fast as we can! while($seq = $clustal->next_aln) { $msf->write_aln($seq) } OBJECT METHODS
See below for more detailed summaries. The main methods are: $alignment = $AlignIO->next_aln() Fetch an alignment from a formatted file. $AlignIO->write_aln($aln) Write the specified alignment to a file.. TIEHANDLE(), READLINE(), PRINT() These provide the tie interface. See perltie for more details. FEEDBACK
Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists Support Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web: https://redmine.open-bio.org/projects/bioperl/ AUTHOR - Peter Schattner Email: schattner@alum.mit.edu CONTRIBUTORS
Jason Stajich, jason@bioperl.org APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ new Title : new Usage : $stream = Bio::AlignIO->new(-file => $filename, -format => 'Format') Function: Returns a new seqstream Returns : A Bio::AlignIO::Handler initialised with the appropriate format Args : -file => $filename -format => format -fh => filehandle to attach to -displayname_flat => 1 [optional] to force the displayname to not show start/end information newFh Title : newFh Usage : $fh = Bio::AlignIO->newFh(-file=>$filename,-format=>'Format') Function: does a new() followed by an fh() Example : $fh = Bio::AlignIO->newFh(-file=>$filename,-format=>'Format') $sequence = <$fh>; # read a sequence object print $fh $sequence; # write a sequence object Returns : filehandle tied to the Bio::AlignIO::Fh class Args : fh Title : fh Usage : $obj->fh Function: Example : $fh = $obj->fh; # make a tied filehandle $sequence = <$fh>; # read a sequence object print $fh $sequence; # write a sequence object Returns : filehandle tied to the Bio::AlignIO::Fh class Args : _load_format_module Title : _load_format_module Usage : *INTERNAL AlignIO stuff* Function: Loads up (like use) a module at run time on demand Example : Returns : Args : next_aln Title : next_aln Usage : $aln = stream->next_aln Function: reads the next $aln object from the stream Returns : a Bio::Align::AlignI compliant object Args : write_aln Title : write_aln Usage : $stream->write_aln($aln) Function: writes the $aln object into the stream Returns : 1 for success and 0 for error Args : Bio::Seq object _guess_format Title : _guess_format Usage : $obj->_guess_format($filename) Function: Example : Returns : guessed format of filename (lower case) Args : force_displayname_flat Title : force_displayname_flat Usage : $obj->force_displayname_flat($newval) Function: Example : Returns : value of force_displayname_flat (a scalar) Args : on set, new value (a scalar or undef, optional) alphabet Title : alphabet Usage : $obj->alphabet($newval) Function: Get/Set alphabet for purpose of passing to Bio::LocatableSeq creation Example : $obj->alphabet('dna'); Returns : value of alphabet (a scalar) Args : on set, new value (a scalar or undef, optional) perl v5.14.2 2012-03-02 Bio::AlignIO(3pm)
All times are GMT -4. The time now is 03:26 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy