03-09-2011
renaming (renumbering) fasta files
I have a fasta file that looks like this:
>Noname
ACCAAAATAATTCATGATATACTCAGATCCATCTGAGGGTTTCACCACTTGTAGAGCTAT
CAGAAGAATGTCAATCAACTGTCCGAGAAAAAAGAATCCCAGG
>Noname
ACTATAAACCCTATTTCTCTTTCTAAAAATTGAAATATTAAAGAAACTAGCACTAGCCTG
ACCTTTAGCCAGACTTCTCACTCTTAATGCTGCGGACAAACAGA
...
I want to rename these starting with ">1", going to as many sequences as I have, which is on the order of 10^6.
I hear you can do this through a command line one-liner. A simple script would also be great.
It should be simple, I am just a shameless newbie and am stuck.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Other than deleting and recreating a user can a users id number be changed?
I need to have my user id the same on more than one system. (1 Reply)
Discussion started by: thumper
1 Replies
2. Shell Programming and Scripting
I am new to the world of UNIX scripting - and would like to make the following script:
I have 100 files numbered 1-100. However - i would like to continue the file list - so that I add another 100 files following on, so that file 101 = 99; file 102 = 98 ; 103 = 97 and so on....
(basically ...... (6 Replies)
Discussion started by: AJC1985
6 Replies
3. Shell Programming and Scripting
Hi,
I am a beginner in awk scripting! I need your help; I want to replace the fifth column number (which is 15 here) here in this file for example :
ATOM 142 N PRO A 15
ATOM 143 CD PRO A 15
ATOM 144 HD1 PRO A 15
ATOM ... (5 Replies)
Discussion started by: adak
5 Replies
4. UNIX for Dummies Questions & Answers
I would like to extract the sequences larger than 10 bases but shorter than 18 along with the identifier from a FASTA file that looks like this:
> Seq I
ACGACTAGACGATAGACGATAGA
> Seq 2
ACGATGACGTAGCAGT
> Seq 3
ACGATACGAT
I know I can extract the IDs alone with the following code
grep... (3 Replies)
Discussion started by: Xterra
3 Replies
5. Shell Programming and Scripting
Hi All
I have a folder that contains hundreds of file with a names
3.msa
4.msa
21.msa
6.msa
345.msa
456.msa
98.msa
...
...
...
I need rename each of this file by adding "core_" in the begiining of each file such as
core_3.msa
core_4.msa
core_21.msa (4 Replies)
Discussion started by: Lucky Ali
4 Replies
6. Shell Programming and Scripting
Hi All,
I have 100 files named - rep-0.txt, rep-2.txt...rep-99.txt.
They each contain information in the following format:
abc 1 qwe
asd 2 zxc
poi 3 lkj
pdh 4 ldf
hgf 5 tyu
i would like to re-number , so that all the new files (rep0.dat, rep1,dat....) have... (1 Reply)
Discussion started by: chen.xiao.po
1 Replies
7. UNIX for Dummies Questions & Answers
Hey,
I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time:
for i in *.rtf.out
do
awk '/^>/{f=++d".fasta"} {print > $i.out}' $i
done (1 Reply)
Discussion started by: Ann Mc Cartney
1 Replies
8. Shell Programming and Scripting
I am new to the world of Linux scripting, and would like to make the following 2 scripts:
I have 67 files named Alk-0001.txt to Alk-0067.txt
I would like them to be numbered Alk-002.txt to Alk-0134.txt
eg
Alk-0001.txt > Alk-0002.txt
Alk-0002.txt > Alk-0004.txt
Alk-0003.txt > Alk-0006.txt
... (3 Replies)
Discussion started by: tollyboy_uk
3 Replies
9. Shell Programming and Scripting
Hi,
In sftp script to get files, I have to rename all the files which I am picking. Rename command does not work here. Is there any way to do this?
I am using #!/bin/ksh
For eg: sftp user@host <<EOF
cd /path
get *.txt
rename *.txt *.txt.done
... (7 Replies)
Discussion started by: jhilmil
7 Replies
10. UNIX for Beginners Questions & Answers
I have two fasta files as shown below,
File:1
>Contig_1:90600-91187
AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGAACAGGAATTGATGACGGTC
>Contig_98:35323-35886
GACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCCTGGATCTGGTCGAAATCCAGCCGCAG
>Contig_24:26615-28387... (11 Replies)
Discussion started by: dineshkumarsrk
11 Replies
LEARN ABOUT DEBIAN
bio::seqio::tab
Bio::SeqIO::tab(3pm) User Contributed Perl Documentation Bio::SeqIO::tab(3pm)
NAME
Bio::SeqIO::tab - nearly raw sequence file input/output stream. Reads/writes id" "sequence"
"
SYNOPSIS
Do not use this module directly. Use it via the Bio::SeqIO class.
DESCRIPTION
This object can transform Bio::Seq objects to and from tabbed flat file databases.
It is very useful when doing large scale stuff using the Unix command line utilities (grep, sort, awk, sed, split, you name it). Imagine
that you have a format converter 'seqconvert' along the following lines:
my $in = Bio::SeqIO->newFh(-fh => *STDIN , '-format' => $from);
my $out = Bio::SeqIO->newFh(-fh=> *STDOUT, '-format' => $to);
print $out $_ while <$in>;
then you can very easily filter sequence files for duplicates as:
$ seqconvert < foo.fa -from fasta -to tab | sort -u |
seqconvert -from tab -to fasta > foo-unique.fa
Or grep [-v] for certain sequences with:
$ seqconvert < foo.fa -from fasta -to tab | grep -v '^S[a-z]*control' |
seqconvert -from tab -to fasta > foo-without-controls.fa
Or chop up a huge file with sequences into smaller chunks with:
$ seqconvert < all.fa -from fasta -to tab | split -l 10 - chunk-
$ for i in chunk-*; do seqconvert -from tab -to fasta < $i > $i.fa; done
# (this creates files chunk-aa.fa, chunk-ab.fa, ..., each containing 10
# sequences)
FEEDBACK
Mailing Lists
User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one
of the Bioperl mailing lists. Your participation is much appreciated.
bioperl-l@bioperl.org - General discussion
http://bioperl.org/wiki/Mailing_lists - About the mailing lists
Support
Please direct usage questions or support issues to the mailing list:
bioperl-l@bioperl.org
rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address
it. Please include a thorough description of the problem with code and data examples if at all possible.
Reporting Bugs
Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the
web:
https://redmine.open-bio.org/projects/bioperl/
AUTHORS
Philip Lijnzaad, p.lijnzaad@med.uu.nl
APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _
next_seq
Title : next_seq
Usage : $seq = $stream->next_seq()
Function: returns the next sequence in the stream
Returns : Bio::Seq object
Args :
write_seq
Title : write_seq
Usage : $stream->write_seq($seq)
Function: writes the $seq object into the stream
Returns : 1 for success and 0 for error
Args : Bio::Seq object
perl v5.14.2 2012-03-02 Bio::SeqIO::tab(3pm)