03-09-2011
renaming (renumbering) fasta files
I have a fasta file that looks like this:
>Noname
ACCAAAATAATTCATGATATACTCAGATCCATCTGAGGGTTTCACCACTTGTAGAGCTAT
CAGAAGAATGTCAATCAACTGTCCGAGAAAAAAGAATCCCAGG
>Noname
ACTATAAACCCTATTTCTCTTTCTAAAAATTGAAATATTAAAGAAACTAGCACTAGCCTG
ACCTTTAGCCAGACTTCTCACTCTTAATGCTGCGGACAAACAGA
...
I want to rename these starting with ">1", going to as many sequences as I have, which is on the order of 10^6.
I hear you can do this through a command line one-liner. A simple script would also be great.
It should be simple, I am just a shameless newbie and am stuck.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Other than deleting and recreating a user can a users id number be changed?
I need to have my user id the same on more than one system. (1 Reply)
Discussion started by: thumper
1 Replies
2. Shell Programming and Scripting
I am new to the world of UNIX scripting - and would like to make the following script:
I have 100 files numbered 1-100. However - i would like to continue the file list - so that I add another 100 files following on, so that file 101 = 99; file 102 = 98 ; 103 = 97 and so on....
(basically ...... (6 Replies)
Discussion started by: AJC1985
6 Replies
3. Shell Programming and Scripting
Hi,
I am a beginner in awk scripting! I need your help; I want to replace the fifth column number (which is 15 here) here in this file for example :
ATOM 142 N PRO A 15
ATOM 143 CD PRO A 15
ATOM 144 HD1 PRO A 15
ATOM ... (5 Replies)
Discussion started by: adak
5 Replies
4. UNIX for Dummies Questions & Answers
I would like to extract the sequences larger than 10 bases but shorter than 18 along with the identifier from a FASTA file that looks like this:
> Seq I
ACGACTAGACGATAGACGATAGA
> Seq 2
ACGATGACGTAGCAGT
> Seq 3
ACGATACGAT
I know I can extract the IDs alone with the following code
grep... (3 Replies)
Discussion started by: Xterra
3 Replies
5. Shell Programming and Scripting
Hi All
I have a folder that contains hundreds of file with a names
3.msa
4.msa
21.msa
6.msa
345.msa
456.msa
98.msa
...
...
...
I need rename each of this file by adding "core_" in the begiining of each file such as
core_3.msa
core_4.msa
core_21.msa (4 Replies)
Discussion started by: Lucky Ali
4 Replies
6. Shell Programming and Scripting
Hi All,
I have 100 files named - rep-0.txt, rep-2.txt...rep-99.txt.
They each contain information in the following format:
abc 1 qwe
asd 2 zxc
poi 3 lkj
pdh 4 ldf
hgf 5 tyu
i would like to re-number , so that all the new files (rep0.dat, rep1,dat....) have... (1 Reply)
Discussion started by: chen.xiao.po
1 Replies
7. UNIX for Dummies Questions & Answers
Hey,
I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time:
for i in *.rtf.out
do
awk '/^>/{f=++d".fasta"} {print > $i.out}' $i
done (1 Reply)
Discussion started by: Ann Mc Cartney
1 Replies
8. Shell Programming and Scripting
I am new to the world of Linux scripting, and would like to make the following 2 scripts:
I have 67 files named Alk-0001.txt to Alk-0067.txt
I would like them to be numbered Alk-002.txt to Alk-0134.txt
eg
Alk-0001.txt > Alk-0002.txt
Alk-0002.txt > Alk-0004.txt
Alk-0003.txt > Alk-0006.txt
... (3 Replies)
Discussion started by: tollyboy_uk
3 Replies
9. Shell Programming and Scripting
Hi,
In sftp script to get files, I have to rename all the files which I am picking. Rename command does not work here. Is there any way to do this?
I am using #!/bin/ksh
For eg: sftp user@host <<EOF
cd /path
get *.txt
rename *.txt *.txt.done
... (7 Replies)
Discussion started by: jhilmil
7 Replies
10. UNIX for Beginners Questions & Answers
I have two fasta files as shown below,
File:1
>Contig_1:90600-91187
AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGAACAGGAATTGATGACGGTC
>Contig_98:35323-35886
GACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCCTGGATCTGGTCGAAATCCAGCCGCAG
>Contig_24:26615-28387... (11 Replies)
Discussion started by: dineshkumarsrk
11 Replies
LEARN ABOUT DEBIAN
bp_mask_by_search
BP_MASK_BY_SEARCH(1p) User Contributed Perl Documentation BP_MASK_BY_SEARCH(1p)
NAME
mask_by_search - mask sequence(s) based on its alignment results
SYNOPSIS
mask_by_search.pl -f blast genomefile blastfile.bls > maskedgenome.fa
DESCRIPTION
Mask sequence based on significant alignments of another sequence. You need to provide the report file and the entire sequence data which
you want to mask. By default this will assume you have done a TBLASTN (or TFASTY) and try and mask the hit sequence assuming you've
provided the sequence file for the hit database. If you would like to do the reverse and mask the query sequence specify the -t/--type
query flag.
This is going to read in the whole sequence file into memory so for large genomes this may fall over. I'm using DB_File to prevent keeping
everything in memory, one solution is to split the genome into pieces (BEFORE you run the DB search though, you want to use the exact file
you BLASTed with as input to this program).
Below the double dash (--) options are of the form --format=fasta or --format fasta or you can just say -f fasta
By -f/--format I mean either are acceptable options. The =s or =n or =c specify these arguments expect a 'string'
Options:
-f/--format=s Search report format (fasta,blast,axt,hmmer,etc)
-sf/--sformat=s Sequence format (fasta,genbank,embl,swissprot)
--hardmask (booelean) Hard mask the sequence
with the maskchar [default is lowercase mask]
--maskchar=c Character to mask with [default is N], change
to 'X' for protein sequences
-e/--evalue=n Evalue cutoff for HSPs and Hits, only
mask sequence if alignment has specified evalue
or better
-o/--out/
--outfile=file Output file to save the masked sequence to.
-t/--type=s Alignment seq type you want to mask, the
'hit' or the 'query' sequence. [default is 'hit']
--minlen=n Minimum length of an HSP for it to be used
in masking [default 0]
-h/--help See this help information
AUTHOR - Jason Stajich
Jason Stajich, jason-at-bioperl-dot-org.
perl v5.14.2 2012-03-02 BP_MASK_BY_SEARCH(1p)