Sponsored Content
Top Forums UNIX for Dummies Questions & Answers renaming (renumbering) fasta files Post 302503174 by Oyster on Wednesday 9th of March 2011 09:12:12 PM
Old 03-09-2011
renaming (renumbering) fasta files

I have a fasta file that looks like this:

>Noname
ACCAAAATAATTCATGATATACTCAGATCCATCTGAGGGTTTCACCACTTGTAGAGCTAT
CAGAAGAATGTCAATCAACTGTCCGAGAAAAAAGAATCCCAGG
>Noname
ACTATAAACCCTATTTCTCTTTCTAAAAATTGAAATATTAAAGAAACTAGCACTAGCCTG
ACCTTTAGCCAGACTTCTCACTCTTAATGCTGCGGACAAACAGA
...

I want to rename these starting with ">1", going to as many sequences as I have, which is on the order of 10^6.

I hear you can do this through a command line one-liner. A simple script would also be great.

It should be simple, I am just a shameless newbie and am stuck.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

renumbering user id

Other than deleting and recreating a user can a users id number be changed? I need to have my user id the same on more than one system. (1 Reply)
Discussion started by: thumper
1 Replies

2. Shell Programming and Scripting

Renumbering files scripting help!

I am new to the world of UNIX scripting - and would like to make the following script: I have 100 files numbered 1-100. However - i would like to continue the file list - so that I add another 100 files following on, so that file 101 = 99; file 102 = 98 ; 103 = 97 and so on.... (basically ...... (6 Replies)
Discussion started by: AJC1985
6 Replies

3. Shell Programming and Scripting

column renumbering

Hi, I am a beginner in awk scripting! I need your help; I want to replace the fifth column number (which is 15 here) here in this file for example : ATOM 142 N PRO A 15 ATOM 143 CD PRO A 15 ATOM 144 HD1 PRO A 15 ATOM ... (5 Replies)
Discussion started by: adak
5 Replies

4. UNIX for Dummies Questions & Answers

grep FASTA files

I would like to extract the sequences larger than 10 bases but shorter than 18 along with the identifier from a FASTA file that looks like this: > Seq I ACGACTAGACGATAGACGATAGA > Seq 2 ACGATGACGTAGCAGT > Seq 3 ACGATACGAT I know I can extract the IDs alone with the following code grep... (3 Replies)
Discussion started by: Xterra
3 Replies

5. Shell Programming and Scripting

renaming files or adding a name in the beginning of all files in a folder

Hi All I have a folder that contains hundreds of file with a names 3.msa 4.msa 21.msa 6.msa 345.msa 456.msa 98.msa ... ... ... I need rename each of this file by adding "core_" in the begiining of each file such as core_3.msa core_4.msa core_21.msa (4 Replies)
Discussion started by: Lucky Ali
4 Replies

6. Shell Programming and Scripting

renumbering within a file

Hi All, I have 100 files named - rep-0.txt, rep-2.txt...rep-99.txt. They each contain information in the following format: abc 1 qwe asd 2 zxc poi 3 lkj pdh 4 ldf hgf 5 tyu i would like to re-number , so that all the new files (rep0.dat, rep1,dat....) have... (1 Reply)
Discussion started by: chen.xiao.po
1 Replies

7. UNIX for Dummies Questions & Answers

Breaking a fasta formatted file into multiple files containing each gene separately

Hey, I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time: for i in *.rtf.out do awk '/^>/{f=++d".fasta"} {print > $i.out}' $i done (1 Reply)
Discussion started by: Ann Mc Cartney
1 Replies

8. Shell Programming and Scripting

Renumbering files bash script

I am new to the world of Linux scripting, and would like to make the following 2 scripts: I have 67 files named Alk-0001.txt to Alk-0067.txt I would like them to be numbered Alk-002.txt to Alk-0134.txt eg Alk-0001.txt > Alk-0002.txt Alk-0002.txt > Alk-0004.txt Alk-0003.txt > Alk-0006.txt ... (3 Replies)
Discussion started by: tollyboy_uk
3 Replies

9. Shell Programming and Scripting

Renaming multiple files in sftp server in a get files script

Hi, In sftp script to get files, I have to rename all the files which I am picking. Rename command does not work here. Is there any way to do this? I am using #!/bin/ksh For eg: sftp user@host <<EOF cd /path get *.txt rename *.txt *.txt.done ... (7 Replies)
Discussion started by: jhilmil
7 Replies

10. UNIX for Beginners Questions & Answers

How to append two fasta files?

I have two fasta files as shown below, File:1 >Contig_1:90600-91187 AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGAACAGGAATTGATGACGGTC >Contig_98:35323-35886 GACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCCTGGATCTGGTCGAAATCCAGCCGCAG >Contig_24:26615-28387... (11 Replies)
Discussion started by: dineshkumarsrk
11 Replies
BP_MASK_BY_SEARCH(1p)					User Contributed Perl Documentation				     BP_MASK_BY_SEARCH(1p)

NAME
mask_by_search - mask sequence(s) based on its alignment results SYNOPSIS
mask_by_search.pl -f blast genomefile blastfile.bls > maskedgenome.fa DESCRIPTION
Mask sequence based on significant alignments of another sequence. You need to provide the report file and the entire sequence data which you want to mask. By default this will assume you have done a TBLASTN (or TFASTY) and try and mask the hit sequence assuming you've provided the sequence file for the hit database. If you would like to do the reverse and mask the query sequence specify the -t/--type query flag. This is going to read in the whole sequence file into memory so for large genomes this may fall over. I'm using DB_File to prevent keeping everything in memory, one solution is to split the genome into pieces (BEFORE you run the DB search though, you want to use the exact file you BLASTed with as input to this program). Below the double dash (--) options are of the form --format=fasta or --format fasta or you can just say -f fasta By -f/--format I mean either are acceptable options. The =s or =n or =c specify these arguments expect a 'string' Options: -f/--format=s Search report format (fasta,blast,axt,hmmer,etc) -sf/--sformat=s Sequence format (fasta,genbank,embl,swissprot) --hardmask (booelean) Hard mask the sequence with the maskchar [default is lowercase mask] --maskchar=c Character to mask with [default is N], change to 'X' for protein sequences -e/--evalue=n Evalue cutoff for HSPs and Hits, only mask sequence if alignment has specified evalue or better -o/--out/ --outfile=file Output file to save the masked sequence to. -t/--type=s Alignment seq type you want to mask, the 'hit' or the 'query' sequence. [default is 'hit'] --minlen=n Minimum length of an HSP for it to be used in masking [default 0] -h/--help See this help information AUTHOR - Jason Stajich Jason Stajich, jason-at-bioperl-dot-org. perl v5.14.2 2012-03-02 BP_MASK_BY_SEARCH(1p)
All times are GMT -4. The time now is 11:09 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy