Change sequence names in fasta file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Change sequence names in fasta file
# 1  
Old 02-27-2013
Change sequence names in fasta file

I have fasta files with multiple sequences in each. I need to change the sequence name headers from:

Code:
>accD:_59176-60699
ATGGAAAAGTGGAGGATTTATTCGTTTCAGAAGGAGTTCGAACGCA
>atpA_(reverse_strand):_showing_revcomp_of_10525-12048
ATGGTAACCATTCAAGCCGACGAAATTAGTAATCTTATCCGGGAAC
>rps12_5end_(reverse_strand)
ATGAGAATCAATCCTACTACTTCTGGTTCTGAAGTTTCCGCGGTTG


to look like this:

Code:
>accD
ATGGAAAAGTGGAGGATTTATTCGTTTCAGAAGGAGTTCGAACGCA
>atpA
ATGGTAACCATTCAAGCCGACGAAATTAGTAATCTTATCCGGGAAC
>rps12_5end
ATGAGAATCAATCCTACTACTTCTGGTTCTGAAGTTTCCGCGGTTG


Any help would be appreciated!
Moderator's Comments:
Mod Comment
Please use code tags when posting data and code samples!

Last edited by vgersh99; 02-27-2013 at 05:40 PM.. Reason: code tags, please!
# 2  
Old 02-27-2013
are you sure about the last mod? do you really want to have >rps12_5end?
something to start with:
Code:
sed '/>/s/[:_].*//' myFile

# 3  
Old 02-28-2013
I do want to have
Code:
>rps12_5end

The sequence names have to match a set of prescribed names for me to upload them to a particular database.

---------- Post updated 02-28-13 at 10:06 AM ---------- Previous update was 02-27-13 at 06:39 PM ----------

Thanks a million. It worked perfectly.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to find a specific sequence pattern in a fasta file?

I have to mine the following sequence pattern from a large fasta file namely gene.fasta (contains multiple fasta sequences) along with the flanking sequences of 5 bases at starting position and ending position, AAGCZ-N16-AAGCZ Z represents A, C or G (Except T) N16 represents any of the four... (3 Replies)
Discussion started by: dineshkumarsrk
3 Replies

2. Shell Programming and Scripting

Script to change file names

I have a landing directory on my unix (solaris) server, that receives the following files: MLH4301I AAOT-hhslog.610.20150805.txt MLH4301I AAOT-hhslog.611.20150805.txt MLH4301I AAOT-hhslog.612.20150805.txt MLH4301I AAOT-hhslog.613.20150805.txt and I need to add to this files the number 10000... (6 Replies)
Discussion started by: fretagi
6 Replies

3. Shell Programming and Scripting

Count and search by sequence in multiple fasta file

Hello, I have 10 fasta files with sequenced reads information with read sizes from 15 - 35 . I have combined the reads and collapsed in to unique reads and filtered for sizes 18 - 26 bp long unique reads. Now i wanted to count each unique read appearance in all the fasta files and make a table... (5 Replies)
Discussion started by: empyrean
5 Replies

4. Shell Programming and Scripting

Extract sequence from fasta file

Hi, I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help . input > fefrwefrwef X900 AGAGGGAATTGG AGGGGCCTGGAG GGTTCTCTTC > fefrwefrwef X932 AGAGGGAATTGG AGGAGGTGGAG GGTTCTCTTC > fefrwefrwef X937... (2 Replies)
Discussion started by: ritakadm
2 Replies

5. Shell Programming and Scripting

Challenge to change file names

Hi, How can I change following file name in a bash script? From file names: myfile-module-1.0-3.0.el6.x86_64.package To file names: myfile-module1_0-1.0-3.0.el6.x86_64.package ^ ^ ^ ^ ^ ^ ^ ^ Basically, the digit 1.0 is a version number, the digit 3.0 is... (11 Replies)
Discussion started by: hce
11 Replies

6. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Hi I have an alignment file (.fasta) with ~80 sequences. They look like this- >JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0 GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Discussion started by: baika
2 Replies

7. Shell Programming and Scripting

change multiple file names

Hi is it possible to change multiple files (~10k) names with out disturbing the data in it. ? input Hynda|cgr10(+):100027702-1000312480|.txt Hynda|cgr10(+):100027702-1000312483|.txt Hynda|cgr10(+):100027702-1000312484|.txt Hynda|cgr10(+):100027702-1000312482|.txt output... (4 Replies)
Discussion started by: quincyjones
4 Replies

8. Shell Programming and Scripting

Parsing a fasta sequence with start and end coordinates

Hi.. I have a seperate chromosome sequences and i wanted to parse some regions of chromosome based on start site and end site.. how can i achieve this? For Example Chr 1 is in following format I need regions from 2 - 10 should give me AATTCCAAA and in a similar way 15- 25 should give... (8 Replies)
Discussion started by: empyrean
8 Replies

9. Shell Programming and Scripting

Change multiple file names

Hello, I have some files in a directory like: 01_07_2010_aa.txt 01_07_2010_bb.txt 01_07_2010_cc.txt 01_07_2010_dd.txt 01_07_2010_ee.txt 01_07_2010_ff.txt I want to change their names to : 3nm_aa.txt 3nm_bb.txt 3nm_cc.txt 3nm_dd.txt 3nm_ee.txt 3nm_ff.txt (8 Replies)
Discussion started by: ad23
8 Replies

10. Shell Programming and Scripting

How to change automatically the file names

Hi all, I need to replace automatically all special characters of one filename with some corresponding characters For example > ö --> oe ä --> ae .... If the special character comes more than one time, then all the coccuerences have to be replaced. I would like to have a... (6 Replies)
Discussion started by: MAKY
6 Replies
Login or Register to Ask a Question