Shell script for changing the accession number of DNA sequences in a FASTA file

12-30-2013

Registered User

4, 0

Join Date: Dec 2013

Last Activity: 27 September 2014, 2:10 AM EDT

Posts: 4

Thanks Given: 1

Thanked 0 Times in 0 Posts

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi,

I am having a file of dna sequences in fasta format which look like this:
>admin_1_45
atatagcaga
>admin_1_46
atatagcagaatatatat

with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to "mydesired_name_45" replacing serially 46,47, 48 with my desired name as static next to each of the > symbol.

kindly help me with the script

margarita

View Public Profile for margarita

Find all posts by margarita

12-30-2013

Moderator

1,837, 668

Join Date: Nov 2012

Last Activity: 30 June 2020, 12:07 PM EDT

Posts: 1,837

Thanks Given: 180

Thanked 668 Times in 590 Posts

Welcome to Forum.

How do you expect output ? Like this ?

Input file :

Code:

$ cat fasta_file
>admin_1_45
atatagcaga
>admin_1_46
atatagcagaatatatat

Code to run on terminal :

Code:

$ awk '(/^>/ && sub(/>.*_/,">"new)) + 1' new="mydesired_name_" fasta_file

Resulting :

Code:

>mydesired_name_45
atatagcaga
>mydesired_name_46
atatagcagaatatatat

This User Gave Thanks to Akshay Hegde For This Post:

Akshay Hegde

View Public Profile for Akshay Hegde

Find all posts by Akshay Hegde

12-30-2013

Registered User

1,271, 299

Join Date: Sep 2009

Last Activity: 17 July 2019, 5:46 PM EDT

Location: ./India/Bangalore

Posts: 1,271

Thanks Given: 70

Thanked 299 Times in 290 Posts

Code:

sed 's/>admin_1_/>mydesired_name_/' filename

pravin27

View Public Profile for pravin27

Find all posts by pravin27

12-30-2013

Registered User

4, 0

Join Date: Dec 2013

Last Activity: 27 September 2014, 2:10 AM EDT

Posts: 4

Thanks Given: 1

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by Akshay Hegde

Welcome to Forum.

How do you expect output ? Like this ?

Input file :

Code:

$ cat fasta_file
>admin_1_45
atatagcaga
>admin_1_46
atatagcagaatatatat

Code to run on terminal :

Code:

$ awk '(/^>/ && sub(/>.*_/,">"new)) + 1' new="mydesired_name_" fasta_file

Resulting :

Code:

>mydesired_name_45
atatagcaga
>mydesired_name_46
atatagcagaatatatat

---------- Post updated at 08:44 AM ---------- Previous update was at 08:44 AM ----------

it worked nicely. Thank you

margarita

View Public Profile for margarita

Find all posts by margarita

12-30-2013

Moderator

1,837, 668

Join Date: Nov 2012

Last Activity: 30 June 2020, 12:07 PM EDT

Posts: 1,837

Thanks Given: 180

Thanked 668 Times in 590 Posts

Please use CODE TAGS for data you provided in post1. Select data and press code option which is available between QUOTE and HTML, I attached screen shot of it see.

Last edited by Akshay Hegde; 05-31-2014 at 04:29 PM..

Akshay Hegde

View Public Profile for Akshay Hegde

Find all posts by Akshay Hegde

12-30-2013

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello,

One more approach with sed may help.

Code:

sed 's/\(^>.*_\)\(.*\)/my_desired_text_\2/g' file_name

Output will be as follows.

Code:

my_desired_text_45
atatagcaga
my_desired_text_46
atatagcagaatatatat

Thanks,
R. Singh

---------- Post updated at 11:34 AM ---------- Previous update was at 10:30 AM ----------

One more approach for same as follows.

Code:

awk -vs1="my_desired_ouput_" -F"_" '/^>/ {print s1$NF} !/^>/' file

Output will be as follows.

Code:

my_desired_ouput_45
atatagcaga
my_desired_ouput_46
atatagcagaatatatat

Thanks,
R. Singh

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to add specific bases at the beginning and ending of all the fasta sequences?

Discussion started by: dineshkumarsrk

2. Shell Programming and Scripting

Shorten header of protein sequences in fasta file to only organism name

Discussion started by: jerrild

3. UNIX for Beginners Questions & Answers

How to count the length of fasta sequences?

Discussion started by: dineshkumarsrk

4. Shell Programming and Scripting

Getting unique sequences from multiple fasta file

Discussion started by: Ibk

5. UNIX for Dummies Questions & Answers

Select distinct sequences from fasta file and list

Discussion started by: Marion MPI

6. Shell Programming and Scripting

Shorten header of protein sequences in fasta file

Discussion started by: alexypaul

7. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

Discussion started by: nelsonfrans

8. Shell Programming and Scripting

Tricky task with DNA sequences.

Discussion started by: Xterra

9. UNIX for Dummies Questions & Answers

trying to grep -v multiple changing sequences from a file

Discussion started by: candyluv030

10. Shell Programming and Scripting

Extracting DNA sequences from GenBank files using Perl

Discussion started by: akreibich07