Convert a DNA sequence into Amino Acid Post: 302957294

Sponsored Content

Top Forums Shell Programming and Scripting Convert a DNA sequence into Amino Acid Post 302957294 by cjcox on Thursday 8th of October 2015 06:05:31 PM

10-08-2015

Registered User

Ok... a bit messy but done very quickly... first I created a sed script (call the file dna.sed):
(you don't need the /g at the end of these... again, created this quicky)

Code:

s/ACC /Thr /g
s/ACA /Thr /g
s/ACG /Thr /g
s/GCT /Ala /g
s/GCC /Ala /g
s/GCA /Ala /g
s/GCG /Ala /g
s/TAT /Tyr /g
s/TAC /Try /g
s/TAA /Stop /g
s/TAG /Stop /g
s/CAT /His /g
s/CAC /His /g
s/CAA /Gln /g
s/CAG /Gln /g
s/AAT /Asn /g
s/AAC /Asn /g
s/AAA /Lys /g
s/AAG /Lys /g
s/GAT /Asp /g
s/GAC /Asp /g
s/GAA /Glu /g
s/GAG /Glu /g
s/TGT /Cys /g
s/TGC /Cys /g
s/TGA /Stop /g
s/TGG /Trp /g
s/CGT /Arg /g
s/CGC /Arg /g
s/CGA /Arg /g
s/CGG /Arg /g
s/AGT /Ser /g
s/AGC /Ser /g
s/AGA /Arg /g
s/AGG /Arg /g
s/GGT /Gly /g
s/GGC /Gly /g
s/GGA /Gly /g
s/GGG /Gly /g

then a script to process DNA sequence lines (assumes sequences each on a line):

Code:

while read dna;do 
  aawork=$(echo "${dna}" |sed -n -e 's/\(...\)/\1 /gp' | sed -f dna.sed)
  echo "$aawork" | sed 's/ //g'
  echo "$aawork" | tr ' ' '\012' | sort | sed '/^$/d' | uniq -c | sed 's/[ ]*\([0-9]*\) \(.*\)/\2: \1/' 
done

again script expects to read the sequences one at a time, you can redirect from a pipe, etc..

In my example below this is just with the sample line you provided.

Code:

$ dna.sh
GCATGCTGCGATAACTTTGGCTGAACTTTGGCTGAAGCATGCTGCGAAACTTTGGCTGAACTTTGGCTG
AlaCysCysAspAsnPheGlyStopThrLeuAlaGluAlaCysCysGluThrLeuAlaGluLeuTrpLeu
Ala: 4
Asn: 1
Asp: 1
Cys: 4
Glu: 3
Gly: 1
Leu: 4
Phe: 1
Stop: 1
Thr: 2
Trp: 1

This User Gave Thanks to cjcox For This Post:

cjcox

View Public Profile for cjcox

Find all posts by cjcox

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

escape sequence for $

Hi all, I have a requirement where the variable name starts with $, like $Amd=/home/student/test/ How to work wit it? can some one help me, am in gr8 confusion:confused:

2. Shell Programming and Scripting

How to remove those sequence with same amino acid?What command line I should type?

My input is listed as: giNumber RefAminoAcid VarAminoAcid 10190711 P P 10190711 D D 109255248 I A 110349771 A ...

3. Shell Programming and Scripting

Extracting DNA sequences from GenBank files using Perl

Hi all, Using Perl, I need to extract DNA bases from a GenBank file for a given plant species. A sample GenBank file is here... Nucleotide This is saved on my computer as NC_001666.gb. I also have a file that is saved on my computer as NC_001666.txt. This text file has a list of all...

4. Shell Programming and Scripting

Tricky task with DNA sequences.

I am trying to reverse and complement my DNA sequences. The file format is FASTA, something like this: Now, to reverse the sequence, I should start reading from right to left. At the same should be complemented. Thus, "A" should be read as "T"; "C" should be read as "G"; "T" should be converted...

5. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ...

6. Shell Programming and Scripting

How to convert multiple number ranges into sequence?

Looking for a simple way to convert ranges to a numerical sequence that would assign the original value of the range to the individual numbers that are on the range. Thank you given data 13196-13199 0 13200 4 13201 10 13202-13207 3 13208-13210 7 desired...

7. Shell Programming and Scripting

Sequence generator

Thanks Guys This really helped

8. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi, I am having a file of dna sequences in fasta format which look like this: >admin_1_45 atatagcaga >admin_1_46 atatagcagaatatatat with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to...

9. Red Hat

Rm -rf * sequence

If I run rm -rf * command under one parent directory. /data > rm -rf * Is there anyway to know which files will be deleted first ? Start using code tags please, ty.

LEARN ABOUT DEBIAN

bio::tools::phylo::molphy

Bio::Tools::Phylo::Molphy(3pm)				User Contributed Perl Documentation			    Bio::Tools::Phylo::Molphy(3pm)

NAME

       Bio::Tools::Phylo::Molphy - parser for Molphy output

SYNOPSIS

	 use Bio::Tools::Phylo::Molphy;
	 my $parser = Bio::Tools::Phylo::Molphy->new(-file => 'output.protml');
	 while( my $r = $parser->next_result ) {
	   # r is a Bio::Tools::Phylo::Molphy::Result object

	   # print the model name
	   print $r->model, "
";

	   # get the substitution matrix
	   # this is a hash of 3letter aa codes -> 3letter aa codes representing
	   # substitution rate
	   my $smat = $r->substitution_matrix;
	   print "Arg -> Gln substitution rate is %d
",
		 $smat->{'Arg'}->{'Gln'}, "
";

	   # get the transition probablity matrix
	   # this is a hash of 3letter aa codes -> 3letter aa codes representing
	   # transition probabilty
	   my $tmat = $r->transition_probability_matrix;
	   print "Arg -> Gln transition probablity is %.2f
",
		 $tmat->{'Arg'}->{'Gln'}, "
";

	   # get the frequency for each of the residues
	   my $rfreqs = $r->residue_frequencies;

	   foreach my $residue ( keys %{$rfreqs} ) {
	      printf "residue %s  expected freq: %.2f observed freq: %.2f
",
		     $residue,$rfreqs->{$residue}->[0], $rfreqs->{$residue}->[1];
	   }

	   my @trees;
	   while( my $t = $r->next_tree ) {
	       push @trees, $t;
	   }

	   print "search space is ", $r->search_space, "
",
		 "1st tree score is ", $trees[0]->score, "
";

	   # writing to STDOUT, use -file => '>filename' to specify a file
	   my $out = Bio::TreeIO->new(-format => "newick");
	   $out->write_tree($trees[0]); # writing only the 1st tree
	 }

DESCRIPTION

       A parser for Molphy output (protml,dnaml)

FEEDBACK

   Mailing Lists
       User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the
       Bioperl mailing list.  Your participation is much appreciated.

	 bioperl-l@bioperl.org			- General discussion
	 http://bioperl.org/wiki/Mailing_lists	- About the mailing lists

   Support
       Please direct usage questions or support issues to the mailing list:

       bioperl-l@bioperl.org

       rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address
       it. Please include a thorough description of the problem with code and data examples if at all possible.

   Reporting Bugs
       Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the
       web:

	 https://redmine.open-bio.org/projects/bioperl/

AUTHOR - Jason Stajich
       Email jason-at-bioperl.org

APPENDIX

       The rest of the documentation details each of the object methods.  Internal methods are usually preceded with a _

   new
	Title	: new
	Usage	: my $obj = Bio::Tools::Phylo::Molphy->new();
	Function: Builds a new Bio::Tools::Phylo::Molphy object
	Returns : Bio::Tools::Phylo::Molphy
	Args	: -fh/-file => $val, # for initing input, see Bio::Root::IO

   next_result
	Title	: next_result
	Usage	: my $r = $molphy->next_result
	Function: Get the next result set from parser data
	Returns : Bio::Tools::Phylo::Molphy::Result object
	Args	: none

perl v5.14.2							    2012-03-02					    Bio::Tools::Phylo::Molphy(3pm)

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

escape sequence for $

Discussion started by: shreekrishnagd

2. Shell Programming and Scripting

How to remove those sequence with same amino acid?What command line I should type?

Discussion started by: patrick chia

3. Shell Programming and Scripting

Extracting DNA sequences from GenBank files using Perl

Discussion started by: akreibich07

4. Shell Programming and Scripting

Tricky task with DNA sequences.

Discussion started by: Xterra

5. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Discussion started by: manigrover

6. Shell Programming and Scripting

How to convert multiple number ranges into sequence?

Discussion started by: jcue25

7. Shell Programming and Scripting

Sequence generator

Discussion started by: robert89

8. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Discussion started by: margarita

9. Red Hat

Rm -rf * sequence

Discussion started by: sameermohite

LEARN ABOUT DEBIAN

bio::tools::phylo::molphy