Convert a DNA sequence into Amino Acid Post: 302957294

Sponsored Content

Top Forums Shell Programming and Scripting Convert a DNA sequence into Amino Acid Post 302957294 by cjcox on Thursday 8th of October 2015 06:05:31 PM

10-08-2015

Registered User

Ok... a bit messy but done very quickly... first I created a sed script (call the file dna.sed):
(you don't need the /g at the end of these... again, created this quicky)

Code:

s/ACC /Thr /g
s/ACA /Thr /g
s/ACG /Thr /g
s/GCT /Ala /g
s/GCC /Ala /g
s/GCA /Ala /g
s/GCG /Ala /g
s/TAT /Tyr /g
s/TAC /Try /g
s/TAA /Stop /g
s/TAG /Stop /g
s/CAT /His /g
s/CAC /His /g
s/CAA /Gln /g
s/CAG /Gln /g
s/AAT /Asn /g
s/AAC /Asn /g
s/AAA /Lys /g
s/AAG /Lys /g
s/GAT /Asp /g
s/GAC /Asp /g
s/GAA /Glu /g
s/GAG /Glu /g
s/TGT /Cys /g
s/TGC /Cys /g
s/TGA /Stop /g
s/TGG /Trp /g
s/CGT /Arg /g
s/CGC /Arg /g
s/CGA /Arg /g
s/CGG /Arg /g
s/AGT /Ser /g
s/AGC /Ser /g
s/AGA /Arg /g
s/AGG /Arg /g
s/GGT /Gly /g
s/GGC /Gly /g
s/GGA /Gly /g
s/GGG /Gly /g

then a script to process DNA sequence lines (assumes sequences each on a line):

Code:

while read dna;do 
  aawork=$(echo "${dna}" |sed -n -e 's/\(...\)/\1 /gp' | sed -f dna.sed)
  echo "$aawork" | sed 's/ //g'
  echo "$aawork" | tr ' ' '\012' | sort | sed '/^$/d' | uniq -c | sed 's/[ ]*\([0-9]*\) \(.*\)/\2: \1/' 
done

again script expects to read the sequences one at a time, you can redirect from a pipe, etc..

In my example below this is just with the sample line you provided.

Code:

$ dna.sh
GCATGCTGCGATAACTTTGGCTGAACTTTGGCTGAAGCATGCTGCGAAACTTTGGCTGAACTTTGGCTG
AlaCysCysAspAsnPheGlyStopThrLeuAlaGluAlaCysCysGluThrLeuAlaGluLeuTrpLeu
Ala: 4
Asn: 1
Asp: 1
Cys: 4
Glu: 3
Gly: 1
Leu: 4
Phe: 1
Stop: 1
Thr: 2
Trp: 1

This User Gave Thanks to cjcox For This Post:

cjcox

View Public Profile for cjcox

Find all posts by cjcox

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

escape sequence for $

Hi all, I have a requirement where the variable name starts with $, like $Amd=/home/student/test/ How to work wit it? can some one help me, am in gr8 confusion:confused:

2. Shell Programming and Scripting

How to remove those sequence with same amino acid?What command line I should type?

My input is listed as: giNumber RefAminoAcid VarAminoAcid 10190711 P P 10190711 D D 109255248 I A 110349771 A ...

3. Shell Programming and Scripting

Extracting DNA sequences from GenBank files using Perl

Hi all, Using Perl, I need to extract DNA bases from a GenBank file for a given plant species. A sample GenBank file is here... Nucleotide This is saved on my computer as NC_001666.gb. I also have a file that is saved on my computer as NC_001666.txt. This text file has a list of all...

4. Shell Programming and Scripting

Tricky task with DNA sequences.

I am trying to reverse and complement my DNA sequences. The file format is FASTA, something like this: Now, to reverse the sequence, I should start reading from right to left. At the same should be complemented. Thus, "A" should be read as "T"; "C" should be read as "G"; "T" should be converted...

5. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ...

6. Shell Programming and Scripting

How to convert multiple number ranges into sequence?

Looking for a simple way to convert ranges to a numerical sequence that would assign the original value of the range to the individual numbers that are on the range. Thank you given data 13196-13199 0 13200 4 13201 10 13202-13207 3 13208-13210 7 desired...

7. Shell Programming and Scripting

Sequence generator

Thanks Guys This really helped

8. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi, I am having a file of dna sequences in fasta format which look like this: >admin_1_45 atatagcaga >admin_1_46 atatagcagaatatatat with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to...

9. Red Hat

Rm -rf * sequence

If I run rm -rf * command under one parent directory. /data > rm -rf * Is there anyway to know which files will be deleted first ? Start using code tags please, ty.

LEARN ABOUT DEBIAN

ssake

SSAKE(1)						      General Commands Manual							  SSAKE(1)

NAME

       ssake - assembling millions of very short DNA sequences

SYNOPSIS

       Progressive assembly of millions of short DNA sequences by k-mer search through a prefix tree and 3' extension.

OPTIONS

       -f     Fasta file containing all the [paired (-p 1) / unpaired (-p 0)] reads (required) paired reads must now be separated by ":"

       -s     Fasta file containing sequences to use as seeds exclusively (specify only if different from read set, optional)

       -m     Minimum number of overlapping bases with the seed/contig during overhang consensus build up (default -m 16)

       -o     Minimum number of reads needed to call a base during an extension (default -o 3)

       -r     Minimum base ratio used to accept a overhang consensus base (default -r 0.7)

       -t     Trim up to -t base(s) on the contig end when all possibilities have been exhausted for an extension (default -t 0)>

       -p     Paired-end reads used? (-p 1=yes, -p 0=no, default -p 0)

       -v     Runs in verbose mode (-v 1=yes, -v 0=no, default -v 0, optional)

       -b     Base name for your output files (optional)

       ============ Options below only considered with -p 1 ============

       -d     Mean distance expected/observed between paired-end reads (default -d 200, optional)

       -e     Error (%) allowed on mean distance   e.g. -e 0.75  == distance +/- 75% (default -e 0.75, optional)

       -k     Minimum number of links (read pairs) to compute scaffold (default -k 2, optional)

       -a     Maximum link ratio between two best contig pairs *higher values lead to least accurate scaffolding* (default -a 0.70, optional)

       -z     Minimum contig size to track paired-end reads (default -z 50, optional)

       -g     Fasta file containing unpaired sequence reads (optional)

SEE ALSO

       /usr/share/doc/ssake/SSAKE.readme between

AUTHORS

       This  manual page was written by Andreas Tille <tille@debian.org> for the Debian system (but may be used by others).  Permission is granted
       to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by
       the Free Software Foundation.

       On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL.

								   January 2008 							  SSAKE(1)

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

escape sequence for $

Discussion started by: shreekrishnagd

2. Shell Programming and Scripting

How to remove those sequence with same amino acid?What command line I should type?

Discussion started by: patrick chia

3. Shell Programming and Scripting

Extracting DNA sequences from GenBank files using Perl

Discussion started by: akreibich07

4. Shell Programming and Scripting

Tricky task with DNA sequences.

Discussion started by: Xterra

5. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Discussion started by: manigrover

6. Shell Programming and Scripting

How to convert multiple number ranges into sequence?

Discussion started by: jcue25

7. Shell Programming and Scripting

Sequence generator

Discussion started by: robert89

8. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Discussion started by: margarita

9. Red Hat

Rm -rf * sequence

Discussion started by: sameermohite

LEARN ABOUT DEBIAN

ssake