Convert a DNA sequence into Amino Acid Post: 302957294

Sponsored Content

Top Forums Shell Programming and Scripting Convert a DNA sequence into Amino Acid Post 302957294 by cjcox on Thursday 8th of October 2015 06:05:31 PM

10-08-2015

Registered User

Ok... a bit messy but done very quickly... first I created a sed script (call the file dna.sed):
(you don't need the /g at the end of these... again, created this quicky)

Code:

s/ACC /Thr /g
s/ACA /Thr /g
s/ACG /Thr /g
s/GCT /Ala /g
s/GCC /Ala /g
s/GCA /Ala /g
s/GCG /Ala /g
s/TAT /Tyr /g
s/TAC /Try /g
s/TAA /Stop /g
s/TAG /Stop /g
s/CAT /His /g
s/CAC /His /g
s/CAA /Gln /g
s/CAG /Gln /g
s/AAT /Asn /g
s/AAC /Asn /g
s/AAA /Lys /g
s/AAG /Lys /g
s/GAT /Asp /g
s/GAC /Asp /g
s/GAA /Glu /g
s/GAG /Glu /g
s/TGT /Cys /g
s/TGC /Cys /g
s/TGA /Stop /g
s/TGG /Trp /g
s/CGT /Arg /g
s/CGC /Arg /g
s/CGA /Arg /g
s/CGG /Arg /g
s/AGT /Ser /g
s/AGC /Ser /g
s/AGA /Arg /g
s/AGG /Arg /g
s/GGT /Gly /g
s/GGC /Gly /g
s/GGA /Gly /g
s/GGG /Gly /g

then a script to process DNA sequence lines (assumes sequences each on a line):

Code:

while read dna;do 
  aawork=$(echo "${dna}" |sed -n -e 's/\(...\)/\1 /gp' | sed -f dna.sed)
  echo "$aawork" | sed 's/ //g'
  echo "$aawork" | tr ' ' '\012' | sort | sed '/^$/d' | uniq -c | sed 's/[ ]*\([0-9]*\) \(.*\)/\2: \1/' 
done

again script expects to read the sequences one at a time, you can redirect from a pipe, etc..

In my example below this is just with the sample line you provided.

Code:

$ dna.sh
GCATGCTGCGATAACTTTGGCTGAACTTTGGCTGAAGCATGCTGCGAAACTTTGGCTGAACTTTGGCTG
AlaCysCysAspAsnPheGlyStopThrLeuAlaGluAlaCysCysGluThrLeuAlaGluLeuTrpLeu
Ala: 4
Asn: 1
Asp: 1
Cys: 4
Glu: 3
Gly: 1
Leu: 4
Phe: 1
Stop: 1
Thr: 2
Trp: 1

This User Gave Thanks to cjcox For This Post:

cjcox

View Public Profile for cjcox

Find all posts by cjcox

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

escape sequence for $

Hi all, I have a requirement where the variable name starts with $, like $Amd=/home/student/test/ How to work wit it? can some one help me, am in gr8 confusion:confused:

2. Shell Programming and Scripting

How to remove those sequence with same amino acid?What command line I should type?

My input is listed as: giNumber RefAminoAcid VarAminoAcid 10190711 P P 10190711 D D 109255248 I A 110349771 A ...

3. Shell Programming and Scripting

Extracting DNA sequences from GenBank files using Perl

Hi all, Using Perl, I need to extract DNA bases from a GenBank file for a given plant species. A sample GenBank file is here... Nucleotide This is saved on my computer as NC_001666.gb. I also have a file that is saved on my computer as NC_001666.txt. This text file has a list of all...

4. Shell Programming and Scripting

Tricky task with DNA sequences.

I am trying to reverse and complement my DNA sequences. The file format is FASTA, something like this: Now, to reverse the sequence, I should start reading from right to left. At the same should be complemented. Thus, "A" should be read as "T"; "C" should be read as "G"; "T" should be converted...

5. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ...

6. Shell Programming and Scripting

How to convert multiple number ranges into sequence?

Looking for a simple way to convert ranges to a numerical sequence that would assign the original value of the range to the individual numbers that are on the range. Thank you given data 13196-13199 0 13200 4 13201 10 13202-13207 3 13208-13210 7 desired...

7. Shell Programming and Scripting

Sequence generator

Thanks Guys This really helped

8. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi, I am having a file of dna sequences in fasta format which look like this: >admin_1_45 atatagcaga >admin_1_46 atatagcagaatatatat with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to...

9. Red Hat

Rm -rf * sequence

If I run rm -rf * command under one parent directory. /data > rm -rf * Is there anyway to know which files will be deleted first ? Start using code tags please, ty.

LEARN ABOUT DEBIAN

lastal

LASTAL(1)							   User Commands							 LASTAL(1)

NAME

       lastal - genome-scale comparison of biological sequences

SYNOPSIS

       lastal [options] lastdb-name fasta-sequence-file(s)

DESCRIPTION

       Find local sequence alignments.

       Score  options  (default  settings): -r: match score   (DNA: 1, protein: blosum62, 0<Q<5:  6) -q: mismatch cost (DNA: 1, protein: blosum62,
       0<Q<5: 18) -p: file for residue pair scores -a: gap existence cost (DNA: 7, protein: 11, 0<Q<5: 21) -b: gap extension cost  (DNA:  1,  pro-
       tein:   2,  0<Q<5:   9)	-c:  unaligned	residue  pair cost (100000) -F: frameshift cost (off) -x: maximum score drop for gapped alignments
       (max[y, a+b*20]) -y: maximum score drop for gapless alignments (t*10) -z: maximum score drop for final gapped alignments  (x)  -d:  minimum
       score for gapless alignments (e*3/5) -e: minimum score for gapped alignments (DNA: 40, protein: 100, 0<Q<5: 180)

       Cosmetic  options  (default  settings): -h: show all options and their default settings -v: be verbose: write messages about what lastal is
       doing -o: output file -f: output format: 0=tabular, 1=maf (1)

       Miscellaneous options (default settings): -s: strand: 0=reverse, 1=forward, 2=both (2 for DNA, 1 for protein) -m: maximum multiplicity  for
       initial	matches (10) -l: minimum length for initial matches (1) -n: maximum number of gapless alignments per query position (infinity) -k:
       step-size along the query sequence (1) -i: query batch size (1 MiB if Q>0, else 16 MiB if j=0, else 128	MiB)  -u:  mask  lowercase  during
       extensions: 0=never, 1=gapless,

	      2=gapless+gapped but not final, 3=always (2 if lastdb -c and Q<5, else 0)

       -w:  supress  repeats  inside exact matches, offset by this distance or less (1000) -G: genetic code file -t: 'temperature' for calculating
       probabilities (1/lambda) -g: 'gamma' parameter for gamma-centroid and LAMA (1) -j: output  type:  0=match  counts,  1=gapless,  2=redundant
       gapped, 3=gapped,

	      4=column ambiguity estimates, 5=gamma-centroid, 6=LAMA (3)

       -Q: input format: 0=fasta, 1=fastq-sanger, 2=fastq-solexa, 3=fastq-illumina,

	      4=prb, 5=PSSM (0)

REPORTING BUGS

       Report bugs to: last (ATmark) cbrc (dot) jp
       LAST home page: http://last.cbrc.jp/

lastal 199							     May 2012								 LASTAL(1)

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

escape sequence for $

Discussion started by: shreekrishnagd

2. Shell Programming and Scripting

How to remove those sequence with same amino acid?What command line I should type?

Discussion started by: patrick chia

3. Shell Programming and Scripting

Extracting DNA sequences from GenBank files using Perl

Discussion started by: akreibich07

4. Shell Programming and Scripting

Tricky task with DNA sequences.

Discussion started by: Xterra

5. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Discussion started by: manigrover

6. Shell Programming and Scripting

How to convert multiple number ranges into sequence?

Discussion started by: jcue25

7. Shell Programming and Scripting

Sequence generator

Discussion started by: robert89

8. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Discussion started by: margarita

9. Red Hat

Rm -rf * sequence

Discussion started by: sameermohite

LEARN ABOUT DEBIAN

lastal