Sponsored Content
Top Forums Shell Programming and Scripting Convert a DNA sequence into Amino Acid Post 302957294 by cjcox on Thursday 8th of October 2015 06:05:31 PM
Old 10-08-2015
Ok... a bit messy but done very quickly... first I created a sed script (call the file dna.sed):
(you don't need the /g at the end of these... again, created this quicky)

Code:
s/ACC /Thr /g
s/ACA /Thr /g
s/ACG /Thr /g
s/GCT /Ala /g
s/GCC /Ala /g
s/GCA /Ala /g
s/GCG /Ala /g
s/TAT /Tyr /g
s/TAC /Try /g
s/TAA /Stop /g
s/TAG /Stop /g
s/CAT /His /g
s/CAC /His /g
s/CAA /Gln /g
s/CAG /Gln /g
s/AAT /Asn /g
s/AAC /Asn /g
s/AAA /Lys /g
s/AAG /Lys /g
s/GAT /Asp /g
s/GAC /Asp /g
s/GAA /Glu /g
s/GAG /Glu /g
s/TGT /Cys /g
s/TGC /Cys /g
s/TGA /Stop /g
s/TGG /Trp /g
s/CGT /Arg /g
s/CGC /Arg /g
s/CGA /Arg /g
s/CGG /Arg /g
s/AGT /Ser /g
s/AGC /Ser /g
s/AGA /Arg /g
s/AGG /Arg /g
s/GGT /Gly /g
s/GGC /Gly /g
s/GGA /Gly /g
s/GGG /Gly /g

then a script to process DNA sequence lines (assumes sequences each on a line):

Code:
while read dna;do 
  aawork=$(echo "${dna}" |sed -n -e 's/\(...\)/\1 /gp' | sed -f dna.sed)
  echo "$aawork" | sed 's/ //g'
  echo "$aawork" | tr ' ' '\012' | sort | sed '/^$/d' | uniq -c | sed 's/[ ]*\([0-9]*\) \(.*\)/\2: \1/' 
done

again script expects to read the sequences one at a time, you can redirect from a pipe, etc..

In my example below this is just with the sample line you provided.

Code:
$ dna.sh
GCATGCTGCGATAACTTTGGCTGAACTTTGGCTGAAGCATGCTGCGAAACTTTGGCTGAACTTTGGCTG
AlaCysCysAspAsnPheGlyStopThrLeuAlaGluAlaCysCysGluThrLeuAlaGluLeuTrpLeu
Ala: 4
Asn: 1
Asp: 1
Cys: 4
Glu: 3
Gly: 1
Leu: 4
Phe: 1
Stop: 1
Thr: 2
Trp: 1

This User Gave Thanks to cjcox For This Post:
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

escape sequence for $

Hi all, I have a requirement where the variable name starts with $, like $Amd=/home/student/test/ How to work wit it? can some one help me, am in gr8 confusion:confused: (5 Replies)
Discussion started by: shreekrishnagd
5 Replies

2. Shell Programming and Scripting

How to remove those sequence with same amino acid?What command line I should type?

My input is listed as: giNumber RefAminoAcid VarAminoAcid 10190711 P P 10190711 D D 109255248 I A 110349771 A ... (4 Replies)
Discussion started by: patrick chia
4 Replies

3. Shell Programming and Scripting

Extracting DNA sequences from GenBank files using Perl

Hi all, Using Perl, I need to extract DNA bases from a GenBank file for a given plant species. A sample GenBank file is here... Nucleotide This is saved on my computer as NC_001666.gb. I also have a file that is saved on my computer as NC_001666.txt. This text file has a list of all... (5 Replies)
Discussion started by: akreibich07
5 Replies

4. Shell Programming and Scripting

Tricky task with DNA sequences.

I am trying to reverse and complement my DNA sequences. The file format is FASTA, something like this: Now, to reverse the sequence, I should start reading from right to left. At the same should be complemented. Thus, "A" should be read as "T"; "C" should be read as "G"; "T" should be converted... (8 Replies)
Discussion started by: Xterra
8 Replies

5. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

6. Shell Programming and Scripting

How to convert multiple number ranges into sequence?

Looking for a simple way to convert ranges to a numerical sequence that would assign the original value of the range to the individual numbers that are on the range. Thank you given data 13196-13199 0 13200 4 13201 10 13202-13207 3 13208-13210 7 desired... (3 Replies)
Discussion started by: jcue25
3 Replies

7. Shell Programming and Scripting

Sequence generator

Thanks Guys This really helped (5 Replies)
Discussion started by: robert89
5 Replies

8. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi, I am having a file of dna sequences in fasta format which look like this: >admin_1_45 atatagcaga >admin_1_46 atatagcagaatatatat with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to... (5 Replies)
Discussion started by: margarita
5 Replies

9. Red Hat

Rm -rf * sequence

If I run rm -rf * command under one parent directory. /data > rm -rf * Is there anyway to know which files will be deleted first ? Start using code tags please, ty. (2 Replies)
Discussion started by: sameermohite
2 Replies
SSAKE(1)						      General Commands Manual							  SSAKE(1)

NAME
ssake - assembling millions of very short DNA sequences SYNOPSIS
Progressive assembly of millions of short DNA sequences by k-mer search through a prefix tree and 3' extension. OPTIONS
-f Fasta file containing all the [paired (-p 1) / unpaired (-p 0)] reads (required) paired reads must now be separated by ":" -s Fasta file containing sequences to use as seeds exclusively (specify only if different from read set, optional) -m Minimum number of overlapping bases with the seed/contig during overhang consensus build up (default -m 16) -o Minimum number of reads needed to call a base during an extension (default -o 3) -r Minimum base ratio used to accept a overhang consensus base (default -r 0.7) -t Trim up to -t base(s) on the contig end when all possibilities have been exhausted for an extension (default -t 0)> -p Paired-end reads used? (-p 1=yes, -p 0=no, default -p 0) -v Runs in verbose mode (-v 1=yes, -v 0=no, default -v 0, optional) -b Base name for your output files (optional) ============ Options below only considered with -p 1 ============ -d Mean distance expected/observed between paired-end reads (default -d 200, optional) -e Error (%) allowed on mean distance e.g. -e 0.75 == distance +/- 75% (default -e 0.75, optional) -k Minimum number of links (read pairs) to compute scaffold (default -k 2, optional) -a Maximum link ratio between two best contig pairs *higher values lead to least accurate scaffolding* (default -a 0.70, optional) -z Minimum contig size to track paired-end reads (default -z 50, optional) -g Fasta file containing unpaired sequence reads (optional) SEE ALSO
/usr/share/doc/ssake/SSAKE.readme between AUTHORS
This manual page was written by Andreas Tille <tille@debian.org> for the Debian system (but may be used by others). Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation. On Debian systems, the complete text of the GNU General Public License can be found in /usr/share/common-licenses/GPL. January 2008 SSAKE(1)
All times are GMT -4. The time now is 05:42 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy