Sponsored Content
Top Forums Shell Programming and Scripting Convert a DNA sequence into Amino Acid Post 302957294 by cjcox on Thursday 8th of October 2015 06:05:31 PM
Old 10-08-2015
Ok... a bit messy but done very quickly... first I created a sed script (call the file dna.sed):
(you don't need the /g at the end of these... again, created this quicky)

Code:
s/ACC /Thr /g
s/ACA /Thr /g
s/ACG /Thr /g
s/GCT /Ala /g
s/GCC /Ala /g
s/GCA /Ala /g
s/GCG /Ala /g
s/TAT /Tyr /g
s/TAC /Try /g
s/TAA /Stop /g
s/TAG /Stop /g
s/CAT /His /g
s/CAC /His /g
s/CAA /Gln /g
s/CAG /Gln /g
s/AAT /Asn /g
s/AAC /Asn /g
s/AAA /Lys /g
s/AAG /Lys /g
s/GAT /Asp /g
s/GAC /Asp /g
s/GAA /Glu /g
s/GAG /Glu /g
s/TGT /Cys /g
s/TGC /Cys /g
s/TGA /Stop /g
s/TGG /Trp /g
s/CGT /Arg /g
s/CGC /Arg /g
s/CGA /Arg /g
s/CGG /Arg /g
s/AGT /Ser /g
s/AGC /Ser /g
s/AGA /Arg /g
s/AGG /Arg /g
s/GGT /Gly /g
s/GGC /Gly /g
s/GGA /Gly /g
s/GGG /Gly /g

then a script to process DNA sequence lines (assumes sequences each on a line):

Code:
while read dna;do 
  aawork=$(echo "${dna}" |sed -n -e 's/\(...\)/\1 /gp' | sed -f dna.sed)
  echo "$aawork" | sed 's/ //g'
  echo "$aawork" | tr ' ' '\012' | sort | sed '/^$/d' | uniq -c | sed 's/[ ]*\([0-9]*\) \(.*\)/\2: \1/' 
done

again script expects to read the sequences one at a time, you can redirect from a pipe, etc..

In my example below this is just with the sample line you provided.

Code:
$ dna.sh
GCATGCTGCGATAACTTTGGCTGAACTTTGGCTGAAGCATGCTGCGAAACTTTGGCTGAACTTTGGCTG
AlaCysCysAspAsnPheGlyStopThrLeuAlaGluAlaCysCysGluThrLeuAlaGluLeuTrpLeu
Ala: 4
Asn: 1
Asp: 1
Cys: 4
Glu: 3
Gly: 1
Leu: 4
Phe: 1
Stop: 1
Thr: 2
Trp: 1

This User Gave Thanks to cjcox For This Post:
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

escape sequence for $

Hi all, I have a requirement where the variable name starts with $, like $Amd=/home/student/test/ How to work wit it? can some one help me, am in gr8 confusion:confused: (5 Replies)
Discussion started by: shreekrishnagd
5 Replies

2. Shell Programming and Scripting

How to remove those sequence with same amino acid?What command line I should type?

My input is listed as: giNumber RefAminoAcid VarAminoAcid 10190711 P P 10190711 D D 109255248 I A 110349771 A ... (4 Replies)
Discussion started by: patrick chia
4 Replies

3. Shell Programming and Scripting

Extracting DNA sequences from GenBank files using Perl

Hi all, Using Perl, I need to extract DNA bases from a GenBank file for a given plant species. A sample GenBank file is here... Nucleotide This is saved on my computer as NC_001666.gb. I also have a file that is saved on my computer as NC_001666.txt. This text file has a list of all... (5 Replies)
Discussion started by: akreibich07
5 Replies

4. Shell Programming and Scripting

Tricky task with DNA sequences.

I am trying to reverse and complement my DNA sequences. The file format is FASTA, something like this: Now, to reverse the sequence, I should start reading from right to left. At the same should be complemented. Thus, "A" should be read as "T"; "C" should be read as "G"; "T" should be converted... (8 Replies)
Discussion started by: Xterra
8 Replies

5. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

6. Shell Programming and Scripting

How to convert multiple number ranges into sequence?

Looking for a simple way to convert ranges to a numerical sequence that would assign the original value of the range to the individual numbers that are on the range. Thank you given data 13196-13199 0 13200 4 13201 10 13202-13207 3 13208-13210 7 desired... (3 Replies)
Discussion started by: jcue25
3 Replies

7. Shell Programming and Scripting

Sequence generator

Thanks Guys This really helped (5 Replies)
Discussion started by: robert89
5 Replies

8. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi, I am having a file of dna sequences in fasta format which look like this: >admin_1_45 atatagcaga >admin_1_46 atatagcagaatatatat with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to... (5 Replies)
Discussion started by: margarita
5 Replies

9. Red Hat

Rm -rf * sequence

If I run rm -rf * command under one parent directory. /data > rm -rf * Is there anyway to know which files will be deleted first ? Start using code tags please, ty. (2 Replies)
Discussion started by: sameermohite
2 Replies
DIALIGN-TX(1)						      DIALIGN-TX: Parameters						     DIALIGN-TX(1)

NAME
dialign-tx - Segment-based multiple sequence alignment SYNOPSIS
dialign-tx [OPTIONS] {conf-directory} {fasta-file} [fasta-out-file] DESCRIPTION
DIALIGN-TX is an improved algorithm for segment-based multiple protein alignments. DIALIGN-TX is a complete reimplementation of the segment-base approach including several new improvements and heuristics that significantly enhance the quality of the output alignments compared to DIALIGN 2.2. This significant superiority has been observed on local as well on global alignment benchmarks. OPTIONS
-d Debug-Mode [DEFAULT 0] 0 no debug statements 1 debugs the current phase of the processing 2 very loquacious debugging 5 hardcore debugging -s Maximum amount of input sequences [DEFAULT 5000]. -a Maximum number of characters per line in a FASTA file [DEFAULT 100]. -c Maximum amount of characters per line when printing a sequence [DEFAULT 80]. -l sensitivity mode, the higher the level the less likely spurious random fragments are aligned in local alignments [DEFAULT 0] 0 switched off 1 level-1, reduced sensitivity 2 level-2, strongly reduced sensitivity -m Score matrix file name (in the configuration directory) [DEFAULT PROTEIN: BLOSUM.scr] / [DEFAULT DNA: dna_matrix.scr]. -w Defines the minimum weight when the weight formula is changed to 1-pow(1-prob, factor) [DEFAULT 0.000000065]. -p Probability distribution file name (in the configuration directory) [DEFAULT PROTEIN: BLOSUM.diag_prob_t10] / [DEFAULT DNA: dna_diag_prob_100_exp_550000]. -v Add to each score (to prevent negative values) [DEFAULT 0]. -t "Even" threshold for low score for sequences alignment [DEFAULT PROTEIN: 4] / [DEFAULT DNA: 0]. -n Maximum number of consecutive positions for window containing low scoring positions [DEFAULT PROTEIN: 4] / [DEFAULT DNA: 1]. -g Global minimum fragment length for stop criterion [DEFAULT PROTEIN: 40] / [DEFAULT DNA: 1]. -m Minimal allowed average score in frag window containing low scoring positions [DEFAULT PROTEIN: 4.0] / [DEFAULT DNA: 0.9]. -o Wether overlap weights are calculated or not [DEFAULT 0]. -f Minimum fragment length [DEFAULT 1]. -r Threshold weight to consider the fragment at all [DEFAULT 0.0]. -u [DEFAULT 0] 1: Only use a sqrt(amount_of_seqs) stripe of neighbour sequences to calculate pairwise alignments (increase performance). 0: All pairwise alignments will be calculated. -A Optional anchor file. [DEFAULT none] -D Input is DNA-sequence. -T Translate DNA into aminoacids from begin to end (length will be cut to mod 3 = 0). Warning Do not use -D with this option (Default values for PROTEIN input will be loaded). -L Compare only longest Open Reading Frame. Warning Do not use -D with this option (Default values for PROTEIN input will be loaded). -O Translate DNA to aminoacids, reading frame for each sequence calculated due to its longest ORF. Warning Do not use -D with this option (Default values for PROTEIN input will be loaded). -P Output in aminoacids, no retranslation of DNA sequences [DEFAULT: input = output]. -F Fast mode (implies -l0, since it already significantly reduces sensitivity). -C Generate probability table saved in /usr/share/dialign-tx/prob_table and exit. -H, -h Print this message. FILES
/usr/share/dialign-tx This is the default conf-directory that dialign-tx expects as its first argument, as supplied in the upstream sources. SEE ALSO
DIALIGN-TX is a re-implementation of dialign2-2(1). (See http://dialign.gobics.de/ for more information about DIALIGN2). The website of DIALIGN-TX is http://dialign-tx.gobics.de/ REFERENCES
Amarendran R. Subramanian, Michael Kaufmann, Burkhard Morgenstern: DIALIGN-TX: improvement of the segment-based approach for multiple sequence alignment by combining greedy and progressive alignment strategies, Algorithms for Molecular Biology 3:6, 2008. Amarendran R. Subramanian, Jan Weyer-Menkhoff, Michael Kaufmann, Burkhard Morgenstern: DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment. BMC Bioinformatics 2005, 6:66. AUTHORS
Amarendran R. Subramanian <subraman@informatik.uni-tuebingen.de> Author of dialign-tx Volker Menrad Co-author of dialign-tx Dorothea Emig Co-author of dialign-tx Charles Plessy <plessy@debian.org> Converted this guide in DocBook XML for the Debian distribution. COPYRIGHT
Copyright (C) 2004, 2005, 2006, 2007, 2008 Amarendran R. Subramanian (DIALIGN-TX) Copyright (C) 2004 Volker Menrad (DIALIGN-TX) Copyright (C) 2004 Dorothea Emig (DIALIGN-TX) Copyright (C) 2007, 2008 Charles Plessy (This document and its XML source.) DIALIGN-TX is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. DIALIGN-TX is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA On Debian system, a copy of the GNU Lesser General Public License is available in /usr/share/common-licences. This documentation and its XML source file can be used, modified and redistributed under the same terms as DIALIGN-TX itself. dialign-tx 1.0.2 12/15/2008 DIALIGN-TX(1)
All times are GMT -4. The time now is 10:52 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy