Sponsored Content
Top Forums Shell Programming and Scripting Convert a DNA sequence into Amino Acid Post 302957292 by faizlo on Thursday 8th of October 2015 05:12:58 PM
Old 10-08-2015
Convert a DNA sequence into Amino Acid

I am trying to write a bash script that would be able to read DNA sequences (each line in the file is a sequence) from a file, where sequences are separated by an empty line. I am then to find the amino acid that these DNA sequences encode per codon (each group of three literals.) For example, if I have a file with the sequence:

Code:
   GCATGCTGCGATAACTTTGGCTGAACTTTGGCTGAAGCATGCTGCGAAACTTTGGCTGAACTTTGGCTG

then starting from GCA (first three literals,) I want to decode the DNA into amino acids based on the following table:

Code:
    Codon(s)                  Amino-acid
    TTT,TTC                    Phe
    TTA,TTG,CTT,CTC,CTA,CTG    Leu
    ATT,ATC,ATA                Ile
    ATG                       Met
    GTT,GTC,GTA,GTG            Val
    TCT,TCC,TCA,TCG            Ser
    CCT,CCC,CCA,CCG            Pro
    ACT,ACC,ACA,ACG            Thr
    GCT,GCC,GCA,GCG            Ala
    TAT,TAC                    Tyr
    TAA,TAG                    Stop
    CAT,CAC                    His
    CAA,CAG                    Gln
    AAT,AAC                    Asn
    AAA,AAG                    Lys
    GAT,GAC                   Asp
    GAA,GAG                   Glu
    TGT,TGC                    Cys
    TGA                        Stop
    TGG                        Trp
    CGT,CGC,CGA,CGG            Arg
    AGT,AGC                    Ser
    AGA,AGG                    Arg
    GGT,GGC,GGA,GGG            Gly

that is, I need to get:
Code:
    AlaCysCysAspAsnPheGlyStopThrLeuAlaGluAlaCysCysGluThrLeuAlaGluLeuTrpLeu

Then I need to print the name of each Amino Acid and how many times it was used. For example:

Code:
    Ala: 4
    Cys: 4

and so on. I have 100s of files with DNA sequences in them, but I am not that good at bash. I tried awk and tr but I did not know how to code the table into a bash script.

Last edited by faizlo; 10-08-2015 at 09:08 PM.. Reason: Add CODE tags.
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

escape sequence for $

Hi all, I have a requirement where the variable name starts with $, like $Amd=/home/student/test/ How to work wit it? can some one help me, am in gr8 confusion:confused: (5 Replies)
Discussion started by: shreekrishnagd
5 Replies

2. Shell Programming and Scripting

How to remove those sequence with same amino acid?What command line I should type?

My input is listed as: giNumber RefAminoAcid VarAminoAcid 10190711 P P 10190711 D D 109255248 I A 110349771 A ... (4 Replies)
Discussion started by: patrick chia
4 Replies

3. Shell Programming and Scripting

Extracting DNA sequences from GenBank files using Perl

Hi all, Using Perl, I need to extract DNA bases from a GenBank file for a given plant species. A sample GenBank file is here... Nucleotide This is saved on my computer as NC_001666.gb. I also have a file that is saved on my computer as NC_001666.txt. This text file has a list of all... (5 Replies)
Discussion started by: akreibich07
5 Replies

4. Shell Programming and Scripting

Tricky task with DNA sequences.

I am trying to reverse and complement my DNA sequences. The file format is FASTA, something like this: Now, to reverse the sequence, I should start reading from right to left. At the same should be complemented. Thus, "A" should be read as "T"; "C" should be read as "G"; "T" should be converted... (8 Replies)
Discussion started by: Xterra
8 Replies

5. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

6. Shell Programming and Scripting

How to convert multiple number ranges into sequence?

Looking for a simple way to convert ranges to a numerical sequence that would assign the original value of the range to the individual numbers that are on the range. Thank you given data 13196-13199 0 13200 4 13201 10 13202-13207 3 13208-13210 7 desired... (3 Replies)
Discussion started by: jcue25
3 Replies

7. Shell Programming and Scripting

Sequence generator

Thanks Guys This really helped (5 Replies)
Discussion started by: robert89
5 Replies

8. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi, I am having a file of dna sequences in fasta format which look like this: >admin_1_45 atatagcaga >admin_1_46 atatagcagaatatatat with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to... (5 Replies)
Discussion started by: margarita
5 Replies

9. Red Hat

Rm -rf * sequence

If I run rm -rf * command under one parent directory. /data > rm -rf * Is there anyway to know which files will be deleted first ? Start using code tags please, ty. (2 Replies)
Discussion started by: sameermohite
2 Replies
Bio::Perl(3pm)						User Contributed Perl Documentation					    Bio::Perl(3pm)

NAME
Bio::Perl - Functional access to BioPerl for people who don't know objects SYNOPSIS
use Bio::Perl; # will guess file format from extension $seq_object = read_sequence($filename); # forces genbank format $seq_object = read_sequence($filename,'genbank'); # reads an array of sequences @seq_object_array = read_all_sequences($filename,'fasta'); # sequences are Bio::Seq objects, so the following methods work # for more info see Bio::Seq, or do 'perldoc Bio/Seq.pm' print "Sequence name is ",$seq_object->display_id," "; print "Sequence acc is ",$seq_object->accession_number," "; print "First 5 bases is ",$seq_object->subseq(1,5)," "; # get the whole sequence as a single string $sequence_as_a_string = $seq_object->seq(); # writing sequences write_sequence(">$filename",'genbank',$seq_object); write_sequence(">$filename",'genbank',@seq_object_array); # making a new sequence from just a string $seq_object = new_sequence("ATTGGTTTGGGGACCCAATTTGTGTGTTATATGTA", "myname","AL12232"); # getting a sequence from a database (assumes internet connection) $seq_object = get_sequence('swissprot',"ROA1_HUMAN"); $seq_object = get_sequence('embl',"AI129902"); $seq_object = get_sequence('genbank',"AI129902"); # BLAST a sequence (assummes an internet connection) $blast_report = blast_sequence($seq_object); write_blast(">blast.out",$blast_report); DESCRIPTION
Easy first time access to BioPerl via functions. FEEDBACK
Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one of the Bioperl mailing lists. Your participation is much appreciated. bioperl-l@bioperl.org Support Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution. Bug reports can be submitted via the web: https://redmine.open-bio.org/projects/bioperl/ AUTHOR - Ewan Birney Email birney@ebi.ac.uk APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ read_sequence Title : read_sequence Usage : $seq = read_sequence('sequences.fa') $seq = read_sequence($filename,'genbank'); # pipes are fine $seq = read_sequence("my_fetching_program $id |",'fasta'); Function: Reads the top sequence from the file. If no format is given, it will try to guess the format from the filename. If a format is given, it forces that format. The filename can be any valid perl open() string - in particular, you can put in pipes Returns : A Bio::Seq object. A quick synopsis: $seq_object->display_id - name of the sequence $seq_object->seq - sequence as a string Args : Two strings, first the filename - any Perl open() string is ok Second string is the format, which is optional For more information on Seq objects see Bio::Seq. read_all_sequences Title : read_all_sequences Usage : @seq_object_array = read_all_sequences($filename); @seq_object_array = read_all_sequences($filename,'genbank'); Function: Just as the function above, but reads all the sequences in the file and loads them into an array. For very large files, you will run out of memory. When this happens, you've got to use the SeqIO system directly (this is not so hard! Don't worry about it!). Returns : array of Bio::Seq objects Args : two strings, first the filename (any open() string is ok) second the format (which is optional) See Bio::SeqIO and Bio::Seq for more information write_sequence Title : write_sequence Usage : write_sequence(">new_file.gb",'genbank',$seq) write_sequence(">new_file.gb",'genbank',@array_of_sequence_objects) Function: writes sequences in the specified format Returns : true Args : filename as a string, must provide an open() output file format as a string one or more sequence objects new_sequence Title : new_sequence Usage : $seq_obj = new_sequence("GATTACA", "kino-enzyme"); Function: Construct a sequency object from sequence string Returns : A Bio::Seq object Args : sequence string name string (optional, default "no-name-for-sequence") accession - accession number (optional, no default) blast_sequence Title : blast_sequence Usage : $blast_result = blast_sequence($seq) $blast_result = blast_sequence('MFVEGGTFASEDDDSASAEDE'); Function: If the computer has Internet accessibility, blasts the sequence using the NCBI BLAST server against nrdb. It chooses the flavour of BLAST on the basis of the sequence. This function uses Bio::Tools::Run::RemoteBlast, which itself use Bio::SearchIO - as soon as you want to know more, check out these modules Returns : Bio::Search::Result::GenericResult.pm Args : Either a string of protein letters or nucleotides, or a Bio::Seq object write_blast Title : write_blast Usage : write_blast($filename,$blast_report); Function: Writes a BLAST result object (or more formally a SearchIO result object) out to a filename in BLAST-like format Returns : none Args : filename as a string Bio::SearchIO::Results object get_sequence Title : get_sequence Usage : $seq_object = get_sequence('swiss',"ROA1_HUMAN"); Function: If the computer has Internet access this method gets the sequence from Internet accessible databases. Currently this supports Swissprot ('swiss'), EMBL ('embl'), GenBank ('genbank'), GenPept ('genpept'), and RefSeq ('refseq'). Swissprot and EMBL are more robust than GenBank fetching. If the user is trying to retrieve a RefSeq entry from GenBank/EMBL, the query is silently redirected. Returns : A Bio::Seq object Args : database type - one of swiss, embl, genbank, genpept, or refseq translate Title : translate Usage : $seqobj = translate($seq_or_string_scalar) Function: translates a DNA sequence object OR just a plain string of DNA to amino acids Returns : A Bio::Seq object Args : Either a sequence object or a string of just DNA sequence characters translate_as_string Title : translate_as_string Usage : $seqstring = translate_as_string($seq_or_string_scalar) Function: translates a DNA sequence object OR just a plain string of DNA to amino acids Returns : A string of just amino acids Args : Either a sequence object or a string of just DNA sequence characters reverse_complement Title : reverse_complement Usage : $seqobj = reverse_complement($seq_or_string_scalar) Function: reverse complements a string or sequence argument producing a Bio::Seq - if you want a string, you can use reverse_complement_as_string Returns : A Bio::Seq object Args : Either a sequence object or a string of just DNA sequence characters revcom Title : revcom Usage : $seqobj = revcom($seq_or_string_scalar) Function: reverse complements a string or sequence argument producing a Bio::Seq - if you want a string, you can use reverse_complement_as_string This is an alias for reverse_complement Returns : A Bio::Seq object Args : Either a sequence object or a string of just DNA sequence characters reverse_complement_as_string Title : reverse_complement_as_string Usage : $string = reverse_complement_as_string($seq_or_string_scalar) Function: reverse complements a string or sequence argument producing a string Returns : A string of DNA letters Args : Either a sequence object or a string of just DNA sequence characters revcom_as_string Title : revcom_as_string Usage : $string = revcom_as_string($seq_or_string_scalar) Function: reverse complements a string or sequence argument producing a string Returns : A string of DNA letters Args : Either a sequence object or a string of just DNA sequence characters perl v5.14.2 2012-03-02 Bio::Perl(3pm)
All times are GMT -4. The time now is 07:17 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy