Extracting DNA sequences from GenBank files using Perl Post: 302328285

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl - extracting data from .csv files

PROJECT: Extracting data from an employee timesheet. The timesheets are done in excel (for user ease) and then converted to .csv files that look like this (see color code key below): ,,,,,,,,,,,,,,,,,,, 9/14/2003,<-- Week Ending,,,,,,,,,,,,,,,,,, Craig Brennan,,,,,,,,,,,,,,,,,,,...

2. Shell Programming and Scripting

GenBank Perl help...

Hey guys, I'm doing some Perl scripting for genomic data out of GenBank files...I have to extract the name of the plant, the file name, the number of bases, and all of the genes including their starting and ending positions...for example, with this GenBank file, LOCUS NC_010093 ...

3. Shell Programming and Scripting

To retrieve gene name & function thorugh Genbank id (gi|9910297)

Hi , I have list of genbank id's and ref number in this format. gi|9910297|ref|NM_019974.1| I want to retrive the gene name and fuction for each genbank list. I have around 1300 gi numbers in my excel sheet. So anybody can help me to retrive the information from NCBI through perl script...

4. UNIX for Advanced & Expert Users

Extracting files with multiple links-perl

i want to write a perl script that gets/displays all those files having multiple links (in current directory)

5. Shell Programming and Scripting

Extracting column value from perl

Hello Kindly help me to find out the first column from first line of a flat file in perl I/P 9869912|20110830|00000000000013009|130|09|10/15/2010 12:36:22|W860944|N|00 9869912|20110830|00000000000013013|130|13|10/15/2010 12:36:22|W860944|N|00...

6. Shell Programming and Scripting

Tricky task with DNA sequences.

I am trying to reverse and complement my DNA sequences. The file format is FASTA, something like this: Now, to reverse the sequence, I should start reading from right to left. At the same should be complemented. Thus, "A" should be read as "T"; "C" should be read as "G"; "T" should be converted...

7. Shell Programming and Scripting

Randomly selecting sequences and generating specific output files

I have two files containing hundreds of different sequences with the same Identifiers (ID-001, ID-002, etc.,), something like this: Infile1: ID-001 ATGGGAGCGGGGGCGTCTGCCTTGAGGGGAGAGAAGCTAGATACA ID-002 ATGGGAGCGGGGGCGTCTGTTTTGAGGGGAGAGAAGCTAGATACA ID-003...

8. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi, I am having a file of dna sequences in fasta format which look like this: >admin_1_45 atatagcaga >admin_1_46 atatagcagaatatatat with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to...

9. Shell Programming and Scripting

Extraction of sequences from files

hey!!! I have 2 files file1 is as ids.txt and is >gi|546473186|gb|AWWX01630222.1| >gi|546473233|gb|AWWX01630175.1| >gi|546473323|gb|AWWX01630097.1| >gi|546474044|gb|AWWX01629456.1| >gi|546474165|gb|AWWX01629352.1| file2 is sequences.fasta and is like >gi|546473233|gb|AWWX01630175.1|...

10. Shell Programming and Scripting

Convert a DNA sequence into Amino Acid

I am trying to write a bash script that would be able to read DNA sequences (each line in the file is a sequence) from a file, where sequences are separated by an empty line. I am then to find the amino acid that these DNA sequences encode per codon (each group of three literals.) For example, if I...

LEARN ABOUT DEBIAN

asn2gb

ASN2GB(1)						     NCBI Tools User's Manual							 ASN2GB(1)

NAME

       asn2gb - convert ASN.1 biological data to a GenBank-style flat format

SYNOPSIS

       asn2gb  [-]  [-A accession] [-F] [-a asn-type] [-b] [-c] [-d] [-f format] [-g N] [-h N] [-i filename] [-j N] [-k N] [-l filename] [-m mode]
       [-n filename] [-o filename] [-p] [-q filename] [-r] [-s style] [-t N] [-u N] [-y N]

DESCRIPTION

       asn2gb converts descriptions of biological sequences from NCBI's ASN.1 format to one of several flat-file formats, and is the successor	to
       asn2ff(1).

OPTIONS

       A summary of options is included below.

       -      Print usage message

       -A accession
	      Accession  to  fetch;  may  take	the  form accession,complexity,flags where complexity should normally be 0 and a flags value of -1
	      enables fetching of external features (as with the legacy -F option)

       -F     Fetch remote annotations (equivalent to specifying -A accession,0,-1)

       -a asn-type
	      ASN.1 Type:
	      [Single record]
	      a      Any (autodetected; default)
	      e      seq-Entry
	      b      Bioseq
	      s      bioseq-Set
	      m      seq-subMit
	      q      Catenated
	      [Release file; components individually processed and freed]
	      t      baTch bioseq-set
	      u      batch seq-sUbmit

       -b     Input file is binary

       -c     Batch file is compressed

       -d     Seq-loc minus strand

       -f format
	      Format:
	      b      GenBank (default)
	      bp or pb
		     GenBank and GenPept
	      e      EMBL
	      p      GenPept
	      q      nucleotide GBSet (XML)
	      r      protein GBSet (XML)
	      t      Feature table only
	      x      nucleotide INSDSet (XML)
	      y      tiny seq (XML)
	      Y      FASTA
	      z      protein INSDSet (XML)

       -g N   Bit flags (all default to off):
	      1      HTML
	      2      XML
	      4      ContigFeats
	      8      ContigSrcs
	      16     FarTransl

       -h N   Lock/Lookup Flags (all default to off):
	      8      LockProd
	      16     LookupComp
	      64     LookupProd

       -i filename
	      Input file name (default = stdin)

       -j N   Start location (default is 0, beginning of sequence)

       -k N   End location (default is 0, end of sequence)

       -l filename
	      Log file

       -m mode
	      Mode:
	      r      Release
	      e      Entrez
	      s      Sequin (default)
	      d      Dump

       -n filename
	      Asn2Flat Executable (default = asn2flat)

       -o filename
	      Output file name (default = stdout)

       -p     Propagate top descriptors

       -q filename
	      Ffdiff Executable (default = /netopt/genbank/subtool/bin/ffdiff)

       -r     Enable remote fetching

       -s style
	      Style:
	      n      Normal (default)
	      s      Segment
	      m      Master
	      c      Contig

       -t N   Batch:
	      1      Report
	      2      Sequin/Release
	      3      asn2gb SSEC/nocleanup
	      4      asn2flat BSEC/nocleanup
	      5      asn2gb/asn2flat
	      6      asn2gb NEW dbxref/OLD dbxref
	      7      oldasn2gb/newasn2gb
       -u N Custom flags (all default to off):
	      4      Hide features
	      1792   Hide references
	      8192   Hide sources
	      262144 Hide translations

       -y N   Feature itemID

AUTHOR

       The National Center for Biotechnology Information.

SEE ALSO

       asn2all(1), asn2asn(1), asn2ff(1), asn2fsa(1), asn2xml(1), asndhuff(1), insdseqget(1), /usr/share/doc/libncbi6-dev/asn2gb.txt.gz.

NCBI
								    2011-09-02								 ASN2GB(1)