REQUIRE HELP IN WRITING A PERL SCRIPT


 
Thread Tools Search this Thread
Top Forums Programming REQUIRE HELP IN WRITING A PERL SCRIPT
# 8  
Old 05-03-2012
Because my supervisor wants me to learn perl now. So I am trying to work on it.

---------- Post updated at 04:48 PM ---------- Previous update was at 04:39 PM ----------

Last edited by kaav06; 05-03-2012 at 05:06 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Writing xml from excel sheet .xls using perl script

Hi all. I am working on the below requirement of generating .xml file from .xls file which i have , can someone please help me or in writing the perl script for the same: The xls file format is as below which has two columns and number of rows are not fixed: Fixlet Name ... (12 Replies)
Discussion started by: omkar.jadhav
12 Replies

2. Shell Programming and Scripting

How to automate a script that would require authentication?

Hey everyone... I'm just stretching my wings a bit and seeing how things work. If I wanted to write a script that had me ssh to my remote computer, how can this be done? If the script runs without me, how can I enter the required password? the same is true for any time of authentication method like... (2 Replies)
Discussion started by: Lost in Cyberia
2 Replies

3. Shell Programming and Scripting

Need help in writing perl script

Hi, I am new to perl. I am trying to write a small perl script for search and replace in a file : ======================================================== #!/usr/bin/perl my $searchStr = "register_inst\.write_t\("; my $replaceStr = "model\.fc_block\."; open(FILE,"temp.sv") ||... (2 Replies)
Discussion started by: chettyravi
2 Replies

4. Shell Programming and Scripting

Perl script for Calling a function and writing all its contents to a file

I have a function which does awk proceessing sub mergeDescription { system (q@awk -F'~' ' NR == FNR { A = $1 B = $2 C = $0 next } { n = split ( C, V, "~" ) if... (3 Replies)
Discussion started by: crypto87
3 Replies

5. Shell Programming and Scripting

Writing a Perl Script that processes multiple files

I want to write a Perl script that manipulates multiple files. In the directory, I have files 250.*chr$.ped where * is from 1 to 1000 and $ is from 1-22 for a total of 22 x 10,000 = 22,000 files. I want to write a script that only manipulates files 250.1chr*.ped where * is from 1 to 22.... (10 Replies)
Discussion started by: evelibertine
10 Replies

6. Shell Programming and Scripting

Require script to create two files

Hi folks, I have a input.file with the following contents:- flor geor enta vpal domi pegl cars mted four rose annc gabi ward dalv elph beac (8 Replies)
Discussion started by: mithalr
8 Replies

7. Shell Programming and Scripting

Need help with writing a perl script

Hi all! I have to write a perl script that gets trashholds from a file and match them with an output of a command. The trashhold file looks like this: "pl-it_prod.GW.Sync.reply.*" "500" "-1" "" "" "pl-it_prod.A.*" "100" "-1" "" "" "application.log" ... (29 Replies)
Discussion started by: eliraza6
29 Replies

8. Shell Programming and Scripting

help for a perl script - writing to a data file

Hi, Here is my problem.. i have 2 files (file1, file2).. i have wrote the last two lines and first 4 lines of "file2" into two different variables .. say.. my $firstrec = `head -4 $file2`; my $lastrec = `tail -2 $file2`; and i write the rest of the file2 to a tmpfile and cat it with head... (2 Replies)
Discussion started by: meghana
2 Replies

9. Shell Programming and Scripting

Require Help for Shell Script

Hi friends, i am trying to print warning for partition size which exceed limit of 90% & other are ok. i m using below command which print partition which exceed 90% # df -h | sort -k5 | head -1 | awk 'END{ print $1" :- Not Having more space on This Partition"}' i want to print... (14 Replies)
Discussion started by: jagnikam
14 Replies

10. UNIX for Dummies Questions & Answers

Perl Unix Script Writing

Hi Folks, I posted a few days ago, thanks for the responses. My original question was for renaming files of sort 3p2325294.dgn in a directory containing multiple files. I need to drop the first 2 characters and the last in a unix script using Perl. How does it differ from using the Unix... (1 Reply)
Discussion started by: Dinkster
1 Replies
Login or Register to Ask a Question
BP_GENBANK2GFF3(1p)					User Contributed Perl Documentation				       BP_GENBANK2GFF3(1p)

NAME
genbank2gff3.pl -- Genbank->gbrowse-friendly GFF3 SYNOPSIS
genbank2gff3.pl [options] filename(s) # process a directory containing GenBank flatfiles perl genbank2gff3.pl --dir path_to_files --zip # process a single file, ignore explicit exons and introns perl genbank2gff3.pl --filter exon --filter intron file.gbk.gz # process a list of files perl genbank2gff3.pl *gbk.gz # process data from URL, with Chado GFF model (-noCDS), and pipe to database loader curl ftp://ftp.ncbi.nih.gov/genomes/Saccharomyces_cerevisiae/CHR_X/NC_001142.gbk | perl genbank2gff3.pl -noCDS -in stdin -out stdout | perl gmod_bulk_load_gff3.pl -dbname mychado -organism fromdata Options: --noinfer -r don't infer exon/mRNA subfeatures --conf -i path to the curation configuration file that contains user preferences for Genbank entries (must be YAML format) (if --manual is passed without --ini, user will be prompted to create the file if any manual input is saved) --sofile -l path to to the so.obo file to use for feature type mapping (--sofile live will download the latest online revision) --manual -m when trying to guess the proper SO term, if more than one option matches the primary tag, the converter will wait for user input to choose the correct one (only works with --sofile) --dir -d path to a list of genbank flatfiles --outdir -o location to write GFF files (can be 'stdout' or '-' for pipe) --zip -z compress GFF3 output files with gzip --summary -s print a summary of the features in each contig --filter -x genbank feature type(s) to ignore --split -y split output to separate GFF and fasta files for each genbank record --nolump -n separate file for each reference sequence (default is to lump all records together into one output file for each input file) --ethresh -e error threshold for unflattener set this high (>2) to ignore all unflattener errors --[no]CDS -c Keep CDS-exons, or convert to alternate gene-RNA-protein-exon model. --CDS is default. Use --CDS to keep default GFF gene model, use --noCDS to convert to g-r-p-e. --format -f Input format (SeqIO types): GenBank, Swiss or Uniprot, EMBL work (GenBank is default) --GFF_VERSION 3 is default, 2 and 2.5 and other Bio::Tools::GFF versions available --quiet don't talk about what is being processed --typesource SO sequence type for source (e.g. chromosome; region; contig) --help -h display this message DESCRIPTION
This script uses Bio::SeqFeature::Tools::Unflattener and Bio::Tools::GFF to convert GenBank flatfiles to GFF3 with gene containment hierarchies mapped for optimal display in gbrowse. The input files are assumed to be gzipped GenBank flatfiles for refseq contigs. The files may contain multiple GenBank records. Either a single file or an entire directory can be processed. By default, the DNA sequence is embedded in the GFF but it can be saved into separate fasta file with the --split(-y) option. If an input file contains multiple records, the default behaviour is to dump all GFF and sequence to a file of the same name (with .gff appended). Using the 'nolump' option will create a separate file for each genbank record. Using the 'split' option will create separate GFF and Fasta files for each genbank record. Notes 'split' and 'nolump' produce many files In cases where the input files contain many GenBank records (for example, the chromosome files for the mouse genome build), a very large number of output files will be produced if the 'split' or 'nolump' options are selected. If you do have lists of files > 6000, use the --long_list option in bp_bulk_load_gff.pl or bp_fast_load_gff.pl to load the gff and/ or fasta files. Designed for RefSeq This script is designed for RefSeq genomic sequence entries. It may work for third party annotations but this has not been tested. But see below, Uniprot/Swissprot works, EMBL and possibly EMBL/Ensembl if you don't mind some gene model unflattener errors (dgg). G-R-P-E Gene Model Don Gilbert worked this over with needs to produce GFF3 suited to loading to GMOD Chado databases. Most of the changes I believe are suited for general use. One main chado-specific addition is the --[no]cds2protein flag My favorite GFF is to set the above as ON by default (disable with --nocds2prot) For general use it probably should be OFF, enabled with --cds2prot. This writes GFF with an alternate, but useful Gene model, instead of the consensus model for GFF3 [ gene > mRNA> (exon,CDS,UTR) ] This alternate is gene > mRNA > polypeptide > exon means the only feature with dna bases is the exon. The others specify only location ranges on a genome. Exon of course is a child of mRNA and protein/peptide. The protein/polypeptide feature is an important one, having all the annotations of the GenBank CDS feature, protein ID, translation, GO terms, Dbxrefs to other proteins. UTRs, introns, CDS-exons are all inferred from the primary exon bases inside/outside appropriate higher feature ranges. Other special gene model features remain the same. Several other improvements and bugfixes, minor but useful are included * IO pipes now work: curl ftp://ncbigenomes/... | genbank2gff3 --in stdin --out stdout | gff2chado ... * GenBank main record fields are added to source feature, e.g. organism, date, and the sourcetype, commonly chromosome for genomes, is used. * Gene Model handling for ncRNA, pseudogenes are added. * GFF header is cleaner, more informative. --GFF_VERSION flag allows choice of v2 as well as default v3 * GFF ##FASTA inclusion is improved, and CDS translation sequence is moved to FASTA records. * FT -> GFF attribute mapping is improved. * --format choice of SeqIO input formats (GenBank default). Uniprot/Swissprot and EMBL work and produce useful GFF. * SeqFeature::Tools::TypeMapper has a few FT -> SOFA additions and more flexible usage. TODO
Are these additions desired? * filter input records by taxon (e.g. keep only organism=xxx or taxa level = classYYY * handle Entrezgene, other non-sequence SeqIO structures (really should change those parsers to produce consistent annotation tags). Related bugfixes/tests These items from Bioperl mail were tested (sample data generating errors), and found corrected: From: Ed Green <green <at> eva.mpg.de> Subject: genbank2gff3.pl on new human RefSeq Date: 2006-03-13 21:22:26 GMT -- unspecified errors (sample data works now). From: Eric Just <e-just <at> northwestern.edu> Subject: bp_genbank2gff3.pl Date: 2007-01-26 17:08:49 GMT -- bug fixed in genbank2gff3 for multi-record handling This error is for a /trans_splice gene that is hard to handle, and unflattner/genbank2 doesn't From: Chad Matsalla <chad <at> dieselwurks.com> Subject: genbank2gff3.PLS and the unflatenner - Inconsistent order? Date: 2005-07-15 19:51:48 GMT AUTHOR
Sheldon McKay (mckays@cshl.edu) Copyright (c) 2004 Cold Spring Harbor Laboratory. AUTHOR of hacks for GFF2Chado loading Don Gilbert (gilbertd@indiana.edu) perl v5.14.2 2012-03-02 BP_GENBANK2GFF3(1p)