Possible performance improvement (Bash and flat file) Post: 302419494

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with Flat Files Please!! BASH (New User)

Hello All, I am brand new to the UNIX world and so far and very intrigued and enjoy scripting. This is just a new language for me. I would really like assistance with the below request. Any help would be greatly appreciated! I want to create a flat file in Vi that has a header field and...

2. Programming

File - reading - Performance improvement

Hi All I am reading a huge file of size 2GB atleast. I am reading each line and cutting certain columns and writing it to another file. Here is the logic. int main() { string u_line; string Char_List; string u_file; int line_pos; string temp_form_u_file; ...

3. Shell Programming and Scripting

Any improvement possible in this script

Hi! Thank you for the help yesterday This is the finished product There is one more thing I would like to do to it but I’m not to certain On how to proceed I would like to log all output to a log in order to Be able to roll back This script is meant to be used in repairing a...

4. UNIX for Advanced & Expert Users

linux os improvement

can anyone help to share the knowledge on linux os improvement? 1) os account - use window AD authentication, such as ldap, but how to set /etc/passwd, where to put user home? 2) user account activity - how to log os user activity share the idea and what tools can do that...thx

5. Shell Programming and Scripting

Display-performance in terminal, bash or python?

Heyas I've been working on my project TUI (Text User Interface) for quite some time now, its a hobby project, so nothing i sit in front of 8hrs/day. Since the only 'real' programming language i knw is Visual Basic, based upon early steps with MS-Batch files. When i 'joined' linux 3 years ago,...

6. Shell Programming and Scripting

[BASH] Performance question - Script to STDOUT

Hello Coders Some time ago i was asking about python and bash performances, and i was told i could post the regarding code, and someone would kindly help to make it faster (if possible). If you have noted, i'm on the way to finalize, finish, stable TUI - Text(ual) User Interface. It is a...

7. Shell Programming and Scripting

Performance improvement in grep

Below script is used to search numeric data from around 400 files in a folder. I have 300 such folders. Need help in performance improvement in the script. Below Script searches 20 such folders ( 300 files in each folder) simultaneously. This increases cpu utilization upto 90% What changes...

8. Shell Programming and Scripting

Bash - array loop performance

Hi, another little question... "sn" is an array whose elements can vary from about 55,000 to about 150,000 elements. Each element consists of an integer between 0-255, eg: ${sn} contain the value: 103 . For a decrypt-procedure I need scroll all the elements 4 or 5 times. Here is an example of...

9. OS X (Apple)

Create a bash array from a flat file of whitespaces only.

Hi guys and gals... MacBook Pro. OSX 10.13.2, default bash terminal. I have a flat file 1920 bytes in size of whitespaces only. I need to put every single whitespace character into a bash array cell. Below are two methods that work, but both are seriously ugly. The first one requires that I...

10. Shell Programming and Scripting

Bash script search, improve performance with large files

Hello, For several of our scripts we are using awk to search patterns in files with data from other files. This works almost perfectly except that it takes ages to run on larger files. I am wondering if there is a way to speed up this process or have something else that is quicker with the...

LEARN ABOUT DEBIAN

boulder::locuslink

Boulder::LocusLink(3pm) 				User Contributed Perl Documentation				   Boulder::LocusLink(3pm)

NAME

       Boulder::LocusLink - Fetch LocusLink data records as parsed Boulder Stones

SYNOPSIS

	 # parse a file of LocusLink records
	 $ll = new Boulder::LocusLink(-accessor=>'File',
				    -param => '/home/data/LocusLink/LL_tmpl');
	 while (my $s = $ll->get) {
	   print $s->Identifier;
	   print $s->Gene;
	 }

	 # parse flatfile records yourself
	 open (LL,"/home/data/LocusLink/LL_tmpl");
	 local $/ = "*RECORD*";
	 while (<LL>) {
	    my $s = Boulder::LocusLink->parse($_);
	    # etc.
	 }

DESCRIPTION

       Boulder::LocusLink provides retrieval and parsing services for LocusLink records

       Boulder::LocusLink provides retrieval and parsing services for NCBI LocusLink records.  It returns Unigene entries in Stone format,
       allowing easy access to the various fields and values.  Boulder::LocusLink is a descendent of Boulder::Stream, and provides a stream-like
       interface to a series of Stone objects.

       Access to LocusLink is provided by one accessors, which give access to  local LocusLink database.  When you create a new Boulder::LocusLink
       stream, you provide the accessors, along with accessor-specific parameters that control what entries to fetch.  The accessors is:

       File
	 This provides access to local LocusLink entries by reading from a flat file (typically Hs.dat file downloadable from NCBI's Ftp site).
	 The stream will return a Stone corresponding to each of the entries in the file, starting from the top of the file and working downward.
	 The parameter is the path to the local file.

       It is also possible to parse a single LocusLink entry from a text string stored in a scalar variable, returning a Stone object.

   Boulder::LocusLink methods
       This section lists the public methods that the Boulder::LocusLink class makes available.

       new()
	      # Local fetch via File
	      $ug=new Boulder::LocusLink(-accessor  =>	'File',
				       -param	  =>  '/data/LocusLink/Hs.dat');

	   The new() method creates a new Boulder::LocusLink stream on the accessor provided.  The only possible accessors is File.  If
	   successful, the method returns the stream object.  Otherwise it returns undef.

	   new() takes the following arguments:

		   -accessor	   Name of the accessor to use
		   -param	   Parameters to pass to the accessor

	   Specify the accessor to use with the -accessor argument.  If not specified, it defaults to File.

	   -param is an accessor-specific argument.  The possibilities is:

	   For File, the -param argument must point to a string-valued scalar, which will be interpreted as the path to the file to read LocusLink
	   entries from.

       get()
	   The get() method is inherited from Boulder::Stream, and simply returns the next parsed LocusLink Stone, or undef if there is nothing
	   more to fetch.  It has the same semantics as the parent class, including the ability to restrict access to certain top-level tags.

       put()
	   The put() method is inherited from the parent Boulder::Stream class, and will write the passed Stone to standard output in Boulder
	   format.  This means that it is currently not possible to write a Boulder::LocusLink object back into LocusLink flatfile form.

OUTPUT TAGS

       The tags returned by the parsing operation are taken from the names shown in the Flat file Hs.dat since no better description of them is
       provided yet by the database source producer.

   Top-Level Tags
       These are tags that appear at the top level of the parsed LocusLink entry.

       Identifier
	   The LocusLink identifier of this entry.  Identifier is a single-value tag.

	   Example:

		 my $identifierNo = $s->Identifier;

       Current_locusid
	   If a locus has been merged with another, the Current_locusid contains the previous LOCUSID line (A bit confusing, shall be called
	   "previous_locusid", but this is defined in NCBI README File ... ).

	   Example:
		 my $prevlocusid=$s->Current_locusid;

       Organism Source species ased on NCBI's Taxonomy
	   Example:
		 my $theorganism=$s->Organism;

       Status Type of reference sequence record. If "PROVISIONAL" then means that is generated automatically from existing Genbank record and
       information stored in the LocusLink database, no curation. If "REVIEWED" than it means that is generated from the most representative
       complete GenBank sequence or merge of GenBank sequenes and from information stored in the LocusLink database
	   Example:
		 my $thestatus=$s->Status;

       LocAss Here comes a complex record ... made up of LOCUS_STRING, NM	  The value in the LOCUS field of the RefSeq record , NP
       The RefSeq accession number for an mRNA record, PRODUCT	  The name of the produc tof this transcript, TRANSVAR	 a variant-specific
       description, ASSEMBLY   The Genbank accession used to assemble the refseq record
	   Example:
		 my $theprod=$s->LocAss->Product;

       AccProt Here comes a complex record ... made up of ACCNUM	Nucleotide sequence accessio number TYPE	 e=EST, m=mRNA, g=Genomic
       PROT	    set of PID values for the coding region or regions annotated on the nucleotide record. The first value is the PID (an integer
       or null), then either MMDB or na, separated from the PID by a |. If MMDB is present, it indicates there are structur edata available for a
       protein related to the protein referenced by the PID Example: my $theprot=$s->AccProt->Prot;
       OFFICIAL_SYMBOL The symbol used for gene reports, validated by the appropriate nomenclature committee
       PREFERRED_SYMBOL Interim symbol used for display
       OFFICIAL_GENE_NAME The gene description used for gene reports validate by the appropriate nomenclatur eommittee. If the symbol is official,
       the gene name will be official. No records will have both official and interim nomenclature.
       PREFERRED_GENE_NAME Interim used for display
       PREFERRED_PRODUCT The name of the product used in the RefSeq record
       ALIAS_SYMBOL Other symbols associated with this gene
       ALIAS_PROT Other protein names associated with this gene
       PhenoTable A complex record made up of Phenotype Phenotype_ID
       SUmmary
       Unigene
       Omim
       Chr
       Map
       STS
       ECNUM
       ButTable BUTTON LINK
       DBTable DB_DESCR DB_LINK
       PMID a subset of publications associated with this locus with the link being the PubMed unique identifier comma separated

SEE ALSO

       Boulder, Boulder::Blast, Boulder::Genbank

AUTHOR

       Lincoln Stein <lstein@cshl.org>.  Luca I.G. Toldo <luca.toldo@merck.de>

       Copyright (c) 1997 Lincoln D. Stein Copyright (c) 1999 Luca I.G. Toldo

       This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.  See DISCLAIMER.txt for
       disclaimers of warranty.

perl v5.10.1							    2004-01-09						   Boulder::LocusLink(3pm)