awk to parse html file Post: 302919206

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

shell script to parse html file

hi all, i have a html file something similar to this. <tr class="evenrow"> <td class="data">added</td><td class="data">xyz@abc.com</td> <td class="data">filename.sql</td><td class="modifications-data">08/25/2009 07:58:40</td><td class="data">Added TK prof script</td> </tr> <tr...

2. Shell Programming and Scripting

Parse HTML tag parameters and text

Hi! I have a bunch of HTML files, which I want to parse to CSV files. Every page has a table in it, and I need to parse each row into a csv record. With awk and sed, I managed to put every table row in separate lines. So my file looks like this: <TR> .... </TR> <TR> .... </TR> ...One...

3. Shell Programming and Scripting

Parse file using awk and work in awk output

hi guys, i want to parse a file using public function, the file contain raw data in the below format i want to get the output like this to load it to Oracle DB MARWA1,BSS:26,1,3,0,0,0,0,0.00,22,22,22.00 MARWA2,BSS:26,1,3,0,0,0,0,0.00,22,22,22.00 this the file raw format: Number of...

4. Shell Programming and Scripting

sed to parse html

Hello, I have a html file like this : <html> ... ... ... <table> ....... ...... </table> <table name = "hi"> ...... ..... ... </table> <h1> Welcome </h1> ....... ...... </html>

5. Shell Programming and Scripting

Extract/Parse information from html (website)

Hello, I want to extract some informations from a html (website, http://www.energiecontracting.de/7-mitglieder/von-A-Z.php?a_z=B&seite=2 ) file and save those in a predefined format (.csv).. However it seems that the code on that website is kinda messy and I can't find a way to handle it...

6. Shell Programming and Scripting

Using awk to Parse File

Hi all, I have a file that contains a good hundred of these job definitions below: Job Name Last Start Last End ST Run Pri/Xit ________________________________________________________________ ____________________...

7. Shell Programming and Scripting

Parse excel file with html on each cell

<DIV><P>Pr�-condi��o aceder ao ecr� Home do MRS.</P></DIV><DIV><P>OK.</P></DIV><DIV><P>Seleccionar Pesquisa de Recep��o Directa.</P></DIV><DIV><P>Confirmar que abriu ecr� de Recep��o Directa.</P></DIV><DIV>

8. Shell Programming and Scripting

Parse multiple html files in directory

I have downloaded source code for 97 files using: wget -x -i link.txt then run a rename loop: for file in * do mv $file $file.txt done to keep the html tags but make the file a text that can be parsed. In each of the 97 txt files the gene # is variable, but the gene is associated...

9. Shell Programming and Scripting

Parse html

I downloaded source code using: wget -qO- http://fulgentdiagnostics.com/test/clinical-exome/ | cat > flugentsource.txt Now I am trying to use sed to parse it to confirm a gene count. Basically, output (flugent.txt) all the gene names with a total count after them I'm not all that...

10. UNIX for Beginners Questions & Answers

How to parse a specifc value between html tags using sed?

Hi, im trying to read a Temperature value from html code. So far i have managed to reduce the whole html page down to this single line with the following sed command:sed -n '/Temperature/p' $temp_temperature | tee temp_string <TD width='350'>Temperature :</td><td>25...

LEARN ABOUT DEBIAN

bio::asn1::entrezgene::indexer

Bio::ASN1::EntrezGene::Indexer(3pm)			User Contributed Perl Documentation		       Bio::ASN1::EntrezGene::Indexer(3pm)

NAME

       Bio::ASN1::EntrezGene::Indexer - Indexes NCBI Entrez Gene files.

SYNOPSIS

	 use Bio::ASN1::EntrezGene::Indexer;

	 # creating & using the index is just a few lines
	 my $inx = Bio::ASN1::EntrezGene::Indexer->new(
	   -filename => 'entrezgene.idx',
	   -write_flag => 'WRITE'); # needed for make_index call, but if opening
				    # existing index file, don't set write flag!
	 $inx->make_index('Homo_sapiens', 'Mus_musculus', 'Rattus_norvegicus');
	 my $seq = $inx->fetch(10); # Bio::Seq obj for Entrez Gene #10
	 # alternatively, if one prefers just a data structure instead of objects
	 $seq = $inx->fetch_hash(10); # a hash produced by Bio::ASN1::EntrezGene
				   # that contains all data in the Entrez Gene record

	 # note that in case you wonder, you can get the files 'Homo_sapiens'
	 # from NCBI Entrez Gene ftp download, DATA/ASN/Mammalia directory

PREREQUISITE

       Bio::ASN1::EntrezGene, Bioperl version that contains Stefan Kirov's entrezgene.pm and all dependencies therein.

INSTALLATION

       Same as Bio::ASN1::EntrezGene

DESCRIPTION

       Bio::ASN1::EntrezGene::Indexer is a Perl Indexer for NCBI Entrez Gene genome databases. It processes an ASN.1-formatted Entrez Gene record
       and stores the file position for each record in a way compliant with Bioperl standard (in fact its a subclass of Bioperl's index objects).

       Note that this module does not parse record, because it needs to run fast and grab only the gene ids.  For parsing record, use
       Bio::ASN1::EntrezGene, or better yet, use Bio::SeqIO, format 'entrezgene'.

       It takes this module (version 1.07) 21 seconds to index the human genome Entrez Gene file (Apr. 5/2005 download) on one 2.4 GHz Intel Xeon
       processor.

SEE ALSO

       For details on various parsers I generated for Entrez Gene, example scripts that uses/benchmarks the modules, please see
       <http://sourceforge.net/projects/egparser/>.  Those other parsers etc. are included in V1.05 download.

AUTHOR

       Dr. Mingyi Liu <mingyi.liu@gpc-biotech.com>

COPYRIGHT

       The Bio::ASN1::EntrezGene module and its related modules and scripts are copyright (c) 2005 Mingyi Liu, GPC Biotech AG and Altana Research
       Institute. All rights reserved. I created these modules when working on a collaboration project between these two companies. Therefore a
       special thanks for the two companies to allow the release of the code into public domain.

       You may use and distribute them under the terms of the Perl itself or GPL (<http://www.gnu.org/copyleft/gpl.html>).

CITATION

       Liu, M and Grigoriev, A(2005) "Fast Parsers for Entrez Gene" Bioinformatics. In press

OPERATION SYSTEMS SUPPORTED

       Any OS that Perl & Bioperl run on.

METHODS

   fetch
	 Parameters: $geneid - id for the Entrez Gene record to be retrieved
	 Example:    my $hash = $indexer->fetch(10); # get Entrez Gene #10
	 Function:   fetch the data for the given Entrez Gene id.
	 Returns:    A Bio::Seq object produced by Bio::SeqIO::entrezgene
	 Notes:      One needs to have Bio::SeqIO::entrezgene installed before
		       calling this function!

   fetch_hash
	 Parameters: $geneid - id for the Entrez Gene record to be retrieved
	 Example:    my $hash = $indexer->fetch_hash(10); # get Entrez Gene #10
	 Function:   fetch a hash produced by Bio::ASN1::EntrezGene for given Entrez
		       Gene id.
	 Returns:    A data structure containing all data items from the Entrez
		       Gene record.
	 Notes:      Alternative to fetch()

   _file_handle
	 Title	 : _file_handle
	 Usage	 : $fh = $index->_file_handle( INT )
	 Function: Returns an open filehandle for the file
		   index INT.  On opening a new filehandle it
		   caches it in the @{$index->_filehandle} array.
		   If the requested filehandle is already open,
		   it simply returns it from the array.
	 Example : $fist_file_indexed = $index->_file_handle( 0 );
	 Returns : ref to a filehandle
	 Args	 : INT
	 Notes	 : This function is copied from Bio::Index::Abstract. Once that module
		     changes file handle code like I do below to fit perl 5.005_03, this
		     sub would be removed from this module

perl v5.14.2							    2005-05-04				       Bio::ASN1::EntrezGene::Indexer(3pm)