Sponsored Content
Top Forums Shell Programming and Scripting extract fields from a downloaded html file Post 302624109 by gubbu on Sunday 15th of April 2012 10:41:45 PM
Old 04-15-2012
extract fields from a downloaded html file

I have around 100 html files and in each html file I have 5-6 such paragraphs of a company and I need to extract the Name of the company from either the one after "title" or "/company" and then the number of employees and finally the location .

HTML Code:
<div class="search_result">
        <div class="search_result_name">
          <a href="/company/BlahBlah" title="BlahBlah, Inc.">BlahBlah, Inc.</a>
        </div>
       
        </div>
  <div class="search_result_explanation">
          60 employees
        </div>
  <div class="search_result_explanation">
          Office in
  Palo Alto, CA, 94301, USA         

The output I want is just 3 columns
Company Employees Location
BlahBlah 60 Palo Alto

Any ideas are appreciated
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies

2. UNIX for Dummies Questions & Answers

Extract some common fields from 1 file that are presnt in another file

I have 2 files FILEA 720646363*PHILIPPINES 117183970*USA 116274291*USA 107940983*USA 107395824*USA 106632425*USA 105861926*USA 105208607*USA 053077046*USA 065428026*ENGLAND FILEB 001125236 001408905 002316511 002521094 020050725 035018308 052288735 (1 Reply)
Discussion started by: unxusr123
1 Replies

3. UNIX for Dummies Questions & Answers

extract fields from text file using delimiter!!

Hi All, I am new to unix scripting, please help me in solving this assignment.. I have a scenario, as follows: 1. i have a text file(read1.txt) with the following data sairam,123 kamal,122 etc.. 2. I have to write a unix... (6 Replies)
Discussion started by: G.K.K
6 Replies

4. Shell Programming and Scripting

Extract urls from index.html downloaded using wget

Hi, I need to basically get a list of all the tarballs located at uri I am currently doing a wget on urito get the index.html page Now this index page contains the list of uris that I want to use in my bash script. can someone please guide me ,. I am new to Linux and shell scripting. ... (5 Replies)
Discussion started by: mnanavati
5 Replies

5. UNIX for Dummies Questions & Answers

How to extract fields from etc/passwd file?

Hi! i want to extract from /etc/passwd file,the user and user info fileds, to a another file.I've tried this: cut -d ':' -f1 ':' -f6 < file but cut can be used to extract olny one field and not two. maybe with awk is this possible? (4 Replies)
Discussion started by: strawhatluffy
4 Replies

6. Shell Programming and Scripting

Extract expressions between two strings in html file

Hello guys, I'm trying to extract all the expressions between the following tags: <b></b> from a HTML file. This is how it looks: big lines containing several dozens expressions (made of 1,2,3,4,6 or even 7 words) I would like to extract: <b>bla ble</b>bla ble</td><tr valign="top"><td... (3 Replies)
Discussion started by: bobylapointe
3 Replies

7. UNIX for Dummies Questions & Answers

Extract table from an HTML file

I want to extract a table from an HTML file. the table starts with <table class="tableinfo" and ends with next closing table tag </table> how can I do this with awk/sed... ---------- Post updated at 04:34 PM ---------- Previous update was at 04:28 PM ---------- also I want to... (4 Replies)
Discussion started by: koutroul
4 Replies

8. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies

9. Shell Programming and Scripting

Extract both contents from a html file and do printing

Hi there, Print IP Address: grep 'HostID :' 10.244.9.124\ nessus.html | awk -F '<br>' '{print $12}' | tr -s ' ' | awk -F ':' '{print "<tr><td>" $2 "</td><td>"}' Print Respective Ports: grep 'classsubsection\|./tcp\|./udp' 10.244.9.124\ nessus.html | grep -v 'h2.classsubsection... (3 Replies)
Discussion started by: alvinoo
3 Replies

10. Shell Programming and Scripting

awk to extract multiple values from file and add two additional fields

In the attached file I am trying to use awk to extract multiple values and create the tab-delimited desired output. In the output R_Index is a the sequential # and Pre_Enrichment is defaulted to .. I can extract from the values to the side of the keywords, but most are above and I can not... (2 Replies)
Discussion started by: cmccabe
2 Replies
CGI::FormBuilder::Template(3pm) 			User Contributed Perl Documentation			   CGI::FormBuilder::Template(3pm)

NAME
CGI::FormBuilder::Template - Template adapters for FormBuilder SYNOPSIS
# Define a template engine package CGI::FormBuilder::Template::Whatever; use base 'Whatever::Template::Module'; sub new { my $self = shift; my $class = ref($self) || $self; my %opt = @_; # override some options $opt{some_setting} = 0; $opt{another_var} = 'Some Value'; # instantiate the template engine $opt{engine} = Whatever::Template::Module->new(%opt); return bless \%opt, $class; } sub render { my $self = shift; my $form = shift; # only arg is form object # grab any manually-set template params my %tmplvar = $form->tmpl_param; # example template manipulation my $html = $self->{engine}->do_template(%tmplvar); return $html; # scalar HTML is returned } DESCRIPTION
This documentation describes the usage of FormBuilder templates, as well as how to write your own template adapter. The template engines serve as adapters between CPAN template modules and FormBuilder. A template engine is invoked by using the "template" option to the top-level "new()" method: my $form = CGI::FormBuilder->new( template => 'filename.tmpl' ); This example points to a filename that contains an "HTML::Template" compatible template to use to layout the HTML. You can also specify the "template" option as a reference to a hash, allowing you to further customize the template processing options, or use other template engines. For example, you could turn on caching in "HTML::Template" with something like the following: my $form = CGI::FormBuilder->new( fields => @fields, template => { filename => 'form.tmpl', shared_cache => 1 } ); As mentioned, specifying a hashref allows you to use an alternate template processing system like the "Template Toolkit". A minimal configuration would look like this: my $form = CGI::FormBuilder->new( fields => @fields, template => { type => 'TT2', # use Template Toolkit template => 'form.tmpl', }, ); The "type" option specifies the name of the engine. Currently accepted types are: Builtin - Included, default rendering if no template specified Div - Render form using <div> (no tables) HTML - HTML::Template Text - Text::Template TT2 - Template Toolkit Fast - CGI::FastTemplate CGI_SSI - CGI::SSI In addition to one of these types, you can also specify a complete package name, in which case that module will be autoloaded and its "new()" and "render()" routines used. For example: my $form = CGI::FormBuilder->new( fields => @fields, template => { type => 'My::Template::Module', template => 'form.tmpl', }, ); All other options besides "type" are passed to the constructor for that templating system verbatim, so you'll need to consult those docs to see what all the different options do. Skip down to "SEE ALSO". SUBCLASSING TEMPLATE ADAPTERS
In addition to the above included template engines, it is also possible to write your own rendering module. If you come up with something cool, please let the mailing list know! To do so, you need to write a module which has a sub called "render()". This sub will be called by FormBuilder when "$form->render" is called. This sub can do basically whatever it wants, the only thing it has to do is return a scalar string which is the HTML to print out. This is actually not hard. Here's a simple adapter which would manipulate an "HTML::Template" style template: # This file is My/HTML/Template.pm package My::HTML::Template; use CGI::FormBuilder::Template::HTML; use base 'CGI::FormBuilder::Template::HTML'; sub render { my $self = shift; # class object my $form = shift; # $form as only argument # the template object (engine) lives here my $tmpl = $self->engine; # setup vars for our fields (objects) for ($form->field) { $tmpl->param($_ => $_->value); } # render output my $html = $tmpl->output; # return scalar; return $html; } 1; # close module Then in FormBuilder: use CGI::FormBuilder; use My::HTML::Template; # your module my $tmpl = My::HTML::Template->new; my $form = CGI::FormBuilder->new( fields => [qw(name email)], header => 1, template => $tmpl # pass template object ); # set our company from an extra CGI param my $co = $form->cgi_param('company'); $tmpl->engine->param(company => $co); # and render like normal print $form->render; That's it! For more details, the best thing to do is look through the guts of one of the existing template engines and go from there. SEE ALSO
CGI::FormBuilder, CGI::FormBuilder::Template::HTML, CGI::FormBuilder::Template::Text, CGI::FormBuilder::Template::TT2, CGI::FormBuilder::Template::Fast, CGI::FormBuilder::Template::CGI_SSI REVISION
$Id: Template.pm 97 2007-02-06 17:10:39Z nwiger $ AUTHOR
Copyright (c) Nate Wiger <http://nateware.com>. All Rights Reserved. This module is free software; you may copy this under the terms of the GNU General Public License, or the Artistic License, copies of which should have accompanied your Perl kit. perl v5.14.2 2011-09-16 CGI::FormBuilder::Template(3pm)
All times are GMT -4. The time now is 05:29 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy