Sponsored Content
Top Forums Shell Programming and Scripting extract fields from a downloaded html file Post 302624109 by gubbu on Sunday 15th of April 2012 10:41:45 PM
Old 04-15-2012
extract fields from a downloaded html file

I have around 100 html files and in each html file I have 5-6 such paragraphs of a company and I need to extract the Name of the company from either the one after "title" or "/company" and then the number of employees and finally the location .

HTML Code:
<div class="search_result">
        <div class="search_result_name">
          <a href="/company/BlahBlah" title="BlahBlah, Inc.">BlahBlah, Inc.</a>
        </div>
       
        </div>
  <div class="search_result_explanation">
          60 employees
        </div>
  <div class="search_result_explanation">
          Office in
  Palo Alto, CA, 94301, USA         

The output I want is just 3 columns
Company Employees Location
BlahBlah 60 Palo Alto

Any ideas are appreciated
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies

2. UNIX for Dummies Questions & Answers

Extract some common fields from 1 file that are presnt in another file

I have 2 files FILEA 720646363*PHILIPPINES 117183970*USA 116274291*USA 107940983*USA 107395824*USA 106632425*USA 105861926*USA 105208607*USA 053077046*USA 065428026*ENGLAND FILEB 001125236 001408905 002316511 002521094 020050725 035018308 052288735 (1 Reply)
Discussion started by: unxusr123
1 Replies

3. UNIX for Dummies Questions & Answers

extract fields from text file using delimiter!!

Hi All, I am new to unix scripting, please help me in solving this assignment.. I have a scenario, as follows: 1. i have a text file(read1.txt) with the following data sairam,123 kamal,122 etc.. 2. I have to write a unix... (6 Replies)
Discussion started by: G.K.K
6 Replies

4. Shell Programming and Scripting

Extract urls from index.html downloaded using wget

Hi, I need to basically get a list of all the tarballs located at uri I am currently doing a wget on urito get the index.html page Now this index page contains the list of uris that I want to use in my bash script. can someone please guide me ,. I am new to Linux and shell scripting. ... (5 Replies)
Discussion started by: mnanavati
5 Replies

5. UNIX for Dummies Questions & Answers

How to extract fields from etc/passwd file?

Hi! i want to extract from /etc/passwd file,the user and user info fileds, to a another file.I've tried this: cut -d ':' -f1 ':' -f6 < file but cut can be used to extract olny one field and not two. maybe with awk is this possible? (4 Replies)
Discussion started by: strawhatluffy
4 Replies

6. Shell Programming and Scripting

Extract expressions between two strings in html file

Hello guys, I'm trying to extract all the expressions between the following tags: <b></b> from a HTML file. This is how it looks: big lines containing several dozens expressions (made of 1,2,3,4,6 or even 7 words) I would like to extract: <b>bla ble</b>bla ble</td><tr valign="top"><td... (3 Replies)
Discussion started by: bobylapointe
3 Replies

7. UNIX for Dummies Questions & Answers

Extract table from an HTML file

I want to extract a table from an HTML file. the table starts with <table class="tableinfo" and ends with next closing table tag </table> how can I do this with awk/sed... ---------- Post updated at 04:34 PM ---------- Previous update was at 04:28 PM ---------- also I want to... (4 Replies)
Discussion started by: koutroul
4 Replies

8. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies

9. Shell Programming and Scripting

Extract both contents from a html file and do printing

Hi there, Print IP Address: grep 'HostID :' 10.244.9.124\ nessus.html | awk -F '<br>' '{print $12}' | tr -s ' ' | awk -F ':' '{print "<tr><td>" $2 "</td><td>"}' Print Respective Ports: grep 'classsubsection\|./tcp\|./udp' 10.244.9.124\ nessus.html | grep -v 'h2.classsubsection... (3 Replies)
Discussion started by: alvinoo
3 Replies

10. Shell Programming and Scripting

awk to extract multiple values from file and add two additional fields

In the attached file I am trying to use awk to extract multiple values and create the tab-delimited desired output. In the output R_Index is a the sequential # and Pre_Enrichment is defaulted to .. I can extract from the values to the side of the keywords, but most are above and I can not... (2 Replies)
Discussion started by: cmccabe
2 Replies
Data::Format::HTML(3pm) 				User Contributed Perl Documentation				   Data::Format::HTML(3pm)

NAME
Data::Format::HTML - Format Perl data structures into simple HTML SYNOPSIS
use Data::Format::HTML; my $f = Data::Format::HTML->new; my %hash = (simple => 'hash'); # Of course it's very unlikely that you won't deal ever with this # kind of structure, but HTML is able to hand it all anyway :) my $struct = { foo => 'bar', 1 => 2, 'hello' => 'goodbye', array_ref => [qw/one two three/], nested_hash => \%hash, [qw/1 2/] => sub { die; }, even_more => { arr => { 1 => [2, 3, 4], this_is_insane => { a => { b => { c => { d => { e => 'z'}}}}} }, }, }; $struct->{'Data::Format::HTML handles it all'} = $f; print $f->format(); And that will output the following insane, but possible, for the sake of showing, HTML: In theory you can pass any kind of Perl data structure to "format" and you will get its data HTML-formatted. TODO
o A LOT. ;) o Explain how CSS can prettify the tables (specification for everything) o Get CSS. o Better support for GLOB, CODE, REF and company. o Extend this documentation. SEE MORE
The author keeps the versioned code at GitHub at: http://github.com/damog/data-format-html/tree/master <http://github.com/damog/data- format-html/tree/master>. AUTHOR
David Moreno Garza, <david@axiombox.com> - <http://damog.net/> THANKS
To Raquel (<http://www.maggit.com.mx/>), who makes me happy every single day of my life. COPYRIGHT AND LICENSE
Copyright (C) 2008 by David Moreno Garza This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available. The Do What The Fuck You Want To public license also applies. It's really up to you. perl v5.12.3 2009-07-17 Data::Format::HTML(3pm)
All times are GMT -4. The time now is 02:14 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy