Sponsored Content
Top Forums Shell Programming and Scripting extract fields from a downloaded html file Post 302624109 by gubbu on Sunday 15th of April 2012 10:41:45 PM
Old 04-15-2012
extract fields from a downloaded html file

I have around 100 html files and in each html file I have 5-6 such paragraphs of a company and I need to extract the Name of the company from either the one after "title" or "/company" and then the number of employees and finally the location .

HTML Code:
<div class="search_result">
        <div class="search_result_name">
          <a href="/company/BlahBlah" title="BlahBlah, Inc.">BlahBlah, Inc.</a>
        </div>
       
        </div>
  <div class="search_result_explanation">
          60 employees
        </div>
  <div class="search_result_explanation">
          Office in
  Palo Alto, CA, 94301, USA         

The output I want is just 3 columns
Company Employees Location
BlahBlah 60 Palo Alto

Any ideas are appreciated
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies

2. UNIX for Dummies Questions & Answers

Extract some common fields from 1 file that are presnt in another file

I have 2 files FILEA 720646363*PHILIPPINES 117183970*USA 116274291*USA 107940983*USA 107395824*USA 106632425*USA 105861926*USA 105208607*USA 053077046*USA 065428026*ENGLAND FILEB 001125236 001408905 002316511 002521094 020050725 035018308 052288735 (1 Reply)
Discussion started by: unxusr123
1 Replies

3. UNIX for Dummies Questions & Answers

extract fields from text file using delimiter!!

Hi All, I am new to unix scripting, please help me in solving this assignment.. I have a scenario, as follows: 1. i have a text file(read1.txt) with the following data sairam,123 kamal,122 etc.. 2. I have to write a unix... (6 Replies)
Discussion started by: G.K.K
6 Replies

4. Shell Programming and Scripting

Extract urls from index.html downloaded using wget

Hi, I need to basically get a list of all the tarballs located at uri I am currently doing a wget on urito get the index.html page Now this index page contains the list of uris that I want to use in my bash script. can someone please guide me ,. I am new to Linux and shell scripting. ... (5 Replies)
Discussion started by: mnanavati
5 Replies

5. UNIX for Dummies Questions & Answers

How to extract fields from etc/passwd file?

Hi! i want to extract from /etc/passwd file,the user and user info fileds, to a another file.I've tried this: cut -d ':' -f1 ':' -f6 < file but cut can be used to extract olny one field and not two. maybe with awk is this possible? (4 Replies)
Discussion started by: strawhatluffy
4 Replies

6. Shell Programming and Scripting

Extract expressions between two strings in html file

Hello guys, I'm trying to extract all the expressions between the following tags: <b></b> from a HTML file. This is how it looks: big lines containing several dozens expressions (made of 1,2,3,4,6 or even 7 words) I would like to extract: <b>bla ble</b>bla ble</td><tr valign="top"><td... (3 Replies)
Discussion started by: bobylapointe
3 Replies

7. UNIX for Dummies Questions & Answers

Extract table from an HTML file

I want to extract a table from an HTML file. the table starts with <table class="tableinfo" and ends with next closing table tag </table> how can I do this with awk/sed... ---------- Post updated at 04:34 PM ---------- Previous update was at 04:28 PM ---------- also I want to... (4 Replies)
Discussion started by: koutroul
4 Replies

8. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies

9. Shell Programming and Scripting

Extract both contents from a html file and do printing

Hi there, Print IP Address: grep 'HostID :' 10.244.9.124\ nessus.html | awk -F '<br>' '{print $12}' | tr -s ' ' | awk -F ':' '{print "<tr><td>" $2 "</td><td>"}' Print Respective Ports: grep 'classsubsection\|./tcp\|./udp' 10.244.9.124\ nessus.html | grep -v 'h2.classsubsection... (3 Replies)
Discussion started by: alvinoo
3 Replies

10. Shell Programming and Scripting

awk to extract multiple values from file and add two additional fields

In the attached file I am trying to use awk to extract multiple values and create the tab-delimited desired output. In the output R_Index is a the sequential # and Pre_Enrichment is defaulted to .. I can extract from the values to the side of the keywords, but most are above and I can not... (2 Replies)
Discussion started by: cmccabe
2 Replies
Authen::Simple::ActiveDirectory(3pm)			User Contributed Perl Documentation		      Authen::Simple::ActiveDirectory(3pm)

NAME
Authen::Simple::ActiveDirectory - Simple ActiveDirectory authentication SYNOPSIS
use Authen::Simple::ActiveDirectory; my $ad = Authen::Simple::ActiveDirectory->new( host => 'ad.company.com', principal => 'company.com' ); if ( $ad->authenticate( $username, $password ) ) { # successfull authentication } # or as a mod_perl Authen handler PerlModule Authen::Simple::Apache PerlModule Authen::Simple::ActiveDirectory PerlSetVar AuthenSimpleActiveDirectory_host "ad.company.com" PerlSetVar AuthenSimpleActiveDirectory_principal "company.com" <Location /protected> PerlAuthenHandler Authen::Simple::ActiveDirectory AuthType Basic AuthName "Protected Area" Require valid-user </Location> DESCRIPTION
Authenticate against Active Directory. This implementation differs from Authen::Simple::LDAP in way that it will try to bind directly as the users principial. METHODS
o new This method takes a hash of parameters. The following options are valid: o host Connection host, can be a hostname, IP number or a URI. Defaults to "localhost". host => ldap.company.com host => 10.0.0.1 host => ldap://ldap.company.com:389 host => ldaps://ldap.company.com o port Connection port, default to 389. May be overridden by host if host is a URI. port => 389 o timeout Connection timeout, defaults to 60. timeout => 60 o principal The suffix in users principal, usally the domain or forrest. Required. principal => 'company.com' o log Any object that supports "debug", "info", "error" and "warn". log => Log::Log4perl->get_logger('Authen::Simple::ActiveDirectory') o authenticate( $username, $password ) Returns true on success and false on failure. SEE ALSO
Authen::Simple::LDAP. Authen::Simple. Net::LDAP. AUTHOR
Christian Hansen "chansen@cpan.org" COPYRIGHT
This program is free software, you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.14.2 2012-04-23 Authen::Simple::ActiveDirectory(3pm)
All times are GMT -4. The time now is 01:12 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy