Sponsored Content
Top Forums Shell Programming and Scripting extract fields from a downloaded html file Post 302624109 by gubbu on Sunday 15th of April 2012 10:41:45 PM
Old 04-15-2012
extract fields from a downloaded html file

I have around 100 html files and in each html file I have 5-6 such paragraphs of a company and I need to extract the Name of the company from either the one after "title" or "/company" and then the number of employees and finally the location .

HTML Code:
<div class="search_result">
        <div class="search_result_name">
          <a href="/company/BlahBlah" title="BlahBlah, Inc.">BlahBlah, Inc.</a>
        </div>
       
        </div>
  <div class="search_result_explanation">
          60 employees
        </div>
  <div class="search_result_explanation">
          Office in
  Palo Alto, CA, 94301, USA         

The output I want is just 3 columns
Company Employees Location
BlahBlah 60 Palo Alto

Any ideas are appreciated
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies

2. UNIX for Dummies Questions & Answers

Extract some common fields from 1 file that are presnt in another file

I have 2 files FILEA 720646363*PHILIPPINES 117183970*USA 116274291*USA 107940983*USA 107395824*USA 106632425*USA 105861926*USA 105208607*USA 053077046*USA 065428026*ENGLAND FILEB 001125236 001408905 002316511 002521094 020050725 035018308 052288735 (1 Reply)
Discussion started by: unxusr123
1 Replies

3. UNIX for Dummies Questions & Answers

extract fields from text file using delimiter!!

Hi All, I am new to unix scripting, please help me in solving this assignment.. I have a scenario, as follows: 1. i have a text file(read1.txt) with the following data sairam,123 kamal,122 etc.. 2. I have to write a unix... (6 Replies)
Discussion started by: G.K.K
6 Replies

4. Shell Programming and Scripting

Extract urls from index.html downloaded using wget

Hi, I need to basically get a list of all the tarballs located at uri I am currently doing a wget on urito get the index.html page Now this index page contains the list of uris that I want to use in my bash script. can someone please guide me ,. I am new to Linux and shell scripting. ... (5 Replies)
Discussion started by: mnanavati
5 Replies

5. UNIX for Dummies Questions & Answers

How to extract fields from etc/passwd file?

Hi! i want to extract from /etc/passwd file,the user and user info fileds, to a another file.I've tried this: cut -d ':' -f1 ':' -f6 < file but cut can be used to extract olny one field and not two. maybe with awk is this possible? (4 Replies)
Discussion started by: strawhatluffy
4 Replies

6. Shell Programming and Scripting

Extract expressions between two strings in html file

Hello guys, I'm trying to extract all the expressions between the following tags: <b></b> from a HTML file. This is how it looks: big lines containing several dozens expressions (made of 1,2,3,4,6 or even 7 words) I would like to extract: <b>bla ble</b>bla ble</td><tr valign="top"><td... (3 Replies)
Discussion started by: bobylapointe
3 Replies

7. UNIX for Dummies Questions & Answers

Extract table from an HTML file

I want to extract a table from an HTML file. the table starts with <table class="tableinfo" and ends with next closing table tag </table> how can I do this with awk/sed... ---------- Post updated at 04:34 PM ---------- Previous update was at 04:28 PM ---------- also I want to... (4 Replies)
Discussion started by: koutroul
4 Replies

8. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies

9. Shell Programming and Scripting

Extract both contents from a html file and do printing

Hi there, Print IP Address: grep 'HostID :' 10.244.9.124\ nessus.html | awk -F '<br>' '{print $12}' | tr -s ' ' | awk -F ':' '{print "<tr><td>" $2 "</td><td>"}' Print Respective Ports: grep 'classsubsection\|./tcp\|./udp' 10.244.9.124\ nessus.html | grep -v 'h2.classsubsection... (3 Replies)
Discussion started by: alvinoo
3 Replies

10. Shell Programming and Scripting

awk to extract multiple values from file and add two additional fields

In the attached file I am trying to use awk to extract multiple values and create the tab-delimited desired output. In the output R_Index is a the sequential # and Pre_Enrichment is defaulted to .. I can extract from the values to the side of the keywords, but most are above and I can not... (2 Replies)
Discussion started by: cmccabe
2 Replies
LWP::Authen::Ntlm(3)					User Contributed Perl Documentation				      LWP::Authen::Ntlm(3)

NAME
LWP::Authen::Ntlm - Library for enabling NTLM authentication (Microsoft) in LWP SYNOPSIS
use LWP::UserAgent; use HTTP::Request::Common; my $url = 'http://www.company.com/protected_page.html'; # Set up the ntlm client and then the base64 encoded ntlm handshake message my $ua = new LWP::UserAgent(keep_alive=>1); $ua->credentials('www.company.com:80', '', "MyDomain\MyUserCode", 'MyPassword'); $request = GET $url; print "--Performing request now...----------- "; $response = $ua->request($request); print "--Done with request------------------- "; if ($response->is_success) {print "It worked!->" . $response->code . " "} else {print "It didn't work!->" . $response->code . " "} DESCRIPTION
"LWP::Authen::Ntlm" allows LWP to authenticate against servers that are using the NTLM authentication scheme popularized by Microsoft. This type of authentication is common on intranets of Microsoft-centric organizations. The module takes advantage of the Authen::NTLM module by Mark Bush. Since there is also another Authen::NTLM module available from CPAN by Yee Man Chan with an entirely different interface, it is necessary to ensure that you have the correct NTLM module. In addition, there have been problems with incompatibilities between different versions of Mime::Base64, which Bush's Authen::NTLM makes use of. Therefore, it is necessary to ensure that your Mime::Base64 module supports exporting of the encode_base64 and decode_base64 functions. USAGE
The module is used indirectly through LWP, rather than including it directly in your code. The LWP system will invoke the NTLM authentication when it encounters the authentication scheme while attempting to retrieve a URL from a server. In order for the NTLM authentication to work, you must have a few things set up in your code prior to attempting to retrieve the URL: o Enable persistent HTTP connections To do this, pass the "keep_alive=>1" option to the LWP::UserAgent when creating it, like this: my $ua = new LWP::UserAgent(keep_alive=>1); o Set the credentials on the UserAgent object The credentials must be set like this: $ua->credentials('www.company.com:80', '', "MyDomain\MyUserCode", 'MyPassword'); Note that you cannot use the HTTP::Request object's authorization_basic() method to set the credentials. Note, too, that the 'www.company.com:80' portion only sets credentials on the specified port AND it is case-sensitive (this is due to the way LWP is coded, and has nothing to do with LWP::Authen::Ntlm) AVAILABILITY
General queries regarding LWP should be made to the LWP Mailing List. Questions specific to LWP::Authen::Ntlm can be forwarded to jtillman@bigfoot.com COPYRIGHT
Copyright (c) 2002 James Tillman. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. SEE ALSO
LWP, LWP::UserAgent, lwpcook. perl v5.12.1 2009-06-15 LWP::Authen::Ntlm(3)
All times are GMT -4. The time now is 07:30 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy