04-15-2012
extract fields from a downloaded html file
I have around 100 html files and in each html file I have 5-6 such paragraphs of a company and I need to extract the Name of the company from either the one after "title" or "/company" and then the number of employees and finally the location .
HTML Code:
<div class="search_result">
<div class="search_result_name">
<a href="/company/BlahBlah" title="BlahBlah, Inc.">BlahBlah, Inc.</a>
</div>
</div>
<div class="search_result_explanation">
60 employees
</div>
<div class="search_result_explanation">
Office in
Palo Alto, CA, 94301, USA
The output I want is just 3 columns
Company Employees Location
BlahBlah 60 Palo Alto
Any ideas are appreciated
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part.
Same problem happens in "type" command in MS-DOS.
I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies
2. UNIX for Dummies Questions & Answers
I have 2 files
FILEA
720646363*PHILIPPINES
117183970*USA
116274291*USA
107940983*USA
107395824*USA
106632425*USA
105861926*USA
105208607*USA
053077046*USA
065428026*ENGLAND
FILEB
001125236
001408905
002316511
002521094
020050725
035018308
052288735 (1 Reply)
Discussion started by: unxusr123
1 Replies
3. UNIX for Dummies Questions & Answers
Hi All,
I am new to unix scripting, please help me in solving this assignment..
I have a scenario, as follows:
1. i have a text file(read1.txt) with the following data
sairam,123
kamal,122
etc..
2. I have to write a unix... (6 Replies)
Discussion started by: G.K.K
6 Replies
4. Shell Programming and Scripting
Hi,
I need to basically get a list of all the tarballs located at uri
I am currently doing a wget on urito get the index.html page
Now this index page contains the list of uris that I want to use in my bash script.
can someone please guide me ,.
I am new to Linux and shell scripting.
... (5 Replies)
Discussion started by: mnanavati
5 Replies
5. UNIX for Dummies Questions & Answers
Hi!
i want to extract from /etc/passwd file,the user and user info fileds, to a another file.I've tried this:
cut -d ':' -f1 ':' -f6 < file
but cut can be used to extract olny one field and not two.
maybe with awk is this possible? (4 Replies)
Discussion started by: strawhatluffy
4 Replies
6. Shell Programming and Scripting
Hello guys,
I'm trying to extract all the expressions between the following tags: <b></b> from a HTML file.
This is how it looks: big lines containing several dozens expressions (made of 1,2,3,4,6 or even 7 words) I would like to extract:
<b>bla ble</b>bla ble</td><tr valign="top"><td... (3 Replies)
Discussion started by: bobylapointe
3 Replies
7. UNIX for Dummies Questions & Answers
I want to extract a table from an HTML file. the table starts with
<table class="tableinfo"
and ends with next closing table tag
</table>
how can I do this with awk/sed...
---------- Post updated at 04:34 PM ---------- Previous update was at 04:28 PM ----------
also I want to... (4 Replies)
Discussion started by: koutroul
4 Replies
8. Shell Programming and Scripting
Hi
This is my first post and I'm just a beginner. So please be nice to me.
I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file?
I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies
9. Shell Programming and Scripting
Hi there,
Print IP Address:
grep 'HostID :' 10.244.9.124\ nessus.html | awk -F '<br>' '{print $12}' | tr -s ' ' | awk -F ':' '{print "<tr><td>" $2 "</td><td>"}'
Print Respective Ports:
grep 'classsubsection\|./tcp\|./udp' 10.244.9.124\ nessus.html | grep -v 'h2.classsubsection... (3 Replies)
Discussion started by: alvinoo
3 Replies
10. Shell Programming and Scripting
In the attached file I am trying to use awk to extract multiple values and create the tab-delimited desired output.
In the output R_Index is a the sequential # and Pre_Enrichment is defaulted to ..
I can extract from the values to the side of the keywords, but most are above and I can not... (2 Replies)
Discussion started by: cmccabe
2 Replies
LEARN ABOUT SUSE
lwp::authen::ntlm
LWP::Authen::Ntlm(3) User Contributed Perl Documentation LWP::Authen::Ntlm(3)
NAME
LWP::Authen::Ntlm - Library for enabling NTLM authentication (Microsoft) in LWP
SYNOPSIS
use LWP::UserAgent;
use HTTP::Request::Common;
my $url = 'http://www.company.com/protected_page.html';
# Set up the ntlm client and then the base64 encoded ntlm handshake message
my $ua = new LWP::UserAgent(keep_alive=>1);
$ua->credentials('www.company.com:80', '', "MyDomain\MyUserCode", 'MyPassword');
$request = GET $url;
print "--Performing request now...-----------
";
$response = $ua->request($request);
print "--Done with request-------------------
";
if ($response->is_success) {print "It worked!->" . $response->code . "
"}
else {print "It didn't work!->" . $response->code . "
"}
DESCRIPTION
"LWP::Authen::Ntlm" allows LWP to authenticate against servers that are using the NTLM authentication scheme popularized by Microsoft.
This type of authentication is common on intranets of Microsoft-centric organizations.
The module takes advantage of the Authen::NTLM module by Mark Bush. Since there is also another Authen::NTLM module available from CPAN by
Yee Man Chan with an entirely different interface, it is necessary to ensure that you have the correct NTLM module.
In addition, there have been problems with incompatibilities between different versions of Mime::Base64, which Bush's Authen::NTLM makes
use of. Therefore, it is necessary to ensure that your Mime::Base64 module supports exporting of the encode_base64 and decode_base64
functions.
USAGE
The module is used indirectly through LWP, rather than including it directly in your code. The LWP system will invoke the NTLM
authentication when it encounters the authentication scheme while attempting to retrieve a URL from a server. In order for the NTLM
authentication to work, you must have a few things set up in your code prior to attempting to retrieve the URL:
o Enable persistent HTTP connections
To do this, pass the "keep_alive=>1" option to the LWP::UserAgent when creating it, like this:
my $ua = new LWP::UserAgent(keep_alive=>1);
o Set the credentials on the UserAgent object
The credentials must be set like this:
$ua->credentials('www.company.com:80', '', "MyDomain\MyUserCode", 'MyPassword');
Note that you cannot use the HTTP::Request object's authorization_basic() method to set the credentials. Note, too, that the
'www.company.com:80' portion only sets credentials on the specified port AND it is case-sensitive (this is due to the way LWP is coded,
and has nothing to do with LWP::Authen::Ntlm)
AVAILABILITY
General queries regarding LWP should be made to the LWP Mailing List.
Questions specific to LWP::Authen::Ntlm can be forwarded to jtillman@bigfoot.com
COPYRIGHT
Copyright (c) 2002 James Tillman. All rights reserved. This program is free software; you can redistribute it and/or modify it under the
same terms as Perl itself.
SEE ALSO
LWP, LWP::UserAgent, lwpcook.
perl v5.12.1 2009-06-15 LWP::Authen::Ntlm(3)