04-15-2012
extract fields from a downloaded html file
I have around 100 html files and in each html file I have 5-6 such paragraphs of a company and I need to extract the Name of the company from either the one after "title" or "/company" and then the number of employees and finally the location .
HTML Code:
<div class="search_result">
<div class="search_result_name">
<a href="/company/BlahBlah" title="BlahBlah, Inc.">BlahBlah, Inc.</a>
</div>
</div>
<div class="search_result_explanation">
60 employees
</div>
<div class="search_result_explanation">
Office in
Palo Alto, CA, 94301, USA
The output I want is just 3 columns
Company Employees Location
BlahBlah 60 Palo Alto
Any ideas are appreciated
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part.
Same problem happens in "type" command in MS-DOS.
I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies
2. UNIX for Dummies Questions & Answers
I have 2 files
FILEA
720646363*PHILIPPINES
117183970*USA
116274291*USA
107940983*USA
107395824*USA
106632425*USA
105861926*USA
105208607*USA
053077046*USA
065428026*ENGLAND
FILEB
001125236
001408905
002316511
002521094
020050725
035018308
052288735 (1 Reply)
Discussion started by: unxusr123
1 Replies
3. UNIX for Dummies Questions & Answers
Hi All,
I am new to unix scripting, please help me in solving this assignment..
I have a scenario, as follows:
1. i have a text file(read1.txt) with the following data
sairam,123
kamal,122
etc..
2. I have to write a unix... (6 Replies)
Discussion started by: G.K.K
6 Replies
4. Shell Programming and Scripting
Hi,
I need to basically get a list of all the tarballs located at uri
I am currently doing a wget on urito get the index.html page
Now this index page contains the list of uris that I want to use in my bash script.
can someone please guide me ,.
I am new to Linux and shell scripting.
... (5 Replies)
Discussion started by: mnanavati
5 Replies
5. UNIX for Dummies Questions & Answers
Hi!
i want to extract from /etc/passwd file,the user and user info fileds, to a another file.I've tried this:
cut -d ':' -f1 ':' -f6 < file
but cut can be used to extract olny one field and not two.
maybe with awk is this possible? (4 Replies)
Discussion started by: strawhatluffy
4 Replies
6. Shell Programming and Scripting
Hello guys,
I'm trying to extract all the expressions between the following tags: <b></b> from a HTML file.
This is how it looks: big lines containing several dozens expressions (made of 1,2,3,4,6 or even 7 words) I would like to extract:
<b>bla ble</b>bla ble</td><tr valign="top"><td... (3 Replies)
Discussion started by: bobylapointe
3 Replies
7. UNIX for Dummies Questions & Answers
I want to extract a table from an HTML file. the table starts with
<table class="tableinfo"
and ends with next closing table tag
</table>
how can I do this with awk/sed...
---------- Post updated at 04:34 PM ---------- Previous update was at 04:28 PM ----------
also I want to... (4 Replies)
Discussion started by: koutroul
4 Replies
8. Shell Programming and Scripting
Hi
This is my first post and I'm just a beginner. So please be nice to me.
I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file?
I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies
9. Shell Programming and Scripting
Hi there,
Print IP Address:
grep 'HostID :' 10.244.9.124\ nessus.html | awk -F '<br>' '{print $12}' | tr -s ' ' | awk -F ':' '{print "<tr><td>" $2 "</td><td>"}'
Print Respective Ports:
grep 'classsubsection\|./tcp\|./udp' 10.244.9.124\ nessus.html | grep -v 'h2.classsubsection... (3 Replies)
Discussion started by: alvinoo
3 Replies
10. Shell Programming and Scripting
In the attached file I am trying to use awk to extract multiple values and create the tab-delimited desired output.
In the output R_Index is a the sequential # and Pre_Enrichment is defaulted to ..
I can extract from the values to the side of the keywords, but most are above and I can not... (2 Replies)
Discussion started by: cmccabe
2 Replies
LEARN ABOUT DEBIAN
authen::simple::activedirectory
Authen::Simple::ActiveDirectory(3pm) User Contributed Perl Documentation Authen::Simple::ActiveDirectory(3pm)
NAME
Authen::Simple::ActiveDirectory - Simple ActiveDirectory authentication
SYNOPSIS
use Authen::Simple::ActiveDirectory;
my $ad = Authen::Simple::ActiveDirectory->new(
host => 'ad.company.com',
principal => 'company.com'
);
if ( $ad->authenticate( $username, $password ) ) {
# successfull authentication
}
# or as a mod_perl Authen handler
PerlModule Authen::Simple::Apache
PerlModule Authen::Simple::ActiveDirectory
PerlSetVar AuthenSimpleActiveDirectory_host "ad.company.com"
PerlSetVar AuthenSimpleActiveDirectory_principal "company.com"
<Location /protected>
PerlAuthenHandler Authen::Simple::ActiveDirectory
AuthType Basic
AuthName "Protected Area"
Require valid-user
</Location>
DESCRIPTION
Authenticate against Active Directory.
This implementation differs from Authen::Simple::LDAP in way that it will try to bind directly as the users principial.
METHODS
o new
This method takes a hash of parameters. The following options are valid:
o host
Connection host, can be a hostname, IP number or a URI. Defaults to "localhost".
host => ldap.company.com
host => 10.0.0.1
host => ldap://ldap.company.com:389
host => ldaps://ldap.company.com
o port
Connection port, default to 389. May be overridden by host if host is a URI.
port => 389
o timeout
Connection timeout, defaults to 60.
timeout => 60
o principal
The suffix in users principal, usally the domain or forrest. Required.
principal => 'company.com'
o log
Any object that supports "debug", "info", "error" and "warn".
log => Log::Log4perl->get_logger('Authen::Simple::ActiveDirectory')
o authenticate( $username, $password )
Returns true on success and false on failure.
SEE ALSO
Authen::Simple::LDAP.
Authen::Simple.
Net::LDAP.
AUTHOR
Christian Hansen "chansen@cpan.org"
COPYRIGHT
This program is free software, you can redistribute it and/or modify it under the same terms as Perl itself.
perl v5.14.2 2012-04-23 Authen::Simple::ActiveDirectory(3pm)