11-28-2007
Quote:
Originally Posted by
LanceBoyles
Use Lynx with the --dump option, like this:
lynx --dump myfile.html > myfile.txt
OR
lynx --dump
http://some.where.com/whatever.html > myfile.txt
You can write a shell script that will do this for many files without you having to touch it.
hi
i can not get the lynx command on linux.
what should i do?
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi!
I have a bunch of HTML files, which I want to parse to CSV files. Every page has a table in it, and I need to parse each row into a csv record.
With awk and sed, I managed to put every table row in separate lines. So my file looks like this:
<TR> .... </TR>
<TR> .... </TR>
...One... (1 Reply)
Discussion started by: senszey
1 Replies
2. Shell Programming and Scripting
I am attempting to extract weather data from the following website, but for the Victoria area only:
Text Forecasts - Environment Canada
I use this:
sed -n "/Greater Victoria./,/Fraser Valley./p"
But that phrasing does not sometimes get it all and think perhaps the website has more... (2 Replies)
Discussion started by: lagagnon
2 Replies
3. Shell Programming and Scripting
Hi there, I'm quite new to the forum and shell scripting.
I want to filter out the "166.0 points". The results, that i found in google / the forum search didn't helped me :(
<a href="/user/test" class="headitem menu" style="color:rgb(83,186,224);">test</a><a href="/points" class="headitem... (1 Reply)
Discussion started by: Mysthik
1 Replies
4. Shell Programming and Scripting
I tried to find elegant (or at least simple) way to remove all but couple of html tags from html file, but all examples I found dealt with removing all the tags.
The logic of the script would be:
- if there is <li> or <ul> on the line, do nothing (=write same line to output)
- if there is:... (0 Replies)
Discussion started by: juubuntu
0 Replies
5. Shell Programming and Scripting
Hi,
i have 30 html files and i want to add the html tag first (<html>) and end of the line </html> tag..How to do it in script.
Thanks, (7 Replies)
Discussion started by: bmk
7 Replies
6. Shell Programming and Scripting
I want to print from <fruits> to </fruits> tag which have <fruit> as mango. Also i want both <fruits> and </fruits> in output. Please help
eg.
<fruits>
<fruit id="111">mango<fruit>
.
another 20 lines
.
</fruits> (3 Replies)
Discussion started by: Ashik409
3 Replies
7. UNIX for Dummies Questions & Answers
I want to extract a table from an HTML file. the table starts with
<table class="tableinfo"
and ends with next closing table tag
</table>
how can I do this with awk/sed...
---------- Post updated at 04:34 PM ---------- Previous update was at 04:28 PM ----------
also I want to... (4 Replies)
Discussion started by: koutroul
4 Replies
8. Shell Programming and Scripting
Hi
This is my first post and I'm just a beginner. So please be nice to me.
I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file?
I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies
9. Shell Programming and Scripting
Hi there,
Print IP Address:
grep 'HostID :' 10.244.9.124\ nessus.html | awk -F '<br>' '{print $12}' | tr -s ' ' | awk -F ':' '{print "<tr><td>" $2 "</td><td>"}'
Print Respective Ports:
grep 'classsubsection\|./tcp\|./udp' 10.244.9.124\ nessus.html | grep -v 'h2.classsubsection... (3 Replies)
Discussion started by: alvinoo
3 Replies
10. Shell Programming and Scripting
I am trying to extract text after keywords fron an html file. The keywords are reportLink":, "barcodedSamples": {", "barcodedSamples": {". Both the perl and awk run but the output is just the entire index.html not the desired output. Also for the reportLink": only the text after the second / until... (5 Replies)
Discussion started by: cmccabe
5 Replies
LEARN ABOUT DEBIAN
mojo::dom::html
Mojo::DOM::HTML(3pm) User Contributed Perl Documentation Mojo::DOM::HTML(3pm)
NAME
Mojo::DOM::HTML - HTML5/XML engine
SYNOPSIS
use Mojo::DOM::HTML;
# Turn HTML5 into DOM tree
my $html = Mojo::DOM::HTML->new;
$html->parse('<div><p id="a">A</p><p id="b">B</p></div>');
my $tree = $html->tree;
DESCRIPTION
Mojo::DOM::HTML is the HTML5/XML engine used by Mojo::DOM.
ATTRIBUTES
Mojo::DOM::HTML implements the following attributes.
"charset"
my $charset = $html->charset;
$html = $html->charset('UTF-8');
Charset used for decoding and encoding HTML5/XML.
"tree"
my $tree = $html->tree;
$html = $html->tree(['root', [qw(text lalala)]]);
Document Object Model.
"xml"
my $xml = $html->xml;
$html = $html->xml(1);
Disable HTML5 semantics in parser and activate case sensitivity, defaults to auto detection based on processing instructions.
METHODS
Mojo::DOM::HTML inherits all methods from Mojo::Base and implements the following new ones.
"parse"
$html = $html->parse('<foo bar="baz">test</foo>');
Parse HTML5/XML document.
"render"
my $xml = $html->render;
Render DOM to XML.
SEE ALSO
Mojolicious, Mojolicious::Guides, <http://mojolicio.us>.
perl v5.14.2 2012-09-05 Mojo::DOM::HTML(3pm)