Here is a possible solution. The script SS_WebPageToCSV ( http://www.biterscripting.com/SS_WebPageToCSV.html ) exactly does what you need. It takes a URL and a table number, and extracts the data in that table into a CSV. The output by default is written to screen. But, you can redirect the CSV data to a CSV file. Here are couple of example commands.
Or,
First command will show the output on screen. Second command will create the CSV file "Output.CSV" (in current directory) with the data from the table.
The number of the table you want to extract (an HTML document may have more than one table), is supplied thru the number() argument to the script. The URL is supplied thru the page() argument. It can extract tables from many document types - .html, .php, .asp, etc.
I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part.
Same problem happens in "type" command in MS-DOS.
I know you can do it by opening it in Internet Explorer,... (4 Replies)
Hi everyone:
I want to extract string which is in between certain html tag.
e.g.
I tried with grep,cut, awk but could not find exact syntax for this one. :wall:
PS>Sorry about bad english. (8 Replies)
Hi,
I have the following code in my xml file:
<aaaRule loginIdPattern=".*"
orgIdPattern=".*" deny="false" />
<aaaRuleGroup name="dpaas">
<aaaRule loginIdPattern=".*" orgIdPattern=".*"
deny="false" />
I want to retrieve orgIdPattern and loginIdPattern parameter value based on... (2 Replies)
I have an XML tag like this:
<property name="agent" value="/var/tmp/root/eclipse" />
Is there way using awk that i can get the value from the above tag. So the output should be:
/var/tmp/root/eclipse
Help will be appreciated.
Regards,
Adi (6 Replies)
I have a xml file in where I need to parse only a particular tag and print the output in the shell script.
Here is the tag info in the xml file
<dp:file> This is dp file output </dp:file>
Output should be printed as
This is dp file output.
Please help.Thank you. (5 Replies)
Hi
I am new to string extractions in shell script... I am trying to extract a string such as #1753 from html tag looks like below.
<a class="model-link tl-tr" href="lastSuccessfulBuild/">Last successful build (#1753), 40 min ago</a>
and want the value as
1753
Could someone help me to... (3 Replies)
I want to print from <fruits> to </fruits> tag which have <fruit> as mango. Also i want both <fruits> and </fruits> in output. Please help
eg.
<fruits>
<fruit id="111">mango<fruit>
.
another 20 lines
.
</fruits> (3 Replies)
Hi Guys
Here is my Input :
<?xml version="1.0" encoding="UTF-8"?>
<xn:MeContext id="01736">
<xn:VsDataContainer id="01736">
<xn:attributes>
<xn:vsDataType>vsDataMeContext</xn:vsDataType>
... (12 Replies)
Hello,
I want to parse the contents of a multiline html tag
ex:
<html>
<body>
<p>some other text</p>
<div>
<p class="margin-bottom-0">
text1
<br>
text2
<br>
<br>
text3
</p>
</div>
</body> (15 Replies)
Discussion started by: SorcRR
15 Replies
LEARN ABOUT CENTOS
html::parse
HTML::Parse(3) User Contributed Perl Documentation HTML::Parse(3)NAME
HTML::Parse - Deprecated, a wrapper around HTML::TreeBuilder
VERSION
This document describes version 5.03 of HTML::Parse, released September 22, 2012 as part of HTML-Tree.
SYNOPSIS
See the documentation for HTML::TreeBuilder
DESCRIPTION
Disclaimer: This module is provided only for backwards compatibility with earlier versions of this library. New code should not use this
module, and should really use the HTML::Parser and HTML::TreeBuilder modules directly, instead.
The "HTML::Parse" module provides functions to parse HTML documents. There are two functions exported by this module:
parse_html($html) or parse_html($html, $obj)
This function is really just a synonym for $obj->parse($html) and $obj is assumed to be a subclass of "HTML::Parser". Refer to
HTML::Parser for more documentation.
If $obj is not specified, the $obj will default to an internally created new "HTML::TreeBuilder" object configured with
strict_comment() turned on. That class implements a parser that builds (and is) a HTML syntax tree with HTML::Element objects as
nodes.
The return value from parse_html() is $obj.
parse_htmlfile($file, [$obj])
Same as parse_html(), but pulls the HTML to parse, from the named file.
Returns "undef" if the file could not be opened, or $obj otherwise.
When a "HTML::TreeBuilder" object is created, the following variables control how parsing takes place:
$HTML::Parse::IMPLICIT_TAGS
Setting this variable to true will instruct the parser to try to deduce implicit elements and implicit end tags. If this variable is
false you get a parse tree that just reflects the text as it stands. Might be useful for quick & dirty parsing. Default is true.
Implicit elements have the implicit() attribute set.
$HTML::Parse::IGNORE_UNKNOWN
This variable contols whether unknow tags should be represented as elements in the parse tree. Default is true.
$HTML::Parse::IGNORE_TEXT
Do not represent the text content of elements. This saves space if all you want is to examine the structure of the document. Default
is false.
$HTML::Parse::WARN
Call warn() with an appropriate message for syntax errors. Default is false.
REMEMBER!
HTML::TreeBuilder objects should be explicitly destroyed when you're finished with them. See HTML::TreeBuilder.
SEE ALSO
HTML::Parser, HTML::TreeBuilder, HTML::Element
AUTHOR
Current maintainers:
o Christopher J. Madsen "<perl AT cjmweb.net>"
o Jeff Fearn "<jfearn AT cpan.org>"
Original HTML-Tree author:
o Gisle Aas
Former maintainers:
o Sean M. Burke
o Andy Lester
o Pete Krawczyk "<petek AT cpan.org>"
You can follow or contribute to HTML-Tree's development at <http://github.com/madsen/HTML-Tree>.
COPYRIGHT AND LICENSE
Copyright 1995-1998 Gisle Aas, 1999-2004 Sean M. Burke, 2005 Andy Lester, 2006 Pete Krawczyk, 2010 Jeff Fearn, 2012 Christopher J. Madsen.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The programs in this library are distributed in the hope that they will be useful, but without any warranty; without even the implied
warranty of merchantability or fitness for a particular purpose.
perl v5.16.3 2014-06-10 HTML::Parse(3)