HTML::TreeBuilder::LibXML(3pm) User Contributed Perl Documentation HTML::TreeBuilder::LibXML(3pm)NAME
HTML::TreeBuilder::LibXML - HTML::TreeBuilder and XPath compatible interface with libxml
SYNOPSIS
use HTML::TreeBuilder::LibXML;
my $tree = HTML::TreeBuilder::LibXML->new;
$tree->parse($html);
$tree->eof;
# $tree and $node compatible to HTML::Element
my @nodes = $tree->findvalue($xpath);
for my $node (@nodes) {
print $node->tag;
my %attr = $node->all_external_attr;
}
HTML::TreeBuilder::LibXML->replace_original(); # replace HTML::TreeBuilder::XPath->new
DESCRIPTION
HTML::TreeBuilder::XPath is libxml based compatible interface to HTML::TreeBuilder, which could be slow for a large document.
HTML::TreeBuilder::LibXML is drop-in-replacement for HTML::TreeBuilder::XPath.
This module doesn't implement all of HTML::TreeBuilder and HTML::Element APIs, but enough methods are defined so modules like Web::Scraper
work.
BENCHMARK
This is a benchmark result by tools/benchmark.pl
Web::Scraper: 0.26
HTML::TreeBuilder::XPath: 0.09
HTML::TreeBuilder::LibXML: 0.01_01
Rate no_libxml use_libxml
no_libxml 5.45/s -- -94%
use_libxml 94.3/s 1632% --
AUTHOR
Tokuhiro Matsuno <tokuhirom slkjfd gmail.com>
Tatsuhiko Miyagawa <miyagawa@cpan.org>
Masahiro Chiba
THANKS TO
woremacx++ http://d.hatena.ne.jp/woremacx/20080202/1201927162
id:dailyflower
SEE ALSO
HTML::TreeBuilder, HTML::TreeBuilder::XPath
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
perl v5.14.2 2012-04-02 HTML::TreeBuilder::LibXML(3pm)
Check Out this Related Man Page
XML::TreeBuilder(3) User Contributed Perl Documentation XML::TreeBuilder(3)NAME
XML::TreeBuilder - Parser that builds a tree of XML::Element objects
SYNOPSIS
foreach my $file_name (@ARGV) {
my $tree = XML::TreeBuilder->new({ 'NoExpand' => 0, 'ErrorContext' => 0 }); # empty tree
$tree->parse_file($file_name);
print "Hey, here's a dump of the parse tree of $file_name:
";
$tree->dump; # a method we inherit from XML::Element
print "And here it is, bizarrely rerendered as XML:
",
$tree->as_XML, "
";
# Now that we're done with it, we must destroy it.
$tree = $tree->delete;
}
DESCRIPTION
This module uses XML::Parser to make XML document trees constructed of XML::Element objects (and XML::Element is a subclass of
HTML::Element adapted for XML). XML::TreeBuilder is meant particularly for people who are used to the HTML::TreeBuilder / HTML::Element
interface to document trees, and who don't want to learn some other document interface like XML::Twig or XML::DOM.
The way to use this class is to:
1. start a new (empty) XML::TreeBuilder object.
2. set any of the "store" options you want.
3. then parse the document from a source by calling "$x->parsefile(...)" or "$x->parse(...)" (See XML::Parser docs for the options that
these two methods take)
4. do whatever you need to do with the syntax tree, presumably involving traversing it looking for some bit of information in it,
5. and finally, when you're done with the tree, call $tree->delete to erase the contents of the tree from memory. This kind of thing
usually isn't necessary with most Perl objects, but it's necessary for TreeBuilder objects. See HTML::Element for a more verbose
explanation of why this is the case.
METHODS AND ATTRIBUTES
XML::TreeBuilder is a subclass of XML::Element, which in turn is a subclass of HTML:Element. You should read and understand the
documentation for those two modules.
An XML::TreeBuilder object is just a special XML::Element object that allows you to call these additional methods:
$root = XML::TreeBuilder->new()
Construct a new XML::TreeBuilder object.
Parameters:
NoExpand
Passed to XML::Parser. Do not Expand external entities.
Deafult: undef
ErrorContext
Passed to XML::Parser. Number of context lines to generate on errors.
Deafult: undef
$root->eof
Deletes parser object.
$root->parse(...options...)
Uses XML::Parser's "parse" method to parse XML from the source(s?) specified by the options. See XML::Parse
$root->parsefile(...options...)
Uses XML::Parser's "parsefile" method to parse XML from the source(s?) specified by the options. See XML::Parse
$root->parse_file(...options...)
Simply an alias for "parsefile".
$root->store_comments(value)
This determines whether TreeBuilder will normally store comments found while parsing content into $root. Currently, this is off by
default.
$root->store_declarations(value)
This determines whether TreeBuilder will normally store markup declarations found while parsing content into $root. Currently, this is
off by default.
$root->store_pis(value)
This determines whether TreeBuilder will normally store processing instructions found while parsing content into $root. Currently,
this is off (false) by default.
$root->store_cdata(value)
This determines whether TreeBuilder will normally store CDATA sectitons found while parsing content into $root. Adds a ~cdata node.
Currently, this is off (false) by default.
SEE ALSO
XML::Parser, XML::Element, HTML::TreeBuilder, HTML::DOMbo.
And for alternate XML document interfaces, XML::DOM and XML::Twig.
COPYRIGHT AND DISCLAIMERS
Copyright (c) 2000,2004 Sean M. Burke. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of
merchantability or fitness for a particular purpose.
AUTHOR
Current Author: Jeff Fearn <jfearn@cpan.org>.
Former Authors: Sean M. Burke, <sburke@cpan.org>
perl v5.16.3 2014-06-09 XML::TreeBuilder(3)