Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

mkdoc::xml(3pm) [debian man page]

MKDoc::XML(3pm) 					User Contributed Perl Documentation					   MKDoc::XML(3pm)

NAME
MKDoc::XML - The MKDoc XML Toolkit SYNOPSIS
This is an article, not a module. SUMMARY
MKDoc is a web content management system written in Perl which focuses on standards compliance, accessiblity and usability issues, and multi-lingual websites. At MKDoc Ltd we have decided to gradually break up our existing commercial software into a collection of completely independent, well- documented, well-tested open-source CPAN modules. Ultimately we want MKDoc code to be a coherent collection of module distributions, yet each distribution should be usable and useful in itself. MKDoc::XML is part of this effort. You could help us and turn some of MKDoc's code into a CPAN module. You can take a look at the existing code at http://download.mkdoc.org/. If you are interested in some functionality which you would like to see as a standalone CPAN module, send an email to <mkdoc-modules@lists.webarch.co.uk>. DISCLAIMER
MKDoc::XML is a low level XML library. MKDoc::XML::* modules do not make sure your XML is well-formed. MKDoc::XML::* modules can be used to work with somehow broken XML. MKDoc::XML::* modules should not be used as high-level parsers with general purpose XML unless you know what you're doing. WHAT'S IN THE BOX XML tokenizer MKDoc::XML::Tokenizer splits your XML / XHTML files into a list of MKDoc::XML::Token objects using a single regex. XML tree builder MKDoc::XML::TreeBuilder sits on top of MKDoc::XML::Tokenizer and builds parsed trees out of your XML / XHTML data. XML stripper MKDoc::XML::Stripper objects removes unwanted markup from your XML / HTML data. Useful to remove all those nasty presentational tags or 'style' attributes from your XHTML data for example. XML tagger MKDoc::XML::Tagger module matches expressions in XML / XHTML documents and tag them appropriately. For example, you could automatically hyperlink certain glossary words or add <abbr> tags based on a dictionary of abbreviations and acronyms. XML entity decoder MKDoc::XML::Decode is a pluggable, configurable entity expander module which currently supports html entities, numerical entities and basic xml entities. XML entity encoder MKDoc::XML::Encode does the exact reverse operation as MKDoc::XML::Decode. XML Dumper MKDoc::XML::Dumper serializes arbitrarily complex perl structures into XML strings. It is also able of doing the reverse operation, i.e. deserializing an XML string into a perl structure. AUTHOR
Copyright 2003 - MKDoc Holdings Ltd. Author: Jean-Michel Hiver This module is free software and is distributed under the same license as Perl itself. Use it at your own risk. SEE ALSO
Petal: http://search.cpan.org/dist/Petal/ MKDoc: http://www.mkdoc.com/ Help us open-source MKDoc. Join the mkdoc-modules mailing list: mkdoc-modules@lists.webarch.co.uk perl v5.10.1 2005-03-10 MKDoc::XML(3pm)

Check Out this Related Man Page

MKDoc::XML::Tokenizer(3pm)				User Contributed Perl Documentation				MKDoc::XML::Tokenizer(3pm)

NAME
MKDoc::XML::Tokenizer - Tokenize XML the REX way SYNOPSIS
my $tokens = MKDoc::XML::Tokenizer->process_data ($some_xml); foreach my $token (@{$tokens}) { print "'" . $token->as_string() . "' is text " if (defined $token->text()); print "'" . $token->as_string() . "' is a self closing tag " if (defined $token->tag_self_close()); print "'" . $token->as_string() . "' is an opening tag " if (defined $token->tag_open()); print "'" . $token->as_string() . "' is a closing tag " if (defined $token->tag_close()); print "'" . $token->as_string() . "' is a processing instruction " if (defined $token->pi()); print "'" . $token->as_string() . "' is a declaration " if (defined $token->declaration()); print "'" . $token->as_string() . "' is a comment " if (defined $token->comment()); print "'" . $token->as_string() . "' is a tag " if (defined $token->tag()); print "'" . $token->as_string() . "' is a pseudo-tag (NOT text and NOT tag) " if (defined $token->pseudotag()); print "'" . $token->as_string() . "' is a leaf token (NOT opening tag) " if (defined $token->leaf()); } SUMMARY
MKDoc::XML::Tokenizer is a module which uses Robert D. Cameron REX technique to parse XML (ignore the carriage returns): [^<]+|<(?:!(?:--(?:[^-]*-(?:[^-][^-]*-)*->?)?|[CDATA[(?:[^]]*](?:[^]]+]) *]+(?:[^]>][^]]*](?:[^]]+])*]+)*>)?|DOCTYPE(?:[ ]+(?:[A-Za-z_:]|[^ x00-x7F])(?:[A-Za-z0-9_:.-]|[^x00-x7F])*(?:[ ]+(?:(?:[A-Za-z_:]|[^ x00-x7F])(?:[A-Za-z0-9_:.-]|[^x00-x7F])*|"[^"]*"|'[^']*'))*(?:[ ]+) ?(?:[(?:<(?:!(?:--[^-]*-(?:[^-][^-]*-)*->|[^-](?:[^]"'><]+|"[^"]*"|'[^']*' )*>)|?(?:[A-Za-z_:]|[^x00-x7F])(?:[A-Za-z0-9_:.-]|[^x00-x7F])*(?:?>|[ n ][^?]*?+(?:[^>?][^?]*?+)*>))|%(?:[A-Za-z_:]|[^x00-x7F])(?:[A-Za-z0 -9_:.-]|[^x00-x7F])*;|[ ]+)*](?:[ ]+)?)?>?)?)?|?(?:(?:[A-Za-z _:]|[^x00-x7F])(?:[A-Za-z0-9_:.-]|[^x00-x7F])*(?:?>|[ ][^?]*?+(? :[^>?][^?]*?+)*>)?)?|/(?:(?:[A-Za-z_:]|[^x00-x7F])(?:[A-Za-z0-9_:.-]|[^x 00-x7F])*(?:[ ]+)?>?)?|(?:(?:[A-Za-z_:]|[^x00-x7F])(?:[A-Za-z0-9_:. -]|[^x00-x7F])*(?:[ ]+(?:[A-Za-z_:]|[^x00-x7F])(?:[A-Za-z0-9_:.-]| [^x00-x7F])*(?:[ ]+)?=(?:[ ]+)?(?:"[^<"]*"|'[^<']*'))*(?:[ t ]+)?/?>?)?) That's right. One big regex, and it works rather well. DISCLAIMER
This module does low level XML manipulation. It will somehow parse even broken XML and try to do something with it. Do not use it unless you know what you're doing. API
my $tokens = MKDoc::XML::Tokenizer->process_data ($some_xml); Splits $some_xml into a list of MKDoc::XML::Token objects and returns an array reference to the list of tokens. my $tokens = MKDoc::XML::Tokenizer->process_file ('/some/file.xml'); Same as MKDoc::XML::Tokenizer->process_data ($some_xml), except that it reads $some_xml from '/some/file.xml'. NOTES
MKDoc::XML::Tokenizer works with MKDoc::XML::Token, which can be used when building a full tree is not necessary. If you need to build a tree, look at MKDoc::XML::TreeBuilder. AUTHOR
Copyright 2003 - MKDoc Holdings Ltd. Author: Jean-Michel Hiver This module is free software and is distributed under the same license as Perl itself. Use it at your own risk. SEE ALSO
MKDoc::XML::Token MKDoc::XML::TreeBuilder perl v5.10.1 2004-10-06 MKDoc::XML::Tokenizer(3pm)
Man Page