xml::sax::byrecord(3pm) [debian man page]

XML::SAX::ByRecord(3pm) 				User Contributed Perl Documentation				   XML::SAX::ByRecord(3pm)

NAME

       XML::SAX::ByRecord - Record oriented processing of (data) documents

SYNOPSIS

	   use XML::SAX::Machines qw( ByRecord ) ;

	   my $m = ByRecord(
	       "My::RecordFilter1",
	       "My::RecordFilter2",
	       ...
	       {
		   Handler => $h, ## optional
	       }
	   );

	   $m->parse_uri( "foo.xml" );

DESCRIPTION

       XML::SAX::ByRecord is a SAX machine that treats a document as a series of records.  Everything before and after the records is emitted as-
       is while the records are excerpted in to little mini-documents and run one at a time through the filter pipeline contained in ByRecord.

       The output is a document that has the same exact things before, after, and between the records that the input document did, but which has
       run each record through a filter.  So if a document has 10 records in it, the per-record filter pipeline will see 10 sets of (
       start_document, body of record, end_document ) events.  An example is below.

       This has several use cases:

       o   Big, record oriented documents

	   Big documents can be treated a record at a time with various DOM oriented processors like XML::Filter::XSLT.

       o   Streaming XML

	   Small sections of an XML stream can be run through a document processor without holding up the stream.

       o   Record oriented style sheets / processors

	   Sometimes it's just plain easier to write a style sheet or SAX filter that applies to a single record at at time, rather than having to
	   run through a series of records.

   Topology
       Here's how the innards look:

	  +-----------------------------------------------------------+
	  |		     An XML:SAX::ByRecord		      |
	  |    Intake						      |
	  |   +----------+    +---------+	  +--------+  Exhaust |
	--+-->| Splitter |--->| Stage_1 |-->...-->| Merger |----------+----->
	  |   +----------+    +---------+	  +--------+	      |
	  |		  			       ^	      |
	  |		   			       |	      |
	  |		    +---------->---------------+	      |
	  |		      Events not in any records 	      |
	  |							      |
	  +-----------------------------------------------------------+

       The "Splitter" is an XML::Filter::DocSplitter by default, and the "Merger" is an XML::Filter::Merger by default.  The line that bypasses
       the "Stage_1 ..." filter pipeline is used for all events that do not occur in a record.	All events that occur in a record pass through the
       filter pipeline.

   Example
       Here's a quick little filter to uppercase text content:

	   package My::Filter::Uc;

	   use vars qw( @ISA );
	   @ISA = qw( XML::SAX::Base );

	   use XML::SAX::Base;

	   sub characters {
	       my $self = shift;
	       my ( $data ) = @_;
	       $data->{Data} = uc $data->{Data};
	       $self->SUPER::characters( @_ );
	   }

       And here's a little machine that uses it:

	   $m = Pipeline(
	       ByRecord( "My::Filter::Uc" ),
	       $out,
	   );

       When fed a document like:

	   <root> a
	       <rec>b</rec> c
	       <rec>d</rec> e
	       <rec>f</rec> g
	   </root>

       the output looks like:

	   <root> a
	       <rec>B</rec> c
	       <rec>C</rec> e
	       <rec>D</rec> g
	   </root>

       and the My::Filter::Uc got three sets of events like:

	   start_document
	   start_element: <rec>
	   characters:	  'b'
	   end_element:   </rec>
	   end_document

	   start_document
	   start_element: <rec>
	   characters:	  'd'
	   end_element:   </rec>
	   end_document

	   start_document
	   start_element: <rec>
	   characters:	 'f'
	   end_element:   </rec>
	   end_document

METHODS

       new
	       my $d = XML::SAX::ByRecord->new( @channels, \%options );

	   Longhand for calling the ByRecord function exported by XML::SAX::Machines.

CREDIT

       Proposed by Matt Sergeant, with advise by Kip Hampton and Robin Berjon.

Writing an aggregator.
       To be written.  Pretty much just that "start_manifold_processing" and "end_manifold_processing" need to be provided.  See
       XML::Filter::Merger and it's source code for a starter.

perl v5.10.0							    2009-06-11						   XML::SAX::ByRecord(3pm)
Linux and UNIX Man Pages

xml::sax::byrecord(3pm) [debian man page]