Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

xml::um(3pm) [debian man page]

UM(3pm) 						User Contributed Perl Documentation						   UM(3pm)

NAME
XML::UM - Convert UTF-8 strings to any encoding supported by XML::Encoding SYNOPSIS
use XML::UM; # Set directory with .xml files that comes with XML::Encoding distribution # Always include the trailing slash! $XML::UM::ENCDIR = '/home1/enno/perlModules/XML-Encoding-1.01/maps/'; # Create the encoding routine my $encode = XML::UM::get_encode ( Encoding => 'ISO-8859-2', EncodeUnmapped => &XML::UM::encode_unmapped_dec); # Convert a string from UTF-8 to the specified Encoding my $encoded_str = $encode->($utf8_str); # Remove circular references for garbage collection XML::UM::dispose_encoding ('ISO-8859-2'); DESCRIPTION
This module provides methods to convert UTF-8 strings to any XML encoding that XML::Encoding supports. It creates mapping routines from the .xml files that can be found in the maps/ directory in the XML::Encoding distribution. Note that the XML::Encoding distribution does install the .enc files in your perl directory, but not the.xml files they were created from. That's why you have to specify $ENCDIR as in the SYNOPSIS. This implementation uses the XML::Encoding class to parse the .xml file and creates a hash that maps UTF-8 characters (each consisting of up to 4 bytes) to their equivalent byte sequence in the specified encoding. Note that large mappings may consume a lot of memory! Future implementations may parse the .enc files directly, or do the conversions entirely in XS (i.e. C code.) get_encode (Encoding => STRING, EncodeUnmapped => SUB) The central entry point to this module is the XML::UM::get_encode() method. It forwards the call to the global $XML::UM::FACTORY, which is defined as an instance of XML::UM::SlowMapperFactory by default. Override this variable to plug in your own mapper factory. The XML::UM::SlowMapperFactory creates an instance of XML::UM::SlowMapper (and caches it for subsequent use) that reads in the .xml encoding file and creates a hash that maps UTF-8 characters to encoded characters. The get_encode() method of XML::UM::SlowMapper is called, finally, which generates an anonimous subroutine that uses the hash to convert multi-character UTF-8 blocks to the proper encoding. dispose_encoding ($encoding_name) Call this to free the memory used by the SlowMapper for a specific encoding. Note that in order to free the big conversion hash, the user should no longer have references to the subroutines generated by get_encode(). The parameters to the get_encode() method (defined as name/value pairs) are: o Encoding The name of the desired encoding, e.g. 'ISO-8859-2' o EncodeUnmapped (Default: &XML::UM::encode_unmapped_dec) Defines how Unicode characters not found in the mapping file (of the specified encoding) are printed. By default, they are converted to decimal entity references, like '{' Use &XML::UM::encode_unmapped_hex for hexadecimal constants, like '«' CAVEATS
I'm not exactly sure about which Unicode characters in the range (0 .. 127) should be mapped to themselves. See comments in XML/UM.pm near %DEFAULT_ASCII_MAPPINGS. The encodings that expat supports by default are currently not supported, (e.g. UTF-16, ISO-8859-1), because there are no .enc files available for these encodings. This module needs some more work. If you have the time, please help! AUTHOR
Original Author is Enno Derksen. Send bug reports, hints, tips, suggestions to T.J Mather at <tjmather@tjmather.com>. perl v5.10.1 2010-01-03 UM(3pm)

Check Out this Related Man Page

Encoding(3)						User Contributed Perl Documentation					       Encoding(3)

NAME
XML::Encoding - A perl module for parsing XML encoding maps. SYNOPSIS
use XML::Encoding; my $em_parser = new XML::Encoding(ErrorContext => 2, ExpatRequired => 1, PushPrefixFcn => &push_prefix, PopPrefixFcn => &pop_prefix, RangeSetFcn => &range_set); my $encmap_name = $em_parser->parsefile($ARGV[0]); DESCRIPTION
This module, which is built as a subclass of XML::Parser, provides a parser for encoding map files, which are XML files. The file maps/encmap.dtd in the distribution describes the structure of these files. Calling a parse method returns the name of the encoding map (obtained from the name attribute of the root element). The contents of the map are processed through the callback functions push_prefix, pop_prefix, and range_set. METHODS
This module provides no additional methods to those provided by XML::Parser, but it does take the following additional options. o ExpatRequired When this has a true value, then an error occurs unless the encmap "expat" attribute is set to "yes". Whether or not the ExpatRequired option is given, the parser enters expat mode if this attribute is set. In expat mode, the parser checks if the encoding violates expat restrictions. o PushPrefixFcn The corresponding value should be a code reference to be called when a prefix element starts. The single argument to the callback is an integer which is the byte value of the prefix. An undef value should be returned if successful. If in expat mode, a defined value causes an error and is used as the message string. o PopPrefixFcn The corresponding value should be a code reference to be called when a prefix element ends. No arguments are passed to this function. An undef value should be returned if successful. If in expat mode, a defined value causes an error and is used as the message string. o RangeSetFcn The corresponding value should be a code reference to be called when a "range" or "ch" element is seen. The 3 arguments passed to this function are: (byte, unicode_scalar, length) The byte is the starting byte of a range or the byte being mapped by a "ch" element. The unicode_scalar is the Unicode value that this byte (with the current prefix) maps to. The length of the range is the last argument. This will be 1 for the "ch" element. An undef value should be returned if successful. If in expat mode, a defined value causes an error and is used as the message string. AUTHOR
Clark Cooper <coopercc@netheaven.com> SEE ALSO
XML::Parser perl v5.8.0 1998-12-26 Encoding(3)
Man Page