Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

xml::um(3pm) [debian man page]

UM(3pm) 						User Contributed Perl Documentation						   UM(3pm)

NAME
XML::UM - Convert UTF-8 strings to any encoding supported by XML::Encoding SYNOPSIS
use XML::UM; # Set directory with .xml files that comes with XML::Encoding distribution # Always include the trailing slash! $XML::UM::ENCDIR = '/home1/enno/perlModules/XML-Encoding-1.01/maps/'; # Create the encoding routine my $encode = XML::UM::get_encode ( Encoding => 'ISO-8859-2', EncodeUnmapped => &XML::UM::encode_unmapped_dec); # Convert a string from UTF-8 to the specified Encoding my $encoded_str = $encode->($utf8_str); # Remove circular references for garbage collection XML::UM::dispose_encoding ('ISO-8859-2'); DESCRIPTION
This module provides methods to convert UTF-8 strings to any XML encoding that XML::Encoding supports. It creates mapping routines from the .xml files that can be found in the maps/ directory in the XML::Encoding distribution. Note that the XML::Encoding distribution does install the .enc files in your perl directory, but not the.xml files they were created from. That's why you have to specify $ENCDIR as in the SYNOPSIS. This implementation uses the XML::Encoding class to parse the .xml file and creates a hash that maps UTF-8 characters (each consisting of up to 4 bytes) to their equivalent byte sequence in the specified encoding. Note that large mappings may consume a lot of memory! Future implementations may parse the .enc files directly, or do the conversions entirely in XS (i.e. C code.) get_encode (Encoding => STRING, EncodeUnmapped => SUB) The central entry point to this module is the XML::UM::get_encode() method. It forwards the call to the global $XML::UM::FACTORY, which is defined as an instance of XML::UM::SlowMapperFactory by default. Override this variable to plug in your own mapper factory. The XML::UM::SlowMapperFactory creates an instance of XML::UM::SlowMapper (and caches it for subsequent use) that reads in the .xml encoding file and creates a hash that maps UTF-8 characters to encoded characters. The get_encode() method of XML::UM::SlowMapper is called, finally, which generates an anonimous subroutine that uses the hash to convert multi-character UTF-8 blocks to the proper encoding. dispose_encoding ($encoding_name) Call this to free the memory used by the SlowMapper for a specific encoding. Note that in order to free the big conversion hash, the user should no longer have references to the subroutines generated by get_encode(). The parameters to the get_encode() method (defined as name/value pairs) are: o Encoding The name of the desired encoding, e.g. 'ISO-8859-2' o EncodeUnmapped (Default: &XML::UM::encode_unmapped_dec) Defines how Unicode characters not found in the mapping file (of the specified encoding) are printed. By default, they are converted to decimal entity references, like '{' Use &XML::UM::encode_unmapped_hex for hexadecimal constants, like '«' CAVEATS
I'm not exactly sure about which Unicode characters in the range (0 .. 127) should be mapped to themselves. See comments in XML/UM.pm near %DEFAULT_ASCII_MAPPINGS. The encodings that expat supports by default are currently not supported, (e.g. UTF-16, ISO-8859-1), because there are no .enc files available for these encodings. This module needs some more work. If you have the time, please help! AUTHOR
Original Author is Enno Derksen. Send bug reports, hints, tips, suggestions to T.J Mather at <tjmather@tjmather.com>. perl v5.10.1 2010-01-03 UM(3pm)

Check Out this Related Man Page

XML::LibXML::Common(3pm)				User Contributed Perl Documentation				  XML::LibXML::Common(3pm)

NAME
XML::LibXML::Common - Constants and Character Encoding Routines SYNOPSIS
use XML::LibXML::Common; $encodedstring = encodeToUTF8( $name_of_encoding, $sting_to_encode ); $decodedstring = decodeFromUTF8($name_of_encoding, $string_to_decode ); DESCRIPTION
XML::LibXML::Common defines constants for all node types and provides interface to libxml2 charset conversion functions. Since XML::LibXML use their own node type definitions, one may want to use XML::LibXML::Common in its compatibility mode: Exporter TAGS use XML::LibXML::Common qw(:libxml); ":libxml" tag will use the XML::LibXML Compatibility mode, which defines the old 'XML_' node-type definitions. use XML::LibXML::Common qw(:gdome); ":gdome" tag will use the XML::GDOME Compatibility mode, which defines the old 'GDOME_' node-type definitions. use XML::LibXML::Common qw(:w3c); This uses the nodetype definition names as specified for DOM. use XML::LibXML::Common qw(:encoding); This tag can be used to export only the charset encoding functions of XML::LibXML::Common. Exports By default the W3 definitions as defined in the DOM specifications and the encoding functions are exported by XML::LibXML::Common. Encoding functions To encode or decode a string to or from UTF-8, XML::LibXML::Common exports two functions, which provide an interface to the encoding support in "libxml2". Which encodings are supported by these functions depends on how "libxml2" was compiled. UTF-16 is always supported and on most installations, ISO encodings are supported as well. This interface was useful for older versions of Perl. Since Perl >= 5.8 provides similar functions via the "Encode" module, it is probably a good idea to use those instead. encodeToUTF8 $encodedstring = encodeToUTF8( $name_of_encoding, $sting_to_encode ); The function will convert a byte string from the specified encoding to an UTF-8 encoded character string. decodeToUTF8 $decodedstring = decodeFromUTF8($name_of_encoding, $string_to_decode ); This function converts an UTF-8 encoded character string to a specified encoding. Note that the conversion can raise an error if the given string contains characters that cannot be represented in the target encoding. Both these functions report their errors on the standard error. If an error occurs the function will croak(). To catch the error information it is required to call the encoding function from within an eval block in order to prevent the entire script from being stopped on encoding error. A note on history Before XML::LibXML 1.70, this class was available as a separate CPAN distribution, intended to provide functionality shared between XML::LibXML, XML::GDOME, and possibly other modules. Since there seems to be no progress in this direction, we decided to merge XML::LibXML::Common 0.13 and XML::LibXML 1.70 to one CPAN distribution. The merge also naturally eliminates a practical and urgent problem experienced by many XML::LibXML users on certain platforms, namely mysterious misbehavior of XML::LibXML occurring if the installed (often pre-packaged) version of XML::LibXML::Common was compiled against an older version of libxml2 than XML::LibXML. AUTHORS
Matt Sergeant, Christian Glahn, Petr Pajas VERSION
2.0001 COPYRIGHT
2001-2007, AxKit.com Ltd. 2002-2006, Christian Glahn. 2006-2009, Petr Pajas. perl v5.14.2 2012-06-20 XML::LibXML::Common(3pm)
Man Page