Unix/Linux Go Back    

RedHat 9 (Linux i386) - man page for xml::um (redhat section 3)

Linux & Unix Commands - Search Man Pages
Man Page or Keyword Search:   man
Select Man Page Set:       apropos Keyword Search (sections above)

XML::UM(3)		       User Contributed Perl Documentation		       XML::UM(3)

       XML::UM - Convert UTF-8 strings to any encoding supported by XML::Encoding

	use XML::UM;

	# Set directory with .xml files that comes with XML::Encoding distribution
	# Always include the trailing slash!
	$XML::UM::ENCDIR = '/home1/enno/perlModules/XML-Encoding-1.01/maps/';

	# Create the encoding routine
	my $encode = XML::UM::get_encode (
	       Encoding => 'ISO-8859-2',
	       EncodeUnmapped => \&XML::UM::encode_unmapped_dec);

	# Convert a string from UTF-8 to the specified Encoding
	my $encoded_str = $encode->($utf8_str);

	# Remove circular references for garbage collection
	XML::UM::dispose_encoding ('ISO-8859-2');

       This module provides methods to convert UTF-8 strings to any XML encoding that XML::Encod-
       ing supports. It creates mapping routines from the .xml files that can be found in the
       maps/ directory in the XML::Encoding distribution. Note that the XML::Encoding distribu-
       tion does install the .enc files in your perl directory, but not the.xml files they were
       created from. That's why you have to specify $ENCDIR as in the SYNOPSIS.

       This implementation uses the XML::Encoding class to parse the .xml file and creates a hash
       that maps UTF-8 characters (each consisting of up to 4 bytes) to their equivalent byte
       sequence in the specified encoding.  Note that large mappings may consume a lot of memory!

       Future implementations may parse the .enc files directly, or do the conversions entirely
       in XS (i.e. C code.)

get_encode (Encoding => STRING, EncodeUnmapped => SUB)
       The central entry point to this module is the XML::UM::get_encode() method.  It forwards
       the call to the global $XML::UM::FACTORY, which is defined as an instance of
       XML::UM::SlowMapperFactory by default. Override this variable to plug in your own mapper

       The XML::UM::SlowMapperFactory creates an instance of XML::UM::SlowMapper (and caches it
       for subsequent use) that reads in the .xml encoding file and creates a hash that maps
       UTF-8 characters to encoded characters.

       The get_encode() method of XML::UM::SlowMapper is called, finally, which generates an
       anonimous subroutine that uses the hash to convert multi-character UTF-8 blocks to the
       proper encoding.

dispose_encoding ($encoding_name)
       Call this to free the memory used by the SlowMapper for a specific encoding.  Note that in
       order to free the big conversion hash, the user should no longer have references to the
       subroutines generated by get_encode().

       The parameters to the get_encode() method (defined as name/value pairs) are:

       o Encoding
	   The name of the desired encoding, e.g. 'ISO-8859-2'

       o EncodeUnmapped (Default: \&XML::UM::encode_unmapped_dec)
	   Defines how Unicode characters not found in the mapping file (of the specified encod-
	   ing) are printed.  By default, they are converted to decimal entity references, like

	   Use \&XML::UM::encode_unmapped_hex for hexadecimal constants, like '«'

       I'm not exactly sure about which Unicode characters in the range (0 .. 127) should be
       mapped to themselves. See comments in XML/UM.pm near %DEFAULT_ASCII_MAPPINGS.

       The encodings that expat supports by default are currently not supported, (e.g. UTF-16,
       ISO-8859-1), because there are no .enc files available for these encodings.  This module
       needs some more work. If you have the time, please help!

       Send bug reports, hints, tips, suggestions to Enno Derksen at <enno@att.com>.

perl v5.8.0				    2000-02-17				       XML::UM(3)
Unix & Linux Commands & Man Pages : ©2000 - 2018 Unix and Linux Forums

All times are GMT -4. The time now is 12:14 AM.