marc::charset(3pm) debian man page

MARC::Charset(3pm)					User Contributed Perl Documentation					MARC::Charset(3pm)

NAME
       MARC::Charset - convert MARC-8 encoded strings to UTF-8

SYNOPSIS
	   # import the marc8_to_utf8 function
	   use MARC::Charset 'marc8_to_utf8';

	   # prepare STDOUT for utf8
	   binmode(STDOUT, 'utf8');

	   # print out some marc8 as utf8
	   print marc8_to_utf8($marc8_string);

DESCRIPTION
       MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings. MARC-8 is a single byte character encoding that predates
       unicode, and allows you to put non-Roman scripts in MARC bibliographic records.

	   http://www.loc.gov/marc/specifications/spechome.html

EXPORTS
   ignore_errors()
       Tells MARC::Charset whether or not to ignore all encoding errors, and returns the current setting.  This is helpful if you have records
       that contain both MARC8 and UNICODE characters.

	   my $ignore = MARC::Charset->ignore_errors();

	   MARC::Charset->ignore_errors(1); # ignore errors
	   MARC::Charset->ignore_errors(0); # DO NOT ignore errors

   assume_unicode()
       Tells MARC::Charset whether or not to assume UNICODE when an error is encountered in ignore_errors mode and returns the current setting.
       This is helepfuli if you have records that contain both MARC8 and UNICODE characters.

	   my $setting = MARC::Charset->assume_unicode();

	   MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8)
	   MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode

   assume_encoding()
       Tells MARC::Charset whether or not to assume a specific encoding when an error is encountered in ignore_errors mode and returns the current
       setting.  This is helpful if you have records that contain both MARC8 and other characters.

	   my $setting = MARC::Charset->assume_encoding();

	   MARC::Charset->assume_encoding('cp850'); # assume characters are cp850
	   MARC::Charset->assume_encoding(''); # DO NOT assume any encoding

   marc8_to_utf8()
       Converts a MARC-8 encoded string to UTF-8.

	   my $utf8 = marc8_to_utf8($marc8);

       If you'd like to ignore errors pass in a true value as the 2nd parameter or call MARC::Charset->ignore_errors() with a true value:

	   my $utf8 = marc8_to_utf8($marc8, 'ignore-errors');

	 or

	   MARC::Charset->ignore_errors(1);
	   my $utf8 = marc8_to_utf8($marc8);

   utf8_to_marc8()
       Will attempt to translate utf8 into marc8.

	   my $marc8 = utf8_to_marc8($utf8);

       If you'd like to ignore errors, or characters that can't be converted to marc8 then pass in a true value as the second parameter:

	   my $marc8 = utf8_to_marc8($utf8, 'ignore-errors');

	 or

	   MARC::Charset->ignore_errors(1);
	   my $utf8 = marc8_to_utf8($marc8);

DEFAULT CHARACTER SETS
       If you need to alter the default character sets you can set the $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to the
       appropriate character set code:

	   use MARC::Charset::Constants qw(:all);
	   $MARC::Charset::DEFAULT_G0 = BASIC_ARABIC;
	   $MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC;

SEE ALSO
       o   MARC::Charset::Constant

       o   MARC::Charset::Table

       o   MARC::Charset::Code

       o   MARC::Charset::Compiler

       o   MARC::Record

       o   MARC::XML

AUTHOR
       Ed Summers (ehs@pobox.com)

perl v5.12.4							    2011-08-05							MARC::Charset(3pm)
marc::charset(3pm) debian man page | unix.com