debian man page for marc::charset

Query: marc::charset

OS: debian

Section: 3pm

Format: Original Unix Latex Style Formatted with HTML and a Horizontal Scroll Bar

MARC::Charset(3pm)					User Contributed Perl Documentation					MARC::Charset(3pm)

NAME
MARC::Charset - convert MARC-8 encoded strings to UTF-8
SYNOPSIS
# import the marc8_to_utf8 function use MARC::Charset 'marc8_to_utf8'; # prepare STDOUT for utf8 binmode(STDOUT, 'utf8'); # print out some marc8 as utf8 print marc8_to_utf8($marc8_string);
DESCRIPTION
MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings. MARC-8 is a single byte character encoding that predates unicode, and allows you to put non-Roman scripts in MARC bibliographic records. http://www.loc.gov/marc/specifications/spechome.html
EXPORTS
ignore_errors() Tells MARC::Charset whether or not to ignore all encoding errors, and returns the current setting. This is helpful if you have records that contain both MARC8 and UNICODE characters. my $ignore = MARC::Charset->ignore_errors(); MARC::Charset->ignore_errors(1); # ignore errors MARC::Charset->ignore_errors(0); # DO NOT ignore errors assume_unicode() Tells MARC::Charset whether or not to assume UNICODE when an error is encountered in ignore_errors mode and returns the current setting. This is helepfuli if you have records that contain both MARC8 and UNICODE characters. my $setting = MARC::Charset->assume_unicode(); MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8) MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode assume_encoding() Tells MARC::Charset whether or not to assume a specific encoding when an error is encountered in ignore_errors mode and returns the current setting. This is helpful if you have records that contain both MARC8 and other characters. my $setting = MARC::Charset->assume_encoding(); MARC::Charset->assume_encoding('cp850'); # assume characters are cp850 MARC::Charset->assume_encoding(''); # DO NOT assume any encoding marc8_to_utf8() Converts a MARC-8 encoded string to UTF-8. my $utf8 = marc8_to_utf8($marc8); If you'd like to ignore errors pass in a true value as the 2nd parameter or call MARC::Charset->ignore_errors() with a true value: my $utf8 = marc8_to_utf8($marc8, 'ignore-errors'); or MARC::Charset->ignore_errors(1); my $utf8 = marc8_to_utf8($marc8); utf8_to_marc8() Will attempt to translate utf8 into marc8. my $marc8 = utf8_to_marc8($utf8); If you'd like to ignore errors, or characters that can't be converted to marc8 then pass in a true value as the second parameter: my $marc8 = utf8_to_marc8($utf8, 'ignore-errors'); or MARC::Charset->ignore_errors(1); my $utf8 = marc8_to_utf8($marc8);
DEFAULT CHARACTER SETS
If you need to alter the default character sets you can set the $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to the appropriate character set code: use MARC::Charset::Constants qw(:all); $MARC::Charset::DEFAULT_G0 = BASIC_ARABIC; $MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC;
SEE ALSO
o MARC::Charset::Constant o MARC::Charset::Table o MARC::Charset::Code o MARC::Charset::Compiler o MARC::Record o MARC::XML
AUTHOR
Ed Summers (ehs@pobox.com) perl v5.12.4 2011-08-05 MARC::Charset(3pm)
Related Man Pages
yaz-iconv(1) - debian
marc::charset(3pm) - debian
marc::charset::table(3pm) - debian
marc::crosswalk::dublincore(3pm) - debian
marc::file::microlif(3pm) - debian
Similar Topics in the Unix Linux Community
Removing all characters on and before specific point
More formatted string
Encoding conversion in PERL script
ignore fields to check in grep
Real UNICODE back to string