Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

marc::charset(3pm) [debian man page]

MARC::Charset(3pm)					User Contributed Perl Documentation					MARC::Charset(3pm)

NAME
MARC::Charset - convert MARC-8 encoded strings to UTF-8 SYNOPSIS
# import the marc8_to_utf8 function use MARC::Charset 'marc8_to_utf8'; # prepare STDOUT for utf8 binmode(STDOUT, 'utf8'); # print out some marc8 as utf8 print marc8_to_utf8($marc8_string); DESCRIPTION
MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings. MARC-8 is a single byte character encoding that predates unicode, and allows you to put non-Roman scripts in MARC bibliographic records. http://www.loc.gov/marc/specifications/spechome.html EXPORTS
ignore_errors() Tells MARC::Charset whether or not to ignore all encoding errors, and returns the current setting. This is helpful if you have records that contain both MARC8 and UNICODE characters. my $ignore = MARC::Charset->ignore_errors(); MARC::Charset->ignore_errors(1); # ignore errors MARC::Charset->ignore_errors(0); # DO NOT ignore errors assume_unicode() Tells MARC::Charset whether or not to assume UNICODE when an error is encountered in ignore_errors mode and returns the current setting. This is helepfuli if you have records that contain both MARC8 and UNICODE characters. my $setting = MARC::Charset->assume_unicode(); MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8) MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode assume_encoding() Tells MARC::Charset whether or not to assume a specific encoding when an error is encountered in ignore_errors mode and returns the current setting. This is helpful if you have records that contain both MARC8 and other characters. my $setting = MARC::Charset->assume_encoding(); MARC::Charset->assume_encoding('cp850'); # assume characters are cp850 MARC::Charset->assume_encoding(''); # DO NOT assume any encoding marc8_to_utf8() Converts a MARC-8 encoded string to UTF-8. my $utf8 = marc8_to_utf8($marc8); If you'd like to ignore errors pass in a true value as the 2nd parameter or call MARC::Charset->ignore_errors() with a true value: my $utf8 = marc8_to_utf8($marc8, 'ignore-errors'); or MARC::Charset->ignore_errors(1); my $utf8 = marc8_to_utf8($marc8); utf8_to_marc8() Will attempt to translate utf8 into marc8. my $marc8 = utf8_to_marc8($utf8); If you'd like to ignore errors, or characters that can't be converted to marc8 then pass in a true value as the second parameter: my $marc8 = utf8_to_marc8($utf8, 'ignore-errors'); or MARC::Charset->ignore_errors(1); my $utf8 = marc8_to_utf8($marc8); DEFAULT CHARACTER SETS
If you need to alter the default character sets you can set the $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to the appropriate character set code: use MARC::Charset::Constants qw(:all); $MARC::Charset::DEFAULT_G0 = BASIC_ARABIC; $MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC; SEE ALSO
o MARC::Charset::Constant o MARC::Charset::Table o MARC::Charset::Code o MARC::Charset::Compiler o MARC::Record o MARC::XML AUTHOR
Ed Summers (ehs@pobox.com) perl v5.12.4 2011-08-05 MARC::Charset(3pm)

Check Out this Related Man Page

MARC::Charset::Table(3pm)				User Contributed Perl Documentation				 MARC::Charset::Table(3pm)

NAME
MARC::Charset::Table - character mapping db SYNOPSIS
use MARC::Charset::Table; use MARC::Charset::Constants qw(:all); # create the table object my $table = MARC::Charset::Table->new(); # get a code using the marc8 character set code and the character my $code = $table->lookup_by_marc8(CYRILLIC_BASIC, 'K'); # get a code using the utf8 value $code = $table->lookup_by_utf8(chr(0x043A)); DESCRIPTION
MARC::Charset::Table is a wrapper around the character mapping database, which is implemented as a tied hash on disk. This database gets generated by Makefile.PL on installation of MARC::Charset using MARC::Charset::Compiler. The database is essentially a key/value mapping where a key is a MARC-8 character set code + a MARC-8 character, or an integer representing the UCS code point. These keys map to a serialized MARC::Charset::Code object. new() The consturctor. add_code() Add a MARC::Charset::Code to the table. get_code() Retrieve a code using a hash key. lookup_by_marc8() Looks up MARC::Charset::Code entry using a character set code and a MARC-8 value. use MARC::Charset::Constants qw(HEBREW); $code = $table->lookup_by_marc8(HEBREW, chr(0x60)); lookup_by_utf8() Looks up a MARC::Charset::Code object using a utf8 value. db() Returns a reference to a tied character database. MARC::Charset::Table wraps access to the db, but you can get at it if you want. db_path() Returns the path to the character encoding database. Can be called statically too: print MARC::Charset::Table->db_path(); brand_new() An alternate constructor which removes the existing database and starts afresh. Be careful with this one, it's really only used on MARC::Charset installation. perl v5.12.4 2010-09-09 MARC::Charset::Table(3pm)
Man Page