MARC::Charset - A module for doing MARC-8/UTF8 translation
use MARC::Charset;
## create a MARC::Charset object my $charset = MARC::Charset->new();
## a string containing the Ansel value for a copyright symbol my $ansel = chr(0xC3) . ' copyright 1969'.
## the same string, but now encoded in UTF8! my $utf8 = $charset->to_utf8($extLatin);
MARC::Charset is a package that allows you to easily convert between the MARC-8 character encodings and Unicode (UTF-8). The Library of Congress maintains some essential mapping tables and information about the MARC-8 and Unicode environments at:
http://www.loc.gov/marc/specifications/spechome.html
MARC::Charset is essentially a Perl implementation of the specifications found at LC, and supports the following character sets:
new()
The constructor which will return MARC::Charset object. If you like you can pass in the default G0 and G1 charsets (using the g0 and g1 parameters, but if you don't ASCII/Ansel will be assumed.
## for standard characters sets: ASCII and Ansel my $cs = MARC::Charset->new();
## or if you want to specify Arabic Basic + Extended as the G0/G1 character ## sets. my $cs = MARC::Charset->new( g0 => MARC::Charset::ArabicBasic->new(), g1 => MARC::Charset::ArabicExtended->new() );
If you would like diagnostics turned on pass in the DIAGNOSTICS parameter and set it to a value that will evaluate to true (eg. 1).
my $cs = MARC::Charset->new( diagnostics => 1 );
to_utf8()
Pass to_utf8()
a string of MARC8 encoded characters and get back a string
of UTF8 characters. to_utf8()
will handle escape sequences within the string
that change the working character sets to Greek, Hebrew, Arabic (Basic +
Extended), Cyrillic (Basic + Extended), and East Asian.
to_marc8()
When you pass this method a UTF8 string you will be returned a MARC8 encoded
string. to_marc8()
handles creating the appropriate character escapes.
g0()
Returns an object representing the character set that is being used as the first graphic character set (G0). If you pass in a MARC::Charset::* object you will set the G0 character set, and as a side effect you'll get the previous G0 value returned to you. You probably don't ever need to call this since character set changes are handled when you call to_utf8(), but it's here if you want it.
## set the G0 character set to Greek my $charset = MARC::Charset->new(); $charset->g0( MARC::Charset::Greek->new() );
g1()
Same as g0()
above, but operates on the second graphic set that is available.
to_marc8()