11-17-2009
Unicode programing in C
im starting to go a little serious with c, woking in a personal project that will read a xml, which might contain Unicode characters (i know it will on my system, which is set to es_AR.UTF-8)
im using mxml, and the documentation says it uses utf8 internally (no worries here).
so i need to be sure im using utf8 in my program. to be sure that i can safely interact with mxml and to be sure my program will work in all languages.
i have been reading alot, but i dont quite fully get how i can accomplish this.
im going for something simple, something easy that wont demand much of me.
my program will read user input (from cli for now, gtk later), and will save it in xml (is a config file for other app). it will also have the option to read a xml, and use it as a base for a new one
now, i have a few concrete questions.
a) do i have to use a special type of variable?
if a) is true, then i need a hole new set of functions? (for strcmp, or strstr)
b) can i work with unicode characters using char *?
if b) is true, how do i "make" them utf8?
c) is a mix of the above? then how to choose the mix ratio?
i appreciate any help, manual, link, ect that can help me understand how this works. (that includes source code)
thanks
9 More Discussions You Might Find Interesting
1. UNIX Desktop Questions & Answers
Hi all!
I`m new in Unix (Linux) and i whant to ask something!
What language should i use for Linux developing.I meen applications an GAME DEVELOPING!
Should i use C,TCL ??? Please help me on this ...:( (1 Reply)
Discussion started by: Sebastyan
1 Replies
2. Programming
I am having a windows and i would like to know whitch program do you prefer for programing in windows
P.S. C++ (1 Reply)
Discussion started by: D.Borak
1 Replies
3. Programming
I have a stream of characters like "\u8BBE\u5907\u7BA1"
and i want to display it.
I tried following things already without any luck.
1) printf("%s",L("\u8BBE\u5907\u7BA1"));
2) printf("%lc",0x8BBE);
3) setlocale followed by fwide followed by wprintf
4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies
4. HP-UX
How can program at crontab dayly each 30 minut (2 Replies)
Discussion started by: petroleo
2 Replies
5. Shell Programming and Scripting
plx help to solve these problems??
1. Create a HERE document which will edit multiple files in the same directory, using the ed editor. I give you 3 original files: file1.c , file2.c , file3.c, download them and change each string "stdio.h" to "STDIO.H" in these files. Note: when execute the... (1 Reply)
Discussion started by: SoCalledEngr
1 Replies
6. Shell Programming and Scripting
Hi...
i need to write a shell script wich shows the full name and
station of every logged user in the system.
pls help! (1 Reply)
Discussion started by: relu89
1 Replies
7. IP Networking
Hello experts,
please help me as i want to learn the networking concepts in details ,
as i come know Unix network programming by Richard Stevens volume 1,2
is good please any of you downloaded the Free PDF version of it please direct m e as i want to download these books or the pdf form of it,... (1 Reply)
Discussion started by: vin_pll
1 Replies
8. IP Networking
hi
I am strucked in a client server program
client need to login to server
client logins if only username and password are correct
i have written a program
username is stored as file and password is smilar to username whic is stored in that file
when server asks for username... (2 Replies)
Discussion started by: karthik1238
2 Replies
9. Shell Programming and Scripting
i wrote an awk progarm to calculate throughput from a ns2 trace file. i want this program to act on multiple trace files and it should display each output in a single output file can anyone please clear my doubt i tried with awk -f awkscript inputfile1... (7 Replies)
Discussion started by: sarathyy
7 Replies
LEARN ABOUT DEBIAN
marc::charset
MARC::Charset(3pm) User Contributed Perl Documentation MARC::Charset(3pm)
NAME
MARC::Charset - convert MARC-8 encoded strings to UTF-8
SYNOPSIS
# import the marc8_to_utf8 function
use MARC::Charset 'marc8_to_utf8';
# prepare STDOUT for utf8
binmode(STDOUT, 'utf8');
# print out some marc8 as utf8
print marc8_to_utf8($marc8_string);
DESCRIPTION
MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings. MARC-8 is a single byte character encoding that predates
unicode, and allows you to put non-Roman scripts in MARC bibliographic records.
http://www.loc.gov/marc/specifications/spechome.html
EXPORTS
ignore_errors()
Tells MARC::Charset whether or not to ignore all encoding errors, and returns the current setting. This is helpful if you have records
that contain both MARC8 and UNICODE characters.
my $ignore = MARC::Charset->ignore_errors();
MARC::Charset->ignore_errors(1); # ignore errors
MARC::Charset->ignore_errors(0); # DO NOT ignore errors
assume_unicode()
Tells MARC::Charset whether or not to assume UNICODE when an error is encountered in ignore_errors mode and returns the current setting.
This is helepfuli if you have records that contain both MARC8 and UNICODE characters.
my $setting = MARC::Charset->assume_unicode();
MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8)
MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode
assume_encoding()
Tells MARC::Charset whether or not to assume a specific encoding when an error is encountered in ignore_errors mode and returns the current
setting. This is helpful if you have records that contain both MARC8 and other characters.
my $setting = MARC::Charset->assume_encoding();
MARC::Charset->assume_encoding('cp850'); # assume characters are cp850
MARC::Charset->assume_encoding(''); # DO NOT assume any encoding
marc8_to_utf8()
Converts a MARC-8 encoded string to UTF-8.
my $utf8 = marc8_to_utf8($marc8);
If you'd like to ignore errors pass in a true value as the 2nd parameter or call MARC::Charset->ignore_errors() with a true value:
my $utf8 = marc8_to_utf8($marc8, 'ignore-errors');
or
MARC::Charset->ignore_errors(1);
my $utf8 = marc8_to_utf8($marc8);
utf8_to_marc8()
Will attempt to translate utf8 into marc8.
my $marc8 = utf8_to_marc8($utf8);
If you'd like to ignore errors, or characters that can't be converted to marc8 then pass in a true value as the second parameter:
my $marc8 = utf8_to_marc8($utf8, 'ignore-errors');
or
MARC::Charset->ignore_errors(1);
my $utf8 = marc8_to_utf8($marc8);
DEFAULT CHARACTER SETS
If you need to alter the default character sets you can set the $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to the
appropriate character set code:
use MARC::Charset::Constants qw(:all);
$MARC::Charset::DEFAULT_G0 = BASIC_ARABIC;
$MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC;
SEE ALSO
o MARC::Charset::Constant
o MARC::Charset::Table
o MARC::Charset::Code
o MARC::Charset::Compiler
o MARC::Record
o MARC::XML
AUTHOR
Ed Summers (ehs@pobox.com)
perl v5.12.4 2011-08-05 MARC::Charset(3pm)