Sponsored Content
Full Discussion: Unicode programing in C
Top Forums Programming Unicode programing in C Post 302372303 by broli on Tuesday 17th of November 2009 03:25:14 PM
Old 11-17-2009
Unicode programing in C

im starting to go a little serious with c, woking in a personal project that will read a xml, which might contain Unicode characters (i know it will on my system, which is set to es_AR.UTF-8)

im using mxml, and the documentation says it uses utf8 internally (no worries here).
so i need to be sure im using utf8 in my program. to be sure that i can safely interact with mxml and to be sure my program will work in all languages.

i have been reading alot, but i dont quite fully get how i can accomplish this.

im going for something simple, something easy that wont demand much of me.

my program will read user input (from cli for now, gtk later), and will save it in xml (is a config file for other app). it will also have the option to read a xml, and use it as a base for a new one

now, i have a few concrete questions.
a) do i have to use a special type of variable?
if a) is true, then i need a hole new set of functions? (for strcmp, or strstr)
b) can i work with unicode characters using char *?
if b) is true, how do i "make" them utf8?
c) is a mix of the above? then how to choose the mix ratio?

i appreciate any help, manual, link, ect that can help me understand how this works. (that includes source code)

thanks
 

9 More Discussions You Might Find Interesting

1. UNIX Desktop Questions & Answers

Graphics programing

Hi all! I`m new in Unix (Linux) and i whant to ask something! What language should i use for Linux developing.I meen applications an GAME DEVELOPING! Should i use C,TCL ??? Please help me on this ...:( (1 Reply)
Discussion started by: Sebastyan
1 Replies

2. Programming

Win programing

I am having a windows and i would like to know whitch program do you prefer for programing in windows P.S. C++ (1 Reply)
Discussion started by: D.Borak
1 Replies

3. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies

4. HP-UX

crontab programing

How can program at crontab dayly each 30 minut (2 Replies)
Discussion started by: petroleo
2 Replies

5. Shell Programming and Scripting

Awk Programing (need help)

plx help to solve these problems?? 1. Create a HERE document which will edit multiple files in the same directory, using the ed editor. I give you 3 original files: file1.c , file2.c , file3.c, download them and change each string "stdio.h" to "STDIO.H" in these files. Note: when execute the... (1 Reply)
Discussion started by: SoCalledEngr
1 Replies

6. Shell Programming and Scripting

shell programing...

Hi... i need to write a shell script wich shows the full name and station of every logged user in the system. pls help! (1 Reply)
Discussion started by: relu89
1 Replies

7. IP Networking

Netork programing

Hello experts, please help me as i want to learn the networking concepts in details , as i come know Unix network programming by Richard Stevens volume 1,2 is good please any of you downloaded the Free PDF version of it please direct m e as i want to download these books or the pdf form of it,... (1 Reply)
Discussion started by: vin_pll
1 Replies

8. IP Networking

Help with Unix socket programing

hi I am strucked in a client server program client need to login to server client logins if only username and password are correct i have written a program username is stored as file and password is smilar to username whic is stored in that file when server asks for username... (2 Replies)
Discussion started by: karthik1238
2 Replies

9. Shell Programming and Scripting

Doubt in awk programing

i wrote an awk progarm to calculate throughput from a ns2 trace file. i want this program to act on multiple trace files and it should display each output in a single output file can anyone please clear my doubt i tried with awk -f awkscript inputfile1... (7 Replies)
Discussion started by: sarathyy
7 Replies
MARC::Charset(3pm)					User Contributed Perl Documentation					MARC::Charset(3pm)

NAME
MARC::Charset - convert MARC-8 encoded strings to UTF-8 SYNOPSIS
# import the marc8_to_utf8 function use MARC::Charset 'marc8_to_utf8'; # prepare STDOUT for utf8 binmode(STDOUT, 'utf8'); # print out some marc8 as utf8 print marc8_to_utf8($marc8_string); DESCRIPTION
MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8 strings. MARC-8 is a single byte character encoding that predates unicode, and allows you to put non-Roman scripts in MARC bibliographic records. http://www.loc.gov/marc/specifications/spechome.html EXPORTS
ignore_errors() Tells MARC::Charset whether or not to ignore all encoding errors, and returns the current setting. This is helpful if you have records that contain both MARC8 and UNICODE characters. my $ignore = MARC::Charset->ignore_errors(); MARC::Charset->ignore_errors(1); # ignore errors MARC::Charset->ignore_errors(0); # DO NOT ignore errors assume_unicode() Tells MARC::Charset whether or not to assume UNICODE when an error is encountered in ignore_errors mode and returns the current setting. This is helepfuli if you have records that contain both MARC8 and UNICODE characters. my $setting = MARC::Charset->assume_unicode(); MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8) MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode assume_encoding() Tells MARC::Charset whether or not to assume a specific encoding when an error is encountered in ignore_errors mode and returns the current setting. This is helpful if you have records that contain both MARC8 and other characters. my $setting = MARC::Charset->assume_encoding(); MARC::Charset->assume_encoding('cp850'); # assume characters are cp850 MARC::Charset->assume_encoding(''); # DO NOT assume any encoding marc8_to_utf8() Converts a MARC-8 encoded string to UTF-8. my $utf8 = marc8_to_utf8($marc8); If you'd like to ignore errors pass in a true value as the 2nd parameter or call MARC::Charset->ignore_errors() with a true value: my $utf8 = marc8_to_utf8($marc8, 'ignore-errors'); or MARC::Charset->ignore_errors(1); my $utf8 = marc8_to_utf8($marc8); utf8_to_marc8() Will attempt to translate utf8 into marc8. my $marc8 = utf8_to_marc8($utf8); If you'd like to ignore errors, or characters that can't be converted to marc8 then pass in a true value as the second parameter: my $marc8 = utf8_to_marc8($utf8, 'ignore-errors'); or MARC::Charset->ignore_errors(1); my $utf8 = marc8_to_utf8($marc8); DEFAULT CHARACTER SETS
If you need to alter the default character sets you can set the $MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to the appropriate character set code: use MARC::Charset::Constants qw(:all); $MARC::Charset::DEFAULT_G0 = BASIC_ARABIC; $MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC; SEE ALSO
o MARC::Charset::Constant o MARC::Charset::Table o MARC::Charset::Code o MARC::Charset::Compiler o MARC::Record o MARC::XML AUTHOR
Ed Summers (ehs@pobox.com) perl v5.12.4 2011-08-05 MARC::Charset(3pm)
All times are GMT -4. The time now is 08:47 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy