Sponsored Content
Full Discussion: Unicode programing in C
Top Forums Programming Unicode programing in C Post 302372303 by broli on Tuesday 17th of November 2009 03:25:14 PM
Old 11-17-2009
Unicode programing in C

im starting to go a little serious with c, woking in a personal project that will read a xml, which might contain Unicode characters (i know it will on my system, which is set to es_AR.UTF-8)

im using mxml, and the documentation says it uses utf8 internally (no worries here).
so i need to be sure im using utf8 in my program. to be sure that i can safely interact with mxml and to be sure my program will work in all languages.

i have been reading alot, but i dont quite fully get how i can accomplish this.

im going for something simple, something easy that wont demand much of me.

my program will read user input (from cli for now, gtk later), and will save it in xml (is a config file for other app). it will also have the option to read a xml, and use it as a base for a new one

now, i have a few concrete questions.
a) do i have to use a special type of variable?
if a) is true, then i need a hole new set of functions? (for strcmp, or strstr)
b) can i work with unicode characters using char *?
if b) is true, how do i "make" them utf8?
c) is a mix of the above? then how to choose the mix ratio?

i appreciate any help, manual, link, ect that can help me understand how this works. (that includes source code)

thanks
 

9 More Discussions You Might Find Interesting

1. UNIX Desktop Questions & Answers

Graphics programing

Hi all! I`m new in Unix (Linux) and i whant to ask something! What language should i use for Linux developing.I meen applications an GAME DEVELOPING! Should i use C,TCL ??? Please help me on this ...:( (1 Reply)
Discussion started by: Sebastyan
1 Replies

2. Programming

Win programing

I am having a windows and i would like to know whitch program do you prefer for programing in windows P.S. C++ (1 Reply)
Discussion started by: D.Borak
1 Replies

3. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies

4. HP-UX

crontab programing

How can program at crontab dayly each 30 minut (2 Replies)
Discussion started by: petroleo
2 Replies

5. Shell Programming and Scripting

Awk Programing (need help)

plx help to solve these problems?? 1. Create a HERE document which will edit multiple files in the same directory, using the ed editor. I give you 3 original files: file1.c , file2.c , file3.c, download them and change each string "stdio.h" to "STDIO.H" in these files. Note: when execute the... (1 Reply)
Discussion started by: SoCalledEngr
1 Replies

6. Shell Programming and Scripting

shell programing...

Hi... i need to write a shell script wich shows the full name and station of every logged user in the system. pls help! (1 Reply)
Discussion started by: relu89
1 Replies

7. IP Networking

Netork programing

Hello experts, please help me as i want to learn the networking concepts in details , as i come know Unix network programming by Richard Stevens volume 1,2 is good please any of you downloaded the Free PDF version of it please direct m e as i want to download these books or the pdf form of it,... (1 Reply)
Discussion started by: vin_pll
1 Replies

8. IP Networking

Help with Unix socket programing

hi I am strucked in a client server program client need to login to server client logins if only username and password are correct i have written a program username is stored as file and password is smilar to username whic is stored in that file when server asks for username... (2 Replies)
Discussion started by: karthik1238
2 Replies

9. Shell Programming and Scripting

Doubt in awk programing

i wrote an awk progarm to calculate throughput from a ns2 trace file. i want this program to act on multiple trace files and it should display each output in a single output file can anyone please clear my doubt i tried with awk -f awkscript inputfile1... (7 Replies)
Discussion started by: sarathyy
7 Replies
UTF8(5) 						      BSD File Formats Manual							   UTF8(5)

NAME
utf8 -- UTF-8, a transformation format of ISO 10646 SYNOPSIS
ENCODING "UTF-8" DESCRIPTION
The UTF-8 encoding represents UCS-4 characters as a sequence of octets, using between 1 and 6 for each character. It is backwards compatible with ASCII, so 0x00-0x7f refer to the ASCII character set. The multibyte encoding of non-ASCII characters consist entirely of bytes whose high order bit is set. The actual encoding is represented by the following table: [0x00000000 - 0x0000007f] [00000000.0bbbbbbb] -> 0bbbbbbb [0x00000080 - 0x000007ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb [0x00000800 - 0x0000ffff] [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb [0x00010000 - 0x001fffff] [00000000.000bbbbb.bbbbbbbb.bbbbbbbb] -> 11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb [0x00200000 - 0x03ffffff] [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] -> 111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb [0x04000000 - 0x7fffffff] [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] -> 1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb If more than a single representation of a value exists (for example, 0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always used. Longer ones are detected as an error as they pose a potential security risk, and destroy the 1:1 character:octet sequence mapping. SEE ALSO
euc(5) Rob Pike and Ken Thompson, "Hello World", Proceedings of the Winter 1993 USENIX Technical Conference, USENIX Association, January 1993. F. Yergeau, UTF-8, a transformation format of ISO 10646, January 1998, RFC 2279. The Unicode Standard, Version 3.0, The Unicode Consortium, 2000, as amended by the Unicode Standard Annex #27: Unicode 3.1 and by the Unicode Standard Annex #28: Unicode 3.2. STANDARDS
The utf8 encoding is compatible with RFC 2279 and Unicode 3.2. BSD
April 7, 2004 BSD
All times are GMT -4. The time now is 02:32 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy