11-17-2009
Unicode programing in C
im starting to go a little serious with c, woking in a personal project that will read a xml, which might contain Unicode characters (i know it will on my system, which is set to es_AR.UTF-8)
im using mxml, and the documentation says it uses utf8 internally (no worries here).
so i need to be sure im using utf8 in my program. to be sure that i can safely interact with mxml and to be sure my program will work in all languages.
i have been reading alot, but i dont quite fully get how i can accomplish this.
im going for something simple, something easy that wont demand much of me.
my program will read user input (from cli for now, gtk later), and will save it in xml (is a config file for other app). it will also have the option to read a xml, and use it as a base for a new one
now, i have a few concrete questions.
a) do i have to use a special type of variable?
if a) is true, then i need a hole new set of functions? (for strcmp, or strstr)
b) can i work with unicode characters using char *?
if b) is true, how do i "make" them utf8?
c) is a mix of the above? then how to choose the mix ratio?
i appreciate any help, manual, link, ect that can help me understand how this works. (that includes source code)
thanks
9 More Discussions You Might Find Interesting
1. UNIX Desktop Questions & Answers
Hi all!
I`m new in Unix (Linux) and i whant to ask something!
What language should i use for Linux developing.I meen applications an GAME DEVELOPING!
Should i use C,TCL ??? Please help me on this ...:( (1 Reply)
Discussion started by: Sebastyan
1 Replies
2. Programming
I am having a windows and i would like to know whitch program do you prefer for programing in windows
P.S. C++ (1 Reply)
Discussion started by: D.Borak
1 Replies
3. Programming
I have a stream of characters like "\u8BBE\u5907\u7BA1"
and i want to display it.
I tried following things already without any luck.
1) printf("%s",L("\u8BBE\u5907\u7BA1"));
2) printf("%lc",0x8BBE);
3) setlocale followed by fwide followed by wprintf
4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies
4. HP-UX
How can program at crontab dayly each 30 minut (2 Replies)
Discussion started by: petroleo
2 Replies
5. Shell Programming and Scripting
plx help to solve these problems??
1. Create a HERE document which will edit multiple files in the same directory, using the ed editor. I give you 3 original files: file1.c , file2.c , file3.c, download them and change each string "stdio.h" to "STDIO.H" in these files. Note: when execute the... (1 Reply)
Discussion started by: SoCalledEngr
1 Replies
6. Shell Programming and Scripting
Hi...
i need to write a shell script wich shows the full name and
station of every logged user in the system.
pls help! (1 Reply)
Discussion started by: relu89
1 Replies
7. IP Networking
Hello experts,
please help me as i want to learn the networking concepts in details ,
as i come know Unix network programming by Richard Stevens volume 1,2
is good please any of you downloaded the Free PDF version of it please direct m e as i want to download these books or the pdf form of it,... (1 Reply)
Discussion started by: vin_pll
1 Replies
8. IP Networking
hi
I am strucked in a client server program
client need to login to server
client logins if only username and password are correct
i have written a program
username is stored as file and password is smilar to username whic is stored in that file
when server asks for username... (2 Replies)
Discussion started by: karthik1238
2 Replies
9. Shell Programming and Scripting
i wrote an awk progarm to calculate throughput from a ns2 trace file. i want this program to act on multiple trace files and it should display each output in a single output file can anyone please clear my doubt i tried with awk -f awkscript inputfile1... (7 Replies)
Discussion started by: sarathyy
7 Replies
UTF(6) Games Manual UTF(6)
NAME
UTF, Unicode, ASCII, rune - character set and format
DESCRIPTION
The Plan 9 character set and representation are based on the Unicode Standard and on the ISO multibyte UTF-8 encoding (Universal Character
Set Transformation Format, 8 bits wide). The Unicode Standard represents its characters in 16 bits; UTF-8 represents such values in an
8-bit byte stream. Throughout this manual, UTF-8 is shortened to UTF.
In Plan 9, a rune is a 16-bit quantity representing a Unicode character. Internally, programs may store characters as runes. However, any
external manifestation of textual information, in files or at the interface between programs, uses a machine-independent, byte-stream
encoding called UTF.
UTF is designed so the 7-bit ASCII set (values hexadecimal 00 to 7F), appear only as themselves in the encoding. Runes with values above
7F appear as sequences of two or more bytes with values only from 80 to FF.
The UTF encoding of the Unicode Standard is backward compatible with ASCII: programs presented only with ASCII work on Plan 9 even if not
written to deal with UTF, as do programs that deal with uninterpreted byte streams. However, programs that perform semantic processing on
ASCII graphic characters must convert from UTF to runes in order to work properly with non-ASCII input. See rune(2).
Letting numbers be binary, a rune x is converted to a multibyte UTF sequence as follows:
01. x in [00000000.0bbbbbbb] -> 0bbbbbbb
10. x in [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
11. x in [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb
Conversion 01 provides a one-byte sequence that spans the ASCII character set in a compatible way. Conversions 10 and 11 represent higher-
valued characters as sequences of two or three bytes with the high bit set. Plan 9 does not support the 4, 5, and 6 byte sequences pro-
posed by X-Open. When there are multiple ways to encode a value, for example rune 0, the shortest encoding is used.
In the inverse mapping, any sequence except those described above is incorrect and is converted to rune hexadecimal 0080.
FILES
/lib/unicode
table of characters and descriptions, suitable for look(1).
SEE ALSO
ascii(1), tcs(1), rune(2), keyboard(6), The Unicode Standard.
UTF(6)