Concerned about C and UNICODE


 
Thread Tools Search this Thread
Top Forums Programming Concerned about C and UNICODE
# 1  
Old 08-09-2006
Concerned about C and UNICODE

Dear experts,

While developping a C UNICODE application under AIX 5.3, I encountered the following problem, and after days of investigations I still could not find any solution.

Please note that the application is full wchar_t based (not utf8) and that I could compile and run it without any problem on SunOS.


I managed to isolate the problem into a simple c program:

Code:
#include <stdio.h>
#include <locale.h>
#include <wchar.h>

  int main()
  {
        wchar_t arab[4] = { 1583, 1575, 1605, 0 };
        wchar_t engl[4] = { 65, 66, 67, 0 };
        wchar_t temp[4] = { 0, 0, 0, 0 };

        printf("\n#1 copy arab into temp");
        wsprintf(temp, "%S", arab);
        printf("\narab bytes : "); for (int i=0; i<4; i++) printf("%d ", (int) arab[i]);
        printf("\ntemp bytes : "); for (int i=0; i<4; i++) printf("%d ", (int) temp[i]);
        printf("\n");

        printf("\n#2 copy engl into temp");
        wsprintf(temp, "%S", engl);
        printf("\nengl bytes : "); for (int i=0; i<4; i++) printf("%d ", (int) engl[i]);
        printf("\ntemp bytes : "); for (int i=0; i<4; i++) printf("%d ", (int) temp[i]);

        printf("\n\n");
        return 0;
  }

Here is the unexpected output:

Code:
#1 copy arab into temp
arab bytes : 1583 1575 1605 0
temp bytes : 32 0 0 0

#2 copy engl into temp
engl bytes : 65 66 67 0
temp bytes : 65 66 67 0

As you can see, the wsprintf call did NOT copy the chars coming from the arabic character set, but did it for the 65, 66, 67 (ABC)... but it copied a single white space (byte 32) instead!

It seems it's related to the installed character sets and to the locales configuration... but then, why using wchar_t strings? This data type and the c functions using it (like wsprintf) are supposed to work whatever the language.

so, my question are:
- What should I do to make this example work ?
- Are the C wchar_t function really character set independent ? (it's the case on Windows and SunOS)
- If necessary, how do I install and use additional character sets ?

Thank you very much for your input about this,

Best regards,

Thomas Gilbert
# 2  
Old 08-09-2006
I cannot find any manpage for wsprintf on my system at all, and when compiling your example the linker cannot find it. Is it possible that this is not a standard function, and therefore it varies from system to system?

My compiler goes bananas when you declare variables in a for statement like that, too. Apparently that syntax was depreciated in C99.
# 3  
Old 08-09-2006
You are right, the declaration of the i variable in the for is not standard C programming... but I use cpp :-)

Here is the code without it:

Code:
#include <stdio.h>
#include <locale.h>
#include <wchar.h>

  int main()
  {
        wchar_t arab[4] = { 1583, 1575, 1605, 0 };
        wchar_t engl[4] = { 65, 66, 67, 0 };
        wchar_t temp[4] = { 0, 0, 0, 0 };
        int i = 0;

        printf("\n#1 copy arab into temp");
        wsprintf(temp, "%S", arab);
        printf("\narab bytes : "); for (i=0; i<4; i++) printf("%d ", (int) arab[i]);
        printf("\ntemp bytes : "); for (i=0; i<4; i++) printf("%d ", (int) temp[i]);
        printf("\n");

        printf("\n#2 copy engl into temp");
        wsprintf(temp, "%S", engl);
        printf("\nengl bytes : "); for (i=0; i<4; i++) printf("%d ", (int) engl[i]);
        printf("\ntemp bytes : "); for (i=0; i<4; i++) printf("%d ", (int) temp[i]);

        printf("\n\n");
        return 0;
  }

Regarding the wsprintf, it's a rather standard wchar function, located in /usr/include/wchar.h (by default) in AIX. Under SunOS, I think it's declared in /usr/include/widec.h.

Thomas
# 4  
Old 08-09-2006
Corona - see mbstowcs, which is ANSI C99...
tgilbert - try mbstowcs, but mind what locale you are set to. These functions are sensitive to that, and I don't see where you called setlocale().

Also, the conversion of characters is subject to change - see the man page warning.
If you were on an old AIX system, wide chars/data written to files from that system may have problems on a newer system.

YMMV. I"m not an AIX expert...
# 5  
Old 08-10-2006
Ok, I eventually managed to make it work.

Actually, I had several problems:

- I had to install UTF-8 codepage sets
- I had to call setlocale(LC_ALL, "en_US.UTF-8") in my program. I did not have to under SunOS and Windows. (or call with "" and use the LANG environment variable)
- I had to get rid of my own libiconv.a library and use the one provided by AIX (mine was OK under SunOS and Windows)

Thomas
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Unicode help

is there any way to handle unicode such as ʃʰɐm̆ (1 Reply)
Discussion started by: sreejithalokkan
1 Replies

2. Shell Programming and Scripting

Help with \u0401 codes ? unicode or something

hello there's some stranges code symbols they looks like this: \u0438 \u0247. unicode i think this code can be viewed by javascript so i need it i need to convert casual characters to this code with perl atm stucked with ord, chr, pack, etc things but they giving other digits (7 Replies)
Discussion started by: tip78
7 Replies

3. Programming

Unicode filenames in C++?

I'm trying to figure out how to support Unicode or atleast an unsigned char in the d_name of struct dirent The problem i'm facing is that I'm checking file names for special characters and obviously the "char d_name" doesn't like it. I'm looping through the directory and getting the file... (3 Replies)
Discussion started by: james2432
3 Replies

4. Programming

Unicode programing in C

im starting to go a little serious with c, woking in a personal project that will read a xml, which might contain Unicode characters (i know it will on my system, which is set to es_AR.UTF-8) im using mxml, and the documentation says it uses utf8 internally (no worries here). so i need to be... (4 Replies)
Discussion started by: broli
4 Replies

5. Programming

unicode problem

on some distributions UTF-32 is the default and i need to change the size of wchar_t to 2 bytes. i tried to compile it with -fwide-exec-charset=UTF-16 but it didn't help. anyone have any ideas? thanks, Akos (3 Replies)
Discussion started by: Akimaki
3 Replies

6. UNIX for Advanced & Expert Users

Unix and Unicode

All, I'm trying to grasp how to use Unicode with/in Unix. I've made progress on some fronts, for example, when uploading files to my server I can use the intermediary language to convert the file to UTF-8. I'm having trouble getting Samba to do this (I'm using "unix charset" in smb.conf);... (4 Replies)
Discussion started by: effigy
4 Replies

7. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies

8. UNIX for Advanced & Expert Users

unicode

Hi, I have some software I need to install on HP-UX 11iv1 64bit but it must not be set up in unicode mode. I know unicode/ ASCII etc. I don't know how to get unix to switch between these. Is there an environment setting for that? I use the Korn shell. Thanks. (2 Replies)
Discussion started by: rein
2 Replies

9. Shell Programming and Scripting

converting string to unicode

How can I can convert a string in a shell script that looks something like: ]] to unicode equivalent? thanks a lot, webtekie (1 Reply)
Discussion started by: webtekie
1 Replies
Login or Register to Ask a Question