How to make gl_get_line read unicode characters


 
Thread Tools Search this Thread
Top Forums Programming How to make gl_get_line read unicode characters
# 1  
Old 12-24-2010
How to make gl_get_line read unicode characters

Hi,

My program uses gl_get_line from libtecla to get user input from terminal. It works fine as long as I enter English at the terminal prompt. However, if I enter other languages, such as Chinese characters, either by typing in or cut-and-paste, the input characters get cleared from terminal right away. I set the locale LC_CTYPE to "en_US.UTF-8". Below is the code:
Code:
    printf("LC_CTYPE=%s\n", setlocale(LC_CTYPE, "en_US.UTF-8"));
    printf("LANG=%s\n", getenv("LANG"));
    gl = new_GetLine(1024, 2048);
    while((line=gl_get_line(gl, "input> ", NULL, -1)) != NULL &&
            strcmp(line, "exit\n") != 0)
    {
        printf("You typed: %s!\n", line);
    }
    gl = del_GetLine(gl);

The output showed that the locale (both LC_CTYPE and LANG) was set to en_US.UTF-8:
Code:
LC_CTYPE=en_US.UTF-8
LANG=en_US.UTF-8

I need to read user input in any languages, i.e., any utf-8 characters. Can someone please help me? Big thanks.

Last edited by Scott; 12-24-2010 at 03:37 AM.. Reason: Please use code tags
# 2  
Old 12-25-2010
I don't think it's readline that's the problem, as much as your terminal's setting. adjusting the LC_CTYPE variable won't adjust what charset your terminal is configured for, whatever your terminal is.
# 3  
Old 12-25-2010
Before I start my program, I can enter Chinese characters in my shell (bash). Both input and display work fine. Once I start my program from the shell, my terminal can still display characters in other languages, such as Chinese or German. If I enter command in English, my program call read it (by calling gl_get_line()), interpret it, generate content in other languages and display it to terminal just fine. The problem only had to do with my program taking input from terminal, such as:

1. If I copy some Chinese characters from a browser and paste them to the terminal where my program is waiting for input (in gl_get_line()), those characters do not appear in the terminal at all.

2. If I type in Chinese characters at my program's terminal, using a standard keyboard but with special input method from a Microsoft program that can convert sequence of English letters to Chinese characters, the Chinese characters do appear in the terminal. But as soon as I press space key which indicates the end of Chinese character input sequence and subsequently (I think) pass the control back to gl_get_line(), the characters I have just entered disappear from the terminal.

It feels like for some reason gl_get_line() erases any character that goes beyond standard ASCII range (0-127). I have read the doc for gl_get_line() and it has some info about international character support but not enough details that will help me resolve my problem. Can anyone provide some help?
# 4  
Old 12-27-2010
Does your program include
Code:
setlocale(LC_CTYPE, "");

and is your locale correctly set?

See tecla(5) – User interface provided by the tecla library for how to enter what Sun calls international characters.

Last edited by fpmurphy; 12-27-2010 at 12:44 PM..
# 5  
Old 12-28-2010
I explicitly set LC_CTYPE to en_US.UTF-8 to avoid any confusion. But at one point I did try with setlocale(LC_CTYPE, "") and result was same because the default locale in my env was set to en_US.UTF-8. I even tried to run the original demo.c that came with the libtecla download package and same problem - any foreign characters I entered or pasted at the terminal were erased.
# 6  
Old 12-30-2010
I have confirmed with Martin Shepherd the author of libtecla that it does not support Unicode input. It only supports extended ASCII to 8-bits, such as ISO-8859-1.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Display unicode characters in zos shell

Hi all, I have a shell script that has several strings with \uxxxx characters distributed within. I would like to display these characters when I execute the script and echo the strings. I am running on zos in an sh environment. Some strings look like this: "Chcete-li pou\u017e\u00edt" <---... (1 Reply)
Discussion started by: adam.wis
1 Replies

2. Shell Programming and Scripting

Perl script backspace not working for Unicode characters

Hello, My Perl script reads input from stdin and prints it out to stdout. After I read input I use BACKSPACE to erase characters. However BACKSPACE does not work with Unicode characters that are multi-bytes. On screen the character is erased but underneath only one byte is deleted instead of all... (3 Replies)
Discussion started by: tdw
3 Replies

3. Shell Programming and Scripting

Read Embedded Newline characters with read (builtin) in KSH93

Hi Guys, Happy New Year to you all! I have a requirement to read an embedded new-line using KSH's read builtin. Here is what I am trying to do: run_sql "select guestid, address, email from guest" | while read id addr email do ## Biz logic goes here done I can take care of any... (6 Replies)
Discussion started by: a_programmer
6 Replies

4. Shell Programming and Scripting

Trim leading zeros to make field 6 characters long

Hi all- I've got a file that will have multiple columns. In one column there will be a string that is 10 digits in length, but I need to trim the first four zeros to make it 6 characters? example: 0000001234 0000123456 0000234566 0000000321 output: 001234 123456 234566 000321 (5 Replies)
Discussion started by: Cailet
5 Replies

5. UNIX for Dummies Questions & Answers

remove special and unicode characters

Hi, How do I remove the lines where special characters or Unicode characters appear? The following query does work but I wonder if there is a better way. cat test.txt | egrep -v '\)|#|,|&|-|\(|\\|\/|\.' The following lines show that my query is incomplete. Warning: The word "*Khan" is... (1 Reply)
Discussion started by: shantanuo
1 Replies

6. Shell Programming and Scripting

Help replacing or scrubbing unicode characters

I have a csv (tab delimited) file that is created by an application (that I didn't write). Every so often it throw out a <U+FEFF> (Zero Width no break space) character at the begining of a tabbed field. The charcater is invisible to some editors, but it shows up bolded in less. The issue is... (3 Replies)
Discussion started by: roninuta
3 Replies

7. UNIX for Advanced & Expert Users

how to make root user read all

I want to know what i can set up so that root user can read everything. On my Solaris systems root can read the following directory. drwxrwx--- 408 icsrc icarc0 36864 Aug 21 07:24 dev drwxrwsr-x 7 icsrc icarc0 4096 Aug 4 1998 test But on my linux systems it gets: # cd dev bash: cd:... (4 Replies)
Discussion started by: frankkahle
4 Replies

8. AIX

problem with Unicode characters insertion

hi, I have a problem with unicode chars ( chinese, japanese etc ) insertion using sqlplus prompt. When i wrote a proc program for it i am able to create records. But when i fore the same query on sql prompt it stores reverse ????? ..some junk. widechar columns are mapped with NVARCHAR datatype.... (0 Replies)
Discussion started by: suman_jakkula
0 Replies

9. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies

10. Programming

How to make static unicode string?

In Windows, wchar_t *pStr = L"Hello"; works, but I can't find the equivalent to Unix system. How can I make static stack-memory-based wide character string in C in Unix? (1 Reply)
Discussion started by: sledge76
1 Replies
Login or Register to Ask a Question