Quote:
Originally Posted by
sumirmehta
My terminal is a normal console. I've tried the same with xterm and emacs shell. So does it mean that it is not possible to see these (any UTF-8 2 bye char) characters on a shell terminal.
You may not be able to see these on a normal console, but you
should be able to see multibyte UTF-8 characters rendered in an X-Windows-based terminal provided:
1. You have set correct locale at the shell
2. You are using a terminal emulator (e.g. uxterm, konsole) that handles Unicode processing properly.
3. You have configured the terminal emulator for the correct encoding.
4. You have the needed X fonts installed, and selected for rendering that specific Unicode character.
On Unix, because an X terminal emulator must have been forked off some process (be it a shell running in another terminal, or a desktop environment such as Gnome or KDE or IceWM), the locale of the parent process may affect the rendering, so eventually that may sometimes propagate upwards until you hit the system locale - and that is especially nasty.
I tried printing the U+0174 character you mentioned with 3 terminals: xterm, konsole and gnome-terminal. gnome-terminal and uxterm displayed it correctly on the machine I am currently using. You can look at the screenshot. For Chinese UTF-8 3-byte characters, they are rendered properly on all of the terminals I tried.
Normally, a font only contains glyphs for a subset of the supported character set. Given the wide range of characters embraced by Unicode, it is not unusual that fonts not designed to render a specific range of characters may fail to render those characters properly. With many X-based terminal emulators, setting the font is easy, but getting to know which font to use is likely a trickier issue.
In the worst case, if you have no idea whether a font contains the glyph for the specific characters you need, you may need to use something like FontForge, as I was suggested by some experts in the field while I was playing with LaTeX. But FontForge is not trivial. You probably can find other programs that gives you easier interface without resorting to FontForge.
I'm not exactly sure about the tests you mentioned. But, my experience is that if you have doubts over the generated bytestream, pass it over to hexdump (or od, as you prefer) and check individual bytes. The golden rule still applies - check the bytes first, and if the bytes are correct but rendering isn't, check the environment (terminal, shell, fonts, locale).
I must admit that getting Unicode to be processed and rendered correctly the first time is tricky, but once it is done, you may find that it becomes more trivial you do it the second time.