I need to use sort, uniq, grep, wc,... and the like to work with lists of words in UTF-8 (the "words" being phonetic transcriptions using the IPA). I have been using Google a lot and I even found at least one previous post on this topic, but it didn't help.
I tried following the instructions on: UTF-8 and Unicode FAQ
* I set the locale in my xterm with
(which is installed as per locale -a)
* Then I started a new xterm from within the old one with
which I found using
* Then I tested using some of the example files found on UTF-8 and Unicode FAQ
Unfortunately, the unicode characters are displayed as boxes when viewing the file with less (after typing a "y" in answer to the message warning me that "UTF-8-demo.txt may be a binary file...")
I also tried setting LESSCHARSET=utf-8, but it didn't help either.
Can anyone help?
I am using the latest version of X11.app on Mac OS X (XQuartz 2.6.3). less is version 394, xterm version 269.
Unfortunately, the unicode characters are displayed as boxes when viewing the file with less (after typing a "y" in answer to the message warning me that "UTF-8-demo.txt may be a binary file...")
I'm suspicious of any tutorial that asks you to use a specific font to get unicode... Those instructions probably only work for one revision of one distro.
xterm has an options menu when running, little known but definitely there, in which you may be able to change fonts and charsets etc.
Unfortunately I don't have access to an xterm right now to tell you where it is but it may be something like right-clicking the title bar.
I also tried using the "underspecified" version:
The result is the same :-/
I know of two menus I can call up with the mouse. One of them is titled "main options", and the other "VT Fonts". I use the 2nd one every now and then to change font size, e.g. when using a beamer, but it doesn't offer options for changing the font.
I have however, achieved a partial solution using
in the new xterm, but there are still a lot of boxes...
Hi,
I have tried to convert a UTF-8 file to windows UTF-16 format file as below from unix machine
unix2dos < testing.txt | iconv -f UTF-8 -t UTF-16 > out.txt
and i am getting some chinese characters as below which l opened the converted file on windows machine.
LANG=en_US.UTF-8... (3 Replies)
I Am trying to change the file encoding from ASCII to UTF-8 using below command
iconv -f ASCII -t UTF-8 <input_file> > <output_file>
But the output_file is not actually in UTF-8 format. If I use the file command to check the file encoding it still says ASCII.
While converting am not... (5 Replies)
Hello everyone!
I have a problem with printing ru_RU.UTF-8 from AIX using lp command.
#locale -a
C
POSIX
RU_RU.UTF-8
RU_RU
en_US.8859-15
en_US.ISO8859-1
en_US
ru_RU.ISO8859-5
ru_RU
#locale
LANG=en_US.UTF-8
LC_COLLATE=RU_RU.UTF-8
LC_CTYPE=RU_RU.UTF-8
LC_MONETARY="en_US" (3 Replies)
My OS (Debian) and gcc use the UTF-8 locale. This code says that the char size is 1 byte but the size of 'a' is really 4 bytes.
int main(void)
{
setlocale(LC_ALL, "en_US.UTF-8");
printf("Char size: %i\nSize of char 'a': %i\nSize of Euro sign '€': %i\nLength of Euro sign: %i\n",... (8 Replies)
We just installed icu for UTF-8 compliance on our AIX 5.3 system. While usuing vi on some files we get the following error:
ex: 0602-169 Incomplete or invalid multibyte character encountere
yte character encountered, conversion failed.ex: 0602-169 Incomplete or invalidb
ractersultibyte... (0 Replies)
hmmm... I was not sure where to post this! I want emit non-ascii chinese and ciryllic text. I'm running windows server 2003 with cygwin xfree86.
I know I have one font that can render chinese and russian: "Arial Unicode MS".
How can I configure my cygwin xterm so I can emit russian and... (1 Reply)
Hi,
I try to get tr to replace multibytes characters by ascii equivalent. For example
"Je vais ŕ l'école" ---> 'Je vais a l'ecole"
But my version of tr (5.97) doesn't seem to support multibyte sets.
$ locale charmap; echo "Je vais ŕ l'école" | tr éŕ ea
UTF-8
Je vais aa l'aacole
I try to... (2 Replies)
Collegues
I tried to manipulate a UTF 8 data using the following script.
cat $1 | sed 's/ലായി$/ലായി LAYI/g' | sed 's/ുടെ/ുടെ UTE/g' | sed 's/യില്*/യില്* YIL/g'
But it says that cnot exicute binary file. Any solution.
Jaganadh.
Linguist (1 Reply)