10-30-2008
Hi cbkihong,
Thanks again for the reply.
My terminal is a normal console. I've tried the same with xterm and emacs shell. So does it mean that it is not possible to see these (any UTF-8 2 bye char) characters on a shell terminal.
I tried the reverse of the cat check you've suggested, i copied the file that i've onto windows, and opened it with notepad/editplus. I am able to see the UTF-8 characters correctly in there.
I also tried a few combinations on unix shell with the file that i have.
case 1 - i simply print the variable, and the output of decode function, i see the foll output --
ŴŶ (Original string)
Wide character in print at ./test1.pl line 11, <FILE> line 19.
ŴŶ (utf-8 decoded string)
if i then enable the binmode, the warning goes away but the character display changes
Ã\205´Ã\205¶ (Original string)
ŴŶ (utf-8 decoded string)
In the second case, i am unable to understand why the original strings shows different characters now (as opposed to case 1 when i simply print the string), and the decoded string displays characters same as the orginal string (it should have showed some diff character considering 2 byte per character and trying to encode to relevant utf-8 )
10 More Discussions You Might Find Interesting
1. Programming
While working with russian text under FreeBSD&MySQL I need to convert a string from MySQL to the Unicode format.
I've just started my way in C++ under FreeBSD , so please explain me how can I get ascii code of Char variable and also how can i get a character into variable with the specified ascii... (3 Replies)
Discussion started by: macron
3 Replies
2. Red Hat
Hello,
I am trying to convert a 7bit ASCII file to UTF-8.
I have used iconv before though it can't recognize it for some reason and says unknown file encoding.
When I used ascii2uni package with different package, ./ascii2uni -a K -a I -a J -a X test_file > new_test_file
It still... (2 Replies)
Discussion started by: rockf1bull
2 Replies
3. Shell Programming and Scripting
I have a shell script running to load some data from a text file to database. Text file contains some non-ASCII characters like ü. How can i convert these characters to UTF-8 codes before loading to DB. (5 Replies)
Discussion started by: vel4ever
5 Replies
4. Shell Programming and Scripting
can someone help me in converting hex streams to decimal values using perl script
Hex value:
$my_hex_stream="0c07ac14001676";
Every hex value in the above stream should be converted in to decimal and separated by comma.
The output should be: 12,07,172,20,00,22,118 (2 Replies)
Discussion started by: Arun_Linux
2 Replies
5. UNIX for Dummies Questions & Answers
Sometimes we recieve some excel files containing French/Japanese characters over the mail, and these files are manually transferred to the server by using SFTP (security is not a huge concern here). The data is changed to text format before transferring it using Notepad.
Problem is: When saving... (4 Replies)
Discussion started by: jawsnnn
4 Replies
6. Linux
Hi,
I have tried to convert a UTF-8 file to windows UTF-16 format file as below from unix machine
unix2dos < testing.txt | iconv -f UTF-8 -t UTF-16 > out.txt
and i am getting some chinese characters as below which l opened the converted file on windows machine.
LANG=en_US.UTF-8... (3 Replies)
Discussion started by: phanidhar6039
3 Replies
7. Shell Programming and Scripting
Hello all
i have utf-8 file that i try to convert to WINDOWS-1251 on linux
without any success
the file name is utf-8 when i try to do :
file -bi test.txt
it gives me :
text/plain; charset=utf-8
when i try to convert the file i do :
/usr/bin/iconv -f UTF-8 -t WINDOWS-1251 test.txt >... (1 Reply)
Discussion started by: umen
1 Replies
8. UNIX for Advanced & Expert Users
Hi All,
I am trying to obtain count of characters using awk, but "length" function returns a value of 1 for 2-byte or 3-byte characters as well unlike wc -c command.
I have tried to use the below commands within awk function, but it does not seem to work
{
cmd="wc -c "stringtocheck
( cmd )... (6 Replies)
Discussion started by: tostay2003
6 Replies
9. Shell Programming and Scripting
I am trying to develop a script which will work on a source UTF-8 file and perform one or more of the following
It will accept the target encoding as an argument e.g. US-ASCII or ISO-8859-1, etc
1. It should replace all occurrences of characters outside target character set by " " (space) or... (3 Replies)
Discussion started by: hemkiran.s
3 Replies
10. UNIX for Beginners Questions & Answers
Dears,
I have a shell script - working perfectly on Oracle Linux - that detects the encoding (the charset to be exact) of the files in a specified directory using the "file" command (The file command outputs the charset in Linux, but doesn't do that in AIX), then if the file isn't a UTF-8 text... (4 Replies)
Discussion started by: JeanM-1
4 Replies
LEARN ABOUT OPENDARWIN
tcl_utftolower
Tcl_UtfToUpper(3) Tcl Library Procedures Tcl_UtfToUpper(3)
__________________________________________________________________________________________________________________________________________________
NAME
Tcl_UniCharToUpper, Tcl_UniCharToLower, Tcl_UniCharToTitle, Tcl_UtfToUpper, Tcl_UtfToLower, Tcl_UtfToTitle - routines for manipulating the
case of Unicode characters and UTF-8 strings.
SYNOPSIS
#include <tcl.h>
Tcl_UniChar
Tcl_UniCharToUpper(ch)
Tcl_UniChar
Tcl_UniCharToLower(ch)
Tcl_UniChar
Tcl_UniCharToTitle(ch)
int
Tcl_UtfToUpper(str)
int
Tcl_UtfToLower(str)
int
Tcl_UtfToTitle(str)
ARGUMENTS
int ch (in) The Tcl_UniChar to be converted.
char *str (in/out) Pointer to UTF-8 string to be converted in place.
_________________________________________________________________
DESCRIPTION
The first three routines convert the case of individual Unicode characters:
If ch represents a lower-case character, Tcl_UniCharToUpper returns the corresponding upper-case character. If no upper-case character is
defined, it returns the character unchanged.
If ch represents an upper-case character, Tcl_UniCharToLower returns the corresponding lower-case character. If no lower-case character is
defined, it returns the character unchanged.
If ch represents a lower-case character, Tcl_UniCharToTitle returns the corresponding title-case character. If no title-case character is
defined, it returns the corresponding upper-case character. If no upper-case character is defined, it returns the character unchanged.
Title-case is defined for a small number of characters that have a different appearance when they are at the beginning of a capitalized
word.
The next three routines convert the case of UTF-8 strings in place in memory:
Tcl_UtfToUpper changes every UTF-8 character in str to upper-case. Because changing the case of a character may change its size, the byte
offset of each character in the resulting string may differ from its original location. Tcl_UtfToUpper writes a null byte at the end of
the converted string. Tcl_UtfToUpper returns the new length of the string in bytes. This new length is guaranteed to be no longer than
the original string length.
Tcl_UtfToLower is the same as Tcl_UtfToUpper except it turns each character in the string into its lower-case equivalent.
Tcl_UtfToTitle is the same as Tcl_UtfToUpper except it turns the first character in the string into its title-case equivalent and all fol-
lowing characters into their lower-case equivalents.
BUGS
At this time, the case conversions are only defined for the ISO8859-1 characters. Unicode characters above 0x00ff are not modified by
these routines.
KEYWORDS
utf, unicode, toupper, tolower, totitle, case
Tcl 8.1 Tcl_UtfToUpper(3)