Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Character Convertion Help Needed Post 302172170 by cbkihong on Monday 3rd of March 2008 12:30:11 AM
Old 03-03-2008
ISO8859-1 is typically regarded as the "English" encoding by most people.

However ....

My understanding is that the full-width version of the characters are of different characters from their half-width counterparts, so they are considered different and I am not inclined to believe that you can use iconv to convert between them, without looking up an external mapping table.

For example, with Chinese that I use, input method editors (some OS, e.g. Windows, IME is part of the OS) typically carry this table as part of the install, but I am not sure if you can find a version that allows you to extract this table and use that directly. Of course, as you said, some applications exist which knows how to convert but you mentioned you do not wish to use them. My feeling is that if you know both the half-width and full-width code assignments, you can hack this table yourself.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

ASCII to char convertion

I am writing the script to encrypt and decrypt content of the text file. How can I convert ASCII to characters and backward? I need it for Bourne shell script. Thanks::confused: (3 Replies)
Discussion started by: woody
3 Replies

2. Filesystems, Disks and Memory

convertion

How to convert a .doc document to .pdf (2 Replies)
Discussion started by: areef4u
2 Replies

3. Shell Programming and Scripting

Help needed in character replacement in Korn Shell

How do I replace a space " " character at a particular position in a line? e.g. I have a file below $ cat i2 111 002 A a 33 0011 B c 2222 003 C a I want all the 1st spaces to be replaced with forward slash "/" and the 3rd spaces to have 5 spaces to get the output below: 111/002... (8 Replies)
Discussion started by: stevefox
8 Replies

4. Shell Programming and Scripting

Hex to Decimal Convertion

Dear all, I have a file like this. EE48 4473 7FC9 EE48 102C D23 EE48 4DD 27D EE48 0 0 EE48 3FFE 854 F230 DC6 ... (1 Reply)
Discussion started by: Nayanajith
1 Replies

5. Shell Programming and Scripting

convertion of a file

Hi I am having file like this 1 560017039 575052020 22-11-2003 8,290.00 709545 100239050 11 2 560017006 575052020 13-01-2008 20,000.00 709545 100246770 11 i want to convert it like 5600170395750520202211200300000008290000000000000709545010023905011... (8 Replies)
Discussion started by: suryanarayana
8 Replies

6. Shell Programming and Scripting

read in a file character by character - replace any unknown ASCII characters with spa

Can someone help me to write a script / command to read in a file, character by character, replace any unknown ASCII characters with space. then write out the file to a new filename/ Thanks! (1 Reply)
Discussion started by: raghav525
1 Replies

7. Shell Programming and Scripting

Convertion of Date Format using SQL query in a shell script

When I write Select date_field from TableA fetch first row only I am getting the output as 09/25/2009. I want to get the output in the below format 2009-09-25 i.e., MM-DD-YYYY. Please help (7 Replies)
Discussion started by: dinesh1985
7 Replies

8. Shell Programming and Scripting

Convertion from Exponential to Decimal

I am trying to read values from excel and perform some calculations but I am getting below error: expr 2.326227180240883E7 / 8.509366417956961E8 expr: non-numeric argument Can anyone let me know how do i convert thse exponential numbers to decimal. (2 Replies)
Discussion started by: sachinnayyar
2 Replies

9. Shell Programming and Scripting

Perl script database date convertion

Need assistance Below script get the output correctly I want to convert the date format .Below is the output . Any idea ? #!/usr/bin/perl -w use DBI; # Get a database handle by connecting to the database my $db = DBI->connect(... (3 Replies)
Discussion started by: ajayram_arya
3 Replies

10. Shell Programming and Scripting

sed searches a character string for a specified delimiter character, and returns a leading or traili

Hi, Anyone can help using SED searches a character string for a specified delimiter character, and returns a leading or trailing space/blank. Text file : "1"|"ExternalClassDEA519CF5"|"Art1" "2"|"ExternalClass563EA516C"|"Art3" "3"|"ExternalClass305ED16B8"|"Art9" ... ... ... (2 Replies)
Discussion started by: fspalero
2 Replies
code_page(5)							File Formats Manual						      code_page(5)

NAME
code_page, cp437, cp737, cp775, cp850, cp852, cp855, cp857, cp860, cp861, cp862, cp863, cp865, cp866, cp869, cp874, cp932, cp936, cp949, cp950, cp1250, cp1251, cp1252, cp1253, cp1254, cp1255, cp1256, cp1257, cp1258, dingbats, symbol - Coded character sets that are used on Mi- crosoft Windows and NT systems DESCRIPTION
Code pages are coded character sets that are used on Microsoft Windows, Windows 95, and NT systems. Just as there are different UNIX code- sets, there are different PC code pages, each supporting a particular set of character encodings. A Tru64 UNIX system supplies one locale, en_US.cp850, that directly supports a PC code-page format (MS-DOS Latin 1). For all other locales, data in code-page format is supported only through codeset converters. These converters can be run directly by users or by software or applications that exchange data between PC and Tru64 UNIX systems. Fonts and other kinds of character support are available only for the native UNIX codeset to which a code page can be converted. See the i18n_intro(5) reference page for introductory information on locales and codesets. See the iconv_intro(5) reference page for an introduction to codeset conversion and the name format and location of codeset con- verters. The following table lists and describes the code pages that have conversion support on a Tru64 UNIX system. An asterisk (*) follows the names of code pages that include support for the Euro currency sign (C=). ------------------------------------------------------ Code Page Description ------------------------------------------------------ cp437 MS-DOS United States cp737 Greek cp775 Baltic languages (1) cp850 MS-DOS Multilingual (Latin-1) cp852 MS-DOS Slavic (Latin-2) cp855 IBM Cyrillic cp857 IBM Turkish cp860 MS-DOS Portuguese cp861 MS-DOS Icelandic cp862 Hebrew cp863 MS-DOS Canadian French cp865 MS-DOS Nordic languages cp866 MS-DOS Russian cp869 IBM Modern Greek cp874 * MS-DOS Thai cp932 Japanese cp936 Chinese (People's Republic of China) cp949 Korean cp950 Chinese (Hong Kong) cp1250 * Windows Latin-2 cp1251 * Windows Cyrillic cp1252 * Windows Latin-1 cp1253 * Windows Greek cp1254 * Windows Turkish cp1255 * Windows Hebrew cp1256 * Windows Arabic cp1257 * Windows Baltic (1) cp1258 * Windows Vietnamese dingbats Microsoft dingbat characters symbol Microsoft miscellaneous symbol characters ------------------------------------------------------ (1) Baltic languages include Estonian, Latvian, and Lithuanian. (2) Latin-2 languages include Albanian, Croatian, Czech, Faeroese, Hungarian, Polish, Romanian, Latin Serbian, Slovak, and Slovenian. (3) Cyrillic languages include Byelorussian, Bulgarian, and Russian. In all cases, a code page can be converted to and from the UCS-2, UCS-4, and UTF-8 codesets. In addition, some code pages can be converted directly to ISO codesets as shown in the following table, although some data loss may occur. ------------------------------------------ Code Page Can Be Converted Directly to: ------------------------------------------ cp437 ISO8859-1 cp737 ISO8859-7 cp775 ISO8859-4 cp850 ISO8859-1 cp852 ISO8859-2 cp855 ISO8859-5 cp857 ISO8859-9 cp860 ISO8859-1 cp861 ISO8859-1 cp862 ISO8859-8 cp863 ISO8859-1 cp865 ISO8859-1 cp866 ISO8859-5 cp869 ISO8859-7 cp874 TACTIS cp1252 ISO8859-1, ISO8859-15 ------------------------------------------ See Unicode(5) for information about UCS-2, UCS-4, and UTF-8. Reference pages for UNIX implementations of the ISO codesets have the name format iso8859-number(5). For Traditional Chinese and Japanese, there are no codeset converters whose names include the name of a code page because identical charac- ter encoding is provided in existing UNIX codesets. For Traditional Chinese, character encoding in PC code-page format (cp950) is identical to that in the Big-5 (big5) codeset. For Japanese, character encoding in PC code-page format (cp932) is identical to that in the Shift JIS (SJIS) codeset. Therefore, the codeset converters whose names include big5 and SJIS can be used to convert data in and out of PC code-page format for the supported languages. Caution for Conversion of Korean and Simplified Chinese Conversion of text that starts out in code-page format (cp949) to the DEC Korean (deckorean) codeset may result in loss of data. All of the Tru64 UNIX codeset equivalents for cp949 support all the Hanja and miscellaneous characters also supported by the code page. However, only the UCS-2, UCS-4, and UTF-8 codesets support the complete set of Hangul characters supported by the cp949 code page. The deckorean codeset supports only a subset of these Hangul characters. Therefore, if data is converted from cp949 format to UCS-2, UCS-4, or UTF-8, no data is lost. However, if the data is then converted from UCS-2, UCS-4, or UTF-8 to deckorean, the unsupported Hangul characters will be lost. The DEC Hanzi (dechanzi) codeset uses the same encoding format as the PC code page used for Simplified Chinese (cp936) but does not support all the characters supported by the code page. Therefore, you can use converters with dechanzi in the converter name to convert text to and from cp936 format, but the operation may result in some loss of data. SEE ALSO
Commands: iconv(1) Functions: iconv(3), iconv_close(3), iconv_open(3) Others: i18n_intro(5), iconv_intro(5), iso8859-1(5), iso8859-2(5), iso8859-4(5), iso8859-5(5), iso8859-7(5), iso8859-8(5), iso8859-15(5), Unicode(5) code_page(5)
All times are GMT -4. The time now is 01:49 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy