Sponsored Content
Top Forums Shell Programming and Scripting extra character with iconv encoding Post 302530936 by fpmurphy on Wednesday 15th of June 2011 11:31:08 AM
Old 06-15-2011
Quote:
Originally Posted by yazu
It's alright. Change coding on your terminal or in your editor to shift_jis and you can see "pure" Cyrillic letters. Sometimes you can see ツ (like so ツА ツБ ツВ ツГ ツД ツИ ツЙ ツК ツЛ) - it's the leading symbol for Cyrillic (and some another) letters.
Out of curiosity, why did you recommend changing to Shift-JS which is a Japanese language encoding? CP866 does not map to Shift-JS. The single-byte characters from 0xA1 to 0xDF map to the half-width katakana characters in JIS X 0201.
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

iconv -l and ANSEL character set

I am forced to use the ANSEL character set for some GEDCOM documents but must convert them to a more modern set for another app which doesn't recognize ANSEL. I am unable to locate an ISO code for ANSEL in a search of the web. Would someone plese identify the ANSEL character set from the list given... (4 Replies)
Discussion started by: Whiterock
4 Replies

2. UNIX for Dummies Questions & Answers

character encoding in Fedora6

Hello, After upgrading the OS from Fedora4 to Fedora6, the firefox view>character encoding doesn't work anymore. None of the foreign characters can be displayed, no matter what character encoding to select. Any suggestions? Thanks, bsky :confused (1 Reply)
Discussion started by: bsky
1 Replies

3. AIX

Vacation message character encoding

I am trying to send a vacation message (.vacation.msg) from my AIX 5.3 server. Message is UTF-8 characters. Some email clients (like apple mail) have no problems displaying the correct text, however, some, like Windows Outlook, display garbage. Is there a way of forcing the client to use proper... (0 Replies)
Discussion started by: lanny
0 Replies

4. Shell Programming and Scripting

sort file adding extra character

HI all i have this script : #!/bin/bash sort /usr/tmp/"REPORT"$1 -o \ /usr/tmp/"SREPORT"$1 -k 1,7 -S 150 end of script now i'm doing this command : ls -lsgt *REPORT* 4 -rw-r--r-- 300 Sep 16 REPORT54784 4 -rw-r--r-- 301 Sep 16 SREPORT54784 as you can see the sorted file... (5 Replies)
Discussion started by: naamas03
5 Replies

5. Shell Programming and Scripting

how to delete extra character in a line?

And I want to delete the characters longer than 20 for each line start with #. The other lines should remain the same. I think this can be done by sed. Could anyone help me with this? Thanks! my input file: #ZP_05494889.1_Clostridium_papyrosolvens... (3 Replies)
Discussion started by: ritacc
3 Replies

6. Shell Programming and Scripting

Remove extra character

Hi I am using cat <filename> command in one of my datastage job(Command Activity). It is giving actual value but giving extra line. Eg: Displayed Output: 1 and showing extraline(Eg: 1 ) I had checked even wc -c it is giving one character extra. If the file contains 11. wc -c says 3. ... (3 Replies)
Discussion started by: cnrj
3 Replies

7. HP-UX

how to find the character encoding of a file in hp_ux

how to find the character encoding of a file in hp_ux (1 Reply)
Discussion started by: alokjyotibal
1 Replies

8. Shell Programming and Scripting

Awk while-loop printing extra character

Hi, I'm using a while-loop in an awk script. If it matches a regular expression, it prints a line. Unfortunately, each line that is printed in this loop is followed by an extra character, "1". While-statement extracted from my script: getline temp; while (temp ~ /.* x .*/) print temp... (3 Replies)
Discussion started by: redbluefish
3 Replies

9. Solaris

connect to ILOM via ssh character encoding

Hello all, I am connecting to ILOM using ssh client (putty) but when RedHat start booting everything look chinese for me... Probably i have to configure the character set, i tried also utf-8 but the issue remain. Any idea? Thanks in advance (0 Replies)
Discussion started by: @dagio
0 Replies

10. Shell Programming and Scripting

sed removing extra character from end

Hi, Searching through forum I found "sed 's/*$//'" can be used to remove trailing whitespaces and tabs from file. The command works fine but I see minor issue as below. Can you please suggest if I am doing something wrong here. $ cat a.txt upg_prod_test upg_prod_new $ cat a.txt |sed... (11 Replies)
Discussion started by: bhupinder08
11 Replies
wctype_ja(3C)						   Standard C Library Functions 					     wctype_ja(3C)

NAME
wctype_ja - Define a character class for the Japanese locale SYNOPSIS
#include <wchar.h> wctype_t wctype(const char *charclass); DESCRIPTION
wctype() builds values in wctype_t data type according to the specification with the charclass argument to determine wide character classes. iswctype() is used for actual determination. wctype() returns arguments that wctype() needs to use. The following character class names are defined in every locale. alnum alpha blank cntrl digit graph lower print punct space upper xdigit In addition to the above, the Japanese locale (ja, ja_JP.eucJP, ja_JP.PCK and ja_JP.UTF-8) defines the following character classes specific to the Japanese locale. jkanji jkata hira jdigit jparen line jisx0201r jisx0208 jisx0212 udc vdc The following character classes are supported in ja and ja_JP.eucJP locales only. jalpha jspecial jgreek jrussian junit jsci jgen jpunct The following character classes are supported in ja_JP.eucJP and ja_JP.UTF-8 locale only. ascii paren jisx0201 gaiji jhankana jspace These can be also used as charclass arguments to wctype(). However, the use of these classes are limited to applications for the Japanese locale only. upper Character class that represents any uppercase letter JIS X 020Alphabetcuppercasegletterss(C/1-D/10) JIS X 020Roman character uppercase letters (3/33-3/58) Greek character uppercase letters (6/1-24) Russian character uppercase letters (7/1-33) JIS X 021Greek alphabet uppercase letters with diacritical marks (6/65-69, 71, 73, 74, 76) Cyrillic alphabet uppercase letters (7/34-46) Latin alphabet uppercase letters (9/1, 2, 4, 6, 8, 9, 11, 12, 13, 15, 16) Latin alphabet uppercase letters with diacritical marks (10/01-24, 26-87) lower Character class that represents any lowercase letter JIS X 020Alphabetclowercasegletterss(E/1-F/10) JIS X 020Roman character lowercase letters (3/65-90) Greek character lowercase letters (6/33-56) Russian character lowercase letters (7/49-81) JIS X 021Greek alphabet lowercase letters with diacritical marks (6/81-92) Cyrillic alphabet lowercase letters (7/82-94) Latin alphabet lowercase letters (9/33-48) Latin alphabet lowercase letters with diacritical marks (11/1-27, 29-35, 37-87) digit Class that determines the numbers 0 to 10 for decimal representation. JIS X 020Numbers (B/0-9)er graphic set space Class that determines a space. JIS X 020Spacet(A/9-13)acter set Space characters JIS X 020Space (1/1) punct Class that determines symbols and special characters. JIS X 020A/1-15, B/10-C/0, D/11-E/0,eF/11-14 cntrl Class that determines control characters. JIS X 020Allocharactersacter set Kill characters C1 controAllhcharacters blank Class that determines field delimiters. JIS X 020A/9ontrol character set Space characters JIS X 020Space (1/1) xdigit Class that determines alphanumerics used for hexadecimal representation. JIS X 020Numbers c(B/0-9)r graphic set A-F, a-f (C/1-6, E/1-6) alpha Class that determines alphabets. upper class and lower class letters print Class that determines printable characters. JIS X 0201 Roman charactSpaceacharacters JIS X 0201 Katakana charAllethercharacters except in character undefined areas JIS X 0208 All the characters except in character undefined areas JIS X 0212 All the characters except in character undefined areas Vendor-defined characterAllethe characters except in character undefined areas in Class vdc. User-defined character aAllsthe characters including character undefined areas in Class udc. graph Class that determines graphic characters. All the characters in Class print except those in Class space. jkanji Class that determines Kanji (symbol or ideographic characters used for Kanji representation). JIS X 020Character defined areas from Ku 16 to Ku 84. JIS X 021Character defined areas from Ku 16 to Ku 77. jkata Class that determines Katakana. JIS X 0205/1-86, 1/11, 12, 19, 20 jhira Class that determines Hiragana. JIS X 0204/1-83, 1/11, 12, 21, 22, 26 jdigit Class that determines numbers except in digit. JIS X 0203/16-25 jparen Class that determines characters such as parentheses. JIS X 0201/38-59 line Class that determines ruled line primitives. JIS X 0208/1-32 jisx0201r Class that determines characters included in JIS X 0201 Katakana character graphic set. JIS X 020AllatheacharactersefromaA/1ctoeD/15. jisx0208 Class that determines characters included in JIS X 0208. All the characters including those in JIS X 0208 character undefined areas: From Ku 1 to Ku 84 (Ku 13 Vendor-defined char- acter area is included). jisx0212 Class that determine characters included in JIS X 0212. All the characters including those in JIS X 0212 character undefined areas: From Ku 1 to Ku 84 (Ku 83 and 84 Vendor-defined character areas are also included). No characters in ja_JP.PCK locale are included in this class. udc Class that determines user-defined characters. All the characters including those in character undefined areas in the user-defined character area. ja locale User-defined characters 0xf5a1-0xfefe 0x8ff5a1-0x8ffefe ja_JP.PCK locale User-defined characters 0xf040-0xf9fc ja_JP.UTF-8 locale User-defined characters 0xe000-0xf8ffters) vdc Class that determines vendor-defined characters. All the characters including those in character undefined areas in the vendor-defined character area. ja and ja_JP.eucJISlXc0208 Ku 13: Special symbols JIS X 0212 Ku 83 - 84 IBM Extended characters not included in JIS X 0212. ja_JP.PCK localeJIS X 0208 Ku 13: Special symbols NEC-selective IBM Extended characters 0xed40-0xeffc IBM Extended characters: 0xfa40-0xfcfc ja_JP.UTF-8 locaNot defined jalpha Class that determines alphabet letters. JIS X 0203/33-58, 3/65-90 jspecial Class that determines special symbol characters. JIS X 0201/2-94, 2/1-14, 2/26-33, 2/42-48, 2/60-74, 2/82-89, 94 JIS X 0212/15-25, 2/34-36, 2/75-81 JIS X 020IBMuExtendedccharacterss Special characters defined by NEC-selective IBM Extended characters jgreek Class that determines Greek characters. JIS X 0206/1-24, 6/33-56 jrussian Class that determines Russian characters. JIS X 0207/1-7/33, 7/49-81 junit Class that determines unit symbols. JIS X 0201/75-83, 2/82, 83 JIS X 0212/80 jsci Class that detemines scientific symbols. JIS X 0201/60-74, 2/26-33, 2/42-48, 2/60-74 jgen Class that determines general symbols. JIS X 0201/84-94, 2/1-14, 2/84-89, 94 JIS X 0212/35, 75, 2/79-81 jpunct Class that determines punctuation symbols. JIS X 0201/2-37 JIS X 0212/34, 36 ascii Class that determines JIS X 0201 Functional character set, Space characters, Roman character graphic set, and Kill charac- ters. paren Class that determines characters such as parentheses. jisx0201 Class that determines characters included in JIS X 0212. gaiji Class that determines implementer defined characters. udc and vdc classes are included. jhankana Class that determines characters used for Japanese representation included in JIS X 0212. jspace Class that determines space characters included in JIS X 0208 and JIS X 0212. XX/YY in JIS X 0201 Functional character set, Roman character graphic set, and Katakana character graphic set denotes Column XX and Row YY. XX/YY in JIS X 0208 and JIS X 0212 denotes Ku XX and Point YY. In case of JIS X 0212 characters, this rule only applies to ja or ja_JP.UTF-8 locale. EXAMPLES
The following example shows how to determine if the wide character wc is included in Class udc. iswctype(wc, wctype("udc")) SEE ALSO
iswctype(3C), wctype(3C), wctrans_ja(3C), jctype(3x), eucJP(5), PCK(5) SunOS 5.10 10 Jan 2003 wctype_ja(3C)
All times are GMT -4. The time now is 02:43 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy