Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Help required for Kanji characters in UNIX Post 302284430 by quirkasaurus on Thursday 5th of February 2009 01:06:41 PM
Old 02-05-2009
I think the problem is that the kanji characters are stored in different integer
notation on one box to the next.

This phenomenon occurs a lot in japanese email, and i think, is called "gojimake".

The solution is nasty:

You must translate the binary values on the computer where the kanji look correct,
into their ASCII numeric characters. ( ie. 00010010101 into "27182" )

This could be done using a C program.

Then, within html, you can get the Japanese characters by specifying:

<meta charset='x-euc-jp' >

in your html at the top....
and then accessing the spelled out numbers like:

&27182;

This is the preferred method -- as this is not confused between machines
with different binary integer encoding.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

UNIX PATH info required PLEASE HELP (I'm new to unix)

I need to know how to enter a unix path in a cgi script for a guest book: example: My URL is http://www.kitachi.info I have an html file in the main folder on my site, the file is called : gbook.html what would the correct unix path for this file be ??? the part of the script... (1 Reply)
Discussion started by: akitachi
1 Replies

2. UNIX for Dummies Questions & Answers

Help Required in Unix Command

Hi All, Can anyone please help me in unix command Query: ==== File contains data along with date and time stamp like, .. Date: 08:23:2005 01:00:00 method: xyz init variables Date 08:23:2005 01:00:01 method: xyz finished init variable .... (2 Replies)
Discussion started by: thaduka
2 Replies

3. UNIX for Dummies Questions & Answers

Unix command help required

Hi All, Can anyone please help me in sort out the command to get the following command say File abc.log contains .... ...... This is the first line This is the second line This is the third line This is the fourth line This is the fifth line This is the first line This is the... (7 Replies)
Discussion started by: thaduka
7 Replies

4. UNIX for Advanced & Expert Users

Help required regarding Unix Signal

It is required to trap the signal send to a daemon process before rebooting a unix server. Suppose a script abc.ksh is running in the server as daemon. Before rebooting the server, the unix admin kills all the daemon processes. It is not known to me how admin kills the processes; I mean by which... (9 Replies)
Discussion started by: k_bijitesh
9 Replies

5. Shell Programming and Scripting

Getting required fields from a test file in required fromat in unix

My data is something like shown below. date1 date2 aaa bbbb ccccc date3 date4 dddd eeeeeee ffffffffff ggggg hh I want the output like this date1date2 aaa eeeeee I serached in the forum but didn't find the exact matching solution. Please help. (7 Replies)
Discussion started by: rdhanek
7 Replies

6. UNIX for Dummies Questions & Answers

How to paste Kanji Characters into a Unix File

Hi, I am unable to copy Kanji characters into a unix file. They look like special characters when pasted into the Unix file. My objective is to copy these characters into a unix file and be able to print it and see the Kanji characters. Any help would be greatly appreciated. I am trying this... (1 Reply)
Discussion started by: andrussw
1 Replies

7. Shell Programming and Scripting

Unix script required

I have a file 123.txt which is aasaasas=1 bsasasasasa=2 sawqas=3 I want my output to be 1 2 3 I am new to scripting can some1 help me out. (14 Replies)
Discussion started by: karthikkasarla
14 Replies

8. UNIX for Dummies Questions & Answers

Rename file to kanji -- Solaris

Hi, My shell script calls a perl script to create an excel and the shell script emails the excel. This excel file needs to be renamed to some Kanji name. I have a flat file that has the required file name in kanji and i extract it within the shell script and try to rename the file, but... (3 Replies)
Discussion started by: tariq_m
3 Replies

9. Shell Programming and Scripting

Recursivly rename folders removing characters as required

HI guys here's hoping some on pout the can help I have a large library of epub and mobi file creates some what by calibre. Output of tree listing below I would like to recursively rename the directories removing the brackets and numbers I have been scratching my head over... (4 Replies)
Discussion started by: dunryc
4 Replies

10. Shell Programming and Scripting

Help required in UNIX commands

I have 40000 records in a file where i need to change the 7th field date format from 05142016 to 20160514 I have given field below. any help would be highly appreciated. 364512|9999999|9999999|210553|195495477|195257095|05142016|10009|36313 ---------- Post updated at 05:02 AM... (2 Replies)
Discussion started by: arun888
2 Replies
jiskanji(5)							File Formats Manual						       jiskanji(5)

NAME
jiskanji, jiskanji7, JIS7 - A character encoding system (codeset) for Japanese DESCRIPTION
JIS Kanji is a codeset that uses the JIS X0202 symbol extension method for encoding the JIS X0208 and JIS X0201 character sets. There are two types of JIS Kanji encoding: 7-bit JIS Kanji code and 8-bit JIS Kanji code. 7-bit JIS Kanji Code In 7-bit JIS Kanji encoding, all character values are 7-bit bytes. Characters are interpreted according to preceding in and out sequences as follows: Kanji in sequence (ESC $ B) The code values following the Kanji in sequence (ESC $ B) are treated as characters in the JIS X0208 Kanji character set. Kanji out sequence (ESC ( B) The code values following the Kanji out sequence (ESC ( B) are treated as ASCII characters. Supplementary Kanji in sequence (ESC $ ( D) The code values following the supplementary Kanji in sequence (ESC $ ( D) are treated as characters in the JIS X0212 supplementary Kanji character set. User-Defined Character (UDC) in sequence (ESC $ ( 0) The code values following the UDC in sequence (ESC $ ( 0) are treated as characters in the vendor-defined or user-defined character set. Kana in (SO) and Kana out (SI) sequences The code values following SO(0x0e) and preceding SI(0x0f) are treated as characters in the JIS X0201 Katakana character set. Katakana in sequence (ESC ( I) Code values following the Katakana in sequence (ESC ( I) are treated as characters in the JIS X0201 Katakana character set. In this case, the Kanji out sequence is used to switch back to ASCII code. The Katakana in and Kanji out sequences are an alternative to using the Kana in and out sequences (SO/SI). 8-bit JIS Kanji Code In 8-bit JIS Kanji encoding, the JIS X0201 Katakana characters are represented as 8-bit bytes. Using this form of encoding, in and out sequences have the following effect: Kanji in sequence (ESC $ B) Code values following the Kanji in sequence (ESC $ B) are treated as characters in the JIS X0208 Kanji character set. Supplementary Kanji in sequence (ESC $ ( D) Code values following the supplementary Kanji in sequence (ESC $ ( D) are treated as characters in the JIS X0212 supplementary Kanji character set. User-Defined Character (UDC) in sequence (ESC $ ( 0) Code values following the UDC in sequence (ESC $ ( 0) are treated as vendor-defined or user-defined characters. Kanji out sequence (ESC ( B) Code values following the Kanji out sequence (ESC ( B) are treated as ASCII characters. Kana in and out sequences (SI/SO) These sequences are ignored. Codeset Conversion The following codeset converter pairs are available for converting Japanese characters between jiskanji7 or JIS7 and other encoding for- mats. The RESTRICTIONS section discusses some conversion limitations that apply to these converters. Refer to iconv_intro(5) for an introduction to codeset conversion. For more information about the other codeset for which jiskanji7 or JIS7 is the input or output, see the reference page specified in the list item. deckanji_jiskanji7 or deckanji_JIS7, jiskanji7_deckanji or JIS7_deckanji Converting from and to the DEC Kanji codeset: deckanji(5). eucJP_jiskanji7 or eucJP_JIS7, jiskanji7_eucJP or JIS7_eucJP Converting from and to Japanese Extended UNIX Code: eucJP(5). eucTW_jiskanji7 or eucTW_JIS7, jiskanji7_eucTW or JIS7_eucTW Converting from and to Taiwanese Extended UNIX Code: eucTW(5). sdeckanji_jiskanji7 or sdeckanji_JIS7, jiskanji7_sdeckanji or JIS7_sdeckanji Converting from and to the Super DEC Kanji codeset: sdeckanji(5). SJIS_jiskanji7 or SJIS_JIS7, jiskanji7_SJIS or JIS7_SJIS Converting from and to Shift JIS format: SJIS(5). Shift JIS encoding format is identical to encoding in Microsoft code-pages used on PC systems. Therefore, you can use these convert- ers to convert Japanese characters between JIS Kanji and PC code-page format. For general information on how the operating system supports PC code pages, see code_page(5). RESTRICTIONS
The JIS Kanji codeset is not supported directly by a locale but through code conversion (through the iconv utility, Japanese terminal (tty) code conversion, and so forth). In the codeset naming conventions used by the iconv utility, the string JIS7 indicates 7-bit JIS Kanji code that follows a Katakana in sequence and the string jiskanji7 indicates 7-bit JIS Kanji code entered between Kana in and out sequences. The following sequences are valid for input to the iconv utility but are not generated when code is converted to jiskanji7: Kanji in (ESC $ @) Kanji in (ESC & @ ESC $ B) Kanji in (ESC $ ( B) Kanji in (ESC $ ( @) Supplementary Kanji in (ESC $ D) Kana in (ESC ( J) Kana in (ESC ( H) In the code naming conventions of the Japanese terminal, the string jis7 indicates 7-bit JIS Kanji code and the string jis8 indicates 8-bit JIS Kanji code. When the terminal code is set to jis7, the Kana in and out sequences (SI/SO) are used for JIS X0201 Katakana character rep- resentation. SEE ALSO
Commands: locale(1) Others: ascii(5), code_page(5), deckanji(5), eucJP(5), i18n_intro(5), i18n_printing(5), iconv_intro(5), iso2022jp(5), Japanese(5), l10n_intro(5), sdeckanji(5), shiftjis(5) jiskanji(5)
All times are GMT -4. The time now is 10:39 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy