Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

iconv_keis(5) [osf1 man page]

iconv_KEIS(5)							File Formats Manual						     iconv_KEIS(5)

NAME
iconv_KEIS - Specification for controlling conversion between Hitachi KEIS and Tru64 UNIX Japanese codesets DESCRIPTION
The iconv utility supports the ability to convert the encoding of characters between Hitachi KEIS (Kanji processing Extended Information System) code and one of the following Tru64 UNIX codesets: DEC Kanji, Super DEC Kanji, Japanese EUC, or Shift JIS. You choose the type of conversion by specifying the appropriate values for the utility's from-code and to-code parameters, as follows: ------------------------------------------------ Type of Code Conversion from-code to-code ------------------------------------------------ KEIS to DEC Kanji KEIS deckanji KEIS to Super DEC Kanji KEIS sdeckanji KEIS to Japanese EUC KEIS eucJP KEIS to Shift JIS KEIS SJIS DEC Kanji to KEIS deckanji KEIS Super DEC Kanji to KEIS sdeckanji KEIS Japanese EUC to KEIS eucJP KEIS Shift JIS to KEIS SJIS KEIS ------------------------------------------------ Conversion behavior for the following items is affected by the definition of environment variables or profile entries in the user's envi- ronment. For more information, see the "Environment Variables" and "Profile" sections. The UDC (User-Defined Character) mapping table that is used for UDC conversion This table must be an ASCII text file that contains UDC mapping information. The table affects conversion of user-defined charac- ters between the codesets. The EBCDIC to/from ISO code (ASCII, JIS Roman characters) mapping table that is used for conversion This table must be ASCII text file that contains information on how to map characters between EBCDIC and ISO code. The K-shift code This is a one- or two-byte hexadecimal code that marks the beginning of Kanji mode. The A-shift code This is a one- or two-byte hexadecimal code that marks the beginning of EBCDIC mode. The status of the initial mode (Kanji or EBCDIC) at the time iconv command starts or the first time the iconv() function is called after calling the iconv_open() function that initializes the converter in a program The status keywords are either kanji_mode or ebcdic_mode. How to treat undefined characters when these are detected in Kanji mode Specify this action by using one of the following keywords: Stop codeset conversion. Output the undefined characters without any processing and continue codeset conversion. Output padding characters instead of the undefined characters and continue codeset con- version. Ignore the undefined characters and continue codeset conversion. The two-byte padding character used in Kanji mode This value is meaningful when replace is chosen for the processing of undefined characters in Kanji mode. Specify the padding char- acter by its hexadecimal value. How to treat undefined characters when these are detected in EBCDIC mode Specify this action by using one of the following keywords: Stop codeset conversion. Output the undefined characters without any processing and continue codeset conversion. Output padding characters instead of the undefined characters and continue codeset con- version. Ignore the undefined characters and continue codeset conversion. The one-byte padding character used in EBCDIC mode This value is meaningful when replace is chosen for the processing of undefined characters in EBCDIC mode. Specify the padding char- acter by its hexadecimal value. When the to-code parameter for the conversion is KEIS, you can also specify the following items for conversion behavior: Whether the ini- tial shift code is output at the start of conversion if the status of the initial mode (Kanji or EBCDIC) is different from the mode of the first input character The start of conversion is the time the iconv utility starts processing, or when the iconv() function is called just after opening the converter with iconv_open(). Keyword values for this item are yes or no. Whether or not the utility outputs the last shift code when iconv() is called with a zero length input string, and the current mode (Kanji or EBCDIC) is different from the mode specified by the last shift state Keyword values for this item are yes or no. The last status (Kanji mode or EBCDIC mode) Specify kanji_mode or ebcdic_mode for this value. It is meaningful only when yes is the setting for whether the utility outputs the last shift code. If the items that control conversion behavior are specified by both environment variables and the profile file, values set by environment variables override values set by comparable entries in the profile. Note that values for all conversion control items are case-sensitive, whether they are set by environment variables or in the profile. The following table contains the default values for each conversion con- trol item: ---------------------------------------------------- Conversion Control Item Default Value ---------------------------------------------------- UDC mapping table None K shift code 0x0a42 A shift code 0x0a41 Initial state ebcdic_mode Processing for undefined characters in Kanji mode abort Processing for undefined characters in EBCDIC mode pass ---------------------------------------------------- The default padding characters are white spaces, whose code values for each destination codeset are noted in the following table. These padding characters are output when you specify replace for processing of undefined characters and do not explicitly specify the padding character. --------------------------------------------------- Mode Default Value Destination Codeset --------------------------------------------------- Kanji mode 0xa1a1 KEIS, deckanji, sdeckanji, or eucJP 0x8140 SJIS EBCDIC mode 0x40 KEIS 0x20 deckanji, sdeckanji, eucJP, or SJIS --------------------------------------------------- The default EBCDIC-ISO mapping table is as follows; For conversion from KEIS to other codesets: /usr/lib/nls/loc/iconv/data/ebcdic_kana.tbl For conversion from other codesets to KEIS: /usr/lib/nls/loc/iconv/data/kana_ebcdic.tbl These mapping tables map both EBCDIC and ISO code, which includes JIS Roman characters. The kana_ebcdic.tbl mapping table also maps ISO lowercase characters to EBCDIC uppercase characters. The following default values for conversion control items are meaningful when the iconv utility's to-code conversion parameter is KEIS: --------------------------------------------- Conversion Control Item Default --------------------------------------------- Output the initial shift code? yes Output the last shift code? yes Output the last status? ebcdic_mode --------------------------------------------- Environment Variables This section discusses the environment variables that you can set to control conversion behavior. The names for these variables adhere to the following format: fromcode_tocode_controlitem The name segments for fromcode or tocode can be one of the following key words: ---------------------------- For Codeset: Use: ---------------------------- Hitachi KEIS KEIS DEC Kanji DECKANJI Super DEC Kanji SDECKANJI Japanese EUC EUCJP Shift JIS SJIS ---------------------------- The name segments for controlitem can be one of the following keywords: -------------------------------------------------------- For Control Item: Use: -------------------------------------------------------- UDC mapping table UDC_TABLE EBCDIC-ISO mapping table EBCDIC_TABLE K shift code K_SHIFT_CODE A shift code A_SHIFT_CODE Initial state INITIAL_STATE Processing of undefined characters in Kanji mode KANJI_EXCEPT_PROC Processing of undefined characters in EBCDIC mode EBCDIC_EXCEPT_PROC Padding characters in Kanji mode PADDING_2BYTE_CHAR Padding characters in EBCDIC mode PADDING_1BYTE_CHAR Output initial shift code INITIAL_SHIFT_CODE Output last shift code TRAILER_SHIFT_CODE Last status LAST_STATE File path of the profile PROFILE -------------------------------------------------------- Following are examples of using the setenv C shell command to define environment variables to control conversion behavior. In these exam- ples, the fromcode name segment indicates Japanese EUC and the tocode name segment indicates KEIS: setenv EUCJP_KEIS_UDC_TABLE eucjp_keis_udc.tbl setenv EUCJP_KEIS_EBCDIC_TABLE ebcdic_kana.tbl setenv EUCJP_KEIS_K_SHIFT_CODE 0x0a42 setenv EUCJP_KEIS_A_SHIFT_CODE 0x0a41 setenv EUCJP_KEIS_INITIAL_STATE ebcdic_mode setenv EUCJP_KEIS_KANJI_EXCEPT_PROC replace setenv EUCJP_KEIS_EBCDIC_EXCEPT_PROC replace setenv EUCJP_KEIS_PADDING_2BYTE_CHAR 0xa1a1 setenv EUCJP_KEIS_PADDING_1BYTE_CHAR 0x40 setenv EUCJP_KEIS_INITIAL_SHIFT_CODE yes setenv EUCJP_KEIS_TRAILER_SHIFT_CODE yes setenv EUCJP_KEIS_LAST_STATE ebcdic_mode setenv EUCJP_KEIS_INI- TIAL_SHIFT_CODE yes setenv EUCJP_KEIS_TRAILER_SHIFT_CODE yes setenv EUCJP_KEIS_LAST_STATE ebcdic_mode setenv EUCJP_KEIS_PROFILE .eucjp_keis_profile Directory Search Path When you specify a file name without a directory, the iconv utility searches the following directories and uses the first file found: Cur- rent directory Home directory The subdirectory iconv/data of the directory specified by the environment variable LOCPATH /usr/lib/nls/loc/iconv/data /usr/i18n/lib/nls/loc/iconv/data If you specify a relative directory path for a file, the utility searches these same directories in the same order and uses the first file found. Profile File Entry lines in the profile file adhere to the following format: entry_name string_value The entry_name and string_value fields are separated by spaces or tabs. Do not append a colon (:) after entry_name. The file can also include blank lines and comment entries, which begin with the # character. Following are the entry_name values for different conversion control items: ------------------------------------------------------------ Conversion Control Item entry_name ------------------------------------------------------------ UDC mapping table udc_mapping_table EBCDIC-ISO mapping table ebcdic_mapping_table K shift code k_shift_code A shift code a_shift_code Initial state initial_state Processing undefined characters in Kanji mode kanji_except_proc Processing undefined characters in EBCDIC mode ebcdic_except_proc Padding character in Kanji mode padding_2byte_char Padding character in EBCDIC mode padding_1byte_char Output initial shift code output_initial_shift_code Output last shift code output_trailer_shift_code Last state last_state ------------------------------------------------------------ Following is a sample profile for converting from Japanese EUC to Hitachi KEIS: # # sample profile for eucJP_KEIS # udc_mapping_table eucjp_keis_udc.tbl ebcdic_mapping_table kana_ebcdic.tbl k_shift_code 0x0a42 # ebcdic -> kanji a_shift_code 0x0a41 # kanji -> ebcdic ini- tial_state ebcdic_mode kanji_except_proc replace ebcdic_except_proc replace padding_2byte_char 0xa1a1 # kanji mode padding_1byte_char 0x40 # ebcdic mode output_initial_shift_code yes out- put_trailer_shift_code yes last_state ebcdic_mode The default file names for the profile are as follows; -------------------------------------------------- Code Conversion Default Profile Name -------------------------------------------------- KEIS to DEC Kanji .keis_deckanji_profile KEIS to Super DEC Kanji .keis_sdeckanji_profile KEIS to Shift JIS .keis_sjis_profile KEIS to Japanese EUC .keis_eucjp_profile DEC Kanji to KEIS .deckanji_keis_profile Super DEC Kanji to KEIS .sdeckanji_keis_profile Shift JIS to KEIS .sjis_keis_profile Japanese EUC to KEIS .eucjp_keis_profile -------------------------------------------------- By default, the iconv utility checks the directory search path mentioned in the "Directory Search Path" section and uses the first profile it finds. However, you can also specify an arbitrary file path for your profile instead of the default names by defining the following environment variables: ------------------------------------------------------------ Code Conversion Profile Path Environment Variable ------------------------------------------------------------ KEIS to DEC Kanji KEIS_DECKANJI_PROFILE KEIS to Super DEC Kanji KEIS_SDECKANJI_PROFILE KEIS to Shift JIS KEIS_SJIS_PROFILE KEIS to Japanese EUC KEIS_EUCJP_PROFILE DEC Kanji to KEIS DECKANJI_KEIS_PROFILE Super DEC Kanji to KEIS SDECKANJI_KEIS_PROFILE Shift JIS to KEIS SJIS_KEIS_PROFILE Japanese EUC to KEIS EUCJP_KEIS_PROFILE ------------------------------------------------------------ UDC Mapping Table Entries in a UDC mapping table adhere to the following format: fromcode tocode Each of these values is a two-byte hexadecimal number. In the case of Super DEC Kanji and Japanese EUC, three-byte hexadecimal values that begin with SS3 (0x8f), such as 0x8fxxxx, are also valid. You can specify ranges of UDC from and to values in the same file entry by using a hyphen to separate the codes that start and end each range: start_fromcode-end_fromcode start_tocode-end_tocode When specifying entries that include ranges of values, the number of codes in the from range must always equal the number of codes in the to range. A UDC mapping table can also include blank lines and comment lines, which begin with the # character. Following is an example of a UDC mapping table: # KEIS eucJP 0x81a1-0x8afe 0xf5a1-0xfefe # udc 0x8ba1-0x94fe 0x8ff5a1-0x8ffefe # udc 0x95a1-0x9afe 0x8feea1-0x8ff3fe # udc 0x9ba1-0x9bfe 0x8ff4a1-0x8ff4fe # udc The first entry in this file specifies a range of KEIS values from 0x80a1 to 0x8afe that are mapped to Japanese EUC code values in the range 0xf5a1 to 0xfefe. You can find additional sample UDC mapping table files in the /usr/i18n/examples/iconv/data directory. EBCDIC-ISO Mapping Table Entries in an EBCDIC-ISO mapping table adhere to the following format: fromcode tocode Each code is a one-byte hexadecimal number. You can specify a range of character codes as follows: start_fromcode-end_fromcode start_tocode-end_tocode When using the range format, the number of hex values in the from range must be the same as the number of hex values in the to range. The EBCDIC-/ISO mapping table can also include blank lines and comment entries, which begin with the # character. Following is an example of EBCDIC-ISO code mapping table: # EBCDIC Kana 0x40 0x20 # space 0x4f 0x21 # '!' 0x7f 0x22 # '"' . . . . . . 0xc1-0xc9 0x41-0x49 # 'A' - 'I' 0xd1-0xd9 0x4a-0x52 # 'J' - 'R' 0xe2-0xe9 0x53-0x5a # 'S' - 'Z' . . . . . . In this example, the first column of values are from codes and the second column of values are to codes. The first three value entry lines specify mapping for single characters, whereas the last three value entry lines specify mapping for ranges of characters. You can find additional sample EBCDIC-ISO mapping tables in the /usr/i18n/lib/nls/loc/iconv/data directory. NOTES
This reference page contains code conversion specifications that apply only to conversion between Hitachi KEIS code and the DEC Kanji, Super DEC Kanji, Japanese EUC, and Shift JIS codesets. Refer to iconv_ibmkanji(5) for code conversion specifications between IBM Kanji Sys- tem characters and the DEC Kanji, Super DEC Kanji, Japanese EUC, and Shift JIS codesets. Refer to iconv_JEF(5) for code conversion specifi- cations between Fujitsu JEF characters and the DEC Kanji, Super DEC Kanji, Japanese EUC, and Shift JIS codesets. Refer to iconv_intro(5) for information about conversion between DEC Kanji, Super DEC Kanji, Japanese EUC, Shift JIS, and other Tru64 UNIX codesets. SEE ALSO
Commands: iconv(1) Functions: iconv(3), iconv_close(3), iconv_open(3) Others: deckanji(5), eucJP(5), iconv_ibmkanji(5), iconv_intro(5), iconv_KEIS(5), Japanese(5), sdeckanji(5), SJIS(5) iconv_KEIS(5)
Man Page