Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

euc(5) [osx man page]

EUC(5)							      BSD File Formats Manual							    EUC(5)

NAME
euc -- EUC encoding of wide characters SYNOPSIS
ENCODING "EUC" VARIABLE len1 mask1 len2 mask2 len3 mask3 len4 mask4 mask DESCRIPTION
EUC implements a system of 4 multibyte codesets. A multibyte character in the first codeset consists of len1 bytes starting with a byte in the range of 0x00 to 0x7f. To allow use of ASCII, len1 is always 1. A multibyte character in the second codeset consists of len2 bytes starting with a byte in the range of 0x80-0xff excluding 0x8e and 0x8f. A multibyte character in the third codeset consists of len3 bytes starting with the byte 0x8e. A multibyte character in the fourth codeset consists of len4 bytes starting with the byte 0x8f. The wchar_t encoding of EUC multibyte characters is dependent on the len and mask arguments. First, the bytes are moved into a wchar_t as follows: byte0 << ((lenN-1) * 8) | byte1 << ((lenN-2) * 8) | ... | bytelenN-1 The result is then ANDed with ~mask and ORed with maskN. Codesets 2 and 3 are special in that the leading byte (0x8e or 0x8f) is first removed and the lenN argument is reduced by 1. For example, the ja_JP.eucJP locale has the following VARIABLE line: VARIABLE 1 0x0000 2 0x8080 2 0x0080 3 0x8000 0x8080 Codeset 1 consists of the values 0x0000 - 0x007f. Codeset 2 consists of the values who have the bits 0x8080 set. Codeset 3 consists of the values 0x0080 - 0x00ff. Codeset 4 consists of the values 0x8000 - 0xff7f excluding the values which have the 0x0080 bit set. Notice that the global mask is set to 0x8080, this implies that from those 2 bits the codeset can be determined. SEE ALSO
mklocale(1), setlocale(3) BSD
November 8, 2003 BSD

Check Out this Related Man Page

eucset(1)						      General Commands Manual							 eucset(1)

NAME
eucset - Sets and gets EUC code widths for the terminal SYNOPSIS
eucset [cswidth] eucset -p OPTIONS
Displays the current settings of the EUC character widths for the terminal DESCRIPTION
The eucset command sets or gets the encoding and display widths of the Extended UNIX Code (EUC) characters processed by the current input terminal. EUC is an encoding method for code sets composed of single or multiple bytes. It permits applications and the terminal hardware to use the 7-bit US ASCII code and up to three single- or multibyte code sets simultaneously. If you use the eucset command to set EUC character widths, but do not specify the cswidth argument, 7-bit U.S. ASCII is applied as a default code set. You must use the command to specify any other EUC code sets, whether they are single-byte or multibyte. EUC Code Set Classes EUC divides code sets into four classes. Each code set class has two characteristics: the number of bytes for encoding the characters in the class, and the number of display columns to display the characters in the class. All characters within a class possess the same char- acteristics. Class 0 consists of all 7-bit, single-byte ASCII characters. The most-significant bit of each of these characters is 0 (zero). Characters in class 0 require one byte for encoding, and occupy one display column. These values are fixed for class 0 (zero). The 7-bit US ASCII code is the primary EUC code set, which is available to users without direct specification. A class 1 code set is a supplementary EUC code set. Class 1 characters have an initial byte whose most-significant bit is 1. If character classes 2 or 3 are to be used, this initial byte must not be the SS2 or SS3 character, as these designate character classes 2 and 3. Char- acters in class 1 may require more than 1 byte for encoding, and may require more than 1 display column. The eucset command must be used to set the characteristics for code set class 1. Class 2 and 3 code sets are supplementary EUC code sets. Characters in these classes have an initial byte of SS2 or SS3, respectively. They require more than 1 byte for encoding, and may require more than 1 display column. The eucset command must be used to set the charac- teristics for code set classes 2 and 3. The cswidth argument in the eucset command line is a character string that describes the character widths for code set classes 1 through 3. The string is of the following format: X1[:Y1], X2[:Y2], X3[:Y3] The value X1 is the number of bytes required to encode a character in code set class 1. Y1 is the number of display columns needed to dis- play characters in this class. X2 is the number of bytes required to encode a character in code set class 2, not counting the SS2 byte, and Y2 is the number of display columns for code set class 2 characters. X3 is the number of bytes needed to encode characters in code set class 3, not counting the SS3 byte, and Y3 is the number of display columns required for these characters. The values for the column widths can be omitted if they are equal to the number of encoding bytes. If the encoding value of any of the EUC code sets is set to 0 (zero), this indicates that the code set does not exist. If no cswidth argument is supplied, the eucset command uses the value of the CSWIDTH environment variable. If this variable is not present, the default string 1:1,0:0,0:0 is substituted. This default string designates that the environment uses a single-byte EUC code set that has characters in the EUC code set class 1 format. If the environment uses a multibyte EUC code set in the code set class 1 format, single- or multibyte EUC code sets in the code set class 2 or 3 format, or both, the default setting cannot be used. DIAGNOSTICS
Your standard input is not an interactive terminal. The maximum character width of 8 was exceeded. EXAMPLES
To display the encoding and display widths for the EUC code set classes 1-3 in your environment, enter: eucset -p To change the current settings of the encoding and display widths for the EUC characters in code set classes 1 and 2 to 2 bytes each, enter: eucset 2:2,2:2,0:0 or eucset 2,2,0 SEE ALSO
Interfaces: eucioctl(7) eucset(1)
Man Page