Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

wnn_automaton(4) [sunos man page]

wnn_automaton(4)						   File Formats 						  wnn_automaton(4)

NAME
wnn_automaton - Automaton DESCRIPTION
Automaton performs xjsi and uum roman character-Kana conversion by referring to the entries mapped in a table (called a conversion table). Automaton can replace tables to enable versatile conversion. Automaton performs three conversions in series according to conversion tables (in the order of preprocessing, main processing, and post- procesing) and outputs the final results. Processing is handled according to conversion tables for each of the three conversions. Atomaton also has a mode function. The mode can be switched to dynamically change the combinations of the three processing stages. Setting the mode and the switchover codes is also performed using conversion tables. Because the conversion tables are text files, they can be replaced easily. You can use a backspace to return to the previous status after a conversion has been completed and before the next conversion is completed. Although xjsi performs roman character-Kana conversion only between uppercase alphbets and Hiragana in the main processing stage, the pre- processing and postprocessing stages can handle various types of inputs and outputs. For example, the preprocessing can convert lowercase alphabets to uppercase alphabets. The postprocessing stage can convert Hiragana to Katakana or Hiragana to half-width Katakana. Automaton proceeds with the operation as follows. 1. Input. Upper/lowercase of alphabet (half-width) 2. Preprocessing. Converts lower case characters to uppercase characters. 3. Main processing. Converts uppercase alphabets to Hiragana according to the conversion table. 4. Postprocessing. Converts Hiragana to Katakana or half-width Kana as required. 5. Output Conversion Tables Automaton uses the following conversion tables. o Mode definition table Declares the mode and the correspondence tables to use. The file name is mode. o Correspondence tables o Preprocessing tables The correspondence table used for preprocessing. The file name begins with "1". o Main processing tables The correspondence table used for main processing. The file name begins with "2". o Postprocessing tables The correspondence table used for postprocessing The file name begins with "3". The mode definition table contains the mode declaration, the correspondence tables to be used for each mode, and table usage rules for them. The correspondence tables contain lists of corresponding input codes and output codes. The correspondence tables are separated into those for preprocessing, main processing and postprocessing and any number of correspondence tables can be used for each processing. xjsi searches for the mode definition table in the following order. 1. Specification by setrkfile entry in the xjsi initialization file uumrc 2. File name /usr/lib/locale/ja/wnn/ja/rk/mode The following table entires can be used in the following mode definition table and correspondence tables. o ... Indicates repeating 0 or more times. o ...... Indicates one or more times. o [] Indicates optional. Mode Definition Table The mode definition table contains the mode declaration, the correspondence tables to be used for each mode, the determination standards for them, and the mode display text strings. The mode definition table consists of the following items 1,2,3, and 4. The remainder of a line is treated as a comment if a semicolon (;) appears at the beginning of the line or follows leading space(s) (including tabs) unless the semicolon is escaped. 1. Special path specificTheofollowings are special strings to represent path names. @HOME Indicates the environment variable HOME @MODEDIR Indicates the directory containing the mode definition table. @LIBDIR /usr/lib/locale/ja/wnn/ ~user If user is a user name, it indicates the user's home directory. ~ Indicates your home directory. 2. Mode Declaration The mode is declared as follows: defmode mode_name specifies alphanumerics. [on|off] Specifies the initial status. The default is off. [r|nr] is a flag to allow Automaton to recognize roman character-Kana conversion mode. In a mode in which r is set, it is used regardless of the current mode when converting Hiragana by F6 key, etc. The default is nr. The mode declaration is made before the mode is used. 3. Search SpecificationsSearchospecificationsaaresmade for correspondence tables using the following format. search diSpecify.the directory name(s) to be searched when the correspondence tables specified in the mode defini- tion table are not in the same directory as the mode definition table. Multiple directory names can be specified by separating them with spaces. The search directory name must be specified before the corre- spondence tables. path direOverrides any directory names previously stored to search for correspondence tables and specifies to search the directory name(s) specified as the argument. Multiple directory names can be specified by sepa- rating them with spaces. The path directory name must be specified before the correspondence tables. 4. Specifications for CoThereoarenthreebwaysaforMthe specifications.rings (1) Correspondence table file names or mode display text string (2) if Conditional_expression. Correspondence_table_specifications or Mode_display_text_string (3) when Conditional_expression. Correspondence_table_specifications or Mode_display_text_string File names for correspondence tables must begin with (1), (2) or (3). Path names can also be specified. Mode dis- play text strings are text strings enclosed with quotation marks used to display the current mode. (a) "string" Indicates the mode display text string when conversion is ON. (b) (on_dispmode "string") Indicates the mode display text string when conversion is ON. (c) (off_dispmode "string") Indicates the mode display text string when conversion is OFF. (d) (on_unchg) Indicates the same mode display text string as was used before the mode was changed be used when conver- sion is ON. (e) (off_unchg) Indicates the same mode display text string as was used before the mode was changed be used when conver- sion is OFF. This text string is used by xjsi to display the mode. (2) and (3) are used to change the correspondence table depending on specified conditions. If the condition in the if statement in (2) is true, then the specification in the if statement is referenced and the specification follow- ing the if statement is not referenced. If the condition is false, the if statement is exited and the specifica- tion following the if statement is referenced. If the condition in the when statement in (3) is true, the specification in the statement is referenced. The speci- fication following the when statement, however, is referenced regardless of whether or not the condition is true or false. (2) or (3) can be used recursively to specify correspondence tables. Any one of the following can be used for the conditional statement. +----------------------------+---------------------------------+ |mode_name | True when the mode is ON | +----------------------------+---------------------------------+ |(and conditional_statement | True when both of the condi- | |conditional_statement) | tional statements are true | +----------------------------+---------------------------------+ |(or conditional_statement | True when either of the condi- | |conditional_statement) | tional statements is true | +----------------------------+---------------------------------+ |(not conditional_statement) | True when the conditional | | | statement is false | +----------------------------+---------------------------------+ |(false) | Always false | +----------------------------+---------------------------------+ |(true) | Always true | +----------------------------+---------------------------------+ For example, when (defmode kana) and (defmode romajikana) are both in the mode definition, (and kana romajikana) is true when both modes are ON. Where conditional statements are represented by @, #, and *, and conversion table names are represented by A, B, and C, assume the follow- ing statement. (when @ A (if # B ) C ) (if * D ) E Also assume that conditional statements @, #, and * have been met. Examine the statement from the beginning. First comes (when @ A (if # B) C). Because @ has been met, "A (if # B) C" is examined and table A is selected. Next comes (if # B ) and # has been met so table B is selected. Because this is an if statement and the conditional statements have been met, the rest of the current series "A (if # B)C" need not be examined. Although this ends examination of "A (if # B)C," this series is contained in a when statement, so the remainder of "(when @ A (if # B )C) (if * D) E" is examined. The next portion is (if * D). Table D is selected because the condition statement * has been met. Because this is an if statement, the rest of "(when @ A (if # B ) C ) (if * D ) E" is not examined. As a result, tables A, B, and D are selected. Next we'll use the mode definition tables used by xjsi as an example. Three modes are defined in the mode definition table. There are specifications for correspondence table and mode display text string to be used from 2A_CTRL to the end. This table is referenced each time the mode changes and the tables to be used are selected as described above. (defmode romkan) (defmode katakana) (defmode zenkaku) 2A_CTRL (if romkan 1B_TOUPPER 2B_ROMKANA 2B_JIS (if (not katakana) "[Ar]") (if zenkaku 3B_KATAKANA "[Ar]") 3B_HANKATA "[AIr]") ; "A" and "I" are half-width Katakana. 2B_DAKUTEN (if (not katakana) 1B_ZENHIRA (if zenkaku 3B_ZENKAKU "[A ]") "[AA]") (if zenkaku 1B_ZENKATA 3B_ZENKAKU "[A ]") "[AIA]" ; "A" and "I" are half-width Katakana. Initially romkan , katakana, and zenkaku are all OFF. 2A_CTRL is selected as the table at this point. Because romkan is OFF, the follow- ing if statement is not referenced, and 2B_DAKUTEN is selected. The conditional statement for the next if statement, (not katakana), is true because katakana is OFF. The inside of the if statement is referenced and 1B_ZENHIRA is selected. Next the if statement inside the if statement is referenced. Because zenkaku is OFF, the conditional statement is false. The if statement is thus not referenced. Next the mode display text string "[A[hiragana-A]]" is selected and the rest of the conversion table series is not examined. Correspondence Tables The correspondence tables contain the conversion data (input codes and corresponding output codes) for preprocessing, main processing, and postprocessing. Preprocessing and postprocessing play supplemental roles for main processing. The following restrictions thus apply to preprocessing and postprocessing correspondence tables. Preprocessing TaItem (2), below, is not possible. Also, there can only be one form for each input and output code in item (1) that results in a character when evaluated. The buffer remainder cannot be entered. Postprocessing TItem (2), below, is not possible. Also, there can only be one form for each input code in item (1) that results in a char- acter when evaluated. The buffer remainder cannot be entered. All lines in the correspondence table must contain one of the following items (1) to (3) or must be empty. Lines of this form are repeated to form the correspondence table. (1) input_code [output_code [buffer_remainder]] (2) input_code function (3) Variable declaration Each entry must occupy no more than one line. The remainder of a line is treated as a comment if a semicolon (;) appears at the beginning of the line or follows leading space(s) (including tabs) unless the semicolon is escaped. The output code or buffer remainder will be treated as a null string if omitted. Input codes, output codes, and buffer remainders must contain strings of the following without intervening spaces: forms that evaluate to characters and forms that evaluate to text strings. Forms are considered to evaluate to characters or text strings if the form is replaced by the character or text string. The following types of forms evalutate to characters. (1) CharaCharactertnotations are shown below (these differ from character notations treated as forms that evaluate to text strings). CharacterCharacters excluding "(", ")", "'", """, "", ";", and space characters 'CharacteCharacters excluding "'", "", and "^" characters. '^CharactRepresents control characters. The characters can be ASCII code 32 to 126. ^? indicates a DEL code. 'CharactCharacters do not include numerics, "o", "d", and "x". " ", " ", "", " ", "f" indicates the characters same as escaped codes in the C language. "e", and "E" indicate escape characters. The other characters are literally inter- preted. 'octal cIndicates'a character corresponding to the specified octal code. 'ooctal Indicates.a character corresponding to the specified octal code. 'ddecimaIndicates.a.character corresponding to the specified decimal code. 'xhexadeIndicatesea.character corresponding to the specified hexadecimal code. (2) Function Name with Form that Evaluates to a Character +---------------------+-------------------------------------+ |Function name | Description | |toupper | If the augument is a lowercase | | | alphabet ASCII character, it is | | | changed to an uppercase character. | | | Example: (toupper a) -> A | |tolower | If the augument is an uppercase | | | alphabet ASCII character, it is | | | changed to a lowercase character. | | | Example: (tolower A) -> a | |toupdown | The case of the alphabet ASCII | | | character is changed from upper to | | | lower or from lower to upper. | | | Example: (toupdown a) -> A (toup- | | | down A) -> a | |tozenalpha | If the argument is an ASCII char- | | | acter, it is converted to a full- | | | width Japanese roman character. | | | Example: (tozenalpha A) -> A | |tohira | If the argument is full-width | | | Katakana, it is converted to Hira- | | | gana. Example: (tohira A) -> A | |tokata | If the argument is Hiragana, it is | | | converted to full-width Katakana. | | | Example: (tokata A) -> A | |tozenhira | If the argument is half-width | | | Katakana, it is converted to Hira- | | | gana. Example: (tozenhira A) -> A | | | ; "A" is half-width Katakana. | |tozenkata | If the argument is half-width | | | Katakana, it is converted to full- | | | width Katakana. Example: | | | (tozenkata A) -> A ; "A" is | | | half-width Katakana. | |value | Converts a character code to its | | | actual numeric value. Example: | | | value 0 -> 'x0' value A -> 'xA' | | | value F -> 'xf' | +---------------------+-------------------------------------+ (3) Function Name with Two Forms that Evaluate to Characters +------------------+----------------------------------------+ |Function name |Description | +------------------+----------------------------------------+ |+ |Finds the sum of the arguments. Exam- | | |ple: (+ A 'd256') -> A | | | (+ 0 (value 3)) -> 3 | +------------------+----------------------------------------+ |- |Finds the difference of the arguments. | +------------------+----------------------------------------+ |* |Finds the product of the arguments. | +------------------+----------------------------------------+ |/ |Finds the quotient of the arguments. | +------------------+----------------------------------------+ (4) VariaVariablesnames are any alphanumeric text strings beginning with an alphabet that do not correspond to function names, functions, and declarations (defvar). Where, an underscore '_' is considered as an alphabet. The following types of forms evalutate to characters. (1) "CharCharacteranotations are shown below (these differ from character notations treated as forms that evaluate to text strings). CharacterCharacters excluding """, "^", and "". ^ charactIndicates a control character. It can be an ASCII code 32 to 126 character. ^? indicates a DEL code. CharacteCharacters do not include numerics, "o", "d", and "x". " ", " ", "", " ", "f" indicates the characters same as escaped codes in the C language. octal coIndicates;a character corresponding to the specified octal code. Specify a semicolon (;) if number(s) follow. ooctal cIndicates[a]character corresponding to the specified octal code. Specify a semicolon (;) if number(s) follow. ddecimalIndicates.a[character corresponding to the specified decimal code. Specify a semicolon (;) if number(s) follow. xhexadecIndicates.a.character corresponding to the specified hexadecimal code. Specify a semicolon (;) if number(s) follow. "" indicates an empty character string. (2) Function Name with Form that Evaluates to a Character +--------------------+--------------------------------------+ |Function name | Description | +--------------------+--------------------------------------+ |tohankata | If the augument is full-width Hira- | | | gana or full-width Katakana, it is | | | converted to half-width Katakana. | | | Example: tohankata [hiragana-GA]) to | | | [hankakukana-GA] | +--------------------+--------------------------------------+ |last= | Examines the argument (a form that | | | evaluates to a character) to see if | | | it matches the last character of the | | | last-matched text string. If the | | | character matches, an empty text | | | string is returned. Example: last= | | | A -> [A] last= can only be entered | | | for input codes. | +--------------------+--------------------------------------+ |todigit | Converts the code given as the first | | | argument to a number in the number | | | base code given as the second argu- | | | ment. | +--------------------+--------------------------------------+ |dakuadd | Adds a Dakuten (voiced constant | | | mark) after the argument. | +--------------------+--------------------------------------+ |handakuadd | Adds a Handakuten (semivoiced con- | | | stant mark) after the argument. | +--------------------+--------------------------------------+ (3) FunctThe mode name must bemdefined in the mode definition table. +-------------------+---------------------------------------+ |Function name |Description | |if |If the mode given as the argument is | | |ON, (if mode_name) returns an empty | | |string. Example: (if katakana)VU | | |[katakana-VU] | |unless |If the mode given as the argument is | | |OFF, (unless mode_name) returns an | | |empty string. Example: (unless | | |katakana)VU [hiragana-BU] | |on |Turns ON the mode given as the argu- | | |ment. Example: (on katakana) | |off |Turns OFF the mode given as the argu- | | |ment. Example: (off katakana) | |switch |Switches the state of the mode given | | |as the argument, i.e., ON to OFF or | | |OFF to ON. Example: (switch | | |katakana) | +-------------------+---------------------------------------+ However, if and unless can only be entered for on, off and switch can only be entered for output codes in the main processing ta- ble. (4) FunctThe followingyfunction names can only be entered for output codes in the main processing table. +-----------------------------+-----------------------------+ |Function name |Description | |allon |Turns ON all modes. | |alloff |Turns OFF all modes. | +-----------------------------+-----------------------------+ Precautions on fBecause functions are forms that evaluate to characters or text string, it can be represented as (toupper (tolower Y)). However, if evaluated as follows, a function that evaluates to text string cannot be used as arguments for other functions. (toupper (tohankata [Hiragana KA])) Functions The following functions can be used. These functions can be used independently. (error) An error will be generated if the corresponding input code is received. (restart)The previous mode definition table is read again to reset conversion. If there is an error in the new conversion table, an error message is displayed and the settings in the previous (original) conversion table are used. Variable Declarations (defvar variable_notation (list character_notation......)) list uses its arguments as the variable range. (defvar variable_notation (all)) all uses all the characters as the variable range. (defvar variable_notation (between character_notation1 character_notation2)) between uses characters between character_notation1 and character_notation2 (both inclusive), when sorted in the code order, as the variable range. Variables that can be used as forms that evaluate to characters and the range of the variables is defined. Vari- ables are declared in the table that uses it. Variable notations are given as variable names or as (variable_name... ... ). Character notations are the same as forms that evaluate to characters. The variable definitions are effective on the entire table. The same variable cannot be declared twice by defvar in a table. You can define a variable a1 in a table, and define it in another table with different specifications. Two vari- ables of a1 are processed independently. Variables Variables can be used effectively when the same patterns appear many times in conversions, such as in the follow- ing example. (defvar a1 (list K S T H Y R W G Z D B P)) (a1)(a1) [small tsu] (a1) The above two lines achieve the same conversions as the following lines. Both show methods of handling assimulated sounds (Sokuon) in roman character-Kana conversion. +--------------------------------------------------------------+ |KK [small tsu] K | |SS [small tsu] S | |TT [small tsu] T | |... | |(omitted) | |PP [small tsu] P | +--------------------------------------------------------------+ The variables declared in the variable declaration are processed. (between A E) and (list A B C D E) are the same. Precautions on vVariables to be used must be defined by variable declaration in the table. You can define the variable a1 in two different tables as required and the a1 will be treated as two separate variables. You cannot, however, define the same variable twice in one table. Variable definitions are valid any- where within the table. You can define the variable a1 within two different tables as required and the a1 will be treated as two separate variables. The definitions of variables are effective in the entire table. You cannot, however, define the same variable twice within any one table. A variable always has the same value within a single line in a correspondence table. (defvar a1 (list A B)) (a1)(tolower (a1)) 3 The text strings "Aa" and "Bb" will be converted to "3" in the above example and not to "Ab" and "Ba". Input code is matched with input codes in the tables starting at the left. Thus, when examining input codes from the left in the tables, a variable must not be used where it will be treated as the argument of a function before it is matched to specific characters, such as in the following example. (defvar a1 (list a b)) (toupper (a1))(a1) 3 "Aa" will not be converted to "3", because the argument of (toupper(a1)) is the variable a1, which does not yet have a value. This type of setting is checked when tables are read into the system. In this case, if you make changes as follows, the result will be as expected. (defvar a1 (list a b)) (a1)(toupper (a1)) 3 If "aA" and "bB" are input, they are converted to "3 ". (defvar a1 (list A B)) (a1)(toupper (a1)) 3 If "Aa" and "Bb" are input, they are converted to "3 ". Any variable appearing in the output codes or buffer remainder section must appear in the input code section, i.e., must have been assigned a value when matched to an input code. (defvar a1 (list K S)) (defvar a2 (list a)) (a1)(a1) (a2) (a1) The above programming is not correct because the variable a2 is not matched to an input code, but appears for an output code. Conversion Method by Correspondence Table Preprocessing First, the code that is input is grouped into character units (characters of 2-byte codes are also treated as one character). This is called the input code. In preprocessing, each input code corresponds to one output code. The output code from preprocessing becomes the input code for main processing. Input codes in the preprocessing table currently used are examined in the order from the beginning. When a match is found for the input code, the corresponding output code (i.e., the output code written on the same line as the input code) is output. If there is more than one table specified in the mode definition table, they are examined in the same order as listed in the mode definition table. If no matching input code is found in a table (including when no table is specified), the input code is output unchanged. This is also true for main processing and postprocessing. Main processing In main processing, input code is continuously added to the buffer as long as there is still a chance that a longer match will be found in the input codes in the table (i.e., when some number of characters from the beginning of the current section of input code have already been matched sometwhere in the table). Each time more input code is added to the buffer, comparisons are again done in the order from the beginning of the input codes listed in the main processing table. As long as there is a chance of the input code in the buffer matching with the longest entry in the table (i.e., when some number of characters from the beginning of the current section of input code have already been matched somewhere in the table) a conversion is not finalized and more input code is awaited. The code in the buffer is, however, output as nonfinalized charac- ters to enable displaying and other processing. Codes for input errors and mode changes are also output. These codes are differentiated from normal output codes and do not undergo postprocessing. When the contents of the buffer matches the longest possible input code in the table (if more than one match is made, then the first one in the table is used), the correspond- ing output code is output. If no buffer remainder has been specified, the part of the buffer that was matched is deleted from the buffer. If a buffer remainder was specified, it replaces the portion in the buffer that was matched and the above operation is repeated. If no possibility of a match is found in the table, the first character in the buffer is output unchanged. If the output code for a matched input code is a function that changes the mode (on, off, switch, etc.), the cor- respondence table is changed according to the specifications in the mode definition table. The functions that change the mode should be placed in the tables where they are required regardless of the status of the modes. If a match is found for the input code corresponding to the function (restart), the mode definition table will be reread. However, the same file as the one for the previous mode definition table will be used. This func- tion can be used to change to an edited version of the conversion tables (including the mode definition table) while the Automaton is running without having to stop the Automaton. Postprocessing In postprocessing, more than one output code can be output for one input code as the final output. In all the other ways, postprocessing is the same as preprocessing. In the following example "ls -la (carriage_return)" is output when "Ls" or "LS" is input. Preprocessing table (defvar a1 (list s)) (a1) (toupper (a1)) Main processing table LS "LS -la " Postprocessing table (defvar a1 (all)) (a1) (tolower (a1)) SunOS 5.10 10 Jan 2003 wnn_automaton(4)
Man Page