Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

localedef(4) [hpux man page]

localedef(4)						     Kernel Interfaces Manual						      localedef(4)

NAME
localedef - format and semantics of locale definition file DESCRIPTION
This is a description of the syntax and meaning of the locale definition that is provided as input to the command to create a locale (see localedef(1M)). The following is a list of category tags, keywords and subsequent expressions which are recognized by The order of keywords within a cate- gory is irrelevant with the exception of the keyword and other exceptions noted under the description. (Note that, as a convention, the category tags are composed of uppercase characters, while the keywords are composed of lowercase characters). Category Tags and Keywords The following keywords do not belong to any category and should appear in the beginning of the locale definition file: Single character indicating the character to be interpreted as starting a comment line within the locale definition file. This character should be in the first column of a comment line. The default comment_char is All lines with a comment_char in the first column are ignored. A single character indicating the character to be interpreted as an escape character within the script. The default escape_char is escape_char is used to escape localedef metacharacters to remove special meaning and in the character constant decimal, octal, and hexadecimal formats. It is also used to continue a line onto the next, if escape_char is the last character on the line (before the new-line charac- ter). The following keywords can be used in any category: A string naming another valid locale available on the system. This causes the category in the locale being created to be a copy of the same category in the named locale. Since the key- word defines the entire category, if used, it must be the only keyword in the category. The following six categories are recognized: This category defines character classification, case conversion and other character attributes. The following predefined character classifications are recognized: Character codes classified as uppercase letters. Characters specified in the or classifications cannot be specified in this category. Character codes classified as lowercase letters. Same restrictions applicable to the category apply to this classification. Character codes classified as numeric. Only ten characters in contiguous ascending sequence by numerical value can be specified. Alternative digits cannot be specified here. Character codes classified as white-space. No character specified for the or categories can be included in this classification. Character codes classified as punctuation characters. No character included in the or categories can be specified. Character codes classified as control characters. No character included in the or can be included here. Character codes classified as blank characters. The <space> and <tab> characters are automatically included. Character codes classified as hexadecimal digits. Only the characters defined for the class can be specified, followed by one or more sets of six characters, with each set in ascending order. Character codes classified as letters. Characters classified as or cannot be specified. Characters specified as and classes are automatically included in this class. Character codes classified as printable characters. Characters specified for and classes and the <space> character are automatically included. No character from the category can be specified. Character codes classified as printable characters, except the <space> character. In all other respect this classification is similar to the category. The following two are special classifications, used to designate valid first-of-two and second-of-two Note that these are byte clas- sifications and not character classifications; hence, they cannot be used with the iswctype interface (see wctype(3C)), in the same manner as the other classifications can be used. Valid first bytes of two-byte characters. Valid second bytes of two-byte characters. Character case conversion definitions: Lowercase to uppercase character relationships. Uppercase to lowercase character relationships. Miscellaneous character attribute and classifications: String mapped into the ASCII equivalent string ``b!"#$%&'()*+,-./:;<=>?@[]^_`{}~'', where b is a blank (a langinfo(5) item). Defines one or more locale-specific character class names as strings separated by semicolons. Each named character class can then be defined subsequently in the definition. The first character of a character class name must be a letter and the class name cannot match any of the prede- fined classifications (for example, String operand indicates text direction (a langinfo(5) item). String operand "1" indicates right-to-left text direction. String operand indicates character context analysis. String "1" indicates Arabic context analysis is required. The category provides collation sequence definition for relative ordering between collating elements (single and multi-character collat- ing elements) in the locale. The following keywords belong to this category and should come between the category tag and The first two keywords can be in any order, but must come before the keyword. Any number of the first two keywords can be specified. Defines a multi-character collating element, symbol, composed of the characters in string. String is limited to two characters. Makes symbol a collating symbol which can be used to define a place in the collating sequence. Symbol does not repre- sent any actual character. Denotes the start of the collation sequence. The directives have an effect on string collation. The lines following the keyword and before the keyword contain collating element entries, one per line. Operands can optionally appear after the keyword to defined rules for string comparison using a multiple-weight scheme (if no operands are specified, a single operand is assumed). The possible operands are: Specifies that comparison operations proceed from start of string towards the end of it. Specifies that comparison operations proceed from end of string towards the beginning of it. Marks the end of the list of collating element entries. The category defines the rules and symbols used to format monetary numeric information. The following keywords belong to this category and should come between the category tag and The operand is a four-character string used to designate the international currency symbol. The first three characters should contain the alphabetic international currency symbol in accor- dance with those specified in the ISO 4217 standard. The fourth character is the character used to separate the international currency symbol from the monetary quantity. The operand is a string used as the local currency symbol. The operand is a string containing the symbol used as the decimal delimiter (radix character). The operand is a string containing the symbol used as a separator for groups of digits to the left of decimal delimiter. The operand is a semicolon-separated list of integers. The initial integer defines the size of the group immediately preceding the decimal delimiter, and the following integers define the preceding groups. If the last integer is not -1, then the size of the previous group (if any) will be repeatedly used for the remainder of the digits. If the last integer is -1, then no further grouping will be performed. The operand is a string to indicate a non-negative monetary quantity. The operand is a string to indicate a negative monetary quantity. The operand is an integer representing the number of fractional digits used in formatted monetary values using The operand is an integer representing the number of fractional digits used in formatted monetary values using The operand is an integer which if set to 1 indicates the precedes a monetary quantity, and if set to 0 the symbol succeeds the value. The operand is an integer which indicates the separation of the the sign string, and the value for a non-negative formatted monetary quantity. The value of and are interpreted according to the following: No space separates the currency symbol and value. If the currency symbol and sign string are adjacent, a space separates them from the value; otherwise, a space separates the currency symbol from the value. If the currency symbol and sign string are adjacent, a space separates them; otherwise, a space separates the sign string from the value. The operand is an integer which if set to 1 indicates the precedes a negative monetary quantity, and if set to 0 the symbol succeeds the negative value. The operand is an integer which indicates the separation of the the sign string, and the value for a negative formatted monetary quantity. The operand is an integer which indicates the positioning of the for a positive monetary quantity. The possible values are: Parenthesis surround the quantity and the or The sign string precedes the quantity and the or The sign string succeeds the quantity and the or The sign string precedes the or The sign string succeeds the or The operand is an integer set to a value indicating the positioning of the negative_sign for a negative formatted monetary quantity. The operand is an integer which if set to 1 indicates the precedes a monetary quantity, and if set to 0 the symbol succeeds the value. The operand is an integer which indicates the separation of the the sign string, and the value for a non-negative internationally formatted monetary quantity. The operand is an integer which if set to 1 indicates the precedes a negative monetary quantity, and if set to 0 the symbol succeeds the negative value. The operand is an integer which indicates the separation of the the sign string, and the value for a negative internationally formatted monetary quantity. The operand is an integer which indicates the positioning of the for a positive monetary quantity formatted with the international format. The operand is an integer which indicates the positioning of the for a negative monetary quantity formatted with the international format. The category defines rules and symbols used to format non-monetary numeric information. The following keywords belong to this category and should come between the category tag and The operand is a string containing the symbol used as the decimal delimiter (radix character) in numeric, non-monetary formatted quantities. This keyword cannot be omitted and cannot be set to the empty string. The operand is a string containing the symbol used as a separator for groups of digits to the left of the decimal delimiter. The operand is a semicolon-separated list of integers. The initial integer defines the size of the group immediately preceding the decimal delimiter, and the following integers define the preceding groups. If the last integer is not -1, then the size of the previous group (if any) will be repeatedly used for the remainder of the digits. If the last integer is -1, then no further grouping will be performed. String mapped into the ASCII equivalent string "", where b is a blank (a langinfo(5) item). The keyword is an HP extension to the POSIX stan- dards and it has a different meaning than the defined in POSIX standards. The category defines the rules for generating locale-specific formatted date strings. The following mandatory keywords belong to this category and should come between the category tag and Seven semicolon-separated strings giving abbreviated names for the days of the week beginning with Sunday. Seven semicolon-separated strings giving full names for the days of the week beginning with Sunday. Twelve semicolon-separated strings giving abbreviated names for the months, beginning with January. Twelve semicolon-separated strings giving full names for the months, beginning with January. The operand is a string defining the appropriate date and time representation. The operand is a string defining the appropriate date representation. The operand is a string defining the appropriate time representation. The operand is two semicolon-separated strings giving the representations for and The operand is a string defining the appropriate time representation in the 12-hour clock format with The operand is a semi-colon-separated list of strings. Each string defines the name and date of an era or emperor for a locale. Each string should conform to the following format: direction:offset:start_date:end_date:name:format where: direction Either a or character. The character indicates the time axis should be such that the years count in the positive direction when moving from the starting date towards the ending date. The char- acter indicates the time axis should be such that the years count in the negative direction when moving from the starting date towards the ending date. offset A number in the range indicating the number of the first year of the era. start_date A date in the form where yyyy, mm, and dd are the year, month and day numbers, respectively, of the start of the era. Years prior to the year 0 A.D. are represented as negative numbers. For example, an era beginning March 5th in the year 100 B.C. would be represented as Years in the range are supported. end_date The ending date of the era in the same form as the start_date above or one of the two special values or A value of indicates the ending date of the era extends to the beginning of time while indicates it extends to the end of time. The ending date can be chronologically either before or after the starting date of an era. For example, the expressions for the Christian eras A.D. and B.C. would be: name A string representing the name of the era which is substituted for the directive of and (see date(1) and strftime(3C)). format A string for formatting the directive of and This string is usually a function of the and direc- tives. If format is not specified, the string specified for the category keyword (see below) is used as a default. The operand is a string defining the format of date in era notation. The operand is a string defining the format of time in era notation. The operand is a string defining the format of date and time in era notation. The operand is a semi-colon-separated list of strings. The first string is the alternative symbol corresponding to zero, the second string is the alternative symbol corresponding to one, and so on. Note that if the HP-UX-proprietary keyword has been specified in the same locale, the first ten symbols should be identical for these two keywords. In addition to the above, the following HP-UX-proprietary keywords are recognized (these are provided for backward compatibility and their use is otherwise not recommended): The category defines the format and values for affirmative and negative responses. The following keywords belong to this category and should come between the category tag and The string operand is an Extended Regular Expression matching acceptable affirmative responses to yes/no queries. The string operand is an Extended Regular Expression matching acceptable negative responses to yes/no queries. The string operand identifies the affirmative response for yes/no questions. This keyword is now obsolete and should be used instead. The string operand identifies the negative response for yes/no questions This keyword is now obsolete and should be used instead. Keyword Operands Keyword operands consist of character-code constants and symbols, strings, and metacharacters. The types of legal expressions are: and operands consist of single character-code constants or symbolic names separated by semicolons, or a character-code range consisting of a constant or symbolic name followed by an ellipsis fol- lowed by another constant or symbolic name. The constant preceding the ellipsis must have a smaller code value than the constant following the ellipsis. A range represents a set of consecutive character codes. If the list is longer than a single line, the escape character must be used at the end of each line as a continuation character. It is an error to use any symbolic name that is not defined in an accompanying charmap file (see charmap(4)). operands consist of strings separated by semicolons. If longer than one line, the escape character must be used for continuation. operands consist of a sequence of zero or more characters surrounded by double quotes ("). Within a string, the double-quote character must be preceded by an escape character. The following escape sequences also can be used: newline horizontal tab backspace carriage return form feed backslash single quote bit pattern The escape consists of the escape character followed by 1, 2, or 3 octal digits specifying the value of the desired character (for other possible bit pattern specification, see below). Also, an escape character () and an immediately-following newline are ignored. Although the backslash () has been used for illustration, another escape character can be substituted by the keyword. Constants represent character codes in the operands. They can be used in the following forms: decimal constants An escape character followed by a followed by up to three decimal digits. octal constants An escape character followed by up to three octal digits. hexadecimal constants An escape character followed by a followed by two hexadecimal digits. Unicode constants An escape character followed by a followed by four to eight hexadecimal digits which specifies a Unicode scalar value in a charmap file to be used with the option of the command. character constants A single character (for example, A) having the numerical value of the character in the machine's character set. symbolic names A string enclosed between and is a symbolic name. input files are recommended to be written entirely in symbolic names, utilizing a user defined or system-supplied charmap file. This aids portability of input files between different encoded character sets (see charmap(4)). Symbolic names can be defined within a locale definition file by the and keywords. These are not character constants. It is an error if such an internally defined symbolic name collides with one defined in a charmap file. operands consists of one or more decimal digits separated by semicolons. operands follow keywords and and must consist of two character-code constants enclosed by left and right parentheses and separated by a comma. Each such character pair is separated from the next by a semicolon. For the first constant represents an uppercase char- acter and the second the corresponding lowercase character. For the first constant represents an lowercase character and the second the corresponding uppercase character. The keyword is followed by collating element entries, one per line, in ascending order by collating position. The collating element entries have the form: collation_element can be a character, a collating symbol enclosed in angle brackets representing a character or collating element, the special symbol or an ellipsis A character stands for itself; a collating symbol can be a symbolic name for a character that is interpreted by the charmap file, a multi-character collating element defined by a keyword, or a collating symbol defined by the The special symbol specifies the collating position of any characters not explicitly defined by collating element entries. For example, if some group of characters is to be omitted from the collation sequence and just collate after all defined characters, a collating symbol might be defined before the keyword: Then somewhere in the list of collating element entries: Notice that there is no second weight. This means that on a second pass all characters collate by their encoded value. An ellipsis is interpreted as a list of characters with an encoded value higher than that of the character on the preced- ing line and lower than that on the following line. Because it is tied to encoded value of characters, the ellipsis is inherently non-portable. If it is used, a warning is issued and no output generated unless the option was given. The weight operands provide information about how the collating element is to be collated on first and subsequent passes. Weight can be a two-character string, the special symbol or a collating element of any of the forms specified for collat- ing_element except If there are no weights, the character is collating strictly by its position in the list. If there is only one weight given, the character sorts by its relative position in the list on the second collation pass. An equivalence class is defined by a series of collating element entries all having the same character or symbol in the first weight position. For example, in many locales all forms of the character 'A' collate equal on the first pass. This is represented in the collating element entries as: Two-to-one collating elements are specified by collating-elements defined before the keyword. For example, the two-to-one collating element in Spanish, would be defined before the keyword as It would then be used in a collating element entry as A one-to-two collating element is defined by having a two-character string in one of the weight positions. For example, if the character collates equal to the pair "AE", the collating element entry would be: A don't-care character is defined by the special symbol For example, the dash character, may be a don't care on the first collation pass. The collating element entry is: Symbols defined by the keyword can be used to indicate that a given character collates higher or lower than some position in the sequence. For example if all characters with an encoded value less than that of are to collate lower than all other characters on the first pass, and in relative order on the second pass, define a collating symbol before the key- word: The first two collating element entries are then: This also illustrates the use of the ellipsis to indicate a range. The first ellipsis is interpreted as "all characters in the encoded character set with a value lower than '0'"; the second ellipsis means that all characters in the range defined by the first collate in relative order. operands conform to the Extended Regular Expressions specifications as described in regexp(5). Metacharacters Metacharacters are characters having a special meaning to localedef in operands. To escape the special meaning of these characters, sur- round them with single quotes or precede them by an escape character. localedef meta-characters include: Indicates the beginning of a symbolic name. Indicates the end of a symbolic name. Indicates the beginning of a character shift pair following the and keywords. Indicates the end of a character shift pair. Used to separate the characters of a character shift pair. Used to quote strings. Used as a separator in list operands. escape character Used to escape special meaning from other metacharacters and itself. It is backslash () by default, but can be redefined by the keyword. Comments Comments are lines beginning with a comment character. The comment character is pound sign (#) by default, but can be redefined by the keyword. Comments and blank lines are ignored. Separators Separator characters include blanks and tabs. Any number of separators can be used to delimit the keywords, metacharacters, constants and strings that comprise a localedef script except that all characters between and are considered to be part of the symbolic name even they are <blank>s. EXAMPLES
Please see the files under for examples of locale description files. These files were used to create the various locales which are deliv- ered with HP-UX. localedef(4)
Man Page