localedef(4) Kernel Interfaces Manual localedef(4)
NAME
localedef - format and semantics of locale definition file
DESCRIPTION
This is a description of the syntax and meaning of the locale definition that is provided as input to the command to create a locale (see
localedef(1M)).
The following is a list of category tags, keywords and subsequent expressions which are recognized by The order of keywords within a cate-
gory is irrelevant with the exception of the keyword and other exceptions noted under the description. (Note that, as a convention, the
category tags are composed of uppercase characters, while the keywords are composed of lowercase characters).
Category Tags and Keywords
The following keywords do not belong to any category and should appear in the beginning of the locale definition file:
Single character indicating the character
to be interpreted as starting a comment line within the locale definition file. This character should be in the first column
of a comment line. The default comment_char is All lines with a comment_char in the first column are ignored.
A single character indicating the character
to be interpreted as an escape character within the script. The default escape_char is escape_char is used to escape
localedef metacharacters to remove special meaning and in the character constant decimal, octal, and hexadecimal formats. It
is also used to continue a line onto the next, if escape_char is the last character on the line (before the new-line charac-
ter).
The following keywords can be used in any category:
A string naming another valid locale available on the system.
This causes the category in the locale being created to be a copy of the same category in the named locale. Since the key-
word defines the entire category, if used, it must be the only keyword in the category.
The following six categories are recognized:
This category defines character classification, case conversion and other
character attributes. The following predefined character classifications are recognized:
Character codes classified as uppercase letters. Characters specified
in the or classifications cannot be specified in this category.
Character codes classified as lowercase letters. Same restrictions
applicable to the category apply to this classification.
Character codes classified as numeric. Only ten characters in contiguous
ascending sequence by numerical value can be specified. Alternative digits cannot be specified here.
Character codes classified as white-space. No character specified for
the or categories can be included in this classification.
Character codes classified as punctuation characters. No character
included in the or categories can be specified.
Character codes classified as control characters. No character included in
the or can be included here.
Character codes classified as blank characters. The <space> and
<tab> characters are automatically included.
Character codes classified as hexadecimal digits. Only the characters
defined for the class can be specified, followed by one or more sets of six characters, with each set in ascending
order.
Character codes classified as letters. Characters classified as
or cannot be specified. Characters specified as and classes are automatically included in this class.
Character codes classified as printable characters.
Characters specified for and classes and the <space> character are automatically included. No character from the
category can be specified.
Character codes classified as printable characters,
except the <space> character. In all other respect this classification is similar to the category.
The following two are special classifications, used to designate valid first-of-two and second-of-two Note that these are byte clas-
sifications and not character classifications; hence, they cannot be used with the iswctype interface (see wctype(3C)), in the same
manner as the other classifications can be used.
Valid first bytes of two-byte characters.
Valid second bytes of two-byte characters.
Character case conversion definitions:
Lowercase to uppercase character relationships.
Uppercase to lowercase character relationships.
Miscellaneous character attribute and classifications:
String mapped into the ASCII
equivalent string ``b!"#$%&'()*+,-./:;<=>?@[]^_`{}~'', where b is a blank (a langinfo(5) item).
Defines one or more locale-specific character class names as
strings separated by semicolons. Each named character class can then be defined subsequently in the definition.
The first character of a character class name must be a letter and the class name cannot match any of the prede-
fined classifications (for example,
String operand indicates text direction (a
langinfo(5) item). String operand "1" indicates right-to-left text direction.
String operand indicates character context analysis. String "1"
indicates Arabic context analysis is required.
The category provides collation sequence definition for relative ordering between collating elements (single and multi-character collat-
ing elements) in the locale. The following keywords belong to this category and should come between the category tag and The first
two keywords can be in any order, but must come before the keyword. Any number of the first two keywords can be specified.
Defines a multi-character collating element,
symbol, composed of the characters in string. String is limited to two characters.
Makes symbol a collating symbol which can be used to define a place in the collating sequence. Symbol does not repre-
sent any actual character.
Denotes the start of the collation sequence.
The directives have an effect on string collation.
The lines following the keyword and before the keyword contain collating element entries, one per line.
Operands can optionally appear after the keyword to defined rules for string comparison using a multiple-weight
scheme (if no operands are specified, a single operand is assumed). The possible operands are:
Specifies that comparison operations proceed from start of string towards
the end of it.
Specifies that comparison operations proceed from end of string towards
the beginning of it.
Marks the end of the list of collating element entries.
The category defines the rules and symbols used to format monetary numeric information. The following keywords belong to this category
and should come between the category tag and
The operand is a four-character string used to designate the international
currency symbol. The first three characters should contain the alphabetic international currency symbol in accor-
dance with those specified in the ISO 4217 standard. The fourth character is the character used to separate the
international currency symbol from the monetary quantity.
The operand is a string used as the local currency symbol.
The operand is a string containing the symbol used as the decimal
delimiter (radix character).
The operand is a string containing the symbol used as a separator for
groups of digits to the left of decimal delimiter.
The operand is a semicolon-separated list of integers.
The initial integer defines the size of the group immediately preceding the decimal delimiter, and the following
integers define the preceding groups. If the last integer is not -1, then the size of the previous group (if any)
will be repeatedly used for the remainder of the digits. If the last integer is -1, then no further grouping will
be performed.
The operand is a string to indicate a non-negative monetary quantity.
The operand is a string to indicate a negative monetary quantity.
The operand is an integer representing the number of fractional digits
used in formatted monetary values using
The operand is an integer representing the number of fractional digits
used in formatted monetary values using
The operand is an integer which if set to 1 indicates the
precedes a monetary quantity, and if set to 0 the symbol succeeds the value.
The operand is an integer which indicates the separation of the
the sign string, and the value for a non-negative formatted monetary quantity.
The value of and are interpreted according to the following:
No space separates the currency symbol and value.
If the currency symbol and sign string are adjacent, a space separates
them from the value; otherwise, a space separates the currency symbol from the value.
If the currency symbol and sign string are adjacent, a space separates them;
otherwise, a space separates the sign string from the value.
The operand is an integer which if set to 1 indicates the
precedes a negative monetary quantity, and if set to 0 the symbol succeeds the negative value.
The operand is an integer which indicates the separation of the
the sign string, and the value for a negative formatted monetary quantity.
The operand is an integer which indicates the positioning of the
for a positive monetary quantity. The possible values are:
Parenthesis surround the quantity and the
or
The sign string precedes the quantity and the
or
The sign string succeeds the quantity and the
or
The sign string precedes the
or
The sign string succeeds the
or
The operand is an integer set to a value indicating the positioning of
the negative_sign for a negative formatted monetary quantity.
The operand is an integer which if set to 1 indicates the
precedes a monetary quantity, and if set to 0 the symbol succeeds the value.
The operand is an integer which indicates the separation of the
the sign string, and the value for a non-negative internationally formatted monetary quantity.
The operand is an integer which if set to 1 indicates the
precedes a negative monetary quantity, and if set to 0 the symbol succeeds the negative value.
The operand is an integer which indicates the separation of the
the sign string, and the value for a negative internationally formatted monetary quantity.
The operand is an integer which indicates the positioning of the
for a positive monetary quantity formatted with the international format.
The operand is an integer which indicates the positioning of the
for a negative monetary quantity formatted with the international format.
The category defines rules and symbols used to format non-monetary numeric information. The following keywords belong to this category
and should come between the category tag and
The operand is a string containing the symbol used as the decimal
delimiter (radix character) in numeric, non-monetary formatted quantities. This keyword cannot be omitted and
cannot be set to the empty string.
The operand is a string containing the symbol used as a separator
for groups of digits to the left of the decimal delimiter.
The operand is a semicolon-separated list of integers.
The initial integer defines the size of the group immediately preceding the decimal delimiter, and the following
integers define the preceding groups. If the last integer is not -1, then the size of the previous group (if any)
will be repeatedly used for the remainder of the digits. If the last integer is -1, then no further grouping will
be performed.
String mapped into the ASCII
equivalent string "", where b is a blank (a langinfo(5) item). The keyword is an HP extension to the POSIX stan-
dards and it has a different meaning than the defined in POSIX standards.
The category defines the rules for generating locale-specific formatted date strings. The following mandatory keywords belong to this
category and should come between the category tag and
Seven semicolon-separated strings
giving abbreviated names for the days of the week beginning with Sunday.
Seven semicolon-separated strings
giving full names for the days of the week beginning with Sunday.
Twelve semicolon-separated strings giving abbreviated names for the months,
beginning with January.
Twelve semicolon-separated strings giving full names for the months,
beginning with January.
The operand is a string defining the appropriate date and time
representation.
The operand is a string defining the appropriate date
representation.
The operand is a string defining the appropriate time
representation.
The operand is two semicolon-separated strings giving
the representations for and
The operand is a string defining the appropriate time representation
in the 12-hour clock format with
The operand is a semi-colon-separated list of strings. Each string
defines the name and date of an era or emperor for a locale. Each string should conform to the following format:
direction:offset:start_date:end_date:name:format
where:
direction Either a or character. The character indicates the time axis should be such that the years count
in the positive direction when moving from the starting date towards the ending date. The char-
acter indicates the time axis should be such that the years count in the negative direction when
moving from the starting date towards the ending date.
offset A number in the range indicating the number of the first year of the era.
start_date A date in the form where yyyy, mm, and dd are the year, month and day numbers, respectively, of
the start of the era. Years prior to the year 0 A.D. are represented as negative numbers. For
example, an era beginning March 5th in the year 100 B.C. would be represented as Years in the
range are supported.
end_date The ending date of the era in the same form as the start_date above or one of the two special
values or A value of indicates the ending date of the era extends to the beginning of time while
indicates it extends to the end of time. The ending date can be chronologically either before or
after the starting date of an era. For example, the expressions for the Christian eras A.D. and
B.C. would be:
name A string representing the name of the era which is substituted for the directive of and (see
date(1) and strftime(3C)).
format A string for formatting the directive of and This string is usually a function of the and direc-
tives. If format is not specified, the string specified for the category keyword (see below) is
used as a default.
The operand is a string defining the format of date in era notation.
The operand is a string defining the format of time in era notation.
The operand is a string defining the format of date and
time in era notation.
The operand is a semi-colon-separated list of strings. The first
string is the alternative symbol corresponding to zero, the second string is the alternative symbol corresponding
to one, and so on. Note that if the HP-UX-proprietary keyword has been specified in the same locale, the first
ten symbols should be identical for these two keywords.
In addition to the above, the following HP-UX-proprietary keywords are recognized (these are provided for backward compatibility and
their use is otherwise not recommended):
The category defines the format and values for affirmative and negative responses. The following keywords belong to this category and
should come between the category tag and
The string operand is
an Extended Regular Expression matching acceptable affirmative responses to yes/no queries.
The string operand is
an Extended Regular Expression matching acceptable negative responses to yes/no queries.
The string operand identifies the affirmative response for yes/no questions.
This keyword is now obsolete and should be used instead.
The string operand identifies the negative response for yes/no questions
This keyword is now obsolete and should be used instead.
Keyword Operands
Keyword operands consist of character-code constants and symbols, strings, and metacharacters. The types of legal expressions are: and
operands consist of single character-code constants or symbolic names
separated by semicolons, or a character-code range consisting of a constant or symbolic name followed by an ellipsis fol-
lowed by another constant or symbolic name. The constant preceding the ellipsis must have a smaller code value than the
constant following the ellipsis. A range represents a set of consecutive character codes. If the list is longer than a
single line, the escape character must be used at the end of each line as a continuation character. It is an error to use
any symbolic name that is not defined in an accompanying charmap file (see charmap(4)).
operands consist of strings separated by semicolons. If longer than one line, the escape character must be used for continuation.
operands consist of a sequence of zero or more characters
surrounded by double quotes ("). Within a string, the double-quote character must be preceded by an escape character.
The following escape sequences also can be used:
newline
horizontal tab
backspace
carriage return
form feed
backslash
single quote
bit pattern
The escape consists of the escape character followed by 1, 2, or 3 octal digits specifying the value of the
desired character (for other possible bit pattern specification, see below). Also, an escape character () and an
immediately-following newline are ignored.
Although the backslash () has been used for illustration, another escape character can be substituted by the keyword.
Constants represent character codes in the operands.
They can be used in the following forms:
decimal constants An escape character followed by a followed by up to three decimal digits.
octal constants An escape character followed by up to three octal digits.
hexadecimal constants An escape character followed by a followed by two hexadecimal digits.
Unicode constants An escape character followed by a followed by four to eight hexadecimal digits which specifies a
Unicode scalar value in a charmap file to be used with the option of the command.
character constants A single character (for example, A) having the numerical value of the character in the machine's
character set.
symbolic names A string enclosed between and is a symbolic name. input files are recommended to be written
entirely in symbolic names, utilizing a user defined or system-supplied charmap file. This aids
portability of input files between different encoded character sets (see charmap(4)).
Symbolic names can be defined within a locale definition file by the and keywords. These are not
character constants. It is an error if such an internally defined symbolic name collides with one
defined in a charmap file.
operands consists of one or more decimal digits separated by semicolons.
operands follow keywords
and and must consist of two character-code constants enclosed by left and right parentheses and separated by a comma.
Each such character pair is separated from the next by a semicolon. For the first constant represents an uppercase char-
acter and the second the corresponding lowercase character. For the first constant represents an lowercase character and
the second the corresponding uppercase character.
The keyword is followed by collating element entries, one per line, in ascending order by collating position. The collating
element entries have the form:
collation_element can be a character, a collating symbol enclosed in angle brackets representing a character or collating
element, the special symbol or an ellipsis
A character stands for itself; a collating symbol can be a symbolic name for a character that is interpreted by the
charmap file, a multi-character collating element defined by a keyword, or a collating symbol defined by the
The special symbol specifies the collating position of any characters not explicitly defined by collating element entries.
For example, if some group of characters is to be omitted from the collation sequence and just collate after all defined
characters, a collating symbol might be defined before the keyword:
Then somewhere in the list of collating element entries:
Notice that there is no second weight. This means that on a second pass all characters collate by their encoded value.
An ellipsis is interpreted as a list of characters with an encoded value higher than that of the character on the preced-
ing line and lower than that on the following line. Because it is tied to encoded value of characters, the ellipsis is
inherently non-portable. If it is used, a warning is issued and no output generated unless the option was given.
The weight operands provide information about how the collating element is to be collated on first and subsequent passes.
Weight can be a two-character string, the special symbol or a collating element of any of the forms specified for collat-
ing_element except If there are no weights, the character is collating strictly by its position in the list. If there is
only one weight given, the character sorts by its relative position in the list on the second collation pass.
An equivalence class is defined by a series of collating element entries all having the same character or symbol in the
first weight position. For example, in many locales all forms of the character 'A' collate equal on the first pass. This
is represented in the collating element entries as:
Two-to-one collating elements are specified by collating-elements defined before the keyword. For example, the two-to-one
collating element in Spanish, would be defined before the keyword as
It would then be used in a collating element entry as
A one-to-two collating element is defined by having a two-character string in one of the weight positions. For example,
if the character collates equal to the pair "AE", the collating element entry would be:
A don't-care character is defined by the special symbol For example, the dash character, may be a don't care on the first
collation pass. The collating element entry is:
Symbols defined by the keyword can be used to indicate that a given character collates higher or lower than some position
in the sequence. For example if all characters with an encoded value less than that of are to collate lower than all
other characters on the first pass, and in relative order on the second pass, define a collating symbol before the key-
word:
The first two collating element entries are then:
This also illustrates the use of the ellipsis to indicate a range. The first ellipsis is interpreted as "all characters
in the encoded character set with a value lower than '0'"; the second ellipsis means that all characters in the range
defined by the first collate in relative order.
operands conform to
the Extended Regular Expressions specifications as described in regexp(5).
Metacharacters
Metacharacters are characters having a special meaning to localedef in operands. To escape the special meaning of these characters, sur-
round them with single quotes or precede them by an escape character. localedef meta-characters include:
Indicates the beginning of a symbolic name.
Indicates the end of a symbolic name.
Indicates the beginning of a character shift pair following the
and keywords.
Indicates the end of a character shift pair.
Used to separate the characters of a character shift pair.
Used to quote strings.
Used as a separator in list operands.
escape character
Used to escape special meaning from other metacharacters and itself. It is backslash () by default, but can be redefined
by the keyword.
Comments
Comments are lines beginning with a comment character. The comment character is pound sign (#) by default, but can be redefined by the
keyword. Comments and blank lines are ignored.
Separators
Separator characters include blanks and tabs. Any number of separators can be used to delimit the keywords, metacharacters, constants and
strings that comprise a localedef script except that all characters between and are considered to be part of the symbolic name even they
are <blank>s.
EXAMPLES
Please see the files under for examples of locale description files. These files were used to create the various locales which are deliv-
ered with HP-UX.
localedef(4)