Unix/Linux Go Back    


NetBSD 6.1.5 - man page for nls (netbsd section 7)

Linux & Unix Commands - Search Man Pages
Man Page or Keyword Search:   man
Select Man Page Set:       apropos Keyword Search (sections above)


NLS(7)			       BSD Miscellaneous Information Manual			   NLS(7)

NAME
     NLS -- Native Language Support Overview

DESCRIPTION
     Native Language Support (NLS) provides commands for a single worldwide operating system
     base.  An internationalized system has no built-in assumptions or dependencies on language-
     specific or cultural-specific conventions such as:

	   o   Character classifications
	   o   Character comparison rules
	   o   Character collation order
	   o   Numeric and monetary formatting
	   o   Date and time formatting
	   o   Message-text language
	   o   Character sets

     All information pertaining to cultural conventions and language is obtained at program run
     time.

     ``Internationalization'' (often abbreviated ``i18n'') refers to the operation by which sys-
     tem software is developed to support multiple cultural-specific and language-specific con-
     ventions.	This is a generalization process by which the system is untied from calling only
     English strings or other English-specific conventions.  ``Localization'' (often abbreviated
     ``l10n'') refers to the operations by which the user environment is customized to handle its
     input and output appropriate for specific language and cultural conventions.  This is a spe-
     cialization process, by which generic methods already implemented in an internationalized
     system are used in specific ways.	The formal description of cultural conventions for some
     country, together with all associated translations targeted to the native language, is
     called the ``locale''.

     NetBSD provides extensive support to programmers and system developers to enable interna-
     tionalized software to be developed.  NetBSD also supplies a large variety of locales for
     system localization.

   Localization of Information
     All locale information is accessible to programs at run time so that data is processed and
     displayed correctly for specific cultural conventions and language.

     A locale is divided into categories.  A category is a group of language-specific and cul-
     ture-specific conventions as outlined in the list above.  ISO C specifies the following six
     standard categories supported by NetBSD:

     LC_COLLATE     string-collation order information
     LC_CTYPE	    character classification, case conversion, and other character attributes
     LC_MESSAGES    the format for affirmative and negative responses
     LC_MONETARY    rules and symbols for formatting monetary numeric information
     LC_NUMERIC     rules and symbols for formatting nonmonetary numeric information
     LC_TIME	    rules and symbols for formatting time and date information

     Localization of the system is achieved by setting appropriate values in environment vari-
     ables to identify which locale should be used.  The environment variables have the same
     names as their respective locale categories.  Additionally, the LANG, LC_ALL, and NLSPATH
     environment variables are used.  The NLSPATH environment variable specifies a colon-sepa-
     rated list of directory names where the message catalog files of the NLS database are
     located.  The LC_ALL and LANG environment variables also determine the current locale.

     The values of these environment variables contains a string format as:

	     language[_territory][.codeset][@modifier]

     Valid values for the language field come from the ISO639 standard which defines two-charac-
     ter codes for many languages.  Some common language codes are:

     Language Name	Code	   Language Family
     ABKHAZIAN		AB	   IBERO-CAUCASIAN
     AFAN (OROMO)	OM	   HAMITIC
     AFAR		AA	   HAMITIC
     AFRIKAANS		AF	   GERMANIC
     ALBANIAN		SQ	   INDO-EUROPEAN (OTHER)
     AMHARIC		AM	   SEMITIC
     ARABIC		AR	   SEMITIC
     ARMENIAN		HY	   INDO-EUROPEAN (OTHER)
     ASSAMESE		AS	   INDIAN
     AYMARA		AY	   AMERINDIAN
     AZERBAIJANI	AZ	   TURKIC/ALTAIC
     BASHKIR		BA	   TURKIC/ALTAIC
     BASQUE		EU	   BASQUE
     BENGALI		BN	   INDIAN
     BHUTANI		DZ	   ASIAN
     BIHARI		BH	   INDIAN
     BISLAMA		BI
     BRETON		BR	   CELTIC
     BULGARIAN		BG	   SLAVIC
     BURMESE		MY	   ASIAN
     BYELORUSSIAN	BE	   SLAVIC
     CAMBODIAN		KM	   ASIAN
     CATALAN		CA	   ROMANCE
     CHINESE		ZH	   ASIAN
     CORSICAN		CO	   ROMANCE
     CROATIAN		HR	   SLAVIC
     CZECH		CS	   SLAVIC
     DANISH		DA	   GERMANIC
     DUTCH		NL	   GERMANIC
     ENGLISH		EN	   GERMANIC
     ESPERANTO		EO	   INTERNATIONAL AUX.
     ESTONIAN		ET	   FINNO-UGRIC
     FAROESE		FO	   GERMANIC
     FIJI		FJ	   OCEANIC/INDONESIAN
     FINNISH		FI	   FINNO-UGRIC
     FRENCH		FR	   ROMANCE
     FRISIAN		FY	   GERMANIC
     GALICIAN		GL	   ROMANCE
     GEORGIAN		KA	   IBERO-CAUCASIAN
     GERMAN		DE	   GERMANIC
     GREEK		EL	   LATIN/GREEK
     GREENLANDIC	KL	   ESKIMO
     GUARANI		GN	   AMERINDIAN
     GUJARATI		GU	   INDIAN
     HAUSA		HA	   NEGRO-AFRICAN
     HEBREW		HE	   SEMITIC
     HINDI		HI	   INDIAN
     HUNGARIAN		HU	   FINNO-UGRIC
     ICELANDIC		IS	   GERMANIC
     INDONESIAN 	ID	   OCEANIC/INDONESIAN
     INTERLINGUA	IA	   INTERNATIONAL AUX.
     INTERLINGUE	IE	   INTERNATIONAL AUX.
     INUKTITUT		IU
     INUPIAK		IK	   ESKIMO
     IRISH		GA	   CELTIC
     ITALIAN		IT	   ROMANCE
     JAPANESE		JA	   ASIAN
     JAVANESE		JV	   OCEANIC/INDONESIAN
     KANNADA		KN	   DRAVIDIAN
     KASHMIRI		KS	   INDIAN
     KAZAKH		KK	   TURKIC/ALTAIC
     KINYARWANDA	RW	   NEGRO-AFRICAN
     KIRGHIZ		KY	   TURKIC/ALTAIC
     KURUNDI		RN	   NEGRO-AFRICAN
     KOREAN		KO	   ASIAN
     KURDISH		KU	   IRANIAN
     LAOTHIAN		LO	   ASIAN
     LATIN		LA	   LATIN/GREEK
     LATVIAN		LV	   BALTIC
     LINGALA		LN	   NEGRO-AFRICAN
     LITHUANIAN 	LT	   BALTIC
     MACEDONIAN 	MK	   SLAVIC
     MALAGASY		MG	   OCEANIC/INDONESIAN
     MALAY		MS	   OCEANIC/INDONESIAN
     MALAYALAM		ML	   DRAVIDIAN
     MALTESE		MT	   SEMITIC
     MAORI		MI	   OCEANIC/INDONESIAN
     MARATHI		MR	   INDIAN
     MOLDAVIAN		MO	   ROMANCE
     MONGOLIAN		MN
     NAURU		NA
     NEPALI		NE	   INDIAN
     NORWEGIAN		NO	   GERMANIC
     OCCITAN		OC	   ROMANCE
     ORIYA		OR	   INDIAN
     PASHTO		PS	   IRANIAN
     PERSIAN (farsi)	FA	   IRANIAN
     POLISH		PL	   SLAVIC
     PORTUGUESE 	PT	   ROMANCE
     PUNJABI		PA	   INDIAN
     QUECHUA		QU	   AMERINDIAN
     RHAETO-ROMANCE	RM	   ROMANCE
     ROMANIAN		RO	   ROMANCE
     RUSSIAN		RU	   SLAVIC
     SAMOAN		SM	   OCEANIC/INDONESIAN
     SANGHO		SG	   NEGRO-AFRICAN
     SANSKRIT		SA	   INDIAN
     SCOTS GAELIC	GD	   CELTIC
     SERBIAN		SR	   SLAVIC
     SERBO-CROATIAN	SH	   SLAVIC
     SESOTHO		ST	   NEGRO-AFRICAN
     SETSWANA		TN	   NEGRO-AFRICAN
     SHONA		SN	   NEGRO-AFRICAN
     SINDHI		SD	   INDIAN
     SINGHALESE 	SI	   INDIAN
     SISWATI		SS	   NEGRO-AFRICAN
     SLOVAK		SK	   SLAVIC
     SLOVENIAN		SL	   SLAVIC
     SOMALI		SO	   HAMITIC
     SPANISH		ES	   ROMANCE
     SUNDANESE		SU	   OCEANIC/INDONESIAN
     SWAHILI		SW	   NEGRO-AFRICAN
     SWEDISH		SV	   GERMANIC
     TAGALOG		TL	   OCEANIC/INDONESIAN
     TAJIK		TG	   IRANIAN
     TAMIL		TA	   DRAVIDIAN
     TATAR		TT	   TURKIC/ALTAIC
     TELUGU		TE	   DRAVIDIAN
     THAI		TH	   ASIAN
     TIBETAN		BO	   ASIAN
     TIGRINYA		TI	   SEMITIC
     TONGA		TO	   OCEANIC/INDONESIAN
     TSONGA		TS	   NEGRO-AFRICAN
     TURKISH		TR	   TURKIC/ALTAIC
     TURKMEN		TK	   TURKIC/ALTAIC
     TWI		TW	   NEGRO-AFRICAN
     UIGUR		UG
     UKRAINIAN		UK	   SLAVIC
     URDU		UR	   INDIAN
     UZBEK		UZ	   TURKIC/ALTAIC
     VIETNAMESE 	VI	   ASIAN
     VOLAPUK		VO	   INTERNATIONAL AUX.
     WELSH		CY	   CELTIC
     WOLOF		WO	   NEGRO-AFRICAN
     XHOSA		XH	   NEGRO-AFRICAN
     YIDDISH		YI	   GERMANIC
     YORUBA		YO	   NEGRO-AFRICAN
     ZHUANG		ZA
     ZULU		ZU	   NEGRO-AFRICAN

     For example, the locale for the Danish language spoken in Denmark using the ISO 8859-1 char-
     acter set is da_DK.ISO8859-1.  The da stands for the Danish language and the DK stands for
     Denmark.  The short form of da_DK is sufficient to indicate this locale.

     The environment variable settings are queried by their priority level in the following man-
     ner:

     o	 If the LC_ALL environment variable is set, all six categories use the locale it speci-
	 fies.

     o	 If the LC_ALL environment variable is not set, each individual category uses the locale
	 specified by its corresponding environment variable.

     o	 If the LC_ALL environment variable is not set, and a value for a particular LC_* envi-
	 ronment variable is not set, the value of the LANG environment variable specifies the
	 default locale for all categories.  Only the LANG environment variable should be set in
	 /etc/profile, since it makes it most easy for the user to override the system default
	 using the individual LC_* variables.

     o	 If the LC_ALL environment variable is not set, a value for a particular LC_* environment
	 variable is not set, and the value of the LANG environment variable is not set, the
	 locale for that specific category defaults to the C locale.  The C or POSIX locale
	 assumes the ASCII character set and defines information for the six categories.

   Character Sets
     A character is any symbol used for the organization, control, or representation of data.  A
     group of such symbols used to describe a particular language make up a character set.  It is
     the encoding values in a character set that provide the interface between the system and its
     input and output devices.

     The following character sets are supported in NetBSD:

     ASCII	      The American Standard Code for Information Exchange (ASCII) standard speci-
		      fies 128 Roman characters and control codes, encoded in a 7-bit character
		      encoding scheme.

     ISO 8859 family  Industry-standard character sets specified by the ISO/IEC 8859 standard.
		      The standard is divided into 15 numbered parts, with each part specifying
		      broad script similarities.  Examples include Western European, Central
		      European, Arabic, Cyrillic, Hebrew, Greek, and Turkish.  The character sets
		      use an 8-bit character encoding scheme which is compatible with the ASCII
		      character set.

     Unicode	      The Unicode character set is the full set of known abstract characters of
		      all real-world scripts.  It can be used in environments where multiple
		      scripts must be processed simultaneously.  Unicode is compatible with ISO
		      8859-1 (Western European) and ASCII.  Many character encoding schemes are
		      available for Unicode, including UTF-8, UTF-16 and UTF-32.  These encoding
		      schemes are multi-byte encodings.  The UTF-8 encoding scheme uses 8-bit,
		      variable-width encodings which is compatible with ASCII.	The UTF-16 encod-
		      ing scheme uses 16-bit, variable-width encodings.  The UTF-32 encoding
		      scheme using 32-bit, fixed-width encodings.

   Font Sets
     A font set contains the glyphs to be displayed on the screen for a corresponding character
     in a character set.  A display must support a suitable font to display a character set.  If
     suitable fonts are available to the X server, then X clients can include support for differ-
     ent character sets.  xterm(1) includes support for Unicode with UTF-8 encoding.  xfd(1) is
     useful for displaying all the characters in an X font.

     The NetBSD wscons(4) console provides support for loading fonts using the wsfontload(8)
     utility.  Currently, only fonts for the ISO8859-1 family of character sets are supported.

   Internationalization for Programmers
     To facilitate translations of messages into various languages and to make the translated
     messages available to the program based on a user's locale, it is necessary to keep messages
     separate from the programs and provide them in the form of message catalogs that a program
     can access at run time.

     Access to locale information is provided through the setlocale(3) and nl_langinfo(3) inter-
     faces.  See their respective man pages for further information.

     Message source files containing application messages are created by the programmer and con-
     verted to message catalogs.  These catalogs are used by the application to retrieve and dis-
     play messages, as needed.

     NetBSD supports two message catalog interfaces: the X/Open catgets(3) interface and the Uni-
     forum gettext(3) interface.  The catgets(3) interface has the advantage that it belongs to a
     standard which is well supported.	Unfortunately the interface is complicated to use and
     maintenance of the catalogs is difficult.	The implementation also doesn't support different
     character sets.  The gettext(3) interface has not been standardized yet, however it is being
     supported by an increasing number of systems.  It also provides many additional tools which
     make programming and catalog maintenance much easier.

   Support for Multi-byte Encodings
     Some character sets with multi-byte encodings may be difficult to decode, or may contain
     state (i.e., adjacent characters are dependent).  ISO C specifies a set of functions using
     'wide characters' which can handle multi-byte encodings properly.	The behaviour of these
     functions is affected by the LC_CTYPE category of the current locale.

     A wide character is specified in ISO C as being a fixed number of bits wide and is state-
     less.  There are two types for wide characters: wchar_t and wint_t.  wchar_t is a type which
     can contain one wide character and operates like 'char' type does for one character.  wint_t
     can contain one wide character or WEOF (wide EOF).

     There are functions that operate on wchar_t, and substitute for functions operating on
     'char'.  See wmemchr(3) and towlower(3) for details.  There are some additional functions
     that operate on wchar_t.  See wctype(3) and wctrans(3) for details.

     Wide characters should be used for all I/O processing which may rely on locale-specific
     strings.  The two primary issues requiring special use of wide characters are:

	   o   All I/O is performed using multibyte characters.  Input data is converted into
	       wide characters immediately after reading and data for output is converted from
	       wide characters to multi-byte encoding immediately before writing.  Conversion is
	       controlled by the mbstowcs(3), mbsrtowcs(3), wcstombs(3), wcsrtombs(3), mblen(3),
	       mbrlen(3), and mbsinit(3).

	   o   Wide characters are used directly for I/O, using getwchar(3), fgetwc(3), getwc(3),
	       ungetwc(3), fgetws(3), putwchar(3), fputwc(3), putwc(3), and fputws(3).	They are
	       also used for formatted I/O functions for wide characters such as fwscanf(3),
	       wscanf(3), swscanf(3), fwprintf(3), wprintf(3), swprintf(3), vfwprintf(3),
	       vwprintf(3), and vswprintf(3), and wide character identifier of %lc, %C, %ls, %S
	       for conventional formatted I/O functions.

SEE ALSO
     gencat(1), xfd(1), xterm(1), catgets(3), gettext(3), nl_langinfo(3), setlocale(3),
     wsfontload(8)

BUGS
     This man page is incomplete.

BSD					February 21, 2007				      BSD
Unix & Linux Commands & Man Pages : ©2000 - 2018 Unix and Linux Forums


All times are GMT -4. The time now is 05:40 PM.