Unix/Linux Go Back    


CentOS 7.0 - man page for gendict (centos section 1)

Linux & Unix Commands - Search Man Pages
Man Page or Keyword Search:   man
Select Man Page Set:       apropos Keyword Search (sections above)


GENDICT(1)				ICU 50.1.2 Manual			       GENDICT(1)

NAME
       gendict - Compiles word list into ICU string trie dictionary

SYNOPSIS
       gendict	[ --uchars | --bytes --transform transform ] [ -h, -?, --help ] [ -V, --version ]
       [ -c, --copyright ] [ -v, --verbose ] [ -i, --icudatadir directory ]    input-file    out-
       put-file

DESCRIPTION
       gendict	reads  the  word  list	from dictionary-file and creates a string trie dictionary
       file. Normally this data file has the .dict extension.

       Words begin at the beginning of a line and are terminated by the first whitespace.   Lines
       that begin with whitespace are ignored.

OPTIONS
       -h, -?, --help
	      Print help about usage and exit.

       -V, --version
	      Print the version of gendict and exit.

       -c, --copyright
	      Embeds the standard ICU copyright into the output-file.

       -v, --verbose
	      Display extra informative messages during execution.

       -i, --icudatadir directory
	      Look  for  any  necessary  ICU  data  files  in  directory.   For example, the file
	      pnames.icu must be located when ICU's data is not built as a shared  library.   The
	      default ICU data directory is specified by the environment variable ICU_DATA.  Most
	      configurations of ICU do not require this argument.

       --uchars
	      Set the output trie type to UChar. Mutually exclusive with --bytes.

       --bytes
	      Set the output trie type to Bytes. Mutually exclusive with --uchars.

       --transform
	      Set the transform type. Should only be specified with --bytes.  Currently supported
	      transforms are: offset-<hex-number>, which specifies an offset to subtract from all
	      input characters.  It should be noted that the offset transform also maps U+200D to
	      0xFF  and U+200C to 0xFE, in order to offer compatibility to languages that require
	      these characters.  A transform must be specified for a bytes trie, and when applied
	      to  the non-value characters in the input-file must produce output between 0x00 and
	      0xFF.

	input-file
	      The source file to read.

	output-file
	      The file to write the output dictionary to.

CAVEATS
       The input-file is assumed to be encoded in UTF-8.  The integers in the input-file that are
       used  as  values  must be made up of ASCII digits. They may be specified either in hex, by
       using a 0x prefix, or in decimal.  Either --bytes or --uchars must be specified.

ENVIRONMENT
       ICU_DATA  Specifies the directory containing ICU data. Defaults to /usr/share/icu/50.1.2/.
		 Some  tools  in  ICU  depend  on  the presence of the trailing slash. It is thus
		 important to make sure that it is present if ICU_DATA is set.

AUTHORS
       Maxime Serrano

VERSION
       1.0

COPYRIGHT
       Copyright (C) 2012 International Business Machines Corporation and others

SEE ALSO
       http://www.icu-project.org/userguide/boundaryAnalysis.html

ICU MANPAGE				   1 June 2012				       GENDICT(1)
Unix & Linux Commands & Man Pages : ©2000 - 2018 Unix and Linux Forums


All times are GMT -4. The time now is 11:27 PM.