debian man page for mmseg

Query: mmseg

OS: debian

Section: 1

Format: Original Unix Latex Style Formatted with HTML and a Horizontal Scroll Bar

MMSEG(1)						User Contributed Perl Documentation						  MMSEG(1)

NAME
mmseg - maximum matching segment Chinese text.
SYNOPSIS
mmseg -d dict_file [option]... [corpus_file]...
DESCRIPTION
mmseg is a tool for segmenting Chinese text into words using maximum matching algorithm. mmseg segments corpus_file, or standard input if no filename is specified, and write the segmented result to standard output.
OPTIONS
-d dict_file Use dict_file as lexicon. A default lexicon can be found at /usr/share/sunpinyin-slm/dict.utf8. -f,--format (text|bin) Output Format, can be 'text' or 'bin'. default 'bin'. Normally, in text mode, word text are output, while in binary mode, binary short integer of the word-ids are written to stdout. -s, --stok STOK_ID Sentence token id. Default 10. It will be written to output in binary mode after every sentence. -i, --show-id Show Id info. Under text output format mode, attach id after known words. If under binary mode, print id(s) in text. -a, --ambiguious-id AMBI-ID Ambiguious means ABC => A BC or AB C. If specified (AMBI-ID != 0), The sequence ABC will not be segmented, in binary mode, the AMBI-ID is written out; in text mode, "<ambi>ABC</ambi>" will be output. Default is 0.
NOTES
Under binary mode, consecutive id of 0 are merged into one 0. Under text mode, no space are inserted between unknown-words.
AUTHOR
Originally written by Phill.Zhang <phill.zhang@sun.com>. Currently maintained by Kov.Chai <tchaikov@gmail.com>.
SEE ALSO
slmseg(1), ids2ngram (1). perl v5.14.2 2012-06-09 MMSEG(1)
Related Man Pages
asn2asn(1) - debian
catod(1) - debian
prezip-bin(1) - debian
tslmendian(1) - debian
prezip-bin(1) - suse
Similar Topics in the Unix Linux Community
Removing spaces at particular position
Word search in awk
remove trailing newline characters
Check whether the pattern is present or not?
Removing text from a line in a file