debian man page for slmbuild

Query: slmbuild

OS: debian

Section: 1

Format: Original Unix Latex Style Formatted with HTML and a Horizontal Scroll Bar

SLMBUILD(1)						User Contributed Perl Documentation					       SLMBUILD(1)

NAME
slmbuild - generate language model from idngram file
SYNOPSIS
slmbuild [option]... idngram_file...
DESCRIPTION
slmbuild generates a back-off smoothing language model from a given idngram file. Generally, the idngram_file is created by ids2ngram. OPTIONS All the following options are mandatory. -n,--NMax N 1 for unigram, 2 for bigram, 3 for trigram. Any number not in the range of 1..3 is not valid. -o, --out output-file Specify the output xfilei name. -l, --log using -log(pr), use pr directly by default. -w, --wordcount N Lexican size, number of different words. -b, --brk id... Set the ids which should be treated as breaker. -e, --e id... Set the ids which should not be put into LM. -c, --cut c... k-grams whose freq <= c[k] are dropped. -d, --discount method, param... The k-th -d parm specifies the discount method For k-gram, possibble values for method/param are: B<GT>,I<R>,I<dis> : B<GT> discount for r E<lt>= I<R>, r is the freq of a ngram. Linear discount for those r E<gt> I<R>, i.e. r'=r*dis 0 E<lt>E<lt> dis E<lt> 1.0, for example 0.999 B<ABS>,[I<dis>] : Absolute discount r'=r-I<dis>. And I<dis> is optional 0 E<lt>E<lt> I<dis> E<lt> cut[k]+1.0, normally I<dis> E<lt> 1.0. LIN,[I<dis>] : Linear discount r'=r*dis. And dis is optional 0 E<lt> dis E<lt> 1.0
NOTE
-n must be given before -c -b. And -c must give right number of cut-off, also -ds must appear exactly N times specifying the discounts for 1-gram, 2-gram..., respectively. BREAKER-IDs could be SentenceTokens or ParagraphTokens. Conceptually, these ids have no meaning when they appeared in the middle of n-gram. EXCLUDE-IDs could be ambiguious-ids. Conceptually, n-grams which contain those ids are meaningless. We can not erase ngrams according to BREAKER-IDS and EXCLUDE-IDs directly from IDNGRAM file, because some low-level information is still useful in it.
EXAMPLE
Following example read 'all.id3gram' and write trigram model 'all.slm'. At 1-gram level, use Good-Turing discount with cut-off 0, i<R>=8, dis=0.9995. At 2-gram level, use Absolute discount with cut-off 3, dis auto-calc. At 3-gram level, use Absolute discount with cut-off 2, dis auto-calc. Word id 10,11,12 are breakers (sentence/para/paper breaker, etc). Exclude-ID is 9. Lexicon contains 200000 words. The result languagme model uses -log(pr). slmbuild -l -n 3 -o all.slm -w 200000 -c 0,3,2 -d GT,8,0.9995 -d ABS -d ABS -b 10,11,12 -e 9 all.id3gram
AUTHOR
Originally written by Phill.Zhang <phill.zhang@sun.com>. Currently maintained by Kov.Chai <tchaikov@gmail.com>.
SEE ALSO
ids2ngram(1), slmprune(1). perl v5.14.2 2012-06-09 SLMBUILD(1)
Related Man Pages
cut(1) - opensolaris
xkbsaactionsetctrls(3) - debian
smp_ena_dis_zoning(8) - debian
cut(1) - sunos
bio::searchdist(3pm) - debian
Similar Topics in the Unix Linux Community
How reverse cut or read rows of lines
Best Website(s) for Discount Hotels in Bangalore?
Top quality!Gobizsale.com discount sale Ecko Jacket!
Renaming all files
WTS : Samsung Galaxy S6,S7 Edge,iPhone 6,6s 64gb/128gb discount Price