Query: langident
OS: debian
Section: 1p
Format: Original Unix Latex Style Formatted with HTML and a Horizontal Scroll Bar
LANGIDENT(1p) User Contributed Perl Documentation LANGIDENT(1p)NAMElangident - identifies the language files are written inSYNOPSISlangident [OPTIONS] file1 [file2 ...]DESCRIPTIONIdentifies the language files are written in using Perl module Lingua::Identify. OPTIONS -a Show all results (not just the most probable language). -c Show confidence level for most probable language (it will be the first value right after the most probable language). -d Debug (development only). -E ENCODING Select an input encoding. Defaults to UTF-8. # use ISO-8859-1 (latin1) langident -E ISO-8859-1 file -e METHODS Select the method(s) to use. There are three ways of doing this: # simply using a method langident -e ngrams3 file # using several methods (separate them with a comma) langident -e prefixes3,suffixes3 # using several methods and assign different weights to each of them langident -e smallwords=2,prefixes=1,ngrams3=1.3 The available methods are the following: smallwords, prefixes1, prefixes2, prefixes3, prefixes4, suffixes1, suffixes2, suffixes3, suffixes4, ngrams1, ngrams2, ngrams3 and ngrams4. -h Display help message and exit. -l List all available languages and exit. -m NUMBER Set maximum number of results (languages) to display (shows the N most probable languages, by descending order of probability). Overrides the -a switch. -o LANGUAGES Only work with specified languages. # identify between Portuguese and English only langident -o pt,en * -p Also show percentages. -s SIZE Maximum size to examine. -v Show version and exit.EXAMPLESUse methods ngrams2 and ngrams1, assigning the double of importance to ngrams2 (-e switch); output will include the three most probable languages (-m switch) with its percentages (-p switch) and also the confidence level (-c switch) of the first result. $ langident -e ngrams2=2,ngrams1 -c -p -m 3 README README:en 65.7209505939491 7.8971987481393 ga 4.11905889385895 tr 4.08487011400505 $TO DOo Add a switch to ignore HTML tags (and maybe other formats too)SEE ALSOLingua::Identify(3), Text::ExtractWords(3), Text::Ngram(3), Text::Affixes(3). A linguist and/or a shrink. The latest CVS version of "Lingua::Identify" (which includes langident) can be attained at http://natura.di.uminho.pt/natura/viewcvs.cgi/Lingua/Identify/ ISO 639 Language Codes, at http://www.w3.org/WAI/ER/IG/ert/iso639.htmAUTHORJose Alves de Castro, <cog@cpan.org>COPYRIGHT AND LICENSECopyright 2004 by Jose Alves de Castro This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.14.2 2010-05-21 LANGIDENT(1p)
Related Man Pages |
---|
langident(1p) - debian |
make-lingua-identify-language(1p) - debian |
locale::language5.18(3pm) - mojave |
lingua::identify(3pm) - debian |
lingua::stopwords(3pm) - debian |
Similar Topics in the Unix Linux Community |
---|
Deleting a character |