langident(1p) debian man page

LANGIDENT(1p)						User Contributed Perl Documentation					     LANGIDENT(1p)

NAME
       langident - identifies the language files are written in

SYNOPSIS
	 langident [OPTIONS] file1 [file2 ...]

DESCRIPTION
       Identifies the language files are written in using Perl module Lingua::Identify.

   OPTIONS
   -a
       Show all results (not just the most probable language).

   -c
       Show confidence level for most probable language (it will be the first value right after the most probable language).

   -d
       Debug (development only).

   -E ENCODING
       Select an input encoding. Defaults to UTF-8.

	 # use ISO-8859-1 (latin1)
	 langident -E ISO-8859-1 file

   -e METHODS
       Select the method(s) to use. There are three ways of doing this:

	 # simply using a method
	 langident -e ngrams3 file

	 # using several methods (separate them with a comma)
	 langident -e prefixes3,suffixes3

	 # using several methods and assign different weights to each of them
	 langident -e smallwords=2,prefixes=1,ngrams3=1.3

       The available methods are the following: smallwords, prefixes1, prefixes2, prefixes3, prefixes4, suffixes1, suffixes2, suffixes3,
       suffixes4, ngrams1, ngrams2, ngrams3 and ngrams4.

   -h
       Display help message and exit.

   -l
       List all available languages and exit.

   -m NUMBER
       Set maximum number of results (languages) to display (shows the N most probable languages, by descending order of probability).

       Overrides the -a switch.

   -o LANGUAGES
       Only work with specified languages.

	 # identify between Portuguese and English only
	 langident -o pt,en *

   -p
       Also show percentages.

   -s SIZE
       Maximum size to examine.

   -v
       Show version and exit.

EXAMPLES
       Use methods ngrams2 and ngrams1, assigning the double of importance to ngrams2 (-e switch); output will include the three most probable
       languages (-m switch) with its percentages (-p switch) and also the confidence level (-c switch) of the first result.

	 $ langident -e ngrams2=2,ngrams1 -c -p -m 3 README
	 README:en 65.7209505939491 7.8971987481393 ga 4.11905889385895 tr 4.08487011400505
	 $

TO DO
       o     Add a switch to ignore HTML tags (and maybe other formats too)

SEE ALSO
       Lingua::Identify(3), Text::ExtractWords(3), Text::Ngram(3), Text::Affixes(3).

       A linguist and/or a shrink.

       The latest CVS version of "Lingua::Identify" (which includes langident) can be attained at
       http://natura.di.uminho.pt/natura/viewcvs.cgi/Lingua/Identify/

       ISO 639 Language Codes, at http://www.w3.org/WAI/ER/IG/ert/iso639.htm

AUTHOR
       Jose Alves de Castro, <cog@cpan.org>

COPYRIGHT AND LICENSE
       Copyright 2004 by Jose Alves de Castro

       This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

perl v5.14.2							    2010-05-21							     LANGIDENT(1p)
langident(1p) debian man page | unix.com