Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

kinosearch1::analysis::polyanalyzer(3pm) [debian man page]

KinoSearch1::Analysis::PolyAnalyzer(3pm)		User Contributed Perl Documentation		  KinoSearch1::Analysis::PolyAnalyzer(3pm)

NAME
KinoSearch1::Analysis::PolyAnalyzer - multiple analyzers in series SYNOPSIS
my $analyzer = KinoSearch1::Analysis::PolyAnalyzer->new( language => 'es', ); # or... my $analyzer = KinoSearch1::Analysis::PolyAnalyzer->new( analyzers => [ $lc_normalizer, $custom_tokenizer, $snowball_stemmer, ], ); DESCRIPTION
A PolyAnalyzer is a series of Analyzers -- objects which inherit from KinoSearch1::Analysis::Analyzer -- each of which will be called upon to "analyze" text in turn. You can either provide the Analyzers yourself, or you can specify a supported language, in which case a PolyAnalyzer consisting of an LCNormalizer, a Tokenizer, and a Stemmer will be generated for you. Supported languages: en => English, da => Danish, de => German, es => Spanish, fi => Finnish, fr => French, it => Italian, nl => Dutch, no => Norwegian, pt => Portuguese, ru => Russian, sv => Swedish, CONSTRUCTOR
new() my $analyzer = KinoSearch1::Analysis::PolyAnalyzer->new( language => 'en', ); Construct a PolyAnalyzer object. If the parameter "analyzers" is specified, it will override "language" and no attempt will be made to generate a default set of Analyzers. o language - Must be an ISO code from the list of supported languages. o analyzers - Must be an arrayref. Each element in the array must inherit from KinoSearch1::Analysis::Analyzer. The order of the analyzers matters. Don't put a Stemmer before a Tokenizer (can't stem whole documents or paragraphs -- just individual words), or a Stopalizer after a Stemmer (stemmed words, e.g. "themselv", will not appear in a stoplist). In general, the sequence should be: normalize, tokenize, stopalize, stem. COPYRIGHT
Copyright 2005-2010 Marvin Humphrey LICENSE, DISCLAIMER, BUGS, etc. See KinoSearch1 version 1.00. perl v5.14.2 2011-11-15 KinoSearch1::Analysis::PolyAnalyzer(3pm)

Check Out this Related Man Page

KinoSearch1::Analysis::Stopalizer(3pm)			User Contributed Perl Documentation		    KinoSearch1::Analysis::Stopalizer(3pm)

NAME
KinoSearch1::Analysis::Stopalizer - suppress a "stoplist" of common words SYNOPSIS
my $stopalizer = KinoSearch1::Analysis::Stopalizer->new( language => 'fr', ); my $polyanalyzer = KinoSearch1::Analysis::PolyAnalyzer->new( analyzers => [ $lc_normalizer, $tokenizer, $stopalizer, $stemmer ], ); DESCRIPTION
A "stoplist" is collection of "stopwords": words which are common enough to be of little value when determining search results. For example, so many documents in English contain "the", "if", and "maybe" that it may improve both performance and relevance to block them. # before @token_texts = ('i', 'am', 'the', 'walrus'); # after @token_texts = ('', '', '', 'walrus'); CONSTRUCTOR
new my $stopalizer = KinoSearch1::Analysis::Stopalizer->new( language => 'de', ); # or... my $stopalizer = KinoSearch1::Analysis::Stopalizer->new( stoplist => \%stoplist, ); new() takes two possible parameters, "language" and "stoplist". If "stoplist" is supplied, it will be used, overriding the behavior indicated by the value of "language". o stoplist - must be a hashref, with stopwords as the keys of the hash and values set to 1. o language - must be the ISO code for a language. Loads a default stoplist supplied by Lingua::StopWords. SEE ALSO
Lingua::StopWords COPYRIGHT
Copyright 2005-2010 Marvin Humphrey LICENSE, DISCLAIMER, BUGS, etc. See KinoSearch1 version 1.00. perl v5.14.2 2011-11-15 KinoSearch1::Analysis::Stopalizer(3pm)
Man Page