Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

lingua::stem::enbroken(3pm) [debian man page]

Lingua::Stem::EnBroken(3pm)				User Contributed Perl Documentation			       Lingua::Stem::EnBroken(3pm)

NAME
Lingua::Stem::EnBroken - Porter's stemming algorithm for 'generic' English SYNOPSIS
use Lingua::Stem::EnBroken; my $stems = Lingua::Stem::EnBroken::stem({ -words => $word_list_reference, -locale => 'en', -exceptions => $exceptions_hash, }); DESCRIPTION
This routine MIS-applies the Porter Stemming Algorithm to its parameters, returning the stemmed words. It is an intentionally broken version of Lingua::Stem::En for people needing backwards compatibility with Lingua::Stem 0.30 and Lingua::Stem 0.40. Do not use it if you aren't one of those people. It is derived from the C program "stemmer.c" as found in freewais and elsewhere, which contains these notes: Purpose: Implementation of the Porter stemming algorithm documented in: Porter, M.F., "An Algorithm For Suffix Stripping," Program 14(3), July 1980, pp. 130-137. Provenance: Written by B. Frakes and C. Cox, 1986. I have re-interpreted areas that use Frakes and Cox's "WordSize" function. My version may misbehave on short words starting with "y", but I can't think of any examples. The step numbers correspond to Frakes and Cox, and are probably in Porter's article (which I've not seen). Porter's algorithm still has rough spots (e.g current/currency, -ings words), which I've not attempted to cure, although I have added support for the British -ise suffix. CHANGES
2003.09.28 - Documentation fix 2000.09.14 - Forked from the Lingua::Stem::En.pm module to provide a backward compatibly broken version for people needing consistent behavior with 0.30 and 0.40 more than accurate stemming. METHODS
stem({ -words => @words, -locale => 'en', -exceptions => \%exceptions }); Stems a list of passed words using the rules of US English. Returns an anonymous array reference to the stemmed words. Example: my $stemmed_words = Lingua::Stem::EnBroken::stem({ -words => @words, -locale => 'en', -exceptions => \%exceptions, }); stem_caching({ -level => 0|1|2 }); Sets the level of stem caching. '0' means 'no caching'. This is the default level. '1' means 'cache per run'. This caches stemming results during a single call to 'stem'. '2' means 'cache indefinitely'. This caches stemming results until either the process exits or the 'clear_stem_cache' method is called. clear_stem_cache; Clears the cache of stemmed words NOTES
This code is almost entirely derived from the Porter 2.1 module written by Jim Richardson. SEE ALSO
Lingua::Stem AUTHOR
Jim Richardson, University of Sydney jimr@maths.usyd.edu.au or http://www.maths.usyd.edu.au:8000/jimr.html Integration in Lingua::Stem by Benjamin Franz, FreeRun Technologies, snowhare@nihongo.org or http://www.nihongo.org/snowhare/ COPYRIGHT
Jim Richardson, University of Sydney Benjamin Franz, FreeRun Technologies This code is freely available under the same terms as Perl. BUGS
TODO
perl v5.10.1 2007-10-23 Lingua::Stem::EnBroken(3pm)

Check Out this Related Man Page

Lingua::Stem::De(3pm)					User Contributed Perl Documentation				     Lingua::Stem::De(3pm)

NAME
Lingua::Stem::De - Stemming algorithm for German SYNOPSIS
use Lingua::Stem::De; my $stems = Lingua::Stem::De::stem({ -words => $word_list_reference, -locale => 'de', -exceptions => $exceptions_hash, }); DESCRIPTION
This routine applies a stemming slgorithm to a passed anon array of German words, returning the stemmed words as an anon array. It is a 'convienence' wrapper for 'Text::German' that provides a standardized interface and caching. CHANGES
1.01 2003.09.28 - Documentation fix 1.00 2003.04.05 - Initial release METHODS
stem({ -words => @words, -locale => 'de', -exceptions => \%exceptions }); Stems a list of passed words using the rules of German Returns an anonymous array reference to the stemmed words. Example: my $stemmed_words = Lingua::Stem::De::stem({ -words => @words, -locale => 'de', -exceptions => \%exceptions, }); stem_caching({ -level => 0|1|2 }); Sets the level of stem caching. '0' means 'no caching'. This is the default level. '1' means 'cache per run'. This caches stemming results during a single call to 'stem'. '2' means 'cache indefinitely'. This caches stemming results until either the process exits or the 'clear_stem_cache' method is called. clear_stem_cache; Clears the cache of stemmed words NOTES
This code is almost entirely derived from Text::German written by Ulrich Pfeifer SEE ALSO
Lingua::Stem Text::German AUTHOR
Ulrich Pfeifer Integration in Lingua::Stem by Benjamin Franz, FreeRun Technologies, snowhare@nihongo.org or http://www.nihongo.org/snowhare/ COPYRIGHT
Ulrich Pfeifer Benjamin Franz, FreeRun Technologies This code is freely available under the same terms as Perl. BUGS
TODO
perl v5.10.1 2007-10-23 Lingua::Stem::De(3pm)
Man Page