LanguageTool 0.9.5 (Default branch)

11-02-2008

Registered User

26,240, 27

Join Date: Sep 2000

Last Activity: 1 August 2008, 3:09 PM EDT

Posts: 26,240

Thanks Given: 0

Thanked 27 Times in 26 Posts

LanguageTool 0.9.5 (Default branch)

LanguageTool is a style and grammar checker that currently supports English, Polish, German, French, Dutch, and other languages to a different degree. It scans the words and their part-of-speech tags for occurrences of error patterns, which are defined in an XML file. More powerful error rules can be written in Java. LanguageTool should be used after the spelling of a text has been corrected. License: GNU Lesser General Public License (LGPL) Changes:
The rules for English and Polish have been updated, crashes with OpenOffice.org integration have been fixed, and false alarms in the German agreement rule have been reduced.

More...

Linux Bot

View Public Profile for Linux Bot

Find all posts by Linux Bot

Mail::SpamAssassin::Plugin::TextCat(3) User Contributed Perl Documentation Mail::SpamAssassin::Plugin::TextCat(3) NAME
Mail::SpamAssassin::Plugin::TextCat - TextCat language guesser SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::TextCat DESCRIPTION
This plugin will try to guess the language used in the message text. You can then specify which languages are considered okay for incoming mail and if the guessed language is not okay, "UNWANTED_LANGUAGE_BODY" is triggered It will always add the results to a "X-Language" name-value pair in the message metadata data structure. This may be useful as Bayes tokens. The results can also be added to marked-up messages using "add_header", with the _LANGUAGES_ tag. See Mail::SpamAssassin::Conf for details. Note: the language cannot always be recognized with sufficient confidence. In that case, "UNWANTED_LANGUAGE_BODY" will not trigger. USER OPTIONS
ok_languages xx [ yy zz ... ] (default: all) This option is used to specify which languages are considered okay for incoming mail. SpamAssassin will try to detect the language used in the message text. Note that the language cannot always be recognized with sufficient confidence. In that case, no points will be assigned. The rule "UNWANTED_LANGUAGE_BODY" is triggered based on how this is set. In your configuration, you must use the two or three letter language specifier in lowercase, not the English name for the language. You may also specify "all" if a desired language is not listed, or if you want to allow any language. The default setting is "all". Examples: ok_languages all (allow all languages) ok_languages en (only allow English) ok_languages en ja zh (allow English, Japanese, and Chinese) Note: if there are multiple ok_languages lines, only the last one is used. Select the languages to allow from the list below: af - Afrikaans am - Amharic ar - Arabic be - Byelorussian bg - Bulgarian bs - Bosnian ca - Catalan cs - Czech cy - Welsh da - Danish de - German el - Greek en - English eo - Esperanto es - Spanish et - Estonian eu - Basque fa - Persian fi - Finnish fr - French fy - Frisian ga - Irish Gaelic gd - Scottish Gaelic he - Hebrew hi - Hindi hr - Croatian hu - Hungarian hy - Armenian id - Indonesian is - Icelandic it - Italian ja - Japanese ka - Georgian ko - Korean la - Latin lt - Lithuanian lv - Latvian mr - Marathi ms - Malay ne - Nepali nl - Dutch no - Norwegian pl - Polish pt - Portuguese qu - Quechua rm - Rhaeto-Romance ro - Romanian ru - Russian sa - Sanskrit sco - Scots sk - Slovak sl - Slovenian sq - Albanian sr - Serbian sv - Swedish sw - Swahili ta - Tamil th - Thai tl - Tagalog tr - Turkish uk - Ukrainian vi - Vietnamese yi - Yiddish zh - Chinese (both Traditional and Simplified) zh.big5 - Chinese (Traditional only) zh.gb2312 - Chinese (Simplified only) inactive_languages xx [ yy zz ... ] (default: see below) This option is used to specify which languages will not be considered when trying to guess the language. For performance reasons, supported languages that have fewer than about 5 million speakers are disabled by default. Note that listing a language in "ok_languages" automatically enables it for that user. The default setting is: bs cy eo et eu fy ga gd is la lt lv rm sa sco sl yi That list is Bosnian, Welsh, Esperanto, Estonian, Basque, Frisian, Irish Gaelic, Scottish Gaelic, Icelandic, Latin, Lithuanian, Latvian, Rhaeto-Romance, Sanskrit, Scots, Slovenian, and Yiddish. textcat_max_languages N (default: 5) The maximum number of languages before the classification is considered unknown. textcat_optimal_ngrams N (default: 0) If the number of ngrams is lower than this number then they will be removed. This can be used to speed up the program for longer inputs. For shorter inputs, this should be set to 0. textcat_max_ngrams N (default: 400) The maximum number of ngrams that should be compared with each of the languages models (note that each of those models is used completely). textcat_acceptable_score N (default: 1.05) Include any language that scores at least "textcat_acceptable_score" in the returned list of languages perl v5.12.1 2010-03-16 Mail::SpamAssassin::Plugin::TextCat(3)

Software Releases - RSS News

LanguageTool 0.9.5 (Default branch)

LEARN ABOUT SUSE

mail::spamassassin::plugin::textcat