👤
Home Man
Search
Today's Posts
Register

Linux & Unix Commands - Search Man Pages
Man Page or Keyword Search:
Select Section of Man Page:
Select Man Page Repository:

RedHat 9 (Linux i386) - man page for cleanfeed (redhat section 8)

cleanfeed(8)			  Cleanfeed - Because spam sucks		     cleanfeed(8)

NAME
       Cleanfeed - spam filter for Usenet news servers

SYNOPSIS
       INN: Installed as filter_innd.pl, location is configured into INN at compile time.

       Highwind servers: <command line> -program cleanfeed -body

       NNTPRelay: ExternalFilter=c:/perl/bin/perl.exe c:/news/cleanfeed.pl

DESCRIPTION
       A spam filter for Usenet servers.  Cleanfeed blocks spam on the way into your server,
       before it is written to disk or propagated to outbound feeds.  It can also block binaries
       in non-binary newsgroups and includes several other features to keep your newsfeed clean.

       Cleanfeed currently works with INN, Cyclone, Typhoon, Breeze, and NNTPRelay servers.  See
       my webpage (listed at the end of this document) for pointers to information about using
       Cleanfeed with CNews, Diablo, Collabra, or INN versions earlier than 1.5.1.

USAGE
       For all versions, place the cleanfeed.conf configuration file somewhere, then edit the
       Cleanfeed source file and change the $config_dir option at the top to point to the
       directory where the config file lives.

       INN Install the filter file (called cleanfeed) as filter_innd.pl, and cleanfeed.conf, in
	   the location you specified in config.data (INN 1.7.2 and earlier) or when configuring
	   INN 2.x (usually the bin/filter directory under the installation root).  Make sure
	   both files are readable by the news user.  Once in place, the filter is loaded with
	   the command ctlinnd reload filter.perl meow.  Filtering can be turned on with ctlinnd
	   perl y and turned off with ctlinnd perl n.

       Cyclone/Typhoon/Breeze
	   Add the -program <file> and -body options to the bin/start script, where <file> is the
	   location and name of the Cleanfeed program. Restart the server.  Cleanfeed will run as
	   an external process (standalone mode).  IMPORTANT: make sure both cleanfeed and
	   cleanfeed.conf are readable by the news user!  Double-check the permissions as this is
	   a fairly common mistake!

       NNTPRelay
	   Find the ExternalFilter directive in config.txt and make it look like:

	   ExternalFilter=c:/perl/bin/perl.exe c:/news/cleanfeed.pl

	   Cleanfeed will run as an external process (standalone mode).

       More detailed installation instructions are provided later in this document.

CONFIGURATION OPTIONS
       Configuration is accomplished by setting the various options in the cleanfeed.conf
       configuration file.  This file is evaluated as Perl code, so comments can be included in
       the usual Perl # syntax.  A sample default file is included with the distribution.

       If you would rather not use cleanfeed.conf, you can set its location to "undef" in the
       source and edit the configuration variables directly in the source file.

       cleanfeed.conf has two sections (which define perl hashes): %config_local and
       %config_append.	Entries in %config_local will override the default settings of the same
       name in the Cleanfeed source.  Entries in %config_append can be used to add to most of the
       default regular expressions, for items such as badguys, bin_allowed, poison_groups, etc.
       Settings in %config_append for these items will be appended to the default regexps,
       seperated by "|" (or).

       If you want to completely override the default regexps for these options, rather than just
       add to the defaults, you can add an entry for them into the %config_local section of
       cleanfeed.conf.

       All of this is done quite blindly, so if you do anything odd, be careful.  (Cleanfeed will
       remove the common mistake of including two "|" (or) signs in a row.)  All config options
       are exposed to %config_local, including any that may not be present in the sample file.
       Only the defined list of options are exposed to %config_append.

       Options that are on/off or yes/no should be set to 1 for on/yes, or 0 for off/no.

       First, you need to tell Cleanfeed which news server software you are using.  At the top of
       the file, set the appropriate variable to 1.  For INN, set $inn; for Cyclone, Typhoon, or
       Breeze, set $highwind; and for NNTPRelay, set $nntprelay.  Ensure the other two (the ones
       you're not using) are set to 0.

       General Settings

       aggressive
	   Set this to 0 to disable all content-based filters.	Helpful to please paranoid
	   lawyers, or paranoid customers.

       active_file
	       Set this to the full path to an active file, to allow Cleanfeed to know what
	       groups are moderated.  This is normally your server's active file, but it doesn't
	       have to be; it is possible, for example, to run Cyclone with no active file, but
	       give one to Cleanfeed anyway.

       MD5 Body Filter Settings

       do_md5  When turned on, the MD5 EMP checks will be done.  This should be left on unless
	       you have a really good reason to turn it off.  If you're running Hippo along with
	       Cleanfeed, you might feel Cleanfeed's MD5 checks are redundant and want to turn
	       them off, for example.  It would probably be better to leave it on with the
	       history turned down, instead.

       md5maxmultiposts
	       Start rejecting articles after we have seen this many copies, according to the MD5
	       checksum filter.

       MD5History
	       How many articles to remember for MD5-based EMP comparison.  Since the MD5 filter
	       is not prone to false positives, setting this higher is a good idea to catch more
	       spam, if you have the RAM to spare.

       MD5maxlife
	       When a spam is identified by the MD5 EMP filter, it is saved for continual
	       rejection. MD5maxlife specifies how long, in hours, to keep a saved MD5 id which
	       is no longer getting any hits.  (A spam id which is still getting matches will be
	       saved regardless of age.)  24 hours works well.

       fuzzy_md5
	       When turned on, the message bodies will be munged up a bit before MD5 checksums
	       are generated.  Whitespace and other non-alphanumeric characters are stripped and
	       letters are forced to lowercase, as well as a couple other bits of treachery to
	       try to defeat the "hashbuster" spam-bots.  This adds a bit of "fuzziness" to the
	       MD5 filter, and results in a performance hit as well.

	       Since the smarter spammers have discovered hashbusting, I recommend that this be
	       turned on.

       fuzzy_max_length
	       Sets the maximum amount of lines for an article body to be subject to the
	       fuzzy_md5 munging (above).  This keeps extremely large articles out of those nasty
	       regular expressions.

       md5_skips_followups
	       Determines whether the MD5 filter checks articles with References headers.  The
	       default is to skip them.  Setting this option to 0 will result in all articles
	       passing through the MD5 filter, which can result in a major performance hit, but
	       does close another hole in the filter.  If you turn this off, you should increase
	       MD5history as well to avoid shortening your "window".

       MD5HistSize
	       The maximum allowed size of the EMP memory for the MD5-checksum EMP filter.  Use
	       this as a "sanity check" to prevent a sudden burst of spam from eating up all of
	       your memory.  It should be set high enough so that you normally never hit this
	       number; use the MD5MaxLife to expire the hash instead.

       Header-Based EMP Filter Settings

       do_phl  Turns on the NNTP-Posting-Host/Lines EMP filter.  This filter identifies spam by
	       identical posting-host headers and article sizes in a short period of time.  You
	       really don't want to turn this off.

       do_fsl  Turns on the From/Subject/Lines EMP filter.  This filter identifies spam by
	       identical From and Subject headers and article sizes in a short period of time.
	       This is the one that gets the least number of hits these days, so you won't lose
	       much by shutting it off.

       maxmultiposts
	       Start rejecting articles after we have seen this many copies, according to the
	       header-based EMP filter.  Since false positives are somewhat more likely with this
	       filter than with MD5, this should be set appropriately higher to reduce the odds.

       ArticleHistory
	       How many ids to remember for header-based EMP comparison.  Setting this higher
	       will catch more spam because there will be a larger "window" to look at.  Larger
	       settings will also consume more memory and have a (small) impact on performance,
	       as well as slightly increase the chance of a false positive (since the sample size
	       will be larger).  Most articles will actually take up two entries in this history
	       because there are two different header-based filters.

       EMPmaxlife
	       Same as MD5maxlife but for the header-based EMP filter.

       EMPHistSize
	       Same as MD5HistSize but for the header-based EMP filter.  If you are running the
	       header-based filter but not the MD5 filter for whatever reason, set this high.

       Excessive Crosspost Settings

       maxgroups
	       Reject articles crossposted so that followups will be to more than this many
	       newsgroups.

       low_xpost_maxgroups
	       Specify a special, lower crosspost limit for certain groups, specifed by regular
	       expression in low_xpost_groups (below).	Useful for being more strict in groups
	       plagued by crossposting, such as sex, binaries, and jobs groups.  (Replaces the
	       old tfjmaxgroups option.)

       Misplaced Binaries Filter

       block_binaries
	       Enables blocking of binary posts in non-binary newsgroups.  Which newsgroups allow
	       binaries is configured with bin_allowed (below).

       max_encoded_lines
	       Sets the number of uuencoded or base64-encoded lines to allow before considering a
	       post to be a binary.  This should be set high enough to pass regular PGP
	       signatures.  (Those satanic Netscape crypto-sigs can die along with the other
	       binaries.)  Default is 15 lines, which may be a little low if you are lenient,
	       which you're not.

       binaries_in_mod_groups
	       If set, binaries are allowed in spite of block_binaries if they are posted only to
	       moderated groups (requires active_file).

       HTML

       block_mime_html
	       Enables blocking of MIME-encapsulated HTML posts.  This does NOT affect straight
	       text/html or multipart/alternative posts of the type created by misconfigured
	       Netscape and Microsoft "newsreaders", but ONLY posts which are MIME-encapsulated
	       HTML, a favorite format of sex spammers which often sneaks in under the EMP radar.

       block_html
	       Enables blocking of HTML and multipart/alternative posts.  You can specify group
	       patterns where HTML is allowed by setting html_allowed (below).

       Cancel Message Filtering

       block_late_cancels
	       If turned on, cancels for recently rejected articles will be rejected.  Set the
	       window with MIDmaxlife (below).	This will result in a huge number of rejections
	       if you have multiple full feeds and you aren't backlogging.  If you are concerned
	       about your downstream sites receiving the cancels, leave this off. If you need a
	       performance boost, turn it on.

       MIDmaxlife
	       How long to remember rejected message-ids so cancels for these posts can later be
	       rejected.  Specified in hours.  This only has an effect if block_late_cancels is
	       enabled (above).

       Disabling Other Filters

       do_scoring_filter
	       Enables the (new) "scoring" filter.  You probably want to leave this on, even if
	       you need to turn of aggressive mode (turning off aggressive mode will disable the
	       content-based parts of the scoring filter).

       do_mid_filter (INN only)
	       Enables the message-id filter.  This requires an additional patch to INN 1.7.2,
	       which is included with Cleanfeed (but optional).  The patch adds a new Perl hook
	       to check message-id's during the NNTP CHECK transaction, and decide whether to
	       refuse the article.  There is a patch for this for INN 2.0 which may get
	       incorporated into the INN distribution at some point.  The default is off.

       do_bot_checks
	       Enables the filters that check for spam bot signatures.	The only reason you would
	       ever want to turn this off is if you've written your own version, or something.
	       Otherwise, leave it on.

       do_supersedes_filter
	       Enables the Excessive Supersedes filter, to catch rogue Supersedes attacks.  This
	       filter begins dropping articles with Supersedes headers if too many appear from
	       the same posting-host in a short time.  Moderated groups are given a higher limit
	       (if active_file is set), as is news.answers.  Default is on.

       check_supersedes_path
	       If set, bad_cancel_paths will also be applied to Supersedes articles.  Articles
	       with Supersedes headers, where a path element matches the regexp in
	       bad_cancel_paths, will be dropped.  Default is on.

       drop_useless_controls
	       If set, all control messages of types sendsys, senduuname, and version will be
	       dropped.  These are no longer useful and are a hole for denial-of-service attacks
	       due to the way INN and some other servers handle them.  On by default.

       drop_ihave_sendme
	       If set, control messages of types ihave and sendme will be dropped.  See
	       drop_useless_controls.  If you use these types of control messages, turn this off.
	       If you're not sure, then you're not using them.

       drop_control_with_supersedes
	       Drops any and all control messages which contain a Supersedes header.  Since
	       control messages are not passed through the same filters as regular messages, a
	       rogue Supersedes attack can use control messages to avoid filtering; this option
	       closes this hole.  Legitimate control messages don't have Supersedes headers.  On
	       by default.

       Hash-Trimming

       trimcycles
	       The EMP memories are trimmed every trimcycles times through the filter.

       EMPstarttrimming
	       Tells the filter not to waste time trimming the EMP memories until they have this
	       many entries.  Just a minor performance enhancement during the first hours the
	       filter is running or when you first start innd.

       Logging

       verbose When turned on, verbose logging to news.notice will happen; spam domains will be
	       listed, etc.  When off, only general messages will be logged, making the
	       news.daily summaries less interesting but much shorter and more to the point.
	       (There is, alas, no way to shut off news.notice logging entirely.)  (news.notice
	       only applies to INN.)  Note that this will not reduce the number of log entries,
	       but only their verbosity.

       logfile (Standalone Mode)
	       If set to the path to a file, this will enable logging of message-ids of all
	       articles processed by the filter.  Rejections will be logged with the reason for
	       rejection.  Note that this will create a very large logfile which you will need to
	       rotate or delete (see max_log_size, below).

       reportfile (Standalone Mode)
	       If set to the path to a file, this will enable generation of a simple report of
	       articles accepted and rejected.	The report file will contain one entry per line
	       with the start time, end time, number of articles accepted, and number of articles
	       rejected, tab-separated.

       log_accepts (Standalone Mode)
	       When using the above logfiles, this setting determines whether articles accepted
	       should be logged.  When disabled, only rejections will be logged.

       max_log_size (Standalone Mode)
	       The size at which to rotate the logfile.  This will be replaced by time-based
	       rotation at some point.

       statfile
	       If this is set to the full path of a file, a crude stats file will be written each
	       time the filter is reloaded with ctlinnd reload filter.perl meow (for INN) or
	       whenever the Cleanfeed process receives a SIGUSR1 (for standalone mode).  The file
	       shows how many entries are present in each of the EMP histories, MID history and
	       excessive supersedes history; timer information if enabled (see timer_info); and
	       the contents of all configuration settings.  Posting-hosts in for each supersedes
	       entry will be listed, along with their counts; these are not being rejected unless
	       they are over the threshold.  The default for this is undef, which disables
	       creation of the stat file.

	       More comprehensive stats are planned for the future.

       Timing Info

       timer_info
	       When enabled, Cleanfeed will generate timing statistics telling you how many
	       articles per second are being examined by the filter and being accepted by the
	       filter.	This information will appear in the statfile if this is enabled, and in
	       the output of INN's ctlinnd mode if the mode.patch is applied to INN.  Note that
	       the accepted/second rate is not necessarily the rate at which your server is
	       accepting articles; articles can be rejected by the server after Cleanfeed passes
	       them, for example if they are posted to groups not in your active file.

       timer_interval
	       The period over which to average timing information, in seconds.  The default is
	       600 seconds, or 5 minutes.

       Debugging

       debug_batch_directory
	       Specifies a directory where debugging "batchfiles" can be written.  See the
	       Hacker's Guide in this document for more information.

       debug_batch_size
	       The maximum size of a debugging batchfile before it gets rotated.  Rotation is
	       done by renaming the file to file.1, file.2, etc., using the lowest number that
	       doesn't already exist.

       Regular Expressions

       You can add to most of these regular expressions in the %config_append section of
       cleanfeed.conf; settings you add there will be added to the defaults, rather than
       overriding them.  If you want to completely override the default settings you can add
       entries for these to the %config_local section instead.

       bin_allowed
	       This is a regular expression telling the anti-binary filter in which newsgroups
	       binaries are allowed.  If all groups in the Newsgroups header match this pattern,
	       binaries are allowed through the filter.  (This obviously has no effect when the
	       binary filter is disabled.)  If the binary filter is enabled and this is set to a
	       null string (by overriding the default in the local config) the result will be
	       blocking all binaries regardless of where they are posted.

       poison_groups
	       If any groups in the Newsgroups header match this regexp, the article will be
	       rejected.  Thus you can reject crossposts to certain groups even if they are also
	       posted to groups you carry.

       html_allowed
	       This is a regular expression telling the anti-HTML filter in which newsgroups HTML
	       and multipart/alternative posts are allowed.  This only has an effect if
	       block_html is turned on (above).  The default (to allow HTML in microsoft.*
	       groups) can be added to in cleanfeed.conf.

	       If you don't want to allow HTML anywhere, not even the microsoft.*  groups,
	       override this setting in the local configuration and set it to a null string or
	       undef.

       md5exclude
	       If an article is posted only to groups matching this regexp, the MD5 EMP filter
	       will not be applied.  Useful for "test" groups where it's okay for lots of the
	       posts to be the same.

       allexclude
	       If an article is posted only to groups matching this regexp, NO checks are applied
	       at all.

       low_xpost_groups
	       If a group matches this regular expression, it gets a special crosspost limit, set
	       in low_xpost_maxgroups, rather than the general crosspost limit set in maxgroups.
	       This is useful for groups plagued by excessive crossposting, such as sex,
	       binaries, and jobs groups.  The default is to limit crossposts to 6 groups in
	       test, forsale, and jobs groups.	Setting this to a null string, or undef, will
	       disable this feature.

       badguys This is a monster regular expression containing domains of known spammers.  Only
	       the "middle" part of the domains are listed; these are checked as email addresses
	       in From headers by appending a list of top-level domains to the end, and as URLs
	       by prepending http:// and an optional "www.".  If you modify this list, be very
	       careful not to end up with "||" in there (two "or" signs in a row); this will
	       match every single post that comes through, which is Bad.

       baddomainpat
	       If a post contains a URL for a site whose domain name matches this pattern (in
	       .com, .net, and .nu TLDs only) the post will be rejected.  For example, there are
	       hundreds of spamming porn sites whose domain names begin or end with "xxx".  This
	       prevents us from having to keep up with their nonsense.	Yes, it's a little
	       aggressive, but it works.

       exempt  Regular expression of NNTP-Posting-Hosts that are exempt from the posting-host-
	       based EMP filter.  This is for high-output systems where all posts contain the
	       same NNTP-Posting-Host header, such as AOL, which if not exempted would end up
	       hitting the posting-host EMP filter with all of their posts.  There aren't many of
	       these out there; a "regular" multi-user system does not present a problem because
	       the filter doesn't kick in until it sees a large number of posts from the same
	       posting-host and also of the same length, in a short period of time.

       supersedes_exempt
	       Regular expression of NNTP-Posting-Hosts that are exempt from the excessive
	       supersedes filter.  Generally this will be systems which post a lot of FAQs.

       bad_cancel_paths
	       Cancel messages will be rejected if the Path header contains elements matching
	       this regular expression.  Also applied to the NNTP-Posting-Host.  If
	       check_supersedes_path is set, this will also be checked against the Path header of
	       articles with Supersedes headers.  This list contains sites which are or have
	       recently been the source of rogue cancel attacks.

       refuse_messageids (INN only)
	       If you have do_mid_filter (above) enabled, and you have the optional message-id
	       patch applied to INN (or otherwise have obtained the hook for filter_messageid in
	       INN 2.0), this regular expression will be applied to message-ids as they are
	       offered to your server, and they will be refused if it matches.

       net_abuse_groups

       spam_report_groups
	       These regular expressions are used to exempt certain groups from certain filters;
	       for example, groups expected to contain spam reports, example spams, NoCeM
	       notices, etc.  These are not in cleanfeed.conf; if you need to add to them please
	       let me know.

       After modifying the filter file, always check for mistakes by typing:

	perl -cw filter_innd.pl (or cleanfeed or whatever you called it)

       There should be no errors and no warnings.

       You can check cleanfeed.conf with:

	perl -cw cleanfeed.conf

       You will get several warnings about variables being used only once; these can be ignored.

       If you are running INN, you can modify the file and reload it with ctlinnd reload
       filter.perl meow while the server is running.  The configuration in f<cleanfeed.conf> will
       be reloaded at this time as well.

       With the Highwind servers, modifying the program will require a server restart (use the
       bin/restart script).  Note that this will result in all connections (including newsreader
       clients) being dropped.	This is not my fault. :)

       When in standalone mode, configuration from cleanfeed.conf can be reloaded by sending
       Cleanfeed a SIGHUP.

       I have no idea what NNTPRelay does, but I'm guessing it needs a restart as well.

       IMPORTANT NOTE:	A common mistake is not setting file permissions on
       cleanfeed/filter_innd.pl, cleanfeed.conf, and cleanfeed.local so that they are readable by
       the news user.  Please double-check your permissions!  If Cleanfeed is running, and fails
       to successfully load cleanfeed.conf, it will use the default settings instead of those you
       specified in the config file.

INSTALLATION - INN
       These instructions assume you have the Perl hooks compiled into INN.  If you don't, you
       will need to add them and rebuild the INN distribution before proceeding.

       With INN, Perl is embedded into the innd program.  The filter file defines subroutines
       that are called by innd at the appropriate times.

       SYSTEM REQUIREMENTS

       In order to run Cleanfeed with INN, you will need:

       o   INN 1.5.1 or later (1.7.2+insync1.1d or 2.1 recommended)

       o   Perl 5.004 or later

       o   Perl hooks compiled into INN

       o   The MD5 Perl module

       INN is available from:
	   http://www.isc.org/inn.html

       The Insync distribution of INN (highly recommended if you aren't running INN 2.1) is
       available from:
	   http://www.insync.net/~aos/inn.html

       The MD5 Perl module is available from:
	   http://www.perl.com/CPAN-local/modules/by-module/MD5/

       Perl itself is available from the Perl home page:
	   http://www.perl.com/

       PATCHES AND STUFF

       INN 2.0 includes everything you need to run Cleanfeed, except the MD5 Perl module.

       With earlier versions, Cleanfeed requires some patches to INN in order to function
       properly.

       If you are running INN 1.7.2+insync1.1d, you already have the original filter.patch and
       the dynamic-load.patch;	You need only apply the upgrade.patch.

       None of these patches are against INN 2.1; the "extra feature" ones like mode.patch may
       not apply to 2.1.  Ports are always welcome.

       filter.patch
	   This patch provides the basic functionality for Cleanfeed by making some extra headers
	   available to the Perl filter, as well as message bodies.  This patch was changed in
	   version 0.95.3.  It is against INN 1.7.2 and should be applied in the innd directory.
	   This patch is included in the insync "megapatch" for INN as of version 1.1c, so if you
	   are running this version of INN you need not apply this patch.  Not necessary for INN
	   2.x.

       dynamic-load.patch
	   This patch enables INN's Perl interpreter to load dynamic modules.  It is necessary
	   for MD5 support.  The patch is against INN 1.7+insync and should be applied in the lib
	   directory (NOT the innd directory).	It applies cleanly to other versions of INN
	   including 1.5.1 and 1.7.2.  This patch is included in the insync "megapatch" for INN
	   as of version 1.1d, so if you are running this version of INN you need not apply this
	   patch.  Not necessary for INN 2.x.

	   If you are still using INN 1.5.1, you can use dynamic-1.5.1.patch instead.

	   In order to compile INN with the new patch, you need to edit the PERL_LIB entry in
	   config.data.  Type this command at the shell, and paste its output into config.data as
	   PERL_LIB:

	       perl -MExtUtils::Embed -e ldopts

	   Most systems also allow you to simply enter that line in backquotes as PERL_LIB.

	   This patch requires Perl 5.004 or later!  INN will not compile linked with Perl 5.003
	   after following these instructions!

	   AIX: There is a problem with Perl dynamic loading from INN under the AIX operating
	   system.  In simple terms, it doesn't work.  This seems to be a problem with the gcc
	   compiler.  Success has been reported by rebuilding both Perl and INN with IBM's
	   commercial compiler CSet (a.k.a. xlC).

	   Solaris: There have been multiple reports of Cleanfeed not working under Solaris if
	   any part of the system -- INN, Perl, or the MD5 module -- are compiled using egcs.
	   Success has been reported by recompiling everything with gcc, and by upgrading to the
	   very newest egcs.

       upgrade.patch
	   For current users of Cleanfeed, this is a patch for an already-patched INN, or for
	   1.7.2+insync1.1d, to bring you up to the new version of the Cleanfeed patch.  Not
	   applying this patch right now will only lose you a couple of filters, and nothing will
	   break if you don't apply it (no changes to the filter source or configuration will be
	   required).

       messageid.patch
	   This is a patch which adds a new Perl hook to innd, filter_messageid.  This allows you
	   to run a Perl subroutine against each message-id as it is offered to your server, and
	   decide whether to refuse the article before it is even sent to your server.	Cleanfeed
	   includes a small filter_messageid.  This patch is entirely optional.

       mode.patch
	   This patch adds a line to INN's ctlinnd mode output for Perl filter status.	The
	   output line is generated by the filter_stats subroutine.  The default output contains
	   the number of articles accepted, rejected and refused since the filter started, and
	   the sizes of the EMP, Message-ID, and Excessive Supersedes hashes.  If timer_info is
	   enabled, this will also include the rate in articles per second (rounded to the
	   nearest tenth) at which articles were examined (total sent through the filter) and
	   accepted by the filter, averaged over the timer_interval number of seconds.

       After applying the patches, rebuild all of INN and do a "make update".  The first patch
       (filter.patch) only requires innd to be rebuilt, but the dynamic-load.patch requires you
       to rebuild the whole distribution.  Current users upgrading with upgrade.patch need only
       rebuild innd and reinstall that executable.

       Thus:

	   cd inn    [to the top-level source directory]
	   make clean
	   cd innd
	   cp wherever/filter.patch .	  [from the Cleanfeed distribution]
	   patch <filter.patch
	   cd ../lib
	   cp wherever/dynamic-load.patch   [from the Cleanfeed distribution]
	   patch <dynamic-load.patch
	   cd ../config
	   emacs config.data	[edit the PERL_LIB entry as above]
	   make all
	   make update

       Finally, you need to install the MD5 Perl module, no matter what version of INN you are
       running.

       INSTALLING CLEANFEED - INN

       In INN 1.7.2 and earlier, the location where INN looks for the Perl filter is set in
       config.data, as _PATH_PERL_FILTER_INND.	By default, the filename is filter_innd.pl.  The
       Cleanfeed filter program file should be installed in this location.  INN comes with an
       example filter_innd.pl file; move this file (or whatever other filter is in place) out of
       the way first.

       Before putting the filter in place, edit the file, changing $config_dir to the location of
       your cleanfeed.conf file.

       After editing the file, always check for errors with the command:

	   perl -cw filter_innd.pl

       Once the file is in place, tell innd to reload it:

	   ctlinnd reload filter.perl meow

       And, if Perl filtering is currently disabled, enable it:

	   ctlinnd perl y

       Now, you can watch it working by looking at your news.notice log:

	   tail -f /var/log/news/news.notice

       If your server is running a full feed, you should start seeing a constant stream of
       rejections almost immediately.

INSTALLATION - HIGHWIND SERVERS
       The various Highwind server packages (Cyclone, Typhoon, and Breeze) all have the same
       external filter interface.  The filter runs as its own process, reading from standard
       input and writing to standard output.

       SYSTEM REQUIREMENTS

       In order to run Cleanfeed with a Highwind server, you will need:

       o   Cyclone, Typhoon or Breeze

       o   Perl 5.003 or later

       o   The MD5 Perl module

       The Highwind servers are commercial products.  For more information:
	   http://www.highwind.com/

       The MD5 Perl module is available from:
	   http://www.perl.com/CPAN-local/modules/by-module/MD5/

       Perl itself is available from the Perl home page:
	   http://www.perl.com/

       INSTALLING CLEANFEED - HIGHWIND

       The Cleanfeed program file should be installed as "cleanfeed" in your news server's bin
       directory (cyclone/bin, etc).  Make it owned by news:news and make it executable.

       Before putting the filter in place, edit the file, changing $config_dir to the location of
       your cleanfeed.conf file.  Also ensure that the shebang line (the first line of the file,
       starting with #!) points to the correct location of your perl executable.

       After editing the file, always check for errors with the command:

	   perl -cw cleanfeed

       There should be no warnings.

       Now, edit your bin/start script.  You need to add two options to the command line that
       starts up the server process, the -program option to tell it what program to use as a
       filter, and the -body option to tell it to send the bodies as well as the headers.

       typhoond -program /typhoon/bin/cleanfeed -body

       ...along with whatever else you have cluttering up the command line.

       (Highwind has indicated that this may/will be a config file option in a future release.)

       Now you can restart the server with the bin/restart script.  Check to make sure Cleanfeed
       is running, with "ps -ef" or "top".  If Cyclone/Typhoon is unable to start the filter for
       some reason, it will log an error via syslog.  The error will not be terribly helpful.

       You can make Cleanfeed reload its configuration from cleanfeed.conf and local code from
       cleanfeed.local by sending it a SIGHUP.

INSTALLATION - NNTPRELAY
       Please note that I do not have an NNTPRelay server, nor access to one, nor much interest
       in mucking around with Windows NT, and thus I have not tested the NNTPRelay filtering
       support myself.	The necessary changes and notes were contributed by someone else.
       Additions and improvements to this documentation would be most welcome.

       The filter interface in NNTPRelay is pretty much the same as in the Highwind servers.

       SYSTEM REQUIREMENTS

       In order to run Cleanfeed with NNTPRelay, you will need:

       o   NNTPRelay version 1.1b4 or later

       o   Perl 5.003 or later

       o   The MD5 Perl module

       NNTPRelay is available from:
	   http://nntprelay.maxwell.syr.edu/

       An NT binary release of Perl 5.004, which apparently includes the MD5 module, can be found
       at:
	   http://www.perl.com/CPAN/ports/win32/Standard/x86

       The MD5 module (in source code) can be found at:
	   http://www.perl.com/CPAN-local/modules/by-module/MD5/

       INSTALLING CLEANFEED - NNTPRELAY

       Before putting the filter in place, edit the file, changing $config_dir to the location of
       your cleanfeed.conf file.

       Install the Cleanfeed program file wherever is appropriate on your system, as
       "cleanfeed.pl".	Edit NNTPRelay's config.txt file, adding an entry like this:

	   ExternalFilter=c:/perl/bin/perl.exe c:/news/cleanfeed.pl

       Of course, use the correct path to your Perl executable and to the Cleanfeed program file.
       Now restart NNTPRelay.  If you defined a logfile in Cleanfeed, it should appear.

THE HACKER'S GUIDE
       Cleanfeed will look for a file called cleanfeed.local, in the same directory as
       cleanfeed.conf.	If this file exists, it will be loaded and evaluated as Perl code right
       after the config file.  This enables you to provide your own local filter code which will
       survive an upgrade of the main Cleanfeed source.

       It will be reloaded when the filter is reloaded with ctlinnd reload filter.perl meow (for
       INN), or when configuration is reloaded with a SIGHUP (in standalone mode).  This means
       that you can modify the running code without restarting Cleanfeed.

       cleanfeed.local can define a number of different subroutines, which, if defined, will be
       called at various points in the filter process.	Other subroutines can, of course, be
       defined as required by your code.

       The file is simply re-evaluated each time.  So, if you remove a subroutine from the file
       completely, that subroutine will remain defined after the reload, because nothing replaced
       it.  You will need instead to define it as an empty subroutine, or explicitely undef it,
       to make it go away.

       STUFF YOU CAN DEFINE

       Cleanfeed will call the following subroutines, if they are defined.  See the section on
       return values for instructions on what your code should return.

       local_config
	   This is called after configuration is loaded, each time.  It will be called when the
	   filter is reloaded (with INN) or when configuration is reloaded with SIGHUP (running
	   standalone), as well as when the filter is first run.  No return value is expected.

       local_filter_before_emp
	   Called for each (non-control) article, before any other filters.  General-purpose spam
	   filters shouldn't go here, because you really want to populate the EMP hashes first.

       local_filter_after_emp
	   Called for each (non-control) article, after the EMP filters but before any other
	   filters.

       local_filter_middle
	   Called for each (non-control) article, after the "simple" filters but before the
	   "expensive" body checks.

       local_filter_scoring
	   Called during the scoring filter.  Return the value, positive or negative, by which to
	   adjust the article's score.

	   Warning:  Here there be dragons!  If you're going to play with this please examine the
	   existing source, and use the debugging routines to watch what you're doing.

       local_filter_last
	   Called for each (non-control) article, after all other filters are done.

       local_filter_cancel
	   Called for all cancel control messages.

       local_filter_newrmgroup
	   Called for all newgroup and rmgroup control messages.

       RETURN VALUES

       The general filtering subroutines you can define (local_filter_before_emp,
       local_filter_after_emp, local_filter_middle, local_filter_last, local_filter_cancel, and
       local_filter_newrmgroup) are expected to return a value indicating whether you want to
       accept the article being examined.  If the article is okay, you should return "" (empty
       string), in which case filtering will proceed as usual.	If you want to reject the
       article, you return any other string, which will be used as the reason.

       The rejection code actually expects two return values -- the first string is the "verbose"
       rejection message, and the second is the "non-verbose" message (see the verbose
       configuration option).  If only one is supplied, it will be used for both purposes.

       The scoring filter calls local_filter_scoring, which is expected to return the value,
       postive or negative, by which the article's score should be adjusted.

       WHAT YOU GET

       Your subroutines get information about the article in several variables.

       %hdr
	   A hash containing the article headers.  The key is the header name, in "canonical"
	   case as INN likes them; the value is the content of the header.  When running under
	   INN, only headers known to INN will be included in the hash (which includes any header
	   used anywhere in Cleanfeed).  In standalone mode, all headers will be present, but
	   only the known headers will be sent in canonical case; others will have the header
	   name (and thus hash key) in whatever case they are in the article itself, making them
	   difficult to find and use consistently.

	   The message body is in this hash under the key __BODY__.  If running INN 2.x with
	   storageapi, it will be provided in wireformat, with lines terminated in \r\n rather
	   than just \n.  With the traditional spool format (and in all cases with INN prior to
	   2.x) lines will be terminated only with \n.

	   Examples:

	   To get the Subject header as a scalar:  $hdr{'Subject'}

	   To get the entire message body as a scalar:	$hdr{'__BODY__'}

       %lch
	   A hash containing lowercased versions of some of the article headers.  The hash keys
	   are the header names in all lowercase; the values are the contents of the headers,
	   with all letters forced to lowercase.

	   Currently, the only headers added to this hash are From, Organization, Subject,
	   Content-Type, X-Newsreader, X-Newsposter, Message-ID, and Sender.

	   This hash is not availabe to local_filter_before_emp.

       @groups
	   An array containing the newsgroups the article is posted to (from the Newsgroups
	   header).  You can find out how many groups the article is crossposted to with "scalar
	   @groups".

       @followups
	   An array containing the newsgroups to which followups are set (from the Followup-To
	   header).  If the article has no Followup-To header, this array will be identical to
	   @groups.  You can find out how many groups followups are set to with "scalar
	   @followups".  This is the preferred way to limit crossposting, because limiting only
	   by the Newsgroups header will catch FAQs and such.

       $lines
	   The number of lines in the message body.  This is not taken from the Lines header as
	   that can be client-supplied to fool filtering; this is determined by counting the
	   lines in the message body.

       %gr A hash containing information about the groups the article is posted to.  This isn't
	   very straightforward and may not be useful to you, but I'm including it in this
	   documentation for completeness.  The following entries may be present in this hash:

	   $gr{'net'} - the number of net.* (Usenet II) newsgroups the article is posted to, if
	   any.

	   $gr{'other'} - the number of non-net.* groups the article is posted to.

	   $gr{'md5skip'} - true if the article should be exempted from the MD5 body checks (if
	   all newsgroups match the regexp in md5exclude).

	   $gr{'binary'} - true if the article is posted only to groups where binaries are
	   allowed (if all newsgroups match bin_allowed).

	   $gr{'html'} - true if the article is posted only to groups where html is allowed (if
	   all newsgroups match html_allowed).

	   $gr{'poison'} - number of 'poison' newsgroups this article is posted to (matching
	   poison_groups).  If this is present, you'll only see this entry in
	   local_filter_before_emp and local_filter_after_emp because it will be rejected after
	   that.

	   $gr{'abuse'} - number of 'net abuse' newsgroups this article is posted to (matching
	   net_abuse_groups).

	   $gr{'reports'} - number of 'spam reports' newsgroups this article is posted to
	   (matching spam_report_groups).

	   $gr{'low_xpost'} - number of 'low crosspost limit' groups this article is posted to
	   (matching low_xpost_groups).

	   $gr{'mod'} - number of moderated groups this article is posted to (requires that
	   Cleanfeed have an active file).

	   $gr{'allmod'} - true if this article is posted only to moderated groups.

	   $gr{'faq'} - true if this article is crossposted to news.answers.

       %config
	   A hash containing all configuration options.

       DEBUGGING

       When you make filtering changes, you should always check the results for false positives.
       I've provided two subroutines to help you do this: writeheaders() and writefull().

       First, make sure debug_batch_directory is set in your configuration.  Set this to a
       directory that is writable by the news user.

       Call either of these subroutines with one argument, the basename of the batch file you
       want to write the current article to.  writeheaders will dump the article's headers out to
       the file (with INN this will only give you the known headers).  writefull will dump the
       full article, headers (again, known headers with INN) and body.	The file will be rotated
       if it becomes larger than debug_batch_size, set in your configuration.  The rotation is
       simple, a number is appended to the end of the file, and incremented until the filename
       does not exist.	You'll have to delete the old files yourself.

       When testing a new filter, simply call writeheaders ("batchfile") or writefull
       ("batchfile") when you're going to reject an article.  Then you can look at the file to
       make sure you're doing what you think you're doing.

SIGNALS
       When running under Cyclone, Typhoon, Breeze, or NNTPRelay (standalone mode), Cleanfeed
       will catch SIGHUP, and reload its configuration from cleanfeed.conf.  It will also reload
       and reevaluate cleanfeed.local if you're using it.  Note that, unlike INN, there is no way
       to reload the filter code itself without restarting the server.

       Cleanfeed in standalone mode will also catch SIGUSR1 and write its crude current-status
       file (see statfile in the config section) on the next cycle through the filter.

       (I honestly don't know if SIGUSR1 and SIGHUP are things which exist on NT for NNTPRelay.)

CREDITS
       Written by Jeremy Nixon <jeremy@exit109.com>.

       Originally based on Jeff Garzik's EMP filter.

       I can't possibly mention everyone who has submitted ideas or fixes for the filter, but I'd
       like to acknowledge the substantial contributions of several people:  Danhiel Baker, Frank
       Copeland, Brian Moore, John Payne, Russ Allbery, David Riley, and SeokChan LEE.	Thanks,
       guys.

       dynamic-load.patch is from Piers Cawley.  The body-filtering portion of the INN
       filter.patch is from Jeff Garzik.  messageid.patch is from Ed Mooring.  mode.patch is from
       John Payne.

COPYRIGHT
       Copyright 1997-1998 by Jeremy Nixon, All Rights Reserved.

LICENSE
       This software may be distributed freely, provided it is intact (including all the files
       from the original archive).  You may modify it, and you may distribute your modified
       version, provided the original work is credited to the appropriate authors, and your work
       is credited to you (don't make changes and pass them off as my work), and that you aren't
       charging for it.

AVAILABILITY
       This filter is available at:

       http://www.exit109.com/~jeremy/news/antispam.html ftp://ftp.exit109.com/users/jeremy/

3rd Berkeley Distribution		 Version 0.95.7b			     cleanfeed(8)


All times are GMT -4. The time now is 08:15 PM.

Unix & Linux Forums Content Copyrightę1993-2018. All Rights Reserved.
×
UNIX.COM Login
Username:
Password:  
Show Password