Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Answers to Frequently Asked Questions Email Antispam Techniques and Email Filtering Stopping Language Character Set Spam Post 34315 by Neo on Saturday 15th of February 2003 02:03:59 PM
Old 02-15-2003
Stopping Language Character Set Spam

I have noticed a significate decrease in spam by prodmail filtering on the language character sets that I don't read.

Nothing against our fine friends in Japan, China, Korea and all over the world who use different languages, but I now find that over 50% of my mountains of spam daily is unreadable language sets.

Here is my current set of procmail recipes for this problem that is working well based on language sets I cannot and do not read. If you can read these and need them, it is easy to remove them from the recipe:

Code:
# PARTIAL LIST OF CHARSET SPAM
#big5   - chinese
#gb2312 - chinese
#koi8-r     - cyrillic
#iso-8859-2 - Latin-2 für Eastern Europe
#iso-ir-111 - cyrillic (ECMA)
#iso-8859-5 - cyrillic
#euc-kr         - korean
#ks_c_5601-1987 - korean
#iso-2022-kr    - korean
#euc-jp         - japanese
#iso-2022-jp    - japanese

:0:
* charset.*ks_c_5601|euc-kr|3Deuc-kr|euc-kr|big5|gb2312|utf-8|koi8|iso-ir-111
charset_spam

:0:
* charset.*iso-8859-[2-8]|euc-jp|iso-2022|windows-125
charset_spam


:0:
* charset.*shift_jis|x-johab|x-unified-hangul
charset_spam


:0:
* charset.*cn-gb|cn-big5|utf-8|x-euc-tw|iso_2022_cn
charset_spam

:0:
* ^Subject:*ks_c_5601-1987|euc-kr|3Deuc-kr|euc-kr|big5|gb2312|utf-8
charset_spam

I recreated the simple filters above by looking at lots of spam and also visiting sites that list charsets:

http://www.terena.nl/library/multili...ncharsets.html

http://java.sun.com/j2se/1.4.1/docs/...oding.doc.html

http://msdn.microsoft.com/workshop/d...ce/charset.asp

Anyone have any ideas or suggestions for improving these recipes? They are working OK and I'm refining them daily..... Neo
 

8 More Discussions You Might Find Interesting

1. Solaris

latin 2 character-set with xterm

Hi, We have problems with the latin 2 Character-set with xterm. We have installed SunRay-Server with Solaris 8. Our Thinclients use hu- and cz-keyboards. I have set the right local-settings and xmodemaps. If I use the dtterm all is running fine. As soon as I use the xterm, it cannot display... (0 Replies)
Discussion started by: paho
0 Replies

2. Programming

character set solaris

hi , i am trying to work on a script that transforms some special Dutch characters and send them to a Xerox printer .. the problem is that while doing so iam unable to identify th correct character set that is used by solaris , to transfer these characcters to Xerox character set . thanks... (2 Replies)
Discussion started by: ppass
2 Replies

3. UNIX for Advanced & Expert Users

iconv -l and ANSEL character set

I am forced to use the ANSEL character set for some GEDCOM documents but must convert them to a more modern set for another app which doesn't recognize ANSEL. I am unable to locate an ISO code for ANSEL in a search of the web. Would someone plese identify the ANSEL character set from the list given... (4 Replies)
Discussion started by: Whiterock
4 Replies

4. Shell Programming and Scripting

Unix character set problem

Hi All, We are getting file into our unix box with multibyte characters. When we tried to view the file the record looks like this Frédéric Actually the data sent to us is Frédéric --> my locale charmap of unix is set to UTF8 only ... but still i am getting this problem. I... (6 Replies)
Discussion started by: sandeeppvk
6 Replies

5. Solaris

help me to change the character set

dears i am using solaris 10 i am facing a problem when i make setup for solaris i choose the country egypt and i select the language north america but i forget to do that the i found the date Jun written in arabic i want to change character set to written in english -rw-r--r-- 1 root ... (4 Replies)
Discussion started by: hosney00ux
4 Replies

6. UNIX for Advanced & Expert Users

ASCII Character Set

I thought I would point this out. This has a lot of the non printing characters. ASCII Character Set (7 Replies)
Discussion started by: cokedude
7 Replies

7. UNIX for Dummies Questions & Answers

Character set problem

Hi, I'm trying to edit a file with vi, but all special characters (áéíóú etc) don't seem to show correctly. They don't seem to be supported by the OS (SunOS 5.10). I'm using MobaXterm as the terminal emulator, which is configured to use ISO-8859-1. The same charset is used on Solaris. If I open... (4 Replies)
Discussion started by: Subbeh
4 Replies

8. Shell Programming and Scripting

How to set character limit on READ?

Hello, I created the following (snippet from larger code): echo -n "A1: " read A1 VERIFY=$(echo -n $A1|wc -c) if ; then echo -e "TOO MANY CHARACTERS" fi echo -n "A2: " read A2 echo -n "A3: " read A3 echo -e "Concat: $B1/$B2/$B3" Basically what it does is it... (4 Replies)
Discussion started by: jl487
4 Replies
Encode::KR(3pm) 					 Perl Programmers Reference Guide					   Encode::KR(3pm)

NAME
Encode::KR - Korean Encodings SYNOPSIS
use Encode qw/encode decode/; $euc_kr = encode("euc-kr", $utf8); # loads Encode::KR implicitly $utf8 = decode("euc-kr", $euc_kr); # ditto DESCRIPTION
This module implements Korean charset encodings. Encodings supported are as follows. Canonical Alias Description -------------------------------------------------------------------- euc-kr /euc.*kr$/i EUC (Extended Unix Character) /kr.*euc$/i ksc5601-raw Korean standard code set (as is) cp949 /(?:x-)?uhc$/i /(?:x-)?windows-949$/i /ks_c_5601-1987$/i Code Page 949 (EUC-KR + 8,822 (additional Hangul syllables) MacKorean EUC-KR + Apple Vendor Mappings johab JOHAB A supplementary encoding defined in Annex 3 of KS X 1001:1998 iso-2022-kr iso-2022-kr [RFC1557] -------------------------------------------------------------------- To find how to use this module in detail, see Encode. BUGS
When you see "charset=ks_c_5601-1987" on mails and web pages, they really mean "cp949" encodings. To fix that, the following aliases are set; qr/(?:x-)?uhc$/i => '"cp949"' qr/(?:x-)?windows-949$/i => '"cp949"' qr/ks_c_5601-1987$/i => '"cp949"' The ASCII region (0x00-0x7f) is preserved for all encodings, even though this conflicts with mappings by the Unicode Consortium. SEE ALSO
Encode perl v5.12.1 2010-04-26 Encode::KR(3pm)
All times are GMT -4. The time now is 03:35 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy