Spam Filtering: Understanding SEP and CEP


 
Thread Tools Search this Thread
Special Forums News, Links, Events and Announcements Complex Event Processing RSS News Spam Filtering: Understanding SEP and CEP
# 1  
Old 04-14-2008
Spam Filtering: Understanding SEP and CEP

Greg Reemler
Mon, 14 Apr 2008 04:56:52 +0000

In order to*help folks*further understand the differences between CEP and SEP, prompted by*Marc’s reply in the blogosphere, More Cloudy Thoughts, here is the scoop.
In the early days of spam filtering, let’s go back around 10 years, detecting spam was performed with rule-based systems.* In fact, here is a link to one of the first papers that documented rule-based approaches in spam filtering, E-Mail Bombs and Countermeasures: Cyber Attacks on Availability and Brand Integrity published in IEEE Network Magazine, Volume 12, Issue 2, p.10-17 (1998).** At the time, rule-based approaches were common (the state-of-the-art)*in antispam filtering.
Over time, however, the spammers get more clever and they find many ways to poke holes in rule-based detection approaches.* They learn to write with spaces between the letters in the words, they change the subject and message text frequently, they randomize their originating IP addresses, they use IP addresses of your best friends, they changed the timing and frequency of the spam, etc. ad infinitium.
Not to sound like an elitist for speaking the truth,* but the more operational experience you have with detection-oriented solutions, the more you will understand that rule-based approaches (alone)*are not scalable nor efficient.**If you followed a rules-based approach (only),*against*heavy, complex spam (the type of spam we see in cyberspace today), you would spend much of your time writing rules and still not stop very much of the spam!
The same is true for the security situation-detection example in Marc’s example.
Like Google’s Gmail spam filter, and Microsoft’s old Mr Clippy (the goofy help algorithm of the past), you need detection techiques that use advanced statistical methods to detect complex situations as they emerge.* With rules, you can only detect simple situations unless you have a tremendous amount of resources to build a maintain very complex rule bases (and even then rules have limitations for real-time analytics).
We did not make this up at Techrotech, BTW.** Neither did our favorite search engine and leading free email provider, Google!***
This is precisely why Gmail has a great spam filter.***Google detects spam with a Bayesian Classifer, not a rule-based system.*** If they used (only) a rule-based approach, your Gmail inbox would be full of spam!!!*
The same is true for search and retrieval algorithms, but that is a topic for another day.* However, you can bet your annual paycheck that Google uses a Bayesian type of classifer in their highly confidential search and retreival (and - hint - classification) algorithms.
In closing, don’t let the folks selling software and analysts promoting three-letter-acronyms (TLAs)*cloud your thinking.*
What we are seeing*in the market place, the so-called CEP market place, are simple event processing engines.* CEP is already happening in the operations of Google, a company that*needs real-time CEP for spam filtering and also for search-and-retrieval.* We also see real-time CEP*in top quality security products that use advanced neural networks, and Bayesian networks,*to detect problems (fraud, abuse,*denial-of-service attacks, phishing, identity theft)*in cyberspace.
Image Image Image Image Image Image Image Image


Source...
Login or Register to Ask a Question

Previous Thread | Next Thread

5 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bring values in the second column into single line (comma sep) for uniq value in the first column

I want to bring values in the second column into single line for uniq value in the first column. My input jvm01, Web 2.0 Feature Pack Library jvm01, IBM WebSphere JAX-RS jvm01, Custom01 Shared Library jvm02, Web 2.0 Feature Pack Library jvm02, IBM WebSphere JAX-RS jvm03, Web 2.0 Feature... (10 Replies)
Discussion started by: kchinnam
10 Replies

2. Shell Programming and Scripting

help with tar & zip only last months(say,Sep) files

Need to 1. archive all the files in a directory from the previous month into a tar/gz file, ignoring all already archived 'tar.gz' files 2. Check created .tar.gz file isnt corrupted and has all the required files in it. and then remove the original files. I am using a function to get the... (1 Reply)
Discussion started by: Prev
1 Replies

3. Shell Programming and Scripting

Script using Sed :Search all patterns & after the last Patter, insert a newLine with Comma Sep Value

I am trying to search the pattern "ARS (11)" and after the LAST pattern, i am trying to open new line and enter text using sed. My Existing Text file is Users.txtpaul, Paul Smith, Stevn Smiley, REQ000001, ARS (11) sam, Sam Martin, Stevn Smiley, REQ000001, ARS (11) mike, Mike Conway, Stevn... (8 Replies)
Discussion started by: evrurs
8 Replies

4. Ubuntu

Spam filtering with Postfix + Spamassasin + maildrop

Hello all, I would like to ask for your help. I'm trying to configure my Ubuntu Server 8.10 x64 to act as a mail server. I'd like to make it able to filtering spam emails. So, I've followed this tutorial to set up the needed configs and packages. After that, I changed my config files, like... (0 Replies)
Discussion started by: subchee
0 Replies

5. Shell Programming and Scripting

need to get the last word in comma sep line

I have a file with aaa,bbb,ccc,dddd,eee,xyz aaa,bbb,ccc,dddd,eee,xyz,12345,rty aaa,bbb,ccc,dddd,eee,xyz,12345,rty,tsrt 1. line columns are not fixed 2. all words are seperated by comma what i want is always the string after last comma. regards, Senthil... (9 Replies)
Discussion started by: senthilk615
9 Replies
Login or Register to Ask a Question