Click
Here
for more articles |
|
|
How
Spammers Fool Bayesian Filters - And How
to Stop Them |
by:
Paul
Judge, CTO, CipherTrust, Inc. |
Effectively
stopping spam over the long-term requires
much more than blocking individual IP addresses
and creating rules based on keywords that
spammers typically use. The increasing sophistication
of spam tools coupled with the increasing
number of spammers in the wild has created
a hyper-evolution in the variety and volume
of spam. The old ways of blocking the bad
guys just don't work anymore.
Examining spam and spam-blocking technology
can illuminate how this evolution is taking
place and what can be done to combat spam
and reclaim e-mail as the efficient, effective
communication tool it was intended to be.
One method used to combat spam is Bayesian
Filtering. Named after Thomas Bayes, an
English mathematician, Bayesian Logic is
used in decision making and inferential
statistics. Bayesian Filers maintain a database
of known spam and ham, or legitimate email.
Once the database is large enough, the system
ranks the words according to the probability
they will appear in a spam message.
Words more likely to appear in spam are
given a high score (between 51 and 100),
and words likely to appear in legitimate
email are given a low score (between 1 and
50). For example, the words "free" and "sex"
generally have values between 95 and 98,
whereas the words "emphasis" or "disadvantage"
may have a score between 1 and 4. Commonly
used words such as "the" and "that", and
words new to the Bayesian filters are given
a neutral score between 40 and 50 and would
not be used in the system's algorithm.
When the system receives an email, it breaks
the message down into tokens, or words with
values assigned to them. The system utilizes
the tokens with scores on the high and low
end of the range and develops a score for
the email as a whole. If the email has more
spam tokens than ham tokens, the email will
have a high spam score. The email administrator
determines a threshold score the system
uses to allow email to pass through to users.
Bayesian filters are effective at filtering
spam and minimizing false positives. Because
they adapt and learn based on user feedback,
Bayesian Filers produce better results as
they are used within an organization over
time. They are not, however, foolproof.
Spammers have learned which words Bayesian
Filters consider spammy and have developed
ways to insert non-spammy words into emails
to lower the message's overall spam score.
By adding in paragraphs of text from novels
or news stories, spammers can dilute the
effects of high-ranking words. Text insertion
has also caused normally legitimate words
that are found in novels or news stories
to have an inflated spam score. This may
potentially render Bayesian filters less
effective over time.
Another approach spammers use to fool Bayesian
filters is to create less spammy emails.
For example, a spammer may send an email
containing only the phrase, "Here's the
link.". This approach can neutralize the
spam score and entice users to click on
a link to a Web site containing the spammer's
message. To block this type of spam, the
filter would have to be designed to follow
the link and scan the content of the Web
site users are asked to visit. This type
of filtering is not currently employed by
Bayesian filters because it would be prohibitively
expensive in terms of server resources and
could potentially be used as a method of
launching denial of service attacks against
commercial servers.
As with all single-method spam filtering
methodologies, Bayesian filters are effective
against certain techniques spammers use
to fool spam filters, but are not a magic
bullet to solving the spam problem. Bayesian
filters are most effective when combined
with other methods of spam detection.
The Solution
When used individually, each anti-spam technique
has been systematically overcome by spammers.
Grandiose plans to rid the world of spam,
such as charging a penny for each e-mail
received or forcing servers to solve mathematical
problems before delivering e-mail, have
been proposed with few results. These schemes
are not realistic and would require a large
percentage of the population to adopt the
same anti-spam method in order to be effective.
You can learn more about the fight against
spam by visiting our website at www.ciphertrust.com
and downloading our whitepapers.
About the author:
Dr. Paul Judge is a noted scholar and entrepreneur.
He is Chief Technology Officer at CipherTrust,
the industry's largest provider of enterprise
email security. The company's flagship product,
IronMail provides a best of breed enterprise
anti spam solution designed to stop
spam, phishing attacks and other email-based
threats. Learn more by visiting www.ciphertrust.com/products/spam_and_fraud_protection
today.
Circulated by Bandoni
Media
|
|