I just received an email which is clearly spam. However SpamBayes thinks that there is a .13% probability that it is a spam.

I have a big corpus - 623 good and 3113 spam messages.

In a previous post I discussed that SpamBayes is not working for me anymore. This is a good example to that effect.

Frankly there isn't much SpamBayes or any naive-bayesian-filter can do about it. Take a look at the message below.

Subject: Re: Spyware

Desktop icons are automatically added to the desktop ?
Suffering from Unexplained home page change ?

It's very likely that they are being served up by spyware software

Try 2005 Highest-Rated Spyware Remover:

Free Download Here: http://[Spam Affiliate Link...edited]

Prevent the installation of hijackers spyware
Prevent the installation of hijackers spyware
Prevent the installation of adware spyware
and other potentially unwanted pests.

Try our online scan now: http://[Spam Affiliate Link...edited]

Q-u^1*t [Spam Affiliate Link...edited]

The message headers are equally uninteresting for SpamBayes. Here is what SpamBayes thinks about it.

Spam Score: 13% (0.130563)

word                                spamprob         #ham  #spam
'*H*'                               0.740598            -      -
'*S*'                               0.001723            -      -
'header:In-Reply-To:1'              0.0879684         164     78
'potentially'                       0.147771            8      6
'page'                              0.175691           91     96
'likely'                            0.195508           18     21
'installation'                      0.197697            8      9
'served'                            0.201793            7      8
'subject:: '                        0.227479          282    414
'software'                          0.241129          112    177
'change'                            0.247864           77    126
'suffering'                         0.252365            4      6
'download'                          0.254508           45     76
'to:addr:angsuman'                  0.262497          411    730
'header:Received:4'                 0.265284           88    158
'added'                             0.288257           29     58
'try'                               0.312818           57    129
'other'                             0.313797          171    390
'prevent'                           0.315326           12     27
'scan'                              0.345157            4     10
'being'                             0.34637            58    153
'very'                              0.360483           95    267
'skip:a 10'                         0.361028          183    516
'now:'                              0.370986           10     29
'that'                              0.375101          345   1034
'they'                              0.375572          101    303
'are'                               0.385233          349   1092
'reply-to:none'                     0.393789          504   1635
'here:'                             0.608336           15    117
'adware'                            0.653949            0      1
'unwanted'                          0.665617            2     21
'2005'                              0.79075             1     22
'spyware'                           0.820111            0      4
'url:discon'                        0.820111            0      4
'url:700'                           0.844931            0      5

Handling this spam is very hard for a N-B-C. It doesn't include any of the standard keywords. It doesn't directly try to sell you anything. The choice of language shows signs of an intelligent spammer. It includes lots of non-spammy yet contextually relevant words which lowers the score. The only spammy word (quit) has been masked. It even includes ham words in the url.

To a human eye this is clearly a spam. However it is not to a computer.

Note: You can possibly assign very high score to the words spyware or adware, but then they can always pollute the word space with misspellings. Also your friends may want to inform you about AdAware, a valid spyware removal tool.

We need layered spam removal approach at source to handle this type of spammers.