How naive bayesian classifier can be made ineffectiveMarch 18th, 2005 A discussion on a failure vector of naive bayesian classifiers
New techniques of spamming...July 28th, 2004 For quite sometime naive bayesian classifier based SPAMBayes filtered my emails very accurately with very few false positives. Recently however I have noticed few trends in spamming which are alarming in nature.
How to combat small-font-spam..March 3rd, 2005 A recent trend in spamming observed in Larry Seltzer's blog and elsewhere is to use very small font size characters to create an ascii art to provide spam messages. What is interesting is that unlike the example above spammers can use legitimate email messages chosen randomly from computers they have hacked to create the ascii art.
Can Bayes Theorem Give Perception To Robots?July 17th, 2006 The Max Planck Institute for Biological Cybernetics is a partner in the Integrated Research Project BACS (Bayesian Approach to Cognitive Systems) where researchers are investigating the extent to which Bayes' theorem can be used in artificial systems capable of managing complex tasks in a real world environment. Computer's and hence robots today fail miserably with incomplete and/or imprecise knowledge.
Total Security Solution for Windows ComputersFebruary 6th, 2006 This is a short list of highly recommended free softwares for total protection of your Windows PC (Computers with Windows Operating System). It protects you from Spyware, Adware, Malware, Virus, Trojans, Spam etc.
Downloading 1 of 4538 Mail MessagesMarch 30th, 2007 I have just started downloading my email and guess what - I have 4538 email messages in queue, most of them obviously spam as even Bill Gates isn't that popular. Unfortunately my new stopgap setup doesn't have a bayesian filter setup so I will have to filter them manually.
Spam + Donation = Spamation? Is it acceptable?March 2nd, 2009 I got this interesting spam in my comments today:
Please, do not delete the given message. Money obtained from spam will go to the help hungry to children ugandf
What do I do?
I figured someone who can stoop so low as to spam is not likely to donate his ill-gotten wealth.
Comment Guard Pro in Final Stages of DevelopmentNovember 4th, 2007 Comment Guard Pro is finally turning out to be the uber anti-comment spam protection plugin I dreamt of; everything you would ever want in a comment spam protection plugin and more, much more. The plugin itself is composed of several modules, each of which can be individually enabled / disabled, tweaked and configured all from the user interface.
How To Report Spam From GMail AccountJune 23rd, 2008 Would you believe if I said that GMail does not provide an email address to report spam? After some investigation I found how you can report GMail spam to Google. You have received spam from GMail user in your GMail account
This one is simple to address.
Landmark judgement of 1 Billion $ awarded to ISP in Anti-SPAM suitDecember 18th, 2004 A wonderful christmas/hanukkah present I must say :)
http://tech.nytimes.com/aponline/technology/AP-Spam-Lawsuit.html
Looks like if you are in Iowa you can claim damages of 10 $ per spam! Wondering if I should shift my residence to IOWA. Looking at the amount of spam I get I could be a millionaire in no time :)
Python: A recipe for cryptic code?April 13th, 2005 I have heard that Python is a great programming language which is so much superior to everything around etc. The following code in python is touted as the world's smallest p2p client & server.
Bad Karma by Spam Karma FilterDecember 24th, 2004 I have been unable to submit a comment in a site with spam karma enabled. So I wrote a comment in the spam karma plugin site.
Beware GMail Goofs Up On Spam ProtectionNovember 28th, 2007 GMail has apparently gone bonkers with its vaunted spam protection feature or maybe spammers have finally managed to successfully corrupt its (naive bayesian?) filters. GMail is generating tons of false positives (valid emails being marked as spam) with no hope in the horizon so far.
Comment Guard Pro: Over 1 Million Comment Spam BlockedDecember 20th, 2007 Ever since we installed Comment Guard Pro, anti comment-spam plugin on this blog, it stopped over 1 millon spam comments (1001011 spam comments were blocked by Angsuman's Comment Guard plugin in 335 days 8 hours 38 minutes. 99.338 % of the comments received during this time were spam.) so far (look in the right sidebar for latest stats).
USA and China Tops Dirty Dozen List of SpammersApril 20th, 2006 The US and China are competing for leadership as the top spam relaying countries. This is a leadership which I am sure US wouldn't mind relinquising and it is improving.
March 14th, 2005 at 8:37 am
For Outlook I use Matador. It cost around $30 but was well worth it. I can tell you that I get about 300 spams over a weekend. Matador catches about 95% of that. I’m still looking for a good one that is free. -Mark
March 17th, 2005 at 5:56 pm
For your amount of SPAM I would say SpamBayes is good enough. It is free.
I get anywhere between 2500-5000 spams everyday. It is just not scalable enough to handle this huge load.
March 18th, 2005 at 12:48 am
Sounds like you need to re-train your filter or change the threshold. I use spambayes to filter out about 1800-2500 spam per day and have been doing so since last spring. So I’m just under your levels. It does suck if you have to restart outlook all the time, but its no so bad once it is up. I used to train it on all spam that I got, but I’ve gotten a little more particular.over time.
Maybe it is time to change email addresses?
March 18th, 2005 at 1:03 am
Thanks for the ideas Glenn. I did retrain once about an year ago. Looks like its time again. What thresholds do you use?
Changing my email is unfortunately not an option because too many people, including my clients, friends etc. have it.
I have been using it for 5-6 years now, maybe more. I still get emails in the old hotmail address
March 18th, 2005 at 2:34 pm
How naive bayesian classifier can be made ineffective
A discussion on a failure vector of naive bayesian classifiers…
March 19th, 2005 at 1:46 am
You should all try POPfile, available freely including perl sourcecode at SourceForge.net. It will run on Windows, Linux and maybe many other OSes having a perl interpreter available. It has POP3, SMTP, NNTP and even IMAP-support and a nice webinterface for configuration and training.
I don’t know if it will scale seamlesly to handle 5000 spams daily, but since it’s open source and and supports mySQL databases, there is a good chance that such an amout won’t be a problem at all, at least after making some modifications to it.
Personally, I’ve been using this piece of cake for about 3 three months now and while receiving about 2500-3000 spams in a period of 30 days, it achives an accuracy of 99,58%. This means, only 12 spams got through and there was only 1 false positive while 2.832 spams were blocked successfully.
March 19th, 2005 at 10:45 pm
I will definitely try that. I have now reinstalled SpamBayes as a pop3 filter and retraining it from scratch. Lets see if it does better then last time.
Already it is complaining that I have too high spam to ham ratio(10-1) and that SpamBayes doesn’t give good results in this scenario.
August 13th, 2005 at 12:07 am
my spambayes is through procmail, which autofilters into Maildir folders for mutt and is retrained every night at 2AM on viewed hams and spams (unvieweds are left out) that have been touched in the last month (old messages are autoarchived after a month, actually). I set the spam threshold to 0.01% spam and mutt is able to display the individual spam score in the spam folder index.
One in maybe five thousand spams get through and I get 9 times the ham as spam. 1400 spam messages per month get filtered on average. I do have a percentage of false positives due to the low threshold, however 90% of the time I don’t want to see them anyways, the rest are from corporatey sales people (less than a dozen a month) who I mostly expect messages from already anyways, which I suspect training on my sent mail would help fix.
I really don’t see naive bayesians as a failure. training on only the last month’s email really helps as the spam corpus changes frequently and retraining nightly helps spot spam evolution by learning the latest 80-90% spam scorers (instead of 100%, which 80% of spam registers as).
August 13th, 2005 at 8:03 pm
@Seth
Thanks for the valuable insight about using only last months email in the corpus.
May 23rd, 2008 at 10:48 pm
how can I get the code of SpamBayes algorithms?