Vote
0
New techniques of spamming…
Angsuman Chakraborty
July 28th, 2004
For quite sometime naive bayesian classifier based SPAMBayes filtered my emails very accurately with very few false positives.
Recently however I have noticed few trends in spamming which are alarming in nature.
- Database poisoning: Using otherwise innocuous words (ham words) in a SPAM, thereby effectively poisoning the database in the long run
- Junk Tags: Hiding spam words by inserting invalid HTML tags in between words. Any HTML parser ignores tags it doesn’t understand, thereby resulting in properly viewable document
- Invalid Words: Spam word like mortgage etc. are masked by inserting special characters or junk characters in between.
Solutions I could think of:
- Most of the database poisoning email tend to be classified in Not Sure category. I suggest that you delete them instead of classifying them as spam. However it still requires that we spend some time for it which is what I don’t like.
- Junk Tags: Add a filter in front of bayesian classifier to eliminate junk tags
- Invalid Words: No-exact matching algorithms from Lucene etc. should help.
I have recently noticed a significant increase in mortgage spams. It should be easy to tackle them by legal means.
Overall the game is becoming tougher for spam prevention. A combination of existing techniques are required for any spam filters to remain effective.
Looking forward to hear your thoughts.
Filed under Spam Watch, Web |
|
RSS 2.0 |
Email this Article
You may also like to read |



Add to Technorati Favorites

July 29th, 2004 at 2:49 am
I have tried all the software solutions to twarting spam. I have yet to see one that works as good as simply owning a domain and creating many email addresses. One for each site I visit. Like the one I used here. If I start getting spam from that address, I simply forward it to null@null.net and that’s that. I have about 30 email addresses generating well over 250 spams a day. They are all being forwarded to null@null.net (Sure hope no one ever gets that address).
I *NEVER* give out my main email address to anyone! All the non spam addresses get forwarded to my real email account so I can read them and respond to them. Sure, at that point my real address get’s sent out. However, it’s not accidently published on the web. At least not by posting it on a blog or a web store.
October 15th, 2004 at 5:31 am
I facing the same problem. The new genre of spam that I noticed was that a bunch of unrelated words were pushed in at the end of the e-mail. These words are really rare words gathered from different contexts.
Do you have any suggestions for it?