Seems blogs being hit by comment spam is growing worse with each passing day and as such the talk of the town in the MoveableType community has been how to stop as much of it as possible. There’s a thread devoted to this topic over at ScriptyGoddess that lists off five possible solutions including the MT-Blacklist plugin by Jay Allen that I talked about the other day (and which I now have successfully running here at SEB). A couple of the other interesting solutions include a variation on the Captcha Turing Test as an MT Plugin written by James Seng who has since gone on to develop a new method utilizing a Bayesian filtering process that’s even better. In fact, I think he has removed his first plugin from his site.
The Captcha Turing Test is familiar to anyone who has signed up for a Hotmail or Yahoo webmail account. It presents you with a randomly generated code inside of a graphic image that a user has to enter correctly before their account will be set up. The SEB Forums make use of this method of verification during registration. It works pretty well in defeating bots due to the way the code is displayed, but it makes it difficult for handicapped users or people who have graphics turned off in their browsers to participate.
So James went on to develop a new plugin that uses Bayesian based comment filter which is already gaining popularity as a means of fighting email spam. This method takes the idea of a blacklist similar to what MT-Blacklist generates and builds on it. Instead of just looking for known URL fragments, the Bayesian method looks at the entire content of the comment submission and ranks it based on how likely the words used and URLs listed are from a spam comment. Using a form of fuzzy logic the filter makes a guess at whether or not the comment is spam and blocks it if it thinks it is.
The disadvantage to this method is that you have to “train” the filter at first as it will generate some false positives and miss some real spam at the start. The advantage though is that after a little training the filter will block new spam comments without having to be taught as compared to MT-Blacklist which can only block sites that have their URL fragments in it’s blacklist to begin with. If it’s implemented well the Bayesian method should be the least amount of work to maintain for the benefits it offers. Those of you who use Thunderbird or Mozilla for your email client may already be familiar with the Bayesian method as it’s implemented very well in that email client.
I like MT-Blacklist, but I’ll probably try out James’ Bayesian filter as well this weekend to see if it’s a worthy successor as I’m already sold on the Bayesian method from using Thunderbird.
Either way, it’s impressive and a testament to the MT community that there are already several different ways to try and combat this growing problem. If you’re running an MT based blog you’ll benefit from having several different methods to combat comment spam depending on your needs and preferences.