Julien,
the majority of comment spam can be dealt with very simply by including
a turing test. On my blog, when I first started getting comment spam, I
added a check box asking if the poster was a human. For a human, it’s
not a massive inconvience to tick a box, but for an automated tool, it’s
a major problem. Was implemented in 3 lines of html and one line of
python. Since I added it, I haven’t recieved a single piece of
spam.
I don’t believe it’s had a major effect on people commenting,
although I currently can’t tell. I could change it to hide posts that claim to be non-human
until I’ve checked them. If spam tools work out this simple problem, I
could change the nature of the test to randomly change between “I am a
human” and “I am not a human”. After that I could include a simple sum
or some other simple question. It also has an advantage over captchas
that it is accessible. 
It is a simple change which massively reduces spam by increasing the
cost of spamming and I’m surprised that most people don’t do something
similar.
on said:
This only works, until any of the bigger sites adopts a similar scheme to fend off the bots.
Then the bot-writers will adapt (“We are spammers of Borg, resitance is futile…”) and your simple test will stop working.
Problem is: as soon as you become a worthy target to spam (because your site is big enough), the bot writers will write code especially crafted for your site and then such simple tests will not be effective any more.
on said:
But, the big site won’t use exactly the same test. Every site will use different tests, which just makes the spam bot harder to write. Be it different html, questions, parameters etc.
And as I said, once one test becomes less effective, there are plenty of harder tests you can use.
on said:
Remember having this conversation with Phil, only then it was about wikispam:
http://www.nooranch.com/synaesmedia/wiki/wiki.cgi?SpammingThoughtStorms
(see various subpages.)
Personally, I go with word-recognition cos that’s what Blogger has, and all technology is a compromise over accessibility. But I think there are parallels here with e-mail spam, and how it tries to bypass automated anti-spam tools, which is the same as happening here but with signal and noise reversed (automated filtering vs automated filter-breaking).
In which case, bayesian logic is the way forwards in terms of what comments are acceptable, and which aren’t… except there’s no way each blog will have (on average) enough comments to calibrate it, so centralised comment-block services are the way forwards. (Maybe even use an AJAX setup to transparently filter incoming comments from the browser, somehow?)
Anyway, only true if commentspam (c-spam? 🙂 gets to ridiculous levels and only if people are really concerned about people commenting on their blogs (which would be nice, but not a given).
(Disclaimer: I run my own hand-built mail filter, so no idea just how well bayesian tactics cope with images, etc.)
The other thing about different sites and different tests (i.e. a system of diversity to defeat “automation”) is that it doesn’t really cost spammers that much to run through every single test they can, brute-force style. Once they have plug-ins for their little spambots that deal with all the easy ones, you then have the problem that harder questions are naturally less accessible if, say, english isn’t your first language – the same difficulty that bot programmers face is the same difficulty people trying to learn a language and all its foibles (which you may or may not use without realising) face too.
The last point, while I’m on the subject and my brain is the right place, is more philosophical – are you implying that machines can’t generate interesting content? 😉
on said:
Interesting comment effect 😉
on said:
Interesting comment effect
on said:
Aha, maybe this is a side effect of running Session Saver in Firefox – looks like the page that gets laoded when I open Firefox is the POST by the form, and the data is cached so it gets resubmitted each time. Interesting.
If I close this tab now, you should stop getting pestered… in theory.
on said:
The best turing test i know, is to ask for owner name. Only people who actually read blog will pass it. And it’s hard enough to automate this for spamers.