Follow-up to For better blogging…, which explains a simple yet effective way to stop comment spam for Movable Type installations.A week ago (I’m catching up again) Six Apart released the Six Apart Guide to Comment Spam, in which different methods for avoiding the scourge are evaluated. Among them is the Turing Test class of protection, to which my preferred solution belongs, as well as Six Apart’s own TypeKey authentication service, into which the company has invested a lot of effort over the past year or so.
The document ends up recommending TypeKey (and some other techniques like MT-Blacklist) but not any Turing Test solution. I think the reasoning behind this recommendation is faulty.
Turing Tests that operate by showing you some kind of picture — easy for you to decipher but hard for computers and blind people — are indeed not a good idea; the best software available today is far better at reading such images than a legally blind person. But the guide also writes this:
One simplistic example would be to require commenters to answer a natural language question, such as “What is the last name of the author of this weblog?”, or “Which month immediately precedes August?” The problem with this technique is that to be effective, the questions need to change frequently. If you ask the same question, spammers will seed their scripts with the answer.
This is not quite true. For this technique to be effective, you only need to change the question every time a spammer personally makes the effort to answer it in order to spam you. And it turns out from experience that this practically never happens.
That’s because the Turing Test does a sufficiently good job of raising the cost of spamming. If a spammer wants to spam your site, he first needs to visit your site personally. The reward is that he can then spam away, but only on your site; you then change the question, clean up his mess, and he’ll have to visit again. That’s no way to make a living as a spammer — imagine being forced to read all those blogs. This “simplistic” Turing Test technique works because the reward to effort ratio for spammers is so low. That’s good enough for the vast majority of blogs, who do not have the traffic of A-list bloggersAnd if more blogs were to use it, my guess is spam wouldn’t pay at all anymore..
Now compare this with part of Six Apart’s own description of how the TypeKey authentication service works:
The worst case scenario when using TypeKey in this way would be if a spammer created a TypeKey account, and used it to send spam to your weblog. However, because the first comment from any TypeKey user must be approved by your [sic] before being published, the only way a spammer could sneak spam onto your site would be to first submit a comment that appears to be legitimate. While it’s possible that some spammers might attempt this, it is highly unlikely that they would be able to do this using automated scripts. If they do and are reported to Six Apart, TypeKey’s terms of service allows us to disable their accounts.
Here too, the spammer needs to sit down, get a key, pretend to be human for a minute and behave until he gets a comment approved. That’s really just a Turing Test — “Can you write a comment that does not look like spam?” If he passes, he can then use his key to spam with abandon until his account is terminated — not just on your site, but on any TypeKey-enabled site that automatically approves TypeKey user comments (Six Apart is thus being a little optimistic even in its worst-case scenario, above.) And that’s potentially a much much bigger prize. As for “TypeKey’s terms of service allows us to disable their accounts” — I’m sorry, but that doesn’t sound very scary.
The kicker, however is this: “Also, creating a new TypeKey account requires solving a CAPTCHA (only once, during account creation), which entails certain accessibility problems.” Not to mention that after you go jump through all these hoops, your comments still sit in a moderation queue the first time on many participating sites — which raises the effort bar on legit commenters much more than if you just ask them to add 2 and 2 — all without another user id and password.
Basically, I prefer Turing Tests to TypeKey.
PS: I too remember reading about the ingenious tactic mentioned in the guide of grabbing CAPTCHAs in realtime from high-profile services and asking a continuous supply of horny guys to solve them as a condition for access to free porn. This obviously works with CAPTCHAs (as far as I know, blind people don’t surf for porn), and it would also work for questions like “What is the atomic symbol for hydrogen?” and “Type the letter ‘A'”I think it will be only a matter of time before a script tries whatever is inside quote marks as the solution.. If spammers ever apply this level of sophistication to try to spam this site, then the kind of question will have to change to something like: “What is the first letter of the title of this blog?” or “how many characters are there between the www. and .com of my URL?”. These will not be answerable by horny guys given a snippet of a comment form.
Finally, it is not beyond the bounds of possibility that spammers will outsource their efforts, paying English-speaking third worlders pennies in the hour to compile a database of answers for blogs all day long. In that case, the tactic would have to change again, and we’d have to make a Turing Test that attempts to differentiate between those who have no broad education from those who have one, such as “What’s the main language spoken in Ireland?”. However, horny western guys would likely know the answers to such questions, so in a worst-case scenario, posting a comment would entail answering two questions — one that stumps horny guys, and another that stumps sweatshop spammers…. To be continued, for sure.
Blog Spam
Stefan Geens has a long post on why SixApart’s TypeKey system is not a good solution to blog spam. He points out that the system has bad economies of scale: Here too, the spammer needs to sit down, get…
Why not publicize your spam solution more? Make up a name for it, submit it to scriptygoddess.com, mt-plugins.org and similar sites.
Noooooooo! Surely that would only encouarge some clever spammer to find a hack around the hack.
The more blogs use turing test hacks, the more spammers will bother circumventing them, but then you could you the more advanced strategies Stefan mentioned, I suppose. Though that would be more work.
From an egotistical standpoint I defintitely shouldn’t want him to promote it.
What’s with all this running away and appeasement of spammers. Matthew? Bring it on, I say. The best defences have had the most attacks.
If everyone had this hack, spammers probably couldn’t make their pursuits pay and they’d have to get a real job.
One thing I was thinking about was spammers cooperating: Sharing databased of Turing Test answers — but then I realized that it’s a zero sum game for them: Google juice for each page is divided among all the links, so the more links, the fewer juice each link gets. You don’t want other spammers getting in on your act, so there is no incentive to share.