When I was a kid, I've always wondered why they had to have a policy on TV variety shows that the performers are not supposed to mention any commercial brand names. The primary reason behind this policy was to maintain neutrality and to prevent bribes on national broadcast channels. However, a lot of parodies lost their meaning due to the disconnection to their real life reference. Obviously those talk show comedians didn't get paid by the business to make a joke on them. However, a policy is (well, was) a policy, and they'd "rather kill a thousand innocent men than miss a real one" (a historical quote from a Chinese General).
Lately, Stuff starts to have more and more spam comments that are harder to detect by not only machine, but human being. The cost to moderate is thus increased.
Spam comments used to be easy to detect -- there were a few easily detectable templates containing obvious keywords. A spam-killing engine can easily compare the URLs and the product names, compute a hash, give it a score, and then decide its "spam-mality".
Then spammers go: "Let the battle begin!"
To bypass the machine spam killer, spammers use obscured product names (using symbols that look similar to their letter counter part), or add random parameters to the URLs. Spam killers also get smarter by comparing the contents of the target page. And with the frustration, blog owners start to use picture identification scheme to stop machine spamming, at the cost of inconvenience of actual commenters. Then spammers get even smarter by using OCR to bypass picture identification. So Captcha starts to distort the picture or even challenge the users with a mind-twister, making it even more inconvenient for actual commenters.
The war continues.
The internet advertising companies start to hire cheap labor overseas to manually identify Captcha, in batch. And with cheap labor available, even the spam messages become smarter. We start to see a lot of comments that say, "I agree with most of your points on iPod, but this is what I think (link to their site selling iPod)." With keyword matching of a page, such comments blur the used-to-be clear line between spam and real comments, making it more and more costly to maintain a clean board.
Some message boards use flagging system, e.g. Craigslist.org -- readers can flag a message as a spam and if enough readers determine so, then by operational definition, that message is a spam.
Another smart approach is the rating system, e.g. Amazon.com -- each comment gets a rating from users, and readers have the option to filter out comments with low rating, or sort by that.
Both schemes get users to involve and are excellent example of decentralization.
One could not afford inspecting thousand suspects so killing them all was a convenient, albeit inhumane, solution. However, if there are thousand inspectors, then it is no longer too costly to do it properly.


Unfortunately, self moderation doesn't work for the typical, small, personal blog. something like akismet where it is a central services that learns as everyone is submitting data to it.
SixApart has a similar product called typepad anti-spam. This is something we need to test with blogs@psu.
Not sure if you've noticed, but a spam roach has infected our space. Amazing how hard it is to have and mange an open web space.
That's what inspired me to write this post, Cole. I think it's just part of openness -- to allow/tolerate voices that we don't like, at least for a period of time. Mr. iPad Keyboard above left a message, showing he actually read the article. It's a perfect example. How can we determine whether this is more about his agreeing my points or about his link to the site selling iPad Keyboard (link removed)?
It is about selling iPad keyboards.