Comments, Commentors, Spammers and Mollom

LISNews has been running on Drupal for about a year now. Before that we ran on Slashcode for a few years, before that it was PHPSlash for a couple years, and even before that I did it all by hand. If you run (or read) blogs you know comment spam is a big problem. If all you do is run or read a blog you actually have NO idea just how bad it really is. I'd estimate about 80% of all POST requests to all the LISHosted sites are spammers. When I have LISNews on the LISHost servers I worked hard at fine tuning the mod_security rules to combat spam.

Within hours of moving to Ibiblio I could see they have very different rules, and I'd need to do something else. I'm actually surprised just how good my rules were working. So I turned on CAPTCHAs. I tried some images, reCAPTCHA, Math, and finally the basic text CAPTCHAs to fight spam. They also worked. A few weeks ago I got a complaint that the CAPTCHAs were getting in the way. This wasn't the first time, so I thought I'd try something new, I turned to Mollom.

I was shocked that within a day the number of comments went up. It's been a few weeks now, and I continue to be shocked at the number of comments we're seeing. Mollom is doing a decent job blocking spam, but more importantly it's letting more people comment. The bad guys are kept out (for the most part) and the good guys have a very low hurtle to get over. (Or at least I think so. If the current trend holds, then I'll be convinced that it is indeed Mollom and not just a coincidence). Two charts that illustrate what I'm seeing on this end

1. The number of comments per day over the past 90 days. You can see the big upward trend cutting right through the middle.
A STRONG upward trend in comments

2. Mollom generates these nice little graphs to show you how much it's blocking and allowing. The big orange area is spam, the small green area is real. The numbers here surprised me right away, Mollom catches just about everything that mod_security now misses.
How much Mollom says it is blocking

So to summarize, this is exactly what I've been looking for, I love Mollom. It does a really good job (not perfect) of blocking garbage spam comments, and apparently does a really good job (better than standard Drupal CAPTCHAs) of letting real people leave real comments. I guess we can debate how much of that is garbage :-)

Taxonomy upgrade extras: 


Is it possible to expand on the comment about CAPTCHAs getting in the way? What did the site visitors say were their biggest concerns/pain points with CAPTCHAs? I know you can tweak the settings for CAPTCHA in Drupal to make them easier/easier to read, so I'm wondering if you tried that before switching to another solution?

I don't know exactly what was wrong, other than every one I tried ended up getting more complaints, and I haven't gotten a single one since I swithed. I tried the Drupal math, text, and image, and then also tried reCAPTCHA, which I REALLY wanted to use, but for whatever reason it didn't work for some reason.

No matter how much I tried, and I tried ALOT, I couldn't get any of them to fail on me, so I couldn't figure out what was wrong. Might've been user error, or just an obscure bug that only showed itself on certain computers.

I'm reluctant to use an outside service to manage comments that keeps track of patrons' name, IP addresses, e-mail addresses, and the words they post.

At least at a public library and publicly funded academic libraries, patrons' comments are public record and depending on local law, may be subject to public records retention laws, including ones that protect a patron's identity.

Sure, it seems unlikely the scenario that law enforcement agencies would approach a Belgian spam-blocking company to get access to who said what on your library blog, the YouTube-Viacom brouhaha should give us all pause.

I may be one of the the only ones, but I wonder about the ethics of knowingly circumventing laws that protect patron privacy by agreeing to Mollom's terms of service. I mean that honestly - I wonder - ethics was not a required class in library school and I can't even remember it being offered as an elective. I would to read anything providing guidance on questions like this.

And this is part of what makes Mollom so interesting - their terms of service say they will anonymize data after two months, and their paid subscription may provide other options. Even if it's not the perfect way for the public sector to manage spam, it's certainly a step in the right direction.

Good points. Luckily here I don't need to worry too much about that. I would think more about it if this was a library site though. I think your ethics questions are valid.

No of course! As someone using Drupal to manage a reference service, I think I'll try to stick with the local spam filters for as long as I can.

Thinking about this some more, I'm not sure that our library policies protect anonymous speech on our websites. No one is allowed to know what you checked out (without a warrant/court order), or what you did on a library computer (without a warrant/court order), but what you said on the library website? It fits for me, but I'm not sure libraries have really grappled with that one yet.

I think what is most troublesome is that law enforcement can approach Mollom (or or ...) without needing to approach the library. This is true for myriad library services already anyway.

"At least at a public library and publicly funded academic libraries, patrons' comments are public record and depending on local law, may be subject to public records retention laws, including ones that protect a patron's identity."

Actually, I'm glad to hear this--but hadn't thought about it. Too many libraries ignore the public's concerns. And far too many people in all walks and business dash off comments and replies in e-mail without any thought of who will read it down the road, or what a lawyer will do with it, or how it will be misinterpreted.