Overview

This article describes my anti-junk defense grid for Movable Type. It is a layered defense, different aspects working together to reduce the burden of maintaining successful defenses, without overly restricting users of the weblog1.

Layered Defense

I use a combination of technologies as a defense in depth, so that even if there is penetration, it is rarely much effort to clean up.

SpamLookup

SpamLookup is a junk filtering plugin that is installed in Movable Type by default, starting with version 3.2. This is a very useful plugin and I use it extensively. I primarily make use of the word filter to scan for indicators of junk posters and ban specific domains. While SpamLookup doesn’t have any bulk delete functionality, MT 3.2 itself allows one to search comments for specific strings, which is enough to make bulk clean up tolerable. With my defense grid in place, however, I haven’t had to do that for quite a while.

While the distributed SpamLookup is useful, there were many things that I observed that I wanted to check for that were not possible. Initially I wrote my own filter to do these checks but later I created my own SpamLookup Extension to be able to apply filters to specific fields in the entries. This permits much broader filters and more complex patterns. For instance, one can simpy forbid the character sequence “poker” or “sex” in a home page URL or commentor name without affecting normal use of the word in the actual comments. Or, in terms of patterns, one can forbid commentor names that are longer than some specific number of characters.

This layer provides an excellent first defense that gets at least 90% of the incoming junk. Here is my base set of filters. It has a lot of examples of how to take advantage of the field capability. I do tend to tweak it a bit, but it’s got most categories that are useful so it’s mostly a matter of cloning an existing filter and using a new word.

Simple Junk Filters

Simple Junk Filters is a small filter framework that lets me create PERL based filters and easily set them up to run and be tunable via the plugin configuration screens. This used to do much more, but now that I have extended SpamLookup it is down to two tests, but both of these catch quite a bit of junk. Also, should I find some other property of junk I need code to check for, I can put it in this plugin very rapidly with full configuration.

AutoBan

AutoBan is an IP banning plugin that helps

  • Control the number of junk objects so that it is easier to detect inappropriately junked objects.
  • Control floods by banning IP addresses that are repeatedly hitting the weblog.

AutoBan keeps a ‘.htaccess’ file updated to ban the IP addresses of junk objects. As new comments or trackbacks are junked, the IP addresses are automatically banned if there too many junk objects from that IP address. IP addresses are unbanned when the junk entries are deleted (directly or via automatic cleanup) or the junk object is marked as not junk. The key rationale is that the cost of starting up the Movable Type application is large, so any filtering that can be done to prevent that should reduce the load on the server. See the page for MTAutoBan for a more detailed discussion of the performance issue.

While this doesn’t help as much as I hoped, it still seems to keep flooding under control. It also helps protect multiple weblogs in a single install because banned addresses are banned for all the weblogs. This weblog gets very little junk because the junkers hit Thought Mesh first and are banned before they can hit this weblog.

Trackbacks By Name

I use MT 3.2 Patch: Trackback By Name to stop junkers who guess at trackback ping URLs, along with ModRewrite to redirect numeric ID based trackbacks to a sand trap. This definitely seems to have slowed down the trackback junk.

No pop ups

I have removed pop up windows for trackbacks and comments and used ModRewrite to send any requests for those popups to the sand trap. That presumably should help reduce the server load both by slowing up the junkers and by using many fewer webhost resources (as I use the PHP variant, there is no CGI overhead).

The ModRewrite logic used is

RewriteCond %{QUERY_STRING} __mode=view
RewriteRule mt-tb[.]cgi /sand-trap.php [last]
RewriteCond %{QUERY_STRING} __mode=view
RewriteRule mt-comments[.]cgi /sand-trap.php [last]


1 For instance, some weblogs forbid any URL in a trackback excerpt, which is very burdensome because one normally puts the link to the target post early, which puts it in the excerpt. I’ve just stopped linking to such weblogs. Also, I hate captchas — I don’t comment on websites with those, either.