Spring, and Return of the Botweed

Spring has come to Changing Way, with new content blooming. But weeds, the form of spam comments, are also appearing, and Akismet isn’t a completely effective weedkiller. I’ll help it, by manually marking the comments in question as spam.

I hope that the change of seasons is going well for you.

Teahouse in Spring 3I took the photo about a year ago, at Brookside Gardens.

Premium Spam

My spam filters seem to be having a tough time recently. I’m thinking more of email filters, rather than Akismet. That said, I wish that Akismet was a little less hospitable to certain Russian-writing agents. While I took a little Russian in high school, the main result was that I realized how bad at languages I am.

Three messages that gmail somehow let through (to andrew at changingway dot org) made me smile, though.

  • Healthier way to smoke.
  • Make happy the girlfriend! Present to the girlfriend unforgettable night!
  • hi! My neighbor died because his viral infection was mistaken for bacterial…

Each one pure spam comedy gold, but I have to give first place to the last on the list. The switch from the chirpy “hi!” to the details of death has a sort of brilliance.

If gmail is going to let spam through, then I’m not too unhappy that it picked these three.

WordPress (not com) Themes: Search and Spam

After the good news about themes at WordPress.com comes some bad news about themes for self-hosted WordPress sites. Siobhan Ambrose at WPMU.org wondered what she’d find if she Googled “Free WordPress Themes.” She examined themes from each of the top 10 hits for that search.

The result? Only one of the 10 theme sites was “safe.” Another was “iffy.” For the other 8, Siobhan’s advice is “avoid,” on the basis that some of the themes use Base64 encoding in order to sneak spammy links into the theme. Base64 can also be used to include malware.

The safe site is the WordPress.org themes directory. Since it currently includes well over a thousand themes, there seems little danger of a free theme shortage. Each of the themes there is under the GPL, and so is free as in freedom and well as free as in beer. In other words, you are free to modify the code of those themes.

This doesn’t mean that every source of free themes other than the official WordPress.com directory is bad. What it does mean is that, just as social media attracts spam, social media tools attract spam-producing components. It also means that some of the people who make those components also study the dark side of SEO.

What Costumes Do Spam Villains Wear?

I just marked a couple of comments as spam. They were from “Mortgage Man” and “Credit Guy.” Each linked to the same web site. Which raised the questions:

  • What other spam villains lurk in the same lair? Bankruptcy Boy? Loan Lad? Hedge Fund Hombre? Refinance Girl?
  • Why didn’t Akismet mark this nonsense as spam? Because it’s not the most blatant example of spam, and not enough people had previously warned Akismet about that particular nest of spam villains?

Hello Spam

I blog (and I allow comments), therefore I get spammed. I have many blogs, some of which I set up for test purposes and don’t use much. The “typical” Andrew blog uses WordPress and the Akismet spamfighting service.

It seems as though posts with a title starting with “Hello” attract a disproportionate amount of spam. This is course includes the “Hello Word” post that comes in every new WordPress blog.

I wonder if:

  • Hello posts are targeted by spambots?
  • They have characteristics targeted by spambots?
  • Spamfighting services are particularly suspicious of comments on Hello posts?

Mollom: Milestones and Money

Mollom is one of four spam comment fighting services that I’ve covered before. Mollom has enough recent news to merit a fresh post. Centernetworks’ Allen Stern summarized as follows: Mollom Leaves Beta, Hits 10 Million Blocked Spams, Launches Paid Plans.

The Mollom site provides further detail. Dries posted that: Drupal is still the main platform for users with Mollom subscriptions, with Joomla! coming second, and WordPress third. The pricing page contrasts two levels of service: Free and Plus. Plus costs 30 Euros a month (which, at current exchange rates, is about $40, rather than the $30 Allen quotes).

Comment Systems and the Spam in the Sandwich

What does a comment system for a blog or other web site actually do? Let’s think about what needs to happen when you read a blog post and leave a comment. The system needs to:

  1. Display existing comments (or some subset of them or information about them).
  2. Allow entry of a new comment.
  3. Validate the comment. For example, has the commenter provided an email address?
  4. Assess the spam-ness or otherwise of the comment. This may involve a captcha.
  5. Store the comment as appropriate, depending on whether it is spam, requires moderation, or immediately joins the ranks of approved comments.
  6. If necessary, notify the admin of the action taken.

The comment system actually needs to do more than this: provide the admin with access to the moderation queue, for example. But I want to focus on the six-layer sandwich described above, and regard the admin interface as chips (or crisps) served to the side of the sandwich.

Having asserted that a comment system has those six layers, I want to focus on four ways in which it can be implemented. The comment system can be part of a larger system; for example, WordPress Classic (WPC) includes all six layers, as well as a whole bunch of other stuff. In an attempt to be clear, I’ll note that WPC refers to self-hosted WordPress.

I’ll turn to a table to highlight the contrasts between the four cases, and I’ll continue to use concrete examples. I’ll stick with WordPress for the examples; that said, the points I want to make aren’t WordPress-specific.

Spam filter is not a separate plugin Spam filter is a separate plugin
Comment system is not a plugin WordPress Classic (WPC) unplugged (i.e. with no plugins) WPC with Akismet plugin

WordPress.com, which uses Akismet

Comment system is a plugin WPC with Disqus plugin ?

The other cell of the top row represents the use of a plugin to handle step 4 (assess the spam-ness). There are many such plugins. A previous post focused on four of them. One of them is Akismet, which handles spam at the hosted blogging service WordPress.com.

Moving to the second row reflects the replacement by a plugin, not just of step 4, but of all the comment system steps. Disqus provides such a plugin; in fact, I just started using it at my WordPress test blog.

I know of no example for the last cell of the table: hence the ? The cell would be of interest to a blog admin whose preferred spam plugin is Akismet, but who also wants Disqus features such as a cross-site discussion community.

The idea of combining a comment plugin with a spam plugin is a little tricky. It’s probably tricky in technical terms: if Disqus ever invokes Akismet, it will probably use the Akismet API rather than the plugin.

The business trickiness is about revenue sharing. If a comment service invokes a spam service, and each service wants to make money, how should the money be divided? I believe that these tricky issues will be addressed. Disqus may hold to its own spam fighting. But, if it does, it will present an opportunity to competitors willing to work specialized spam services.

Two Four-Letter Words: Spam and Free

Spam is, for many of us, the worst aspect of Web 2.0. The threat of spam of course creates an need, and hence an opportunity, for spam-fighting services. Last week, I compared four of them: Akismet, Defensio, Mollom, and TypePad AntiSpam. The comparison was prompted by the launch of the last of these (the list, like the comparison table in the previous post, is in order of launch date).

TPAS is interesting, not just because it is the most recent, but because it has claims to be the most free. I use the plural claims because TPAS seems to make that claim with respect to each sense of the word free: free of charge (gratis) and free (libre, open source) software.

In this post, I’ll extend the comparison between the four services with respect to each sense of free. First, free of charge. The last two lines of the comparison table refer to this kind of free. The first of these lines shows that each of the four services is free for personal use.

The last line of the table asks whether each service is free for commercial use. It answers “Yes” for TPAS, and “No” for each of the other services. Following some email exchanges and some thinking, it seems that the pricing issue needs clarification.

Akismet has multiple levels of commercial API key. For example, a problogger key is $5/month. Given that a problogger is defined for this purpose as one who makes more than $500/month, the cost seems reasonable (but then, I’m not a problogger). That an enterprise key starts at $50/month also seems reasonable (but then, I’m not an enterprise).

Defensio is free for commercial use up to a limited amount of traffic. That’s a paraphrase of an email. Defensio.com is down at the moment. I don’t know whether that means that the service is down.

Mollom currently describes its future pricing model as follows.

The basic Mollom service will be free… but it will be limited in volume and features… Our goal is to make sure that the free version of Mollom goes well beyond meeting the needs of the average site…

For large and mission-critical business and enterprise websites, we will offer commercial subscriptions. We are currently working out our commercial pricing scheme for access to more advanced features, unlimited traffic, enhanced performance, reliability and support.

TPAS, per its FAQ, “is free, and will always be free, regardless of the number of comments your blog receives.” The FAQ also addresses how Six Apart will support the service; the firm “may choose to provide enterprise-class services on top of TypePad AntiSpam at some point in the future.”

TPAS is the outlier on this “free as in beer” issue, but I now think that it’s closer to the others than I first thought and implied. Like the other three, it seeks to make money from enterprise clients (and I don’t see anything wrong with that). The difference is that it doesn’t attach the price tag to AntiSpam itself.

TPAS is also the outlier on the free software, or “free as in freedom,” issue. As I remarked in the earlier post, “while the TPAS inference engine is open, the rules are hidden.”

I wouldn’t be at all surprised to see Akismet, Mollom, or both move to a similar model. I base this on the following assumptions.

  1. Spam-fighting software has the classic intelligent system split between inference engine and rules base. In particular, Akismet and Mollom already have this architecture.
  2. The action is in the rules, which are specific to the domain of spam-fighting.
  3. Following from the above, you don’t give much away to spammers or to competitors if you free/open-source your engine.
  4. The people behind Akismet and Mollom don’t want to cede the “free high ground” to TPAS.

With respect to this aspect of free (libre), as with respect to the first aspect (gratis), I may have exaggerated TPAS’ outlier status. TPAS does have a legitimate claim to being more free than its competitors in each of the two senses of free. But the gap between TPAS and, say, Akismet, may not be as great or as durable as might at first appear.

That conclusion is, of course, my opinion. Comments (or email: andrew at changingway etc.) would be a good way of telling me that you draw a different conclusion or that my conclusion is based on faulty premises or reasoning. I’d welcome other relevant comments. For example, you might know of a spam-fighting service other than the four I’ve focused on.

AntiSpam: TypePad and the Trio

There’s a new spam fighting service in town: TypePad AntiSpam. To put it another way, the spam sheriff of TypePad town is now available to lay down the law elsewhere.

TPAS competes directly with Akismet. The table compares the two spamfighting services with each other, and with two other competitors. I’ve ordered the columns from earliest to most recent (so the alphabetical order is coincidental).

Akismet Defensio Mollom TypePad AntiSpam
Previous post at Changing Way? Yes Yes Yes No
Service offered by Automattic Karabunga Mollom: shares founder with Acquia Six Apart
If in doubt, challenge with CAPTCHA? No No Yes No
Service has own API? Yes Yes Yes No, uses Akismet API
Open source engine? No No No Yes
Free of charge for personal use? Yes Yes Yes Yes
Free for commercial use?* No No No Yes

Each of the four is the odd one out in at least one sense. Akismet was first out, and remains the service against which each rival positions itself.

Defensio is the one that doesn’t share developers or an organization with a prominent publishing or content management platform (Akismet/WordPress, Mollom/Drupal, TPAS/TypePad and Movable Type).

Mollom uses CAPTCHA when unsure whether a comment is ham (the good stuff) or spam, whereas each of the others queues the suspect comment for moderation. That’s something of an oversimplification about the others: for example, a TPAS client can use CAPTCHA when told about a suspect comment by the server.

TPAS is open source (GPL V2). I found this particularly interesting, given that the other three are not. They explain that source code access would help spammers. I then realized that while the TPAS inference engine is open, the rules are hidden.

TechCrunch is currently using TPAS via the WordPress plugin that Six Apart provide. Mike Arrington reports that TPAS is doing well so far.

Anil Dash wrote the announcement post at the Six Apart blog. TPAS also has its own blog.

Missing from the table are two of the most interesting potential comparisons: performance and market share. I suspect that we will before long see data relevant to these comparisons, and challenges to the data, and…

Update, after a few hours sleep and some further research. I made a few changes to the above.

I’d like to add that I find the name TypePad AntiSpam interesting. Or rather, I find the choice of name interesting. The name may give the impression that it’s more specific to TypePad than it really is. My guess is that Six Apart think they have a winner on their hands here, and that the success of TPAS will raise awareness and reputation for TypePad.

* Final update to this post. I decided that the last line of the table, while close to the mark, needs clarification. Hence the followup post (see the first comment to the current post).

Automattic Making Money From Other Projects

By other, I mean other than WordPress. We are almost at the end of my series of posts on Automattic, and how the firm makes money. We’ll start by noting that the firm provides a handy summary of its projects. Some of them are covered in earlier posts in this series (e.g., WordPress.com).

There are three non-WordPress projects: Akismet, bbPress, and Gravatar. (Actually, to describe them as “non-WordPress” is to simplify since, as we will see, each has firm connections to WordPress.) I find the first of these the most interesting, and I know I’m not alone in that. Askismet is an ambitious project.

Automattic Kismet (Akismet for short) is a collaborative effort to make comment and trackback spam a non-issue and restore innocence to blogging, so you never have to worry about spam again.

Although Akismet is an Automattic project and is WordPress.com’s spam cop, it is not only for WordPress blogs. The Akismet API is published so that the server can be invoked from other applications.

The Akismet server is unusual among Automattic projects in that it is closed source. This seems to be the norm for spam-fighting server code: it is also true of Akismet’s rivals Defensio and Mollom.

Automattic, as a privately-held firm, is under no obligation to provide details of how much money it makes from specific projects. But Duncan Riley at TechCrunch described Akismet as Automattic’s biggest money earner. Toni, Automattic’s CEO, was quick to counter what he described as “misconceptions,” stating that Akismet is not even close to being Automattic’s biggest earner.

Direct earnings from Akismet come from commercial licenses. Indirect earnings arise from the extent to which Akismet helps convince bloggers to choose WordPress.com.

Moving on to the other other projects, bbPress is forum software. It runs the various WordPress forums. To put it another way, bbPress is the name under which Automattic released the software on which the WordPress.org support forums have been running for years. Automattic intends to offer hosted forums under the name TalkPress (rather as it offers hosted blogging at WordPress.com).

Gravatar is notable among Automattic projects for having been acquired; I believe it to be Automattic’s only acquisition so far. At the time of the acquisition, Om Malik described Gravatar as a small project that gives WordPress users the ability to add avatars to their profiles. It is clear from the Gravatar about page that there are far loftier ambitions for the project. Today, an avatar. Tomorrow, Your Identity—Online.

I’ll stop there, rather than speculate about the future of online identity. I’ll add one more post to this series: a wrapup.