19 July 2006

Email is broken

I have recently come to the conclusion that the current Internet email infrastructure is broken, and can never be fixed.

Like you, I am beleaguered by junk email (spam) flooding all of my email accounts. My primary email comes through the Mozilla Thunderbird client, which has a powerful Bayesian filter that I’ve trained by marking spam as “junk.” It then compares the words (and other elements) of incoming email with the ones I marked as junk, and the good ones (not marked as “junk”.) Eventually it becomes very good at discriminating against spam, which automatically goes to the junk mail folder. I occasionally check this “junk” folder for false positives (legitimate emails incorrectly determined to be spam—something that rarely happens) and click on the “not junk” button, training it that these are OK.

I recently received the following strange spam email. It wasn’t trying to sell anything, it didn’t even contain a link; it just contained the following excerpts from a story:
sopping wet from head to toe. I locked myself in a stall, got my flask,
faster. He was flying now straight down, at two hundred fourteen miles per
something like a vessel, like a glass jar with blue syrup. We looked at it
The next night from the Flock came Kirk Maynard Gull, wobbling across

I’ve seen a paragraph or two of generic text like this at the bottom of emails selling Viagra, debt refinancing, etc. It’s an attempt to make the email look more “normal” to Bayesian filters, and therefore not get automatically marked as spam. But this one didn’t make any sense; why would anyone send out millions of useless messages like these?

Well, I think I’ve figured it out: this is likely a concerted effort by spammers to cause our Bayesian filters system to mark more false positives, which will then make us either abandon it, or spend a lot of time sifting through our junk folder (hopefully pausing on one of their subsequent spam mail.) Therefore, if you receive an email like this, DO NOT mark it as spam, just delete it; otherwise, you will start to get false positives (i.e. emails from acquaintances may get marked as spam.)

BTW, the way these spammers operate is to propagate a virus or Trojan horse that compromises thousands of machines around the world to become unwitting “spam-bots” churning out spam at their command (remotely.) The distributed nature of this system makes it nearly impossible to strike back in any meaningful way. These creeps are then contracted to send out millions of emails by shady business people selling questionable product. These “businesses” purposely create websites in the guise of shell identities with incorrect or missing contact information to avoid the onslaught of negative emails, phone calls, faxes, and personal visits they would otherwise receive from millions of irate email users—point being: don’t waste any time and effort to complain.

My real problem is not so much with these ingenious people, but with the few idiots who actually respond to these offers. Coming from the direct mail sector, I know that you generally have to get a 1% response rate to make a mailing profitable (covering printing, postage, and list rental cost.) However, since emails have $0 printing cost, $0 delivery cost, and a $0-$3 per million list cost; you can still be quite profitable with less than a 1 in 10,000 response rate. Therefore, to the idiots out there who think that male enhancement products really work, or that you can get safe Viagra without going to your doctor please, PLEASE, do us all a favor and don’t respond to these emails, just Google these products and find a less-slimy vendor to do business with!

[UPDATE] I’ve posted this to my favorite social book-marking site, reddit.com, where it has started an interesting discussion. My detractors think that these messages are just dumb mistakes by the spammers; I would agree, but I’ve seen 3 separate instances of this now. Obviously, if these emails sounded more like your standard friendly email, it would be more effective at subverting Bayesian filters (even though these get quite specific—tailored to the way your friends write.) But I suspect each one is unique (to prevent filtering at the ISP level,) and coming up with millions of real email conversations is practically impossible. Therefore, I propose they just take excerpts from literature found online—so as not to sound like ad copy. The other issue is that most spam originates from this part of the world (Eastern Europe & Russia) bereft of native English speakers.

In any case, it looks like the spammers are not satisfied with just reaching a dim-witted audience, they want to make sure that tech-savy Internet users will have to eye-ball their (clients') offerings as well—which makes no sense.
