19 July 2006

Email is broken

I have recently come to the conclusion that the current Internet email infrastructure is broken, and can never be fixed.

Like you, I am beleaguered by junk email (spam) flooding all of my email accounts. My primary email comes through the Mozilla Thunderbird client, which has a powerful Bayesian filter that I’ve trained by marking spam as “junk.” It then compares the words (and other elements) of incoming email with the ones I marked as junk, and the good ones (not marked as “junk”.) Eventually it becomes very good at discriminating against spam, which automatically goes to the junk mail folder. I occasionally check this “junk” folder for false positives (legitimate emails incorrectly determined to be spam—something that rarely happens) and click on the “not junk” button, training it that these are OK.

I recently received the following strange spam email. It wasn’t trying to sell anything, it didn’t even contain a link; it just contained the following excerpts from a story:
sopping wet from head to toe. I locked myself in a stall, got my flask,
faster. He was flying now straight down, at two hundred fourteen miles per
something like a vessel, like a glass jar with blue syrup. We looked at it
The next night from the Flock came Kirk Maynard Gull, wobbling across

I’ve seen a paragraph or two of generic text like this at the bottom of emails selling Viagra, debt refinancing, etc. It’s an attempt to make the email look more “normal” to Bayesian filters, and therefore not get automatically marked as spam. But this one didn’t make any sense; why would anyone send out millions of useless messages like these?

Well, I think I’ve figured it out: this is likely a concerted effort by spammers to cause our Bayesian filters system to mark more false positives, which will then make us either abandon it, or spend a lot of time sifting through our junk folder (hopefully pausing on one of their subsequent spam mail.) Therefore, if you receive an email like this, DO NOT mark it as spam, just delete it; otherwise, you will start to get false positives (i.e. emails from acquaintances may get marked as spam.)

BTW, the way these spammers operate is to propagate a virus or Trojan horse that compromises thousands of machines around the world to become unwitting “spam-bots” churning out spam at their command (remotely.) The distributed nature of this system makes it nearly impossible to strike back in any meaningful way. These creeps are then contracted to send out millions of emails by shady business people selling questionable product. These “businesses” purposely create websites in the guise of shell identities with incorrect or missing contact information to avoid the onslaught of negative emails, phone calls, faxes, and personal visits they would otherwise receive from millions of irate email users—point being: don’t waste any time and effort to complain.

My real problem is not so much with these ingenious people, but with the few idiots who actually respond to these offers. Coming from the direct mail sector, I know that you generally have to get a 1% response rate to make a mailing profitable (covering printing, postage, and list rental cost.) However, since emails have $0 printing cost, $0 delivery cost, and a $0-$3 per million list cost; you can still be quite profitable with less than a 1 in 10,000 response rate. Therefore, to the idiots out there who think that male enhancement products really work, or that you can get safe Viagra without going to your doctor please, PLEASE, do us all a favor and don’t respond to these emails, just Google these products and find a less-slimy vendor to do business with!

[UPDATE] I’ve posted this to my favorite social book-marking site, reddit.com, where it has started an interesting discussion. My detractors think that these messages are just dumb mistakes by the spammers; I would agree, but I’ve seen 3 separate instances of this now. Obviously, if these emails sounded more like your standard friendly email, it would be more effective at subverting Bayesian filters (even though these get quite specific—tailored to the way your friends write.) But I suspect each one is unique (to prevent filtering at the ISP level,) and coming up with millions of real email conversations is practically impossible. Therefore, I propose they just take excerpts from literature found online—so as not to sound like ad copy. The other issue is that most spam originates from this part of the world (Eastern Europe & Russia) bereft of native English speakers.

In any case, it looks like the spammers are not satisfied with just reaching a dim-witted audience, they want to make sure that tech-savy Internet users will have to eye-ball their (clients') offerings as well—which makes no sense.

26 comments:

Anonymous said...

ya society is broken cause there is crime

Anonymous said...

This has been going on for at least a couple of years. We recently implemented greylisting at our tiny little site and we've had great success with it. Greylisting isn't appropriate for all sites (i.e., you don't want to delay, say, inbound customer service or support e-mails) but the results are hard to ignore in any case.

Anonymous said...

Your theory is a decent one... I've gotten dozens of these emails.

They will pass. Email isn't broken.

Anonymous said...

There are filters that only allow approved senders to send you anything. Gettting approved is easy for the sender, upon sending the first email to your address, an automatic response asks sender to type the letters/numbers show in a JPEG. After this one time hassle they can freely email you. (unles you then decide to block them)

Bill@spamitup.com

Anonymous said...

Email is not broken, it is a failed design.

They should have designed it so that you would be able to press a button that you not accept the email after a glance. That button press would send it back to the forelast node and so on until all mails return back to their sources. (causing a big return with bogus mails DDOSsing ones self)

Anonymous said...

Time to move on to more aggressive adaptive spam filtering. Paul Graham posted a follow-up article to "A Plan for Spam" that proposed that retaliatory measures be added to spam filtering. Basically, any links embedded in a spam e-mail should cause a web client to connect to the spammer's remote site 1 or more times (as determined by the user). With enough users using such a system, you would simply be DDoS attacking their web server, denying access to the site to stupid would-be buyers and raising bandwidth charges to the spammer's site dramatically. Problem solved! Onward to phase two of the filtering.

Anonymous said...

Greylisting is great, but seems to break so many non-RFC MTAs. I've seen some seriously broken MTAs that were sending NDRs to people who were trying to email our accounts.

Anonymous said...

this is why we combine strategies and use things like bayesian filters AND autowhitelisting. my friends' email will never get marked as spam because my system knows that I trust them.

Anonymous said...

Hmm what about when your friends get infected by a virus and start automatically sending you "I love you" emails? I want my filter to correctly identify those as spam.

Anonymous said...

Anonymous: Your virus scanner on the server (clamscan!) will catch those before the Bayesian filters. I run clamscan, then spamassassin on my mail server and find it exteremely effective.

Anonymous said...

Back in the 1880's, the U.S. Goverment offered a bounty for each Native American ("Indian") killed of approximately $20. Since corpses were too cumbersome to tote back for payment, the officially recognized token of Indian Eradication was a scalp of long black hair.

I digress with the specifics... I think it's time for a new bounty to be enstated. $2000 for each spammer's head brought into the office of the treasury.

Anonymous said...

If you all would get Linux, you wouldn't get all the rubbish effects, from trojans and viruses and the whole collection of other crap you get. However the only thing is that you may not recieve your other messages.

Linux Rocks!

Anonymous said...

You're right, this is an effort to make Bayesian filtering ineffective, and it has been going on for at least the last two years. Blacklisting services like SpamCop are useful in avoiding these messages and they also provide a way to report spam. The spam is "dissected" and admins for the network hosting the spammer can be notified. Spam-friendly servers will also be blocked by the SpamCop blacklisting service if certain criteria are met, keeping subscribers to the service free of more spam. (No, I don't work for them, but have used their service for years.)

Anonymous said...

I call it "Moron Poetry" and keep a folder of it when I receive it. It's kinda funny and weird at the same time. Sometimes a gif attached with garbage in it.

Anonymous said...

if you own a email server, turn off you server for about a week and there wont be any more spam. Spammers stop sending mail if they cant reach u.

Anonymous said...

well you cant say spammers aren't clever. just wish they would use their inteligence for some good.

Anonymous said...

Linux Sucks. NetBSD PWNS J00!

Anonymous said...

if you own a email server, turn off you server for about a week and there wont be any more spam. Spammers stop sending mail if they cant reach u.

That might just be the stupidest thing I've read today. Thanks for making me chuckle.

Anonymous said...

There was one solution that was working very well in fact: Blue Security. They had managed to get several major spammers to stop sending spam to its members altogether. Unfortunately, one of the top spammers got very pissed that this system was working and almost brought the entire Internet to its knees. BlueSecurity decided to pull its service to avoid being blamed for causing this situation. It is a very sad, but interesting story that is fully documented on Wikipedia.

An open-source alternative to BlueSecurity, called Okopipi, is now being developed that will try to accomplish what BlueSecurity had done but in a distributed computing, P2P-type arrangement that spammers could never bring down.

Anonymous said...

Filtering at the mail client is always going to be less effective than filtering at the mail server.

I stop the vast majority of SPAM before it ever gets sent to my mailserver by a combination of techniues, including:

- dropping obviously fraudulent connections (for example, a foreign host that tries to HELO with an invalid string, or my own IP or Domain Name)
- dropping connections that attempt to send before presentation of my HELO banner
- judicious use of RBLs
- dropping connections where a foreign mailserver claims MAIL FROM: is a my own Domain or IP

These simple tests, plus some more complex ones (checking DNS, SPF, etc.) kills upwards of 80% of SPAM. The rest is handled by ClamAV and SpamAssassin, running on the mail server.

Doing anti-SPAM filtering in your mail client means the spammer has already gotten to chew up bandwidth and disk space, and you'll never filter it all. Plus, you're open to Bayes poisoning, which has been a problem for quite some time.

Stop the SPAM before it ever gets into your mailbox, don't wait it's already in there to begin trying to filter. Your mail client's SPAM filters should be the *last* line of defense, not the first.

Anonymous said...

Kaizyn said...

Time to move on to more aggressive adaptive spam filtering. Paul Graham posted a follow-up article to "A Plan for Spam" that proposed that retaliatory measures be added to spam filtering. Basically, any links embedded in a spam e-mail should cause a web client to connect to the spammer's remote site 1 or more times (as determined by the user). With enough users using such a system, you would simply be DDoS attacking their web server, denying access to the site to stupid would-be buyers and raising bandwidth charges to the spammer's site dramatically. Problem solved! Onward to phase two of the filtering.


Nope, wrong, never gonna work. Here's why:

If I were a spammer, I'd simply put a unique link (perhaps something like "scamsite.com/verify/?adr=you@yourdomain.com") into each spam mail, replacing the adr variable with the address it's being sent to. Then I'd spam millions of people, and end up with a dataset of people who have that feature.

With that data I could:
a) Have a list of verified real e-mail addresses. Obviously doing the exact opposite of what you'd want, which is confirming to the spammers that you're a real address.
b) Create a blacklist of people not to spam. This *might* be beneficial to you, but I really doubt that any spammer is giong to want to *not* spam certain people.
c) And finally, if your mail client connected to every website it thought was spam, I'd hope that no e-mail client would mistake a newsletter for spam, as it'd kill the newsletter's site as soon as it was sent out.
d) Didn't they try this with a Lycos screen saver a while back? I remember it ended very badly for the good guys and eventually shut down.

Anonymous said...

e) If I hated some company, I'd sent out spam with that companies website. Bam, instand DDoS against some company I didn't like.
f) A lot of scam sites are hosted on hacked (or worse, shared) servers. You'd be then participating in a DoS attack on many, many websites that are basically innocent bystanders. I think this is why the Lycos one ended up shutting down, but I'm not sure. Basically, it's WAY too easy to trick a system like that into attacking innocent systems.

The list goes on.

Anonymous said...

g) And finally, even if you're attacking a "bad" site, it's still fairly illegal to knowingly participate in a DDoS. If the client was configured to only connect once, I could see people getting away with it, but for it to work you'd probably have to have everyone connect hundreds of times, and that'd be a real shady/grey area, legally.

Anonymous said...

I think your advice about not classifying the random text as spam messages is wrong, unless the bayesian filter your using sucks. If the filters are designed correctly the random text messages will at worst cause a few false positives which will quickly be corrected once you tell it it's a false positive. What ends up happening (at least in my case) is the filters start relying more heavily on the content of the headers.

I use popfile and have never had a problem with false positives even having been marking these random text messages as spam for several years. If it's really an issue in thunderbird than the problem probably has more to do with thunderbird being poorly designed than with a weakness in bayesian filters with reguard to poisoning.

Anonymous said...

I think this isn't targeting individual users of spam filtering like you, but rather the bigger mail providers which might still be sharing a global spam filter for the whole system. It would be much more important to follow the advice you give in Yahoo! mail than in your individual Thunderbird.

Wesley

Anonymous said...

Well, I am not really a geek on that topic, but I never had any problems with my Yahoo! mail spam filter, so I think I have to agree with Wesley, I think. Anyway, I see you got a lot of comments on this :)