Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Fighting Spam Randy Appleton Northern Michigan University [email protected] What is Spam • Probably, it’s “unsolicited and unwanted commercial email sent in bulk”. Sometimes It’s Not Spam • You did sign up for it. • You accidentally signed up for it. • You still don’t want it. How Is It Delivered? • Anyone can fake email. • 80% of all spam came from bot-nets – We helped • Open relays are mostly gone. • You can hire this done for you (see Google). How Much Spam Is There? • In absolute numbers • 1978 - An e-mail spam is sent to 600 addresses. • 1994 - First large-scale spam sent to 6000 bulletin boards, reaching millions of people. • 2005 - (June) 30 billion per day • 2006 - (June) 55 billion per day How Much Spam Is There #2 • As a percentage of the total volume of e-mail • MAAWG estimates that 80-85% of incoming mail is "abusive email", as of the last quarter of 2005. The sample size for the MAAWG's study was over 100 million mailboxes. • More is coming!!! Why They Spam • • • • • • Money Political causes. Money It’s fun Money Money Sell You Something • It’s just mass electronic marketing • They give you a web site, you click over and buy the product. • Email might even be targeted. • weight loss.html Does Selling By Email Work? • Kodak settled a CAN SPAM suit with the FTC. Their Ofoto unit sent two million commercial messages that didn't comply with the CAN SPAM act. They didn't include a notice that it was an ad, opt-out info, and Kodak's postal address. They paid the FTC $26,000, the revenue they got. Pure Fraud “There is a sucker born every minute.” • Send email to lots of people. • Wait for sucker to respond. • Convince them to give you money. • Nigerian bank fraud Identity Theft • • • • Send an email message. Direct them with a bad URL. Capture their info. Reject login and send them to the right site. • Microsoft says to manually check every link. Identity Theft #2 • An Example • Who Did It. Stock Manipulation • • • • • • • Pick a small cap stock Buy some. Send spam telling people about the stock. Sell when price rises. stock-spam.txt spam-stock.jpg New York Times Yes, Spam Works • 5% response rate from sexual material. • 0.02% response rate for drugs. • 0.0075% response rate for Rolex Watches. Avoiding Spam • Don’t let them get your email address. – Don’t use AOL, etc. – Don’t put address on web page. – Don’t use mailing lists. • Throw away email addresses. – Mailinator, spamgourmet, sneakermail • Annoying …. but possible. List Removal • For a reputable company, you can always click “remove me from the list”. • A disreputable company will merely take that to be confirmation you’re reading the email. • It’s a calculated gamble. Auto Detecting Spam • • • • • Blacklist Whitelist Bayesian Analysis Other Analysis These are all things your email server does for you. Blacklist • A list of web sites from which you don’t take mail. • Automatically interfaced to your email server. • Spamhaus Block List – Zelots – Many choices. Defeating Blacklists • The spammers can switch ISPs. • The spammers can use a botnet. Whitelist • There is no global whitelist; you make your own. • Your own contact group is a good start. • Add your institution. • Add people to whom you have sent mail. • Semiautomatic at best. Bayesian Analysis • Make two piles of mail: spam and ham. • Find words or phrases that can be used to identify mail. • Check all incoming mail for those phrases. • Normally you get a starter database that can be customized. Example Bayesian Analysis • My friends don’t email me about Viagra. • They do email me about Linux. • The phrase “stupid freshmen” appears in email to me. • The phrase “hot freshman” does not. • Result is a score. Defeating Bayesian Analysis • Intentionally misspell words • Humans will still understand Vi agra. • But this is an even more potent of spamhood. (unless you have unique misspelling, and you don't). • Use pictures instead of words. • But that indicates spam-hood too. Defeating Bayesian Analysis • Add happy words • Add poetry • move closer playoff berth Yankees straight East CNBC Stocks Source: HP got Fiorinas phone records Energy price drop makes Feds job easier farmers Oil continues slide gas Down Oprahs America live Clay Aiken releases third album turn Crocodile AND NASA spotting debris Probe spots around Saturn Distorted solar system Schools crack amp FITNESS More potent Test Pattern: Multilink Monday World poor real headlines Watch netcast man who believes planning make next nuclear Other Analysis • Look for a feature that is more likely in spam than ham. • Give that a score. • If the score is high enough, reject. • Can (and should) be combined with other techniques. SpamAssasin Rules • http://spamassassin.apache.org/tests_3_1_ x.html. Other Analysis Example • 1.3 HELO_DYNAMIC_HEXIP hostname (Hex IP) Relay HELO'd using suspicious • 0.8 EXTRA_MPART_TYPE type:...type= entry • 2.0 DATE_IN_FUTURE_03_06 Date: is 3 to 6 hours after Received: date • 0.9 HTML_IMAGE_ONLY_24 2400 bytes of words • 0.0 HTML_MESSAGE • 2.0 RCVD_IN_SORBS_DUL dynamic IP address Header has extraneous Content- BODY: HTML: images with 2000- BODY: HTML included in message RBL: SORBS: sent directly from Other Analysis Example #2 • 0.9 FORGED_YAHOO_RCVD 'From' yahoo.com does not match 'Received' headers • 0.8 HTML_IMAGE_ONLY_32 BODY: HTML: images with 2800-3200 bytes of words • 2.2 REPTO_QUOTE_YAHOO Yahoo! doesn't do quoting like this • 0.2 MIME_BOUND_NEXTPART Spam tool pattern in MIME boundary Other Fun Ideas • Return “temporary error” code on first time senders. • Return “temporary error” code if you think the last email was spam. • Ring them back. • Challange-Response. Current Best Technique • • • • • • If on whitelist, admit mail. Check connection address via blacklist(s). Check for spammy features. Use Bayesian Analysis Produce a combined score. Sort on combined score. Success Rate • You can expect 80%-95% of spam to be detected. • You can expect nearly 0% of real mail to be misclassified. • Adding anti-virus detection is easy. • Adding porn detection is not reasonably possible. Fighting Back • Don’t. • The nasty email goes to an innocent. • Or it confirms you exist. • Or it bounces back to you. Using • Gmail filters. • Gmail allows pop downloads. • You can even forward the mail to Gmail to keep your old account name. Using JunkEmailFilter.com • Small business, $100 per year. • Large business, ¼ cent per email. • Progressive charities are free. • Lots of other choices. Buy a Box • • • • Barracuda Networks will see you a box. It will filter spam, viruses, and more. Costs $2000. Very low administration once set up. Summary • Keep email address private. • Use spam detecting software. • Use spam detecting software. • Or use Google. • Suffer.