Download Fighting Spam - Northern Michigan University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Fighting Spam
Randy Appleton
Northern Michigan University
[email protected]
What is Spam
• Probably, it’s “unsolicited and unwanted
commercial email sent in bulk”.
Sometimes It’s Not Spam
• You did sign up for it.
• You accidentally signed up for it.
• You still don’t want it.
How Is It Delivered?
• Anyone can fake email.
• 80% of all spam came from bot-nets
– We helped 
• Open relays are mostly gone.
• You can hire this done for you (see Google).
How Much Spam Is There?
• In absolute numbers
• 1978 - An e-mail spam is sent to 600
addresses.
• 1994 - First large-scale spam sent to 6000
bulletin boards, reaching millions of
people.
• 2005 - (June) 30 billion per day
• 2006 - (June) 55 billion per day
How Much Spam Is There #2
• As a percentage of the total volume of
e-mail
• MAAWG estimates that 80-85% of
incoming mail is "abusive email", as of the
last quarter of 2005. The sample size for
the MAAWG's study was over 100 million
mailboxes.
• More is coming!!!
Why They Spam
•
•
•
•
•
•
Money
Political causes.
Money
It’s fun
Money
Money
Sell You Something
• It’s just mass electronic marketing
• They give you a web site, you click over
and buy the product.
• Email might even be targeted. 
• weight loss.html
Does Selling By Email Work?
• Kodak settled a CAN SPAM suit with the
FTC. Their Ofoto unit sent two million
commercial messages that didn't comply
with the CAN SPAM act. They didn't
include a notice that it was an ad, opt-out
info, and Kodak's postal address. They
paid the FTC $26,000, the revenue they
got.
Pure Fraud
“There is a sucker born every minute.”
• Send email to lots of people.
• Wait for sucker to respond.
• Convince them to give you money.
• Nigerian bank fraud
Identity Theft
•
•
•
•
Send an email message.
Direct them with a bad URL.
Capture their info.
Reject login and send them to the right
site.
• Microsoft says to manually check every
link.
Identity Theft #2
• An Example
• Who Did It.
Stock Manipulation
•
•
•
•
•
•
•
Pick a small cap stock
Buy some.
Send spam telling people about the stock.
Sell when price rises.
stock-spam.txt
spam-stock.jpg
New York Times
Yes, Spam Works
• 5% response rate from sexual material.
• 0.02% response rate for drugs.
• 0.0075% response rate for Rolex
Watches.
Avoiding Spam
• Don’t let them get your email address.
– Don’t use AOL, etc.
– Don’t put address on web page.
– Don’t use mailing lists.
• Throw away email addresses.
– Mailinator, spamgourmet, sneakermail
• Annoying …. but possible.
List Removal
• For a reputable company, you can always
click “remove me from the list”.
• A disreputable company will merely take
that to be confirmation you’re reading the
email.
• It’s a calculated gamble.
Auto Detecting Spam
•
•
•
•
•
Blacklist
Whitelist
Bayesian Analysis
Other Analysis
These are all things your email server
does for you.
Blacklist
• A list of web sites from which you don’t
take mail.
• Automatically interfaced to your email
server.
• Spamhaus Block List
– Zelots
– Many choices.
Defeating Blacklists
• The spammers can switch ISPs.
• The spammers can use a botnet.
Whitelist
• There is no global whitelist; you make your
own.
• Your own contact group is a good start.
• Add your institution.
• Add people to whom you have sent mail.
• Semiautomatic at best. 
Bayesian Analysis
• Make two piles of mail: spam and ham.
• Find words or phrases that can be used to
identify mail.
• Check all incoming mail for those phrases.
• Normally you get a starter database that
can be customized.
Example Bayesian Analysis
• My friends don’t email me about Viagra.
• They do email me about Linux.
• The phrase “stupid freshmen” appears in
email to me.
• The phrase “hot freshman” does not.
• Result is a score.
Defeating Bayesian Analysis
• Intentionally misspell words
• Humans will still understand Vi agra.
• But this is an even more potent of spamhood. (unless you have unique misspelling,
and you don't).
• Use pictures instead of words.
• But that indicates spam-hood too.
Defeating Bayesian Analysis
• Add happy words
• Add poetry
• move closer playoff berth Yankees straight East CNBC
Stocks Source: HP got Fiorinas phone records Energy
price drop makes Feds job easier farmers Oil continues
slide gas Down Oprahs America live Clay Aiken releases
third album turn Crocodile AND NASA spotting debris
Probe spots around Saturn Distorted solar system
Schools crack amp FITNESS More potent Test Pattern:
Multilink Monday World poor real headlines Watch netcast
man who believes planning make next nuclear
Other Analysis
• Look for a feature that is more likely in
spam than ham.
• Give that a score.
• If the score is high enough, reject.
• Can (and should) be combined with other
techniques.
SpamAssasin Rules
• http://spamassassin.apache.org/tests_3_1_
x.html.
Other Analysis Example
•
1.3 HELO_DYNAMIC_HEXIP
hostname (Hex IP)
Relay HELO'd using suspicious
•
0.8 EXTRA_MPART_TYPE
type:...type= entry
•
2.0 DATE_IN_FUTURE_03_06 Date: is 3 to 6 hours after Received:
date
•
0.9 HTML_IMAGE_ONLY_24
2400 bytes of words
•
0.0 HTML_MESSAGE
•
2.0 RCVD_IN_SORBS_DUL
dynamic IP address
Header has extraneous Content-
BODY: HTML: images with 2000-
BODY: HTML included in message
RBL: SORBS: sent directly from
Other Analysis Example #2
• 0.9 FORGED_YAHOO_RCVD
'From'
yahoo.com does not match 'Received'
headers
• 0.8 HTML_IMAGE_ONLY_32 BODY:
HTML: images with 2800-3200 bytes of
words
• 2.2 REPTO_QUOTE_YAHOO
Yahoo!
doesn't do quoting like this
• 0.2 MIME_BOUND_NEXTPART Spam
tool pattern in MIME boundary
Other Fun Ideas
• Return “temporary error” code on first time
senders.
• Return “temporary error” code if you think
the last email was spam.
• Ring them back.
• Challange-Response.
Current Best Technique
•
•
•
•
•
•
If on whitelist, admit mail.
Check connection address via blacklist(s).
Check for spammy features.
Use Bayesian Analysis
Produce a combined score.
Sort on combined score.
Success Rate
• You can expect 80%-95% of spam to be
detected.
• You can expect nearly 0% of real mail to be
misclassified.
• Adding anti-virus detection is easy.
• Adding porn detection is not reasonably
possible.
Fighting Back
• Don’t.
• The nasty email goes to an innocent.
• Or it confirms you exist.
• Or it bounces back to you.
Using
• Gmail filters.
• Gmail allows pop downloads.
• You can even forward the mail to Gmail to
keep your old account name.
Using JunkEmailFilter.com
• Small business, $100 per year.
• Large business, ¼ cent per email.
• Progressive charities are free.
• Lots of other choices.
Buy a Box
•
•
•
•
Barracuda Networks will see you a box.
It will filter spam, viruses, and more.
Costs $2000.
Very low administration once set up.
Summary
• Keep email address private.
• Use spam detecting software.
• Use spam detecting software.
• Or use Google.
• Suffer.