Download Class4.1 - Other Methods and Success Stories

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
More Data Mining
Success Stories for
Marketing and Related
Fields
Wolfgang Jank
RH Smith School of Business
University of Maryland
What is “Data Mining”?
What is Data Mining?
 Many


Definitions
Non-trivial extraction of implicit, previously
unknown and potentially useful information from
data
Exploration & analysis, by automatic or
semi-automatic means, of
large quantities of data
in order to discover
meaningful patterns
Related Fields
Machine
Learning
Visualization
Data Mining and
Knowledge Discovery
Statistics
Databases
Why Mine Data?
Because there are Data
Floods….
Why Mine Data?

Lots of data is being automatically
collected and warehoused
 Web
data, e-commerce
 Scanner data at department/
grocery stores
 Bank/Credit Card/Insurance
transactions

Computers have become cheaper and more
powerful

Competitive Pressure is Strong
 Provide
better, customized services for an edge
Big Data Examples

Europe's Very Long Baseline Interferometry
(VLBI) has 16 telescopes, each of which
produces 1 Gigabit/second of astronomical
data over a 25-day observation session
 storage

and analysis a big problem
AT&T handles billions of calls per day
 so
much data, it cannot be all stored -- analysis
has to be done “on the fly”, on streaming data
Data Growth
In 2 years, the size of the largest database TRIPLED!
Data Mining is particularly
promising Online

Why?
 Because
every “click” leaves a digital footprint
 We can use these footprints to better
understand our customers…


Coupons, ads, discount, dynamic pricing, …
…or guard them against predators

Fraud detection, account protection, spam, junk
mail, viruses, …
Blog Pulse

Measures what
the world (= the
internet) is
thinking
 Measured
in
terms of the
blogging activity
The “Obama Buzz”
started here!
The Republican
Convention &
Sarah Palin
Google Trends

Measures what
the world is
looking for
 Measured
in
terms of
search words
The world’s interest in “Lehman Brothers”
and “AIG”
Google Flu Trends

Detects outbreaks of
flu early and only
based on search
terms


More accurate and
faster than CDC
Read more at
http://www.google.org/fl
utrends/
Data Mining Success Stories
The Netflix Recommendation
Engine

Netflix uses data mining to
make recommendations to its
users





Based on past user behavior
Based on movie similarities
Helps cross-selling of
products
Improves the search
experience for users
However, developing good recommendation engines
is not easy; therefore, Netflix has initiated the Netflix
Challenge
The Netflix Challenge
“The Netflix Prize
seeks to substantially
improve the accuracy
of predictions about
how much someone is
going to love a movie
based on their movie
preferences”


Netflix offers $1 million for the person/team that can
improve their current data mining method by 10% (i.e.
classification accuracy)



http://www.netflixprize.com/
Incremental progress prizes of $50,000 every year
AT&T team has won progress prize in 2007
Amazon’s Recommendation
Engine



Every time we buy a
book on Amazon,
we receive
recommendations
about similar books
How are they doing
this?
The answer:
massive data mining
Google’s Search Algorithm


Google continuously
collects data about
web pages using web
spiders
It transforms this
massive data into
search information
using the famous
“page-rank” algorithm
AT&T’s Fraud Detection
In the AT&T telephone network, every day old
nodes drop out (terminated accounts) and new
nodes pop up (new accounts)
Fraudulent account:
terminated!
Should this new account
be allowed?
Name
Elizabeth Harmon
Name
Elizabeth Harmon
Address
APT 1045
4301 ST JOHN RD
SCOTTSDALE, AZ
Address
180 N 40TH PL
APT 40
PHOENIX, AZ
Balance
$149.00
Balance
$72.00
Disconnected
2/19/04 (nonpayment)
Connected
1/31/04
AT&T’s Fraud Detection
AT&T uses massive
graph mining to
detect fraud in their
telephone network
data
Mining Accounting Fraud at
PricewaterhouseCoopers


PwC uses data mining for the
automatic analysis of company
general ledgers to detect
accounting fraud
Helps conform with SarbanesOxley Act
 Improves efficiency
 Improves accuracy
Sales Lead Identification at IBM

IBM uses predictive modeling to estimate
opportunities for cross-selling to existing
customers, selling of existing services to
new customers
 Uses
analytic tools to estimate
A potential customer’s wallet size
 A potential customer’s probability of purchasing
a service

Data Mining at IBM
Firmographics
Historical total
Software sales
State is CA
Sector is IT
IBM Relationship
Historical
Lotus sales
Historical
System p sales
Company is HQ
Historical
System x sales
Historical
System z sales
New Rational sales
zata3: Data-Driven Decisions in
Election Campaigns


zata3 is an election
campaign consulting
company
They recently
decided to add data
mining technology to
their services
zata3: Lot’s of data on voters
and past voting behavior


Goal: to predict who
will vote in the next
election
Idea: better
targeted spending
of election
campaign resources
General(00-03) Presidential Primary
Y
YYYY
R
YY
YYYY
YYY
YYYY
DD
RR
RR
YYYY
DD
VH General VH Presidential VH Primary G04 Voted
1
0
0
0
4
1
0
1
2
0
0
1
0
0
0
1
4
2
2
1
3
2
1
1
4
2
1
1
0
0
0
1
4
2
3
1
0
0
0
1
PARTY_CODE Gender Education Children Home_Owner Income Times Donated
A
0
3
1
4
3
0
R
0
0
4
2
0
A
1
0
3
1
D
1
0
0
0
D
1
5
0
2
6
0
R
1
0
4
3
0
R
1
0
4
1
0
R
1
0
0
0
D
1
7
1
2
4
0
D
1
0
0
0
zata3: Huge savings with data
mining

Zata3 anticipates savings of over 30%
using data mining models
Analysis
Traditional
Total Cost
Voted
Cost Per Vote
$ 74,522.50
14664 $
5.08
With Data Mining
Total Cost
Voted
Cost Per Vote
$ 52,806.64
15626 $
3.38
Savings
Total Cost
Votes
Cost Per Vote % Savings
$ 21,715.86
962 $
1.70
34%
Data Mining and
Mass e-Customization
Customization for Online Services

Opportunities:
 Combination
of countless
features for highly individualized
solutions


“A single personalized solution for
every customer”
Challenges:
 How
does the customer
understand what’s right for
them?

Moving from consultative selling to
self-consultative buying
Ex.: Freddie Mac
Mortgage Services

Freddie Mac mass
customizes mortgage
products
 Combines
hundreds of
different loan characteristics

Challenge: How does the
customer find the loan
that’s right for them?
Ex.: Mass Customization at eBay

eBay offers any possible product & service in
“garage-type” sales
 However,
it does not assist the customer much in
finding the right product/service.
Ex.: Books on Amazon

Amazon.com offers books for every taste
 But:
How can we find the book that’s right for us?
Managing Mass Customization at
Amazon

How does Amazon assure that customers
find what they are looking for?
 Answer:
by making (automated)
recommendations
Managing Mass Customization

From Expert Salesperson to Expert System:


How can we assure that our customers get what they are looking for?
Pre-Internet customization:


Expert Salesperson



Experienced with product,
process
Consultative selling
Salesperson provides
expertise, identifies needs,
defines configuration

Early/current-Internet
customization:

Expert Customer



Experiences with product
Revelation, Transaction buying
Customer provides expertise,
knows needs, defines
configuration
Future Internet Customization:

Non-Expert Customer



Inexperienced with product,
process
Self-consultative buying
System provides expertise,
identifies needs, defines
configuration
Providing the non-expert customer
with decision support


Moving from Expert to Non-Expert Buyers: Computerization
Assisted service

Telephone, email,
instant messaging
 Drawback: requires
human interaction,
only limited scalability


Self service

Search, user ratings, forums,
blogs, expert
recommendations
 Drawback: does not help the
customer that is unsure about
their needs
Automated service





Expert systems for the non-expert
Replaces the salesperson
Translates customer characteristics and usage requirements into
recommended product configurations
Consists of rule-based systems and data mining algorithms
Advantage: fully automatic, scalable, updatable
Ex.: Automated-Service at AmEx

Offers online tool that, based on desired
features, recommends best card
 Compensates
only for lack of product knowledge,
but assumes customer knows why they need the
product.
Ex.: Blockbuster’s
Recommendation System

Blockbuster
recommends similar
movies based on movie
features and user
behavior
 “If
you liked Indiana
Jones, then you will also
like Tomb Raider”
Key Component for Automated
Service Systems: Data Mining

Collect and mine customer information in order to, e.g.,

Segment the market



Analyze behaviors and events



Understand when customer has needs and the events that lead to
them
E.g. path tracking, click stream analysis
Optimize prizing



Understand customers’ different needs, expertise, profitability
E.g. Dell distinguishes between the segments “Home”, “Small
Business”, “Medium/Large Business”, “Public Sector”
Bundling, price discrimination
E.g. Amazon’s price testing; Zilliant’s data-driven pricing software
Key requirement: understand customer data
Dangers of Data Mining
Dangers of Data Mining

The danger of using data mining
software/technology as a “black box”
 Data
does not mine itself!
 We still need the domain knowledge and
expertise of the user; otherwise outcomes
may be meaningless

Data quality
 Junk-in,
junk-out
What Data Mining Isn’t
Data Mining Isn’t…

…smarter than you
 Example
from DeVeaux:
A new backpack inkjet printer is showing higher
than expected warranty claims
 A neural networks analysis shows that Zip code is
the most important predictor

Data Mining Isn’t…

…always about algorithms
 Sometimes
is enough

Blogpulse
collecting an plotting the right data
More Data Mining Resources

Repository:
 http://www.kdnuggets.com/
 http://www.the-data-mine.com/

Tutorials
 http://www.autonlab.org/tutorials/

Software
 SAS
Enterprise Miner, SPSS Clementine,
Orange, Weka, Rattle, R, …