Download ECLT 5810 E-Commerce Data Mining Techniques

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
ECLT 5810
E-Commerce Data Mining Techniques Introduction
Prof. Wai Lam
Data Opportunities
●
●
●
●
Business infrastructure have improved the ability to
collect data
Virtually every aspect of business is now open to data
collection:
◆ customer behavior, marketing campaign
performance, supply-chain management, workflow
procedures, etc
Information is now widely available on external events
◆ market trends, industry news, etc
Broad availability of data
◆ increasing interest in methods for extracting useful
information and knowledge from data
2
Data Opportunities
●
●
●
●
Companies focus on exploiting data for competitive
advantage
In the past, statisticians and analysts to explore datasets
manually
◆ but the volume and variety of data have far
outstripped the capacity of manual analysis
Computers have become far more powerful
◆ algorithms have been developed that can connect
datasets to enable broader and deeper analyses
Given rise to the increasing widespread business
application of data mining techniques
3
Data Mining Adoption
●
●
●
●
●
Data Mining – extraction of useful knowledge from
data
Knowledge may refer to - models, rules, regularities,
patterns
◆ non-trivial, implicit, previously unknown
Used for general customer relationship management
◆ analyze customer behavior in order to manage
attrition and maximize expected customer value
Used for credit scoring and trading, fraud detection, and
workforce management
Major retailers from Walmart to Amazon apply data
mining from marketing to supply-chain management.
4
A Data Mining Case in Retail Industry - Walmart
●
Largest retailer in the world
●
Over 10,000 stores and half a trillion dollars in annual revenue
5
A Data Mining Case in Retail Industry - Walmart
●
●
●
Collects 2.5 petabytes of information from 1 million customers
every hour
Employs data analytics to improve operational efficiency and
marketing campaigns
Mining sales data
◆ Find patterns that can be used to provide personalized
product recommendations and promotional offers
◆ Correlates sales trends with external events such as trending
in Twitter, local weather.
◆ In 2013, it found that strawberry pop-tarts sales increased by
700% before a hurricane
6
A Data Mining Case in Retail Industry - Walmart
●
Connecting in-store and online customer behavior
◆ It has customer data of about 60% of all US adults.
◆ Track when shoppers are looking at a product online using
in-store Wi-Fi.
◆ Social Genome: Reach customers who have mentioned
something on social media to inform them about that
product and include a discount.
◆ Combine information from online and in-store sources
7
A Data Mining Case in Retail Industry - Walmart
●
Personalized Recommendation
◆ Send tailored product offers based on customer behavior
online and inside their stores.
◆
◆
Track campaign’s email open rate and re-align delivery
times based on user patterns.
Email subject line personalization
8
Data-Driven Decision Making (DDD)
●
●
●
●
In 2012, Walmart’s competitor Target cares about
consumers’ shopping habits, what drives them, and
what can influence them.
◆ Consumers tend to have inertia in their habits and
getting them to change is very difficult.
Decision makers at Target knew that the arrival of a new
baby in a family is one point where people do change
their shopping habits significantly.
Most retailers compete with each other trying to sell
baby-related products to new parents.
Since most birth records are public, retailers obtain
information on births and send out special offers to the
new parents.
9
Data-Driven Decision Making (DDD)
●
●
Target wanted to get a jump on their competition. They
were interested in whether they could predict that
people are expecting a baby.
◆ If they could, they would gain an advantage by
making offers before their competitors.
Using techniques of data mining, Target analyzed
historical data on customers who later were revealed to
have been pregnant.
◆ Pregnant mothers often change their diets, their
wardrobes, their vitamin regimens, and so on. These
indicators could be extracted from historical data,
assembled into predictive models
10
Data-Driven Decision Making (DDD)
●
●
●
In both the Walmart and the Target example, the data
analysis was not testing a simple hypothesis.
Instead, the data were explored with the hope that
something useful would be discovered.
Also referred to as data mining, predictive analytics,
business intelligence
Video for predictive analytics
https://www.youtube.com/watch?v=TZerxbj5uL8
11
Data Mining: A General KDD Process
Data mining:
the core of knowledge discovery process
Pattern Evaluation
Data Mining
Task-relevant Data
Data Warehouse
Selection
Data Cleaning
Data Integration
Databases
12
Data Mining Procedures
Input Data
Data PreProcessing
Data integration
Normalization
Feature selection
Dimension reduction

Data
Mining
Pattern discovery
Association & correlation
Classification
Clustering
Outlier analysis
…………
PostProcessing
Pattern
Pattern
Pattern
Pattern
evaluation
selection
interpretation
visualization
This is a view from typical machine learning and statistics communities
13
Steps of a General KDD Process
●
Learning the application domain:
 relevant prior knowledge and goals of application
●
Creating a target data set: data selection
●
Data cleaning and preprocessing: (may take 60% of effort!)
●
Data reduction and transformation:
●
●
Find useful features, dimensionality/variable reduction, invariant
representation.
Choosing functions of data mining
 summarization, classification, regression, association, clustering.
●
Choosing the mining algorithm(s)
●
Pattern evaluation and knowledge presentation
14
Data Mining: Functionalities (1)

Classification and Prediction

Finding models (functions) that describe and
distinguish classes or concepts for future
prediction


e.g., classify countries based on climate, or
identify good clients
Model: decision-tree, classification rule, neural
network
15
Data Mining: Functionalities (2)

Cluster analysis

Class label is unknown: Group data to form new
classes


e.g., cluster houses to find distribution
patterns
Clustering based on the principle: maximizing
the intra-class similarity and minimizing the
interclass similarity
16
Data Mining: Functionalities (3)

Association (correlation and causality)

age(X, “20..29”) ^ income(X, “20K..29K”)
→ buys(X, “PC”)
[support = 2%, confidence = 60%]
17
Data Mining: Relationship to Other
Disciplines
Database
Technology
Machine
Learning
Information
Science
Statistics
Data Mining
Visualization
Other
Disciplines
18