Download Introduction to Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Introduction to Data Mining
Part 1
Introductory Material
Overview
This module will introduce Data
Mining and provide examples of
successful applications.
What is Data Mining?
Why is it important?
How do we mine data?
Introductory Material
Overview
Text Book:
Building Better Models with JMP
Pro by Jim Grayson, Sam
Gardner, and Mia L. Stephens
Software:
Excel
JMP Pro Version 12
What is Data Mining?
Data Mining is extracting useful
information from massive
quantities of data that
businesses generate.
It is exploration and analysis
by automatic means,
of large quantities of data,
to discover actionable insights
and rules.
More applications?
e-mail spam filters
examine millions of emails to classify as
spam or not
banks trying to identify applicants more
likely to default on loans
fraud prediction
credit card companies: questioning a
charge?
insurance companies: which claims are
most likely fraudulent
government: which tax returns are most
likely fraudulent
More applications?
Text mining at Google and Yahoo
helps order websites by relevance to
your search.
Retaining good customers for cell
phone carriers (& banks): which
customers are more likely to abandon
service (that is, churn)
discounts or other enticements might be
offered
Why is Data Mining
so effective now?
 Data are warehoused and computerized.
 Collected through bar coding, scanning,
RFID, internet, ERP systems, etc.
 Computing power is cheap
 Competitive pressure
 Commercial products available
 How do we do it?
Two Basic Styles
1. Top-Down
HYPOTHESIS TESTING
Science
SUPERVISED
have a theory, experiment to
prove/disprove
2. Bottom-Up
KNOWLEDGE DISCOVERY
UNSUPERVISED
start with data, see new patterns
Creativity
Data Mining Process: CRISP-DM
How does it work?
Cross Industry Standard
Process – for Data Mining
A standard process model for
data mining
Independent of industry sector &
technology
CRISP-DM Phases
Result:
Business
understanding
Problem definition
Data
understanding
Data collected
Data
preparation
Clean Data Table
Modeling
Choose model(s)
Evaluation
Do Model results achieve objectives?
Deployment
Use the model to make decisions
CRISP-DM Phases
Business
understanding
Data
understanding
Data
preparation
Modeling
Evaluation
Deployment
Solve a specific problem
Clear definition helps
Measurable success
criteria
Convert business
objectives to set of datamining goals
What to achieve in
technical terms
CRISP-DM Phases
Business
understanding
Data
understanding
Data
preparation
Modeling
Evaluation
Deployment
Data can come from many
sources
Internal
ERP system, data
warehouse,
External
Government data,
commercial,
Created
Research
CRISP-DM Phases
Business
understanding
Data
understanding
Data
preparation
Modeling
Clean data
format, identify gaps, filter
outliers & redundancies
sampling rare events
Transform & create dataset
for modeling
Types of data
Evaluation
Deployment
Nominal
Ordinal
Continuous
CRISP-DM Phases
Business
understanding
Data
understanding
Data
preparation
Modeling
Evaluation
Deployment
 Data Treatment
How many data sets?
training, validation, test.
 Techniques
Regression
Decision Trees
Neural Nets
Boosted Trees
Bootstrap Forest
CRISP-DM Phases
Business
understanding
Data
understanding
Data
preparation
Modeling
Evaluation
Deployment
Check that the model is
good and evaluate to
assure that nothing is
missing.
Does model meet business
objectives?
Any important business
objectives not addressed?
Does model make sense?
Is it actionable?
CRISP-DM Phases
Business
understanding
Data
understanding
Data
preparation
Modeling
Evaluation
Deployment
Ongoing monitoring &
maintenance
evaluate performance
against success criteria
market reaction?
competitor changes?