Download Enterprise Miner: What is new in version 5.2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Predictive Modeling
Concepts and Algorithms
Russ Albright and David Duling
SAS Institute
Copyright © 2006, SAS Institute Inc. All rights reserved.
Predictive Modeling Landscape
 1. Background
 2. Modeling Overview
 3. Models
 4. Model Assessment and Selection
 5. Model Deployment / Scoring
Copyright © 2006, SAS Institute Inc. All rights reserved.
Use Cases for Data Mining
1.
Offline applications
 Campaign planning
 Adverse event detection
2.
On-demand applications
 Front Office data collection & recommendation
3.
Real-time applications
 Transaction processing
 Fraud detection
 Website product recommendation
4.
Real time modeling and scoring of data streams (the future!)
 Mega data streams
 Internet traffic
 Satellite transmissions
 Digital data acquisition
Copyright © 2006, SAS Institute Inc. All rights reserved.
Background - Enterprise Miner Functionality
ample
xplore
odify
odel
ssess
Copyright © 2006, SAS Institute Inc. All rights reserved.
Background - Predictive Modeling Terminology
Training Data
Variables/Features/Attributes
O
b
s
e
r
v
a
t
i
o
n
s
Validation and Test Data
Scoring Data
Copyright © 2006, SAS Institute Inc. All rights reserved.
Actual Target
Actual Target
Actual Target
Predicted Target
(Output)
Predicted Target
(Output)
Modeling Overview
 What do we mean by prediction?
 What is a predictive model?
• Classification/descriminant model– target is categorical,
usually binary
• Regression model– target continuous
 Given {x(i),y(i)},
y=f(x,θ)
E(y|x,θ)
p(y|x,θ)
Copyright © 2006, SAS Institute Inc. All rights reserved.
Response
Consider the following data
Predict the Response for a new value of Attribute
Attribute
Copyright © 2006, SAS Institute Inc. All rights reserved.
Response
The Most Simple Model: y = Y
Attribute
Copyright © 2006, SAS Institute Inc. All rights reserved.
Response
What about a polynomial ?
Attribute
Copyright © 2006, SAS Institute Inc. All rights reserved.
Response
What about a better polynomial ?
Attribute
Copyright © 2006, SAS Institute Inc. All rights reserved.
Now acquire more data and call it “validation data”
The blue model is said to overfit the training data.
Response
The mean model is said to underfit the training data.
Attribute
Copyright © 2006, SAS Institute Inc. All rights reserved.
Training
Validation
Models
 Linear Regression
 Logistic Regression
(Generalized Linear
Model)
Y
*
*
*
** *
*
* *
* *
*
Fit pj = p(yj=0|x) = 1- p(yj=1|x)
*
*
**
*
X2
y = 0 + 1x1 + 2x2
Copyright © 2006, SAS Institute Inc. All rights reserved.
0-1 target/response variable
X1
log(pj/(1-pj)) = 0 + 1X1 + 2X2
Response
Idea: What if we break the data into smaller
chunks to identify local phenomena ?
Attribute
Copyright © 2006, SAS Institute Inc. All rights reserved.
Decision Trees
Copyright © 2006, SAS Institute Inc. All rights reserved.
Neural Networks
ftp://ftp.sas.com/pub/neural/FAQ.html
Copyright © 2006, SAS Institute Inc. All rights reserved.
Evolution of model training error and validation
error
Optimal fit
Initialization
Model Error
Validation Error
Copyright © 2006, SAS Institute Inc. All rights reserved.
Underfitting
Overfitting
Training Error
Memory Based Reasoning (Nearest Neighbors)
Y
*
*
*
*
** *
* *
**
*
*
**
*
X2
Copyright © 2006, SAS Institute Inc. All rights reserved.
Neighbors
X1
Model Assessment and Selection – Lift charts
Test Data
Actual Target
Predicted Target
(Output)
0
1
1
0
.3
.9
.8
.6
0
1
1
1
Decision
Copyright © 2006, SAS Institute Inc. All rights reserved.
Model Assessment Selection – ROC CURVES
Copyright © 2006, SAS Institute Inc. All rights reserved.
Copyright © 2006, SAS Institute Inc. All rights reserved.
5. $ Model Deployment / “Scoring” $
 It is definitely not (just) about building the models.
 Scoring and Score Code
 Monitoring
Copyright © 2006, SAS Institute Inc. All rights reserved.
Batch Score Delivery to Offline Applications
ETL for model development and scoring
Scores generated on nightly basis
ID and Score data pre-loaded into data store
Score requests contain ID
Decision server translates score to action
ETL engine
Scheduled Scoring
ETL process
Copyright © 2006, SAS Institute Inc. All rights reserved.
Data Mining
SAS Scoring
RDB Scoring
C code
PMML engine
Data Store
Scores
Model
Development
BI Application
Campaign
Planning
Operations
Campaign
Execution
Thanks!
Copyright © 2006, SAS Institute Inc. All rights reserved.
Related documents