Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Predictive Modeling Concepts and Algorithms Russ Albright and David Duling SAS Institute Copyright © 2006, SAS Institute Inc. All rights reserved. Predictive Modeling Landscape 1. Background 2. Modeling Overview 3. Models 4. Model Assessment and Selection 5. Model Deployment / Scoring Copyright © 2006, SAS Institute Inc. All rights reserved. Use Cases for Data Mining 1. Offline applications Campaign planning Adverse event detection 2. On-demand applications Front Office data collection & recommendation 3. Real-time applications Transaction processing Fraud detection Website product recommendation 4. Real time modeling and scoring of data streams (the future!) Mega data streams Internet traffic Satellite transmissions Digital data acquisition Copyright © 2006, SAS Institute Inc. All rights reserved. Background - Enterprise Miner Functionality ample xplore odify odel ssess Copyright © 2006, SAS Institute Inc. All rights reserved. Background - Predictive Modeling Terminology Training Data Variables/Features/Attributes O b s e r v a t i o n s Validation and Test Data Scoring Data Copyright © 2006, SAS Institute Inc. All rights reserved. Actual Target Actual Target Actual Target Predicted Target (Output) Predicted Target (Output) Modeling Overview What do we mean by prediction? What is a predictive model? • Classification/descriminant model– target is categorical, usually binary • Regression model– target continuous Given {x(i),y(i)}, y=f(x,θ) E(y|x,θ) p(y|x,θ) Copyright © 2006, SAS Institute Inc. All rights reserved. Response Consider the following data Predict the Response for a new value of Attribute Attribute Copyright © 2006, SAS Institute Inc. All rights reserved. Response The Most Simple Model: y = Y Attribute Copyright © 2006, SAS Institute Inc. All rights reserved. Response What about a polynomial ? Attribute Copyright © 2006, SAS Institute Inc. All rights reserved. Response What about a better polynomial ? Attribute Copyright © 2006, SAS Institute Inc. All rights reserved. Now acquire more data and call it “validation data” The blue model is said to overfit the training data. Response The mean model is said to underfit the training data. Attribute Copyright © 2006, SAS Institute Inc. All rights reserved. Training Validation Models Linear Regression Logistic Regression (Generalized Linear Model) Y * * * ** * * * * * * * Fit pj = p(yj=0|x) = 1- p(yj=1|x) * * ** * X2 y = 0 + 1x1 + 2x2 Copyright © 2006, SAS Institute Inc. All rights reserved. 0-1 target/response variable X1 log(pj/(1-pj)) = 0 + 1X1 + 2X2 Response Idea: What if we break the data into smaller chunks to identify local phenomena ? Attribute Copyright © 2006, SAS Institute Inc. All rights reserved. Decision Trees Copyright © 2006, SAS Institute Inc. All rights reserved. Neural Networks ftp://ftp.sas.com/pub/neural/FAQ.html Copyright © 2006, SAS Institute Inc. All rights reserved. Evolution of model training error and validation error Optimal fit Initialization Model Error Validation Error Copyright © 2006, SAS Institute Inc. All rights reserved. Underfitting Overfitting Training Error Memory Based Reasoning (Nearest Neighbors) Y * * * * ** * * * ** * * ** * X2 Copyright © 2006, SAS Institute Inc. All rights reserved. Neighbors X1 Model Assessment and Selection – Lift charts Test Data Actual Target Predicted Target (Output) 0 1 1 0 .3 .9 .8 .6 0 1 1 1 Decision Copyright © 2006, SAS Institute Inc. All rights reserved. Model Assessment Selection – ROC CURVES Copyright © 2006, SAS Institute Inc. All rights reserved. Copyright © 2006, SAS Institute Inc. All rights reserved. 5. $ Model Deployment / “Scoring” $ It is definitely not (just) about building the models. Scoring and Score Code Monitoring Copyright © 2006, SAS Institute Inc. All rights reserved. Batch Score Delivery to Offline Applications ETL for model development and scoring Scores generated on nightly basis ID and Score data pre-loaded into data store Score requests contain ID Decision server translates score to action ETL engine Scheduled Scoring ETL process Copyright © 2006, SAS Institute Inc. All rights reserved. Data Mining SAS Scoring RDB Scoring C code PMML engine Data Store Scores Model Development BI Application Campaign Planning Operations Campaign Execution Thanks! Copyright © 2006, SAS Institute Inc. All rights reserved.