Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Saskatoon SAS user group Efficiency and data mining? Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Agenda • • Background Case Study Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Agenda • • Background Case Study Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Predictive Analytics…Data science…Statistics…Machine Learning…Data mining It means different things to different people? Uses a variety of tools Show me the power Heavy Excel user Show me the easy button Tries to avoid next migraine How do we manage this? Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Consistent answers So what? CRISP-DM The Data Mining Process Methodology CRISP-DM is good methodology SEMMA is a process in Enterprise Miner. It aligns well with CRISP-DM This process is your friend. Use it. Iterate. Fail fast. SEMMA Process Sample Explore Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Modify Model Assess Deploy Building a predictive model 3 Approaches Rapid Predictive Modeler (RPM) • Preconfigured Enterprise Miner workflow in Enterprise Guide • Easy • Quick • Good models • Auditable and reusable Enterprise Miner • • • • • Visual workflows Powerful Medium difficulty Great models Auditable and reusable Programming • Difficult to learn • Some Data Scientists prefer this • Not suitable for the business analyst Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. The Data Mining Process How to add efficiency 1. 2. 3. Use visualization early in the process Don’t be afraid to build models, start with RPM Fail fast Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. • Understand the problem • Understand the data Agenda • • Background Case Study Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. The Data Mining Process Case study We have a problem! Use actionable, in-memory, bigdata, cloud, machine-learning, analytics to fix it You mean use predictive modeling to find the trucks that are going to blow up Last time it was altitude related Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. • 40 000 vehicles – Fleet is ageing • Trucks are equipped with Telematics • The data scientist is on vacation • Dataset = 1,5GB (2M rows) !!!!!!!!!! - my spreadsheet won’t open it….. Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Case study What I am going to show you Use visualization early in the process to formulate a strategy Sample Explore Modify Model Demo 1 • Visual exploration of timeline • Cluster analysis Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Assess Deploy Case study What I am going to show you Don’t be afraid to model Sample Rapid Predictive Modeler Explore Modify Model Demo 2 - Feature engineering - 2 Minute model - Enterprise Model Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Assess Deploy Enterprise Miner Case study What I am going to show you This is how we derive value from the model Sample Explore Modify Model Assess Demo 3 - Create score-code - Geo spatial representation of scored data Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Deploy Sample & Explore Data Sample Explore Modify Model • Missing data is a landmine. Identify and remediate. • Visualize - Reconstruct a timeline • Explore before sub setting or filtering Demo 1 • Visual exploration of timeline • Cluster Analysis Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Assess Deploy Sample & Explore Data Sample Explore Modify Model Assess Deploy Cluster Analysis in Visual Analytics Now that I understand the data, I have a plan Sample only Alternator faults Focus on recent data. Using all the history may pollute my model Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Modify Model Assess Sample • • • • Explore Modify Model Assess Use Rapid Predictive Modeler to fail fast Look at the variable importance chart Engineer features into the data Mitigate the risk of overfitting – (holdouts, model selection criteria) Demo 2 - Feature engineering - RPM Advanced - EM Model Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Deploy Modify Data Sample Explore Modify Model Assess Engineered Features • • Binning into deciles • Binning into quartiles • Altitude • Speed • Engine hours • RPM • Years in service • Water temp*oil temp • Odometer mileage • Days since service origin • Oil temp • Water temp Computed variables • RPM • Days since service origin • Water temp * Oil temp Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Deploy Modify Model Assess Sample Explore Step Modify Model Misclassification % rate % Improvement Just do it – Model on full dataset 10.30 Assess Deploy Champion Model Logistic regression RPM - Regression on segmented data 8.56 16.89 RPM - Intermediate 8.02 6.31 Decision tree 2 RPM - Advanced 7.27 9.35 Decision Tree 3 Add feature engineered variables 6.94 4.54 Decision Tree 3 Use Enterprise Miner 6.46 6.92 Ensemble (neural network and decision tree) We improve the model by iterating Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Logistic regression (segmented dataset; sampled) Pre release version of SAS Visual Data Mining and Machine Learning Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Deploy Sample • • • Explore Modify Model Assess How will the model output be used by someone that knows nothing about data science? Scorecode is useful. A model is not. Visualize the output Demo 3 - Create score-code - Geo spatial representation of scored data Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Deploy Deploy Sample Explore Modify Model Out of a truck fleet of 2000+ • 72 have fault codes on alternators • 12 are prioritized for maintenance based on the prediction Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Assess Deploy This is where they are The Data Mining Process How to add efficiency 1. 2. 3. Use visualization early in the process Don’t be afraid to build models, it is easy, start with RPM Fail fast Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d. Ideas? Questions? sas.com Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.