Download Sample

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Mixture model wikipedia , lookup

Transcript
Saskatoon SAS user group
Efficiency and data mining?
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Agenda
•
•
Background
Case Study
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Agenda
•
•
Background
Case Study
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Predictive Analytics…Data science…Statistics…Machine Learning…Data mining
It means different things to different people?
Uses a variety of
tools
Show me
the power
Heavy Excel user
Show me
the easy
button
Tries to avoid next
migraine
How do
we
manage
this?
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Consistent
answers
So what?
CRISP-DM
The Data Mining Process
Methodology
CRISP-DM is good methodology
SEMMA is a process in Enterprise
Miner. It aligns well with CRISP-DM
This process is your friend. Use it.
Iterate. Fail fast.
SEMMA
Process
Sample
Explore
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Modify
Model
Assess
Deploy
Building a predictive model
3 Approaches
Rapid Predictive
Modeler (RPM)
• Preconfigured Enterprise
Miner workflow in
Enterprise Guide
• Easy
• Quick
• Good models
• Auditable and reusable
Enterprise
Miner
•
•
•
•
•
Visual workflows
Powerful
Medium difficulty
Great models
Auditable and reusable
Programming
• Difficult to learn
• Some Data Scientists prefer this
• Not suitable for the business analyst
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
The Data Mining Process
How to add efficiency
1.
2.
3.
Use visualization early in the process
Don’t be afraid to build models, start with RPM
Fail fast
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
• Understand the
problem
• Understand the data
Agenda
•
•
Background
Case Study
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
The Data Mining Process
Case study
We have a problem!
Use actionable, in-memory, bigdata, cloud, machine-learning,
analytics to fix it
You mean use predictive
modeling to find the trucks
that are going to blow up
Last time it was
altitude related
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
•
40 000 vehicles – Fleet is ageing
•
Trucks are equipped with Telematics
•
The data scientist is on vacation
•
Dataset = 1,5GB (2M rows) !!!!!!!!!! - my spreadsheet won’t
open it…..
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Case study
What I am going to show you
Use visualization early in
the process to formulate a
strategy
Sample
Explore
Modify
Model
Demo 1
• Visual exploration of timeline
• Cluster analysis
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Assess
Deploy
Case study
What I am going to show you
Don’t be afraid to model
Sample
Rapid
Predictive
Modeler
Explore
Modify
Model
Demo 2
- Feature engineering
- 2 Minute model
- Enterprise Model
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Assess
Deploy
Enterprise Miner
Case study
What I am going to show you
This is how we derive value
from the model
Sample
Explore
Modify
Model
Assess
Demo 3
- Create score-code
- Geo spatial representation of scored data
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Deploy
Sample & Explore Data
Sample
Explore
Modify
Model
• Missing data is a landmine. Identify and remediate.
• Visualize - Reconstruct a timeline
• Explore before sub setting or filtering
Demo 1
• Visual exploration of timeline
• Cluster Analysis
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Assess
Deploy
Sample & Explore Data
Sample
Explore
Modify
Model
Assess
Deploy
Cluster Analysis in Visual Analytics
Now that I understand the data,
I have a plan
Sample only Alternator faults
Focus on recent data.
Using all the history may pollute
my model
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Modify Model Assess
Sample
•
•
•
•
Explore
Modify
Model
Assess
Use Rapid Predictive Modeler to fail fast
Look at the variable importance chart
Engineer features into the data
Mitigate the risk of overfitting – (holdouts, model selection criteria)
Demo 2
- Feature engineering
- RPM Advanced
- EM Model
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Deploy
Modify Data
Sample
Explore
Modify
Model
Assess
Engineered Features
•
•
Binning into deciles
•
Binning into quartiles
•
Altitude
•
Speed
•
Engine hours
•
RPM
•
Years in service
•
Water temp*oil temp
•
Odometer mileage
•
Days since service origin
•
Oil temp
•
Water temp
Computed variables
•
RPM
•
Days since service origin
•
Water temp * Oil temp
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Deploy
Modify Model Assess
Sample
Explore
Step
Modify
Model
Misclassification
%
rate % Improvement
Just do it – Model on full dataset
10.30
Assess
Deploy
Champion Model
Logistic regression
RPM - Regression on segmented data
8.56
16.89
RPM - Intermediate
8.02
6.31
Decision tree 2
RPM - Advanced
7.27
9.35
Decision Tree 3
Add feature engineered variables
6.94
4.54
Decision Tree 3
Use Enterprise Miner
6.46
6.92
Ensemble (neural network and decision tree)
We improve the model by iterating
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Logistic regression (segmented dataset; sampled)
Pre release version of SAS Visual Data
Mining and Machine Learning
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Deploy
Sample
•
•
•
Explore
Modify
Model
Assess
How will the model output be used by someone that
knows nothing about data science?
Scorecode is useful. A model is not.
Visualize the output
Demo 3
- Create score-code
- Geo spatial representation of scored data
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Deploy
Deploy
Sample
Explore
Modify
Model
Out of a truck fleet of 2000+
• 72 have fault codes on alternators
• 12 are prioritized for maintenance
based on the prediction
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Assess
Deploy
This is where they are
The Data Mining Process
How to add efficiency
1.
2.
3.
Use visualization early in the process
Don’t be afraid to build models, it is easy, start with RPM
Fail fast
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.
Ideas?
Questions?
sas.com
Copy rig ht © SA S Institute Inc. A ll rig hts re se rve d.