Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
MACHINE LEARNING IN REAL LIFE
PATRICK HALL, SAS INSTITUTE
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
AGENDA MACHINE LEARNING
•
•
•
•
•
Machine learning background
Machine learning in real life
Supervised Learning
Unsupervised Learning: Clustering Example
Closing
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING BACKGROUND
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING WHAT IS MACHINE LEARNING?
BACKGROUND
•
Wikipedia: Machine learning, a branch of artificial intelligence, concerns
the construction and study of systems that can learn from data.
•
SAS: Machine learning is a branch of artificial intelligence that automates the
building of systems that learn from data, identify patterns, and predict future
results – with minimal human intervention.
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING
VOCABULARY
BACKGROUND
Machine Learning Term
Multidisciplinary Synonyms
Feature, input
Independent variable, variable, column
Case, instance, example
Observation, record, row
Label
Dependent variable, target
Train
Fit
Class
Categorical target variable level
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING WHY WOULD YOU USE MACHINE LEARNING?
BACKGROUND
•
Machine learning is often used in situations where the predictive accuracy of
a model is more important than the interpretability of a model.
•
Common applications of machine learning include:
•
•
•
•
•
Pattern recognition
Anomaly detection
Medical diagnosis
Document classification
Machine learning shares many approaches with statistical modeling, data
mining, data science, and other related fields.
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING
MULTIDISCIPLINARY NATURE OF BIG DATA ANALYSIS
BACKGROUND
Statistics
Data
Science
Pattern
Recognition
Data Mining
Machine
Learning
Databases
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Information
Retrieval
Computational
Neuroscience
AI
Machine Learning
Data Mining
SUPERVISED
LEARNING
Regression
LASSO regression
Logistic regression
Ridge regression
Decision tree
Gradient boosting
Random forests
Know y
Neural networks
SVM
Naïve Bayes
Neighbors
Gaussian
processes
UNSUPERVISED
LEARNING
A priori rules
Clustering
k-means clustering
Mean shift clustering
Spectral clustering
Kernel density
estimation
Nonnegative
matrix
factorization
PCA
Don’t
know y
SEMI-SUPERVISED
LEARNING
Prediction and
classification*
Clustering*
EM
TSVM
Manifold
regularization
Autoencoders
Sometimes
know y
REINFORCEMENT
LEARNING
Multilayer perceptron
Restricted Boltzmann
machines
Kernel PCA
Sparse PCA
Singular value
decomposition
SOM
*In semi-supervised learning, supervised prediction and classification algorithms are often combined with
clustering.
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
TRANSDUCTION
DEVELOPMENTAL
LEARNING
MACHINE LEARNING IN REAL LIFE
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING IRL
• Buy-in
SERIOUS CHALLENGES
from decision makers
• Big data
• Dirty data
• Deployment
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING IRL
• No
BUY-IN FROM DECISION MAKERS
equations or statistics
• Tell visual stories
• Value must be proven
• Autopilot
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING IRL
BIG DATA
• Distributed storage platform
Hadoop Distributed File System (HDFS)
Massively parallel (MPP) databases
• Distributed analytics platform
Hadoop MapReduce, disk-enabled
SAS® High-Performance Analytics or SAS ® LASR
Analytic Server, in-memory
Spark MLlib, in-memory
MPI Distributed
Based
Data Scientist
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Software
Client
Data
and Software on
Multiple Servers
HADOOP
HDFS LAYER
MAP REDUCE LAYER
NAME NODE
Text
Centralized Metadata
DATA NODE 1
JOB TRACKER
Scheduling and
Management
Map Reduce
Instructions
TASK TRACKER 1
Text
Distributed Data
Map
Reduce
Intermediate Results
DATA NODE 2
TASK TRACKER 2
Text
Distributed Data
Reduce
Intermediate Results
…
…
DATA NODE N
Map
TASK TRACKER N
Text
Distributed Data
Commodity Server Cluster
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Map
Reduce
Final Results
Location
MACHINE LEARNING IRL
BIG DATA: HADOOP
• Bulk ETL
• Batch processing
• Online transactions (Impala)
• Advanced Analytics
MapReduce is a difficult framework for iterative algorithms
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
IN-MEMORY
Commodity Server Cluster
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Multi-threaded
Processing
DATA
Multi-threaded
Processing
DATA
Multi-threaded
Processing
DATA
Multi-threaded
Processing
DATA
MACHINE LEARNING IRL
Cardinality
• Missing
• MCBS
Values
Frequency
• High
DIRTY DATA
Characters
Blind_submodel
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING IRL
DEPLOYMENT
•
“Scoring” is the process of making predictions on new
data
•
Be aware of barriers to production implementation of
scoring procedures
•
In-database scoring is the current industry standard
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING IRL
DEPLOYMENT
•
Monitor performance over time
•
Try many different Champion-Challenger modeling
scenarios
•
The ultimate goal of machine learning is automated
decision making
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SUPERVISED LEARNING
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SUPERVISED
SACRIFICE INTERPRETABILITY FOR ACCURACY
LEARNING
Traditional regression
Decision tree
Hill and plateau sample data
Machine learning
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
DOCUMENT
TRAINING A CLASSIFIER IN SAS® ENTERPRISE MINERTM
CLASSIFICATION
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
DOCUMENT
EVALUATING A CLASSIFIER
CLASSIFICATION
𝑁𝑁
𝑀𝑀
1
Multiclass logarithmic loss: −𝑁𝑁 � � 𝑦𝑦𝑖𝑖,𝑗𝑗 log(𝑝𝑝𝑖𝑖,𝑗𝑗 )
𝑖𝑖=1 𝑗𝑗=1
Random Forest
200 Highest Document Count
200 SVDs
200 Highest MIC
Logarithmic loss
1.22
1.55
1.82
OOB misclassification
0.48
0.51
0.66
Neural Network
200 Highest Document Count
200 SVDs
200 Highest MIC
Logarithmic loss
0.08
1.45
0.20
Validation
misclassification
0.12
0.57
0.15
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SUPERVISED
LAW OF DIMINISHING RETURNS
LEARNING
0.95
Validation ROC Index
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
Benchmark Past Claims Transform to Parameter
Interval
Tuning
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Ensemble
SUPERVISED
MODELING APPROACH
LEARNING
1.
Establish a benchmark
2.
Take small, measurable steps from the benchmark
Ensemble models for noisy data
Deep learning for pattern recognition
Be vigilant against overfitting
•
Cross validation
•
Regularization
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Decision
Tree
Source: https://flic.kr/p/a4Lo4Q
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
RANDOM FOREST
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Deep Learning
MLP Neural Network
y
h11
x1
Autoencoder Neural Network
x1
h12
x2
x2
h11
x3
x1
x3
h12
x2
x3
Deep Learning
y
h31
h21
h11
x1
h22
x2
h14
x4
h23
13
hh32
h
32
h14
The weights from layerwise
pre-training can be used
asx1an initialization
for
hx21
hx22
hx23
2
3
4
training the entire deep
network!
h23
h13
x3
12
hh31
h
31
h11
h32
h12
hy22
h21
x5
x1
hh1313
hh1212
hh1111
x2
x3
Many separate, unsupervised, single hidden-layer networks are used to
initialize a larger supervised network in a layerwise fashion
x5
hh1414
x4
x5
UNSUPERVISED LEARNING
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
UNSUPERVISED
FEATURE EXTRACTION
LEARNING
Original Data
Reduced
Data
Feature extraction
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
UNSUPERVISED
FEATURE EXTRACTION METHODS
LEARNING
Random projections
Principal component
analysis
Singular value
decomposition
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Fastest
Runs directly on data matrix
Creates linear, oblique features
No one feature is more important than the
others
Not interpretable
Faster
Requires a covariance or correlation matrix
Creates linear, orthogonal features
Limited interpretability
Faster
Runs directly on the data matrix
Creates linear, orthogonal features
Sparse implementations available
Limited interpretability
UNSUPERVISED
FEATURE EXTRACTION METHODS
LEARNING
Factorization machines
Nonnegative matrix
factorization
Autoencoder networks
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Faster
Suitable for large, sparse data sources
Creates linear, oblique features
Accounts for variable interactions
Limited interpretability
Slower
Suitable for large, sparse, and nonnegative
data sources
Creates linear, oblique features
Higher interpretability
Slower
Runs directly on the data matrix
Creates highly representative, nonlinear
features
Not interpretable
UNSUPERVISED
CLUSTERING
LEARNING
Similar
Observations
Grouped Together
Original Data
Clustering
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
UNSUPERVISED LEARNING EXAMPLE
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k
How many market segments?
Use three similar methods to compare a clustering solution in the training data
(Wk) to a clustering solution in a reference distribution (Wk*).
• Aligned box criterion (ABC)
• Gap statistic
• Cubic clustering criterion (CCC)
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Reference
distribution
complexity
ESTIMATING k REFERENCE DISTRIBUTIONS
Sample Data
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k REFERENCE DISTRIBUTIONS
Reference Distribution
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k REFERENCE DISTRIBUTIONS
Aligned Box Criterion
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k REFERENCE DISTRIBUTIONS
Aligned Box Criterion
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k REFERENCE DISTRIBUTIONS
Aligned Box Criterion
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k REFERENCE DISTRIBUTIONS
Aligned Box Criterion
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k REFERENCE DISTRIBUTIONS
Aligned Box Criterion
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k REFERENCE DISTRIBUTIONS
Aligned Box Criterion
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k REFERENCE DISTRIBUTIONS
Aligned Box Criterion
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k REFERENCE DISTRIBUTIONS
Aligned Box Criterion
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k REFERENCE DISTRIBUTIONS
Aligned Box Criterion
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k REFERENCE DISTRIBUTIONS
Gap Statistic
Clearer
Maxima.
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Aligned Box Criterion
Aligned Box Criterion
ESTIMATING k CLAIMS PREDICTION CHALLENGE DATA
•
Anonymized customer data
•
32 customer and product features
•
13,184,290 customer records
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k EXECUTING CALCULATIONS
Calculate …
•
CCC using PROC FASTCLUS.
•
gap statistic using the R cluster package in the Open Source Integration
Node in SAS Enterprise Miner.
•
ABC using PROC HPCLUS.
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k EXECUTING CALCULATIONS
R code in SAS Enterprise Miner:
# Load cluster package
library('cluster')
# Set random seed
set.seed(12345)
# Time and execute Gap statistic using k-means clustering
system.time(
gskmn <- clusGap(&EMR_IMPORT_DATA, FUN= kmeans, K.max= 20, B= 10)
)
# Output results
gskmn
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k INTERPRETING RESULTS
Cubic Clustering Criterion
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k INTERPRETING RESULTS
Gap Statistic
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k INTERPRETING RESULTS
Aligned Box Criterion
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ESTIMATING k TIMINGS
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
CLOSING
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SAS Data Mining Community
Communities.sas.com/data-mining
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
CLOSING
Use the scientific method!
http://www.sas.com/en_us/insights/articles/analytics/keeping-the-science-in-datascience.html
Where you can find me:
SAS Data Mining Community
https://communities.sas.com/community/support-communities/sas_data_mining_and_text_mining
Quora
Github
www.quora.com
github.com/jphall663
github.com/sassoftware
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .