Download RFModelalgorithm_FHu..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Multinomial logistic regression wikipedia , lookup

Transcript
Cliff Notes on Ecological Niche Modeling
with RandomForest (ensembles)
Falk Huettmann
EWHALE lab
University of Alaska
Fairbanks AK 99775
Email [email protected]
Tel. 907 474 7882
Modeling Ecological Niches
Geographic Space
Environmental factor b
Ecological Space
Latitude
Longitude
Sampling Space
Environmental factor a
Model Space
=> Predictions
A Super Model
LM
GLM
GAM
CART
MARS
NN
GARP
TN
RF
GDM
Maxent…
=>Ensembles
A starting point…
Linear regression
Y
X
‘Mean’
SD
One formula capturing the data
Response Variable ~ Predictor1
Y
X
y=a +bx
Common Ground
A Multiple Regression framework
Response Variable ~ Predictor1 + Predictor2 + Predictor3…
Common Ground
A Multiple Regression framework
Response Variable ~ Predictor1 + Predictor2 + Predictor3…
Traditionally, we used 1-5 predictors
But: 1 to 1000s of predictors are possible
‘One single algorithm’ that explains relationship between
response and predictors
 Derived relationship can be predicted to other locations
with known predictors
GLM
Linear
(~unrealistic)
‘Mean’
SD
=> potentially low r2
vs
CART etc.
Non-Linear
(driven by data)
‘Mean’ ?
SD ?
CART, TreeNet &
RandomForest
(there are many
other algorithms !)
Our Free Algorithms …
R-Project
TreeNet
RandomForest
(free 30 day trial)
Fortran, C …
http://rweb.stat.umn.edu/R/library/randomForest/html/00Index.html
http://salford-systems.com/products.php
Tree/CART - Family
Classification & Regression Tree (CART)
=>Binary recursive partitioning
Leo Breiman 1984, and others
Tree/CART - Family
Classification & Regression Tree (CART)
=>Binary recursive partitioning
Temp>15
Precip <100
Temp<5
YES
Leo Breiman 1984, and others
NO
Tree/CART - Family
Binary splits
Widely used concept
Leo Breiman 1984, and others
Tree/CART - Family
Binary splits
Widely used concept
Free of data
assumptions!
No significances.
Leo Breiman 1984, and others
Tree/CART - Family
Binary splits
Widely used concept
Free of data
assumptions!
No significances.
Binary split recursive partitioning (same
predictor can re-occur elsewhere as a ‘splitter’)
Maximizes Nodes for Homogenous Variance
Stopping Rules for Number of Branches based
on Optimization/Cross-validation
Terminal Nodes show Means (Regression Tree)
or Categories (Classification Tree)
Leo Breiman 1984, and others
Tree/CART - Family
Binary splits
Widely used concept
Multiple splits
Free of data
assumptions!
No significances.
Rarely used, yet
Binary split recursive partitioning (same
predictor can re-occur elsewhere as a ‘splitter’)
Maximizes Nodes for Homogenous Variance
Classification
Tree
Stopping Rules for Number of Branches based
on Optimization/Cross-validation
Terminal Nodes show Means (Regression Tree)
or Categories (Classification Tree)
A
B
C
0.3
A
Leo Breiman 1984, and others
Regression
Tree
B
3
0.1
2
2.3
CART Salford (rpart in R)
Nice to interpret
(e.g. for small trees, or
when following through
specific decision rules
til end)
CART Salford (rpart in R)
Nice to interpret
(e.g. for small trees, or
when following through
specific decision rules
til end)
From
withheld
Test Data
ROC
Optimum
ROC curves for accuracy tests
Importance Value
DEM
100.00
TAIR_AUG 77.58
PREC_AUG 69.46
HYDRO
54.59
POP
47.39
LDUSE
40.88
||||||||||||||||||||||||||||||||||||||||||
||||||||||||||||||||||||||||||||
|||||||||||||||||||||||||||||
||||||||||||||||||||||
|||||||||||||||||||
|||||||||||||||||
e.g. correctly predicted absence
app. 77%
e.g. correctly predicted presence app. 85%
=>Apply to a dataset for predictions
TreeNet
(~A sequence of CARTs)
‘boosting’
+
+
+
The more nodes
…the more detail
…the slower
+
Many trees make for a ‘net of trees’, or ‘a forest’
=> Leo Breiman + Data Mining
TreeNet
(~A sequence of CARTs)
‘boosting’
+
+
The more nodes
…the more detail
…the slower
+
+
each explains remaining variance
ROC
100
0.4
0.3
60
Risk
Pct. Class 1
80
0.2
40
0.1
20
0.0
0
0
20
40
60
0
80 100
10
30
40
50
60
70
80
90
100
110
Number of Trees
Pct. Population
Importance Value
Variable
Score
LDUSE 100.00
||||||||||||||||||||||||||||||||||||||||||
TAIR_AUG 97.62
|||||||||||||||||||||||||||||||||||||||||
HYDRO94.35
||||||||||||||||||||||||||||||||||||||||
DEM94.01
|||||||||||||||||||||||||||||||||||||||
PREC_AUG 90.17 ||||||||||||||||||||||||||||||||||||||
POP 82.54
||||||||||||||||||||||||||||||||||
HMFPT81.46
||||||||||||||||||||||||||||||||||
20
ROC curves for accuracy tests
e.g. correctly predicted absence
app. 97%
e.g. correctly predicted presence app. 92%
Difficult to interpret
but good graphs
=>Apply to a dataset for predictions
Bear Occurrence
(Partial Dependence)
TreeNet: Graphic Output example
yes
Distance to Lake (m)
no
Response Curve
Bear Occurrence
(Partial Dependence)
TreeNet: Graphic Output example
yes
or
?
Distance to Lake (m)
no
Response Curve
(the function above is virtually impossible to fit
in linear algorithms => misleading coefficients, e.g. from LMs, GLMs)
Bear Occurrence
(Partial Dependence)
TreeNet: Graphic Output example
?
or
yes
?
Distance to Lake (m)
no
Response Curve
(the function above is virtually impossible to fit
in linear algorithms => misleading coefficients, e.g. from LMs, GLMs)
RandomForest (Prasad et al. 2006,
Furlanelllo et al. 2003
‘Boosting & Bagging’ algorithms
(~Ensemble)
Random set 1
Breimann 2001)
Random set 2
Average Final Tree
from >2000 trees
done by VOTING
RandomForest (Prasad et al. 2006,
Furlanelllo et al. 2003
‘Boosting & Bagging’ algorithms
(~Ensemble)
Breimann 2001)
Random set 1
Predictors
Random set of Rows
(Cases)
DEM
Slope
Aspect
Climate
Landcover
1
2
Random set 2
3
4
5
Average Final Tree
from e.g.>2000 trees
done by VOTING
RandomForest (Prasad et al. 2006,
Furlanelllo et al. 2003
‘Boosting & Bagging’ algorithms
(~Ensemble)
Breimann 2001)
Random set 1
Random set of Columns
(Predictors)
Random set of Rows
(Cases)
DEM
Slope
Aspect
Climate
Landcover
1
Random set 2
2
3
4
Average Final Tree
from e.g.>2000 trees
done by VOTING
5
Difficult to interpret
but good graphs
RandomForest (Prasad et al. 2006,
Furlanelllo et al. 2003
Boosting & Bagging algorithms
Breimann 2001)
Handles ‘noise’, interactions
Random set 1
and categorical data fine!
Random set of Columns
(Predictors)
Random set of Rows
(Cases)
DEM
Slope
Aspect
Climate
Landcover
1
Random set 2
2
3
4
Average Final Tree
from e.g.>2000 trees
done by VOTING
5
Bagging: Optimization based on In-Bag, Out-of Bag samples
In RF no pruning => Difficult to overfit
(robust)
Difficult to interpret
but good graphs
RandomForest and GIS: Spatial Modeling
RandomForest and GIS: Spatial Modeling
Response
Predictors
GIS
Visualization
of
Predictions
GIS
Overlays
Table
Apply
Model
Train &
Develop
Model
RandomForest
(quantification)
RandomForest and GIS: Spatial Modeling
Response
Predictors
aaahhhh
uuhhhh ?!
-Makes sense because of...
-No, wait a minute,
that’s wrong…
GIS
Visualization
of
Predictions
GIS
Overlays
Table
Apply
Model
Train &
Develop
Model
RandomForest
(quantification)
RandomForest: Why so good and useable ?
Allows for:
Works multivariate (100s of predictors)
Best Possible Predictions
Best Possible Clustering (without a response variable)
Tracking of Complex Interactions
Predictor Ranking
Handling Noisy Data
Algorithms:
RandomForest (R, Fortran, Salford)
YAIMPUTE (R)
PARTY (R)
…
Fast & convenient applications
Allows for multiple (!) response variables !
=> Change in World’s Science
What to read, for instance…
http://www.stat.berkeley.edu/~breiman/RandomForests/
Breiman, L. 2001. Statistical modeling: the two cultures. Statistical Science. 16(3): 199 –
231.
Craig, E., and F. Huettmann. (2008). Using “blackbox” algorithms such as TreeNet
and Random Forests for data-mining and for finding meaningful patterns,
relationships and outliers in complex ecological data: an overview, an example
using golden eagle satellite data and an outlook for a promising future. Chapter
IV in Intelligent Data Analysis: Developing New Methodologies through Pattern
Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA,
USA.
Magness, D.R., F. Huettmann, and J.M. Morton. (2008). Using Random Forests to
provide predicted species distribution maps as a metric for ecological inventory
& monitoring programs. Pages 209-229 in T.G. Smolinski, M.G. Milanova & AE. Hassanien (eds.). Applications of Computational Intelligence in Biology:
Current Trends and Open Problems. Studies in Computational Intelligence,
Vol. 22, Springer-Verlag Berlin Heidelberg. 428 pp.
Prasad, A. L.A. Iverson, A. Liar. 2006. Newer Classification and Regression Tree Techniques:
Bagging and Random Forests for Ecological Prediction. Ecosystems 181-199.
(and Hastie & Tibshirani, Furlanello et al. 2003, Elith et al. 2006
etc. etc.)
From now on, simply referred to as …
A Super Model
LM
GLM
CART
MARS
NN
GARP
TN
RF
GDM
Maxent…
=>Ensembles
Some Super Models: Ensembles
LM
GLM
CART
MARS
NN
GARP
TN
RF
GDM
Maxent…
Find the best model
for a given section of your
data => the best possible
fit & prediction
poly
LM
poly
log
Pres/Abs
LM
RF
log
RF
Predictors
Ivory Gull
On Greyboxes, Philosophy and Science
Data
Algorithm with a
Known Behavior
(Data Mining)
Prediction &
Accuracy
On Greyboxes, Philosophy and Science
Data
Algorithm with a
Known Behavior
(Data Mining)
Prediction &
Accuracy
Such a statistical relationship
will be found by either CART, TN, RF or
LM, GLM
On Greyboxes, Philosophy and Science
Data
Algorithm with a
Known Behavior
(Data Mining)
Prediction &
Accuracy
GLMs as a blackbox!? YES.
Just think of software implementations, Max-Likelihood, Model Fitting
AIC and Research Design (sensu Keating & Cherry 1994)
On Greyboxes, Philosophy and Science
Data
Algorithm with a
Known Behavior
100%
Model
Performance
0%
(Data Mining)
Prediction &
Accuracy
Improvement
Increases
-> Over time ->
GLM ANN Boosting, Bagging …
GLMs as a blackbox!? YES.
Just think of software implementations, Max-Likelihood, Model Fitting
AIC and Research Design (sensu Keating & Cherry 1994)
Parsimony, Inference and Prediction ?!
Sole focus on predictions and its accuracies, whereas…
…R2, p-values and traditional inference (variable rankings, AIC) are of lower relevance
Why Parsimony ?
No real need for optimizing the fit and for parsimony when prediction is the goal
Global accuracy metrics, ROC, AUC, kappa, meta analysis …(instead of p-values and
significance levels or AIC)