Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Cliff Notes on Ecological Niche Modeling with RandomForest (ensembles) Falk Huettmann EWHALE lab University of Alaska Fairbanks AK 99775 Email [email protected] Tel. 907 474 7882 Modeling Ecological Niches Geographic Space Environmental factor b Ecological Space Latitude Longitude Sampling Space Environmental factor a Model Space => Predictions A Super Model LM GLM GAM CART MARS NN GARP TN RF GDM Maxent… =>Ensembles A starting point… Linear regression Y X ‘Mean’ SD One formula capturing the data Response Variable ~ Predictor1 Y X y=a +bx Common Ground A Multiple Regression framework Response Variable ~ Predictor1 + Predictor2 + Predictor3… Common Ground A Multiple Regression framework Response Variable ~ Predictor1 + Predictor2 + Predictor3… Traditionally, we used 1-5 predictors But: 1 to 1000s of predictors are possible ‘One single algorithm’ that explains relationship between response and predictors Derived relationship can be predicted to other locations with known predictors GLM Linear (~unrealistic) ‘Mean’ SD => potentially low r2 vs CART etc. Non-Linear (driven by data) ‘Mean’ ? SD ? CART, TreeNet & RandomForest (there are many other algorithms !) Our Free Algorithms … R-Project TreeNet RandomForest (free 30 day trial) Fortran, C … http://rweb.stat.umn.edu/R/library/randomForest/html/00Index.html http://salford-systems.com/products.php Tree/CART - Family Classification & Regression Tree (CART) =>Binary recursive partitioning Leo Breiman 1984, and others Tree/CART - Family Classification & Regression Tree (CART) =>Binary recursive partitioning Temp>15 Precip <100 Temp<5 YES Leo Breiman 1984, and others NO Tree/CART - Family Binary splits Widely used concept Leo Breiman 1984, and others Tree/CART - Family Binary splits Widely used concept Free of data assumptions! No significances. Leo Breiman 1984, and others Tree/CART - Family Binary splits Widely used concept Free of data assumptions! No significances. Binary split recursive partitioning (same predictor can re-occur elsewhere as a ‘splitter’) Maximizes Nodes for Homogenous Variance Stopping Rules for Number of Branches based on Optimization/Cross-validation Terminal Nodes show Means (Regression Tree) or Categories (Classification Tree) Leo Breiman 1984, and others Tree/CART - Family Binary splits Widely used concept Multiple splits Free of data assumptions! No significances. Rarely used, yet Binary split recursive partitioning (same predictor can re-occur elsewhere as a ‘splitter’) Maximizes Nodes for Homogenous Variance Classification Tree Stopping Rules for Number of Branches based on Optimization/Cross-validation Terminal Nodes show Means (Regression Tree) or Categories (Classification Tree) A B C 0.3 A Leo Breiman 1984, and others Regression Tree B 3 0.1 2 2.3 CART Salford (rpart in R) Nice to interpret (e.g. for small trees, or when following through specific decision rules til end) CART Salford (rpart in R) Nice to interpret (e.g. for small trees, or when following through specific decision rules til end) From withheld Test Data ROC Optimum ROC curves for accuracy tests Importance Value DEM 100.00 TAIR_AUG 77.58 PREC_AUG 69.46 HYDRO 54.59 POP 47.39 LDUSE 40.88 |||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||| |||||||||||||||||||||| ||||||||||||||||||| ||||||||||||||||| e.g. correctly predicted absence app. 77% e.g. correctly predicted presence app. 85% =>Apply to a dataset for predictions TreeNet (~A sequence of CARTs) ‘boosting’ + + + The more nodes …the more detail …the slower + Many trees make for a ‘net of trees’, or ‘a forest’ => Leo Breiman + Data Mining TreeNet (~A sequence of CARTs) ‘boosting’ + + The more nodes …the more detail …the slower + + each explains remaining variance ROC 100 0.4 0.3 60 Risk Pct. Class 1 80 0.2 40 0.1 20 0.0 0 0 20 40 60 0 80 100 10 30 40 50 60 70 80 90 100 110 Number of Trees Pct. Population Importance Value Variable Score LDUSE 100.00 |||||||||||||||||||||||||||||||||||||||||| TAIR_AUG 97.62 ||||||||||||||||||||||||||||||||||||||||| HYDRO94.35 |||||||||||||||||||||||||||||||||||||||| DEM94.01 ||||||||||||||||||||||||||||||||||||||| PREC_AUG 90.17 |||||||||||||||||||||||||||||||||||||| POP 82.54 |||||||||||||||||||||||||||||||||| HMFPT81.46 |||||||||||||||||||||||||||||||||| 20 ROC curves for accuracy tests e.g. correctly predicted absence app. 97% e.g. correctly predicted presence app. 92% Difficult to interpret but good graphs =>Apply to a dataset for predictions Bear Occurrence (Partial Dependence) TreeNet: Graphic Output example yes Distance to Lake (m) no Response Curve Bear Occurrence (Partial Dependence) TreeNet: Graphic Output example yes or ? Distance to Lake (m) no Response Curve (the function above is virtually impossible to fit in linear algorithms => misleading coefficients, e.g. from LMs, GLMs) Bear Occurrence (Partial Dependence) TreeNet: Graphic Output example ? or yes ? Distance to Lake (m) no Response Curve (the function above is virtually impossible to fit in linear algorithms => misleading coefficients, e.g. from LMs, GLMs) RandomForest (Prasad et al. 2006, Furlanelllo et al. 2003 ‘Boosting & Bagging’ algorithms (~Ensemble) Random set 1 Breimann 2001) Random set 2 Average Final Tree from >2000 trees done by VOTING RandomForest (Prasad et al. 2006, Furlanelllo et al. 2003 ‘Boosting & Bagging’ algorithms (~Ensemble) Breimann 2001) Random set 1 Predictors Random set of Rows (Cases) DEM Slope Aspect Climate Landcover 1 2 Random set 2 3 4 5 Average Final Tree from e.g.>2000 trees done by VOTING RandomForest (Prasad et al. 2006, Furlanelllo et al. 2003 ‘Boosting & Bagging’ algorithms (~Ensemble) Breimann 2001) Random set 1 Random set of Columns (Predictors) Random set of Rows (Cases) DEM Slope Aspect Climate Landcover 1 Random set 2 2 3 4 Average Final Tree from e.g.>2000 trees done by VOTING 5 Difficult to interpret but good graphs RandomForest (Prasad et al. 2006, Furlanelllo et al. 2003 Boosting & Bagging algorithms Breimann 2001) Handles ‘noise’, interactions Random set 1 and categorical data fine! Random set of Columns (Predictors) Random set of Rows (Cases) DEM Slope Aspect Climate Landcover 1 Random set 2 2 3 4 Average Final Tree from e.g.>2000 trees done by VOTING 5 Bagging: Optimization based on In-Bag, Out-of Bag samples In RF no pruning => Difficult to overfit (robust) Difficult to interpret but good graphs RandomForest and GIS: Spatial Modeling RandomForest and GIS: Spatial Modeling Response Predictors GIS Visualization of Predictions GIS Overlays Table Apply Model Train & Develop Model RandomForest (quantification) RandomForest and GIS: Spatial Modeling Response Predictors aaahhhh uuhhhh ?! -Makes sense because of... -No, wait a minute, that’s wrong… GIS Visualization of Predictions GIS Overlays Table Apply Model Train & Develop Model RandomForest (quantification) RandomForest: Why so good and useable ? Allows for: Works multivariate (100s of predictors) Best Possible Predictions Best Possible Clustering (without a response variable) Tracking of Complex Interactions Predictor Ranking Handling Noisy Data Algorithms: RandomForest (R, Fortran, Salford) YAIMPUTE (R) PARTY (R) … Fast & convenient applications Allows for multiple (!) response variables ! => Change in World’s Science What to read, for instance… http://www.stat.berkeley.edu/~breiman/RandomForests/ Breiman, L. 2001. Statistical modeling: the two cultures. Statistical Science. 16(3): 199 – 231. Craig, E., and F. Huettmann. (2008). Using “blackbox” algorithms such as TreeNet and Random Forests for data-mining and for finding meaningful patterns, relationships and outliers in complex ecological data: an overview, an example using golden eagle satellite data and an outlook for a promising future. Chapter IV in Intelligent Data Analysis: Developing New Methodologies through Pattern Discovery and Recovery (Hsiao-fan Wang, Ed.). IGI Global, Hershey, PA, USA. Magness, D.R., F. Huettmann, and J.M. Morton. (2008). Using Random Forests to provide predicted species distribution maps as a metric for ecological inventory & monitoring programs. Pages 209-229 in T.G. Smolinski, M.G. Milanova & AE. Hassanien (eds.). Applications of Computational Intelligence in Biology: Current Trends and Open Problems. Studies in Computational Intelligence, Vol. 22, Springer-Verlag Berlin Heidelberg. 428 pp. Prasad, A. L.A. Iverson, A. Liar. 2006. Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction. Ecosystems 181-199. (and Hastie & Tibshirani, Furlanello et al. 2003, Elith et al. 2006 etc. etc.) From now on, simply referred to as … A Super Model LM GLM CART MARS NN GARP TN RF GDM Maxent… =>Ensembles Some Super Models: Ensembles LM GLM CART MARS NN GARP TN RF GDM Maxent… Find the best model for a given section of your data => the best possible fit & prediction poly LM poly log Pres/Abs LM RF log RF Predictors Ivory Gull On Greyboxes, Philosophy and Science Data Algorithm with a Known Behavior (Data Mining) Prediction & Accuracy On Greyboxes, Philosophy and Science Data Algorithm with a Known Behavior (Data Mining) Prediction & Accuracy Such a statistical relationship will be found by either CART, TN, RF or LM, GLM On Greyboxes, Philosophy and Science Data Algorithm with a Known Behavior (Data Mining) Prediction & Accuracy GLMs as a blackbox!? YES. Just think of software implementations, Max-Likelihood, Model Fitting AIC and Research Design (sensu Keating & Cherry 1994) On Greyboxes, Philosophy and Science Data Algorithm with a Known Behavior 100% Model Performance 0% (Data Mining) Prediction & Accuracy Improvement Increases -> Over time -> GLM ANN Boosting, Bagging … GLMs as a blackbox!? YES. Just think of software implementations, Max-Likelihood, Model Fitting AIC and Research Design (sensu Keating & Cherry 1994) Parsimony, Inference and Prediction ?! Sole focus on predictions and its accuracies, whereas… …R2, p-values and traditional inference (variable rankings, AIC) are of lower relevance Why Parsimony ? No real need for optimizing the fit and for parsimony when prediction is the goal Global accuracy metrics, ROC, AUC, kappa, meta analysis …(instead of p-values and significance levels or AIC)