* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Exploring Data for Regression Modeling
Survey
Document related concepts
Transcript
Best Model Dylan Loudon Linear Regression Results Erin Alvey Who will you trust? • Field technicians? • Software programmers? • Statisticians? • Instructors? • GIS technicians? • Other researchers? • Yourself? Regression (Correlation) Modeling • Creates a model in N-Dimensional “Hyper-Space” • Defined by: – Covariates – Response variables – Mathematics used to create the model – Statistics used to optimize parameters – Options for model evaluation – Predictor variables Multiple Linear Regression Linear Regression: 2 Predictors Mathworks.com Non-Linear Regression Regression Methods • Continuous Regression: – Linear Regression – Generalized Linear Models (GLM) – Generalized Additive Models (GAMs) • Categorical Regression (trees): – Regression Trees – Classification and regression trees (CART) • Machine Learning: – Maximum Entropy (Maxent) – NPMR, HEMI, BRTs, etc. Brown Shrimp Size • Add graph from work Terminology • Plant uses: – Measured value and response variable – Explanatory variable • I prefer: – Response variable – I’ll use “measured value” to identify measured values in field data – Covariate: Explanatory variable used to build the model – Predictor: Explanatory variable used to predict Douglas Fir Habitat Model Habitat Quality 1 0 0 Precipitation (mm) 1000 Model Predictor Prediction Model Selection and Parameter Estimation Field Data Covariate Model Predictor Prediction Model Selection and Parameter Estimation Field or Sample Data Covariate Model Predictor Model Validation Prediction Douglas-Fir sample data Lat Lon F3 40.893634 -121.802272 40.987702 -122.117088 40.987702 -122.117088 40.987702 -122.117088 40.987702 -122.117088 40.987702 -122.117088 MeanTempPrecip 41 69 1070 45 96 1406 40 96 1406 43 96 1406 42 96 1406 46 96 1406 Create the Model Model “Parameters” Precip Extract Prediction To Points Text File X Attributes Y MeanTempPrecip Predict -123.677 41.61906 71 1548 193.6 -123.344 41.61906 55 1212 150.4 -123.011 41.61906 79 887 187.5667 -122.677 41.61906 68 584 155.4667 -122.344 41.61906 102 513 221.1 To Raster Data • Response Variable – From the field data (sample data) • Covariates – From the field or remotely sensed • Predictors – Typically remotely sensed – Sample as covariates for training – Can be different for predicting to new scenarios Response Variable • What is the: – Spatial uncertainty? – Temporal uncertainty? – Measurement uncertainty? • Will it answer your question? Covariate Variables • What is the: – Spatial uncertainty? – Temporal uncertainty? – Measurement uncertainty? • How well does the collection time of the covariates match the field data? • Do they co-vary with the phenomena? • Do the covariates “correlate”? Types of uncertainty • Accuracy (bias) • Precision (repeatability) • Reliability (consistency of a set of measurements) • Resolution (fineness of detail) • Logical consistency – Adherence to structural rules, attributes, and relationships • Completeness Types of Errors • Gross errors – Transcription – Sinks in DEMs • Random – Estimated using probability theory • Systematic errors – “Drift” in instruments – Dropped lines in Landsat Gross Errors • Lat/Lon: – Reversed – 0, names, dates, etc. • Dates: – Extended in databases • Measurements: – Inconsistent units – Inconsistent protocols – What can you expect from a field team? Occurrences of Polar Bears From The Global Biodiversity Information Facility (www.gbif.org, 2011) Systematic Errors Landsat Scan line Error Response Variable Qualification Tools • Maps (various resolutions) • Examine the data values: – How many digits? – Repeating patterns, gross errors? • “Documentation” • Measurements: – Occurrences? – Binary: Histogram – Categorical: Histogram – Continuous: Histogram What’s the Impact on Models? Significant Digits • How many digits to represent 1 meter? – Geographic: Lat/Lon? – UTM: Eastings/Northings? Significant Digits • Geographic: – 1 digit = 1 degree – 1 degree ~ 110 km – 0.00001 ~ 1.1 meters • UTM: – 1 digit = 1 meter Covariate Qualification • Maps • Documentation • Examine the data: – How many digits? • Integer or floating point? – Repeating patterns? • Histograms CONUS Annual Percip. -231 -219 -207 -195 -183 -172 -160 -148 -136 -124 -112 -100 -88 -77 -65 -53 -41 -29 -17 -5 7 19 30 42 54 66 78 90 102 Number of Pixels Scaled to 1 Covariate Uncertinaty Min Temp of Coldest Month 1.20 1.00 0.80 0.60 0.40 0.20 0.00 Degrees C Times 10 -230 -215 -201 -186 -172 -157 -143 -128 -114 -100 -85 -71 -56 -42 -27 -13 2 16 31 45 60 74 88 103 Number of Occurrences Scaled to 1 Min Temp of Coldest Month Min Temp: Envrionment 1.20 1.00 0.80 0.60 0.40 0.20 0.00 Degrees C Times 10 Histograms hist(Temp,breaks=400) Covariate Correlation • Correlation Plots • Pearson product-moment correlation coefficient • Spearman’s rho – non parametric correlation coefficient Correlation plots California Correlations California Predictors Response vs. Covariates • For Occurrences: – Histogram covariates at occurrences vs. overall covariates • For Binary Data: – Histogram covariates for each value • For Categorical Data : – Histogram covariates for each value – Or scatter plots • For Continuous Data – Scatter plots Covariate Occurrence Histograms Precipitation with Douglas-Fir Occurrences Douglas Fir Model In HEMI 2 Green: Histogram of all of California Red: Histogram of Douglas-Fir Occurrences Doug-Fir Height vs. Precip. Douglas Fir Height Terrestrial Predictors • Elevation: – Slope – Aspect – Absolute Aspect • Distance to: – Roads – Streams (streamline) • Climate – Precip – Temp • Soil Type • RS: – Landsat – MODIS – NDVI, etc. Marine Predictors • • • • • Temp DO2 Salinity Depth Rugosity (roughness) • Current (at depths) • Wind More Complicated • • • • Associated species Trophic levels Temporal Cyclical Predictor Layers • • • • Means, mins, maxes Range of values Heterogeneity Spatial layers: – Distance to… – Topography: elevation, slope, aspect Field Data and Predictors • As close to field measurements as possible • Clean and aggregate data as needed – Documenting as you go • Estimate overall uncertainty • Answer the question: – What spatial, temporal, and measurement scales are appropriate to model at given the data? Temporal Issues • Divide data into months, seasons, years, decades. – Consistent between predictors and response • Extract predictors as close to sample location and dates as possible • Use the “best” predictor layers Additional Slides Dimensions of uncertainty • • • • • Space Time Attribute Scale Relationships Basic Tools • Histograms: What is the distribution of occurrences of values (range and shape) • Scattergrams: What is the relationship between response and predictor variables and between predictor variables • QQPlots: Are the residuals normally distributed? Types of Data • “God does not play dice” – Einstein • “the end of certainty” – Prigogine, 1977 Nobel Prize • What remains is: – Quantifiable probability with uncertainty Uncertainty Factors • • • • Inherent uncertainty in the world Limitation of human congnition Limitation of measurement Uncertainty in processing and analysis