* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Modeling Suitable Habitat for the Endangered Navasota Ladies
Survey
Document related concepts
Transcript
Predictive Habitat Suitability Model for the Endangered Navasota Ladies’-Tresses Patrick Young GISC 6389 Masters Project May 8, 2008 Discussion Topics • Subject background – Introduce plant – Describe surveying obstacles • Project purpose – Test model using public data – Determine real world applicability value • Literature review – Criteria: discuss suitable habitat – Habitat modeling – Logistic regression analyses • Data acquisition – NLT data – Defining study areas – Discuss variable set 1 • Data creation – Describe methodology (programs, tools, etc) – Discuss variable set 2 • Statistical input preparation – Prepare input table of all dependent and independent variables • Statistical analysis – Run logistic regression in R – Discuss changes made to improve model • Results & interpretation – Present R results – Present scatter plots & histograms • Conclusion – Recommendation for real world application – Future improvements Navasota Ladies’-tresses Subject Background: Spiranthes parksii, Navasota Ladies’-Tresses (NLT) • Endangered native East Texas orchid • Perennial herb, 8-15 inches tall that produces a single spike of tiny white flowers • Produces a basal rosette of leaves in early spring that die before the plant flowers from midOctober to mid-November •Texas Parks and Wildlife, “Navasota Ladies’-tresses (Spiranthes parksii),” Texas Parks and Wildlife, http://www.tpwd.state.tx.us/huntwild/wild/species/navasolt/ Subject Background: Surveying • Must be surveyed in certain counties to obtain construction permits • Can only be identified when in bloom but blooming does not occur every year • Often immediately eaten by wildlife • Very similar in appearance to Spiranthes cernua (not endangered) parksii cernua Purpose • To determine whether GIS and statistical-based methods can be used to accurately model NLT habitat at the landscape-scale using publicly available datasets • Results from this study will help determine how GIS and readily available datasets can be best used to aid field surveyors Literature Review: NLT Habitat • Pelchat, C., 2005. Spiranthes parksii Correll-Navasota Ladies’ Tresses. – Discusses rediscovery, range/habitat, and morphology of NLT • Lea, W.A., 2005. USFWS Biological Opinion. – Written to federal agencies to propose changes to roadway plans to avoid NLT impacts – Describes NLT life history, population distribution and status Literature Review: Habitat Modeling • Sperduto, M.B., Congalton, R.G., 1996. Predicting Rare Orchid (Small Whorled Pogonia) Habitat Using GIS. – Determined if landscape-level characteristics were associated with known sites – Utilized a GIS model to identify combinations of important habitat characteristics to locate other existing habitat – Utilized both equal weight and chi-square models and found 57% and 78% correct predictability, respectively • Wu, X.B., Smeins, F.E., 2000. Multiple-scale habitat modeling approach for rare plant conservation. – Studied availability and usefulness of available data to model habitat for eight rare plants in southern Texas on three different scales: regional, landscape and site – Developed the regional-scale model to predict plant distribution using coarse-scale GIS data, overlapping it with the plant distributions to create areas of likelihood – Created the site-scale model for field assessment in a study area. It combined existing literature with field investigation to determine soils, landform and vegetation. It proved the most effective but also the most costly. – Created the landscape-scale model by combining existing detailed datasets with ones developed by the researcher. It proved effective for generating spatial distributions of potential and present habitat suitability over large project areas. • Odom, R.H., Ford, W.M., Edwards, J.W., Stihler, C.W., Menzel, J.M., 2001. Developing a habitat model for the endangered Virginia northern flying squirrel in the Allegheny Mountains of West Virginia. – Many of the variables studied (elevation, landform index, surface curvature, slope, aspect and distance to coniferous forest cover) were the same as mine. – Statistical approach used non-parametric Wilcoxon tests between sites with known habitat, sites without and random points as well as logistic regression. Literature Review: Logistic Regression Modeling • Manly, B., 2001. Statistics for Environmental Science and Management. – Introduction of generalized linear models and the formula used to calculate it • Osborne, P.E., Alonso, J.C., Bryant, R.G., 2001. Modelling landscape-scale habitat use using GIS and remote sensing: a case study with great bustards. – Studied vegetation, terrain characteristics and human disturbance to determine bustard distribution – Performed multivariate analyses using forward stepwise logistic regression – The study correctly predicted 90.0% of occupied sites and 78.9% of unoccupied sites based on a 50% probability dichotomy • Everitt, B.S., Hothorn, T., 2006. A Handbook of Statistical Analyses Using R. – In depth discussion of logistic regression – Sample issues studied and R code Literature Review: Suitable Habitat Criteria • • • • • • • • • • • Geology – tertiary Eocene sands and quaternary Holocene sandy depositional sites. Soil – acid loamy sands in upland areas to acid loamy sands and clays in bottomland areas Vegetation – well-developed Post Oak Savannah woods, mixed pine/oak woods Forest Proximity – near small, natural openings in woodlands, rarely in open areas dominated by grases Drainage – within 600 feet of drainage areas representing natural disturbance Elevation – between 200 and 300 feet above sea level Slope – flat to gently sloping Aspect – no pre-existing documentation Precipitation – blooming is dependent on rainfall in April/Map and again in August/September Temperature – no pre-existing documentation References: – Pelchat, C., 2005. Spiranthes parksii Correll-Navasota Ladies’ Tresses. – Lea, W.A., 2005. USFWS Biological Opinion. Data Acquisition: Dependent Variable • NLT Locations – Dependent variable – GPS points provided by HDR Engineering – 670 points – Acquired using submeter Trimble GeoXT GPS – Collected between 2001-2006 Study Areas • Seven sites were chosen both for their number of points and their density of points • Boundaries of field surveys were not available, so the assumption was made to expand the study area 100 feet around the extent of the NLT points. Site Acreage NLT Points 1 6.71 62 2 91.77 148 3 653.71 331 4 53.4 14 5 39.46 20 6 13.57 37 7 468.36 17 Study Areas Site 3 • Dimensions of study areas were chosen to be divisible by the analysis resolution (30 meters) – 30 meter resolution was chosen to match the DEM and ASTER data resolution which accounted • 30 meter grids were generated for each study area using the Hawth’s Tools extension to ArcGIS Data Acquisition: Independent Variables Org. Scale Date Collected Format National Elevation Dataset USGS 30 meter various Grid Geology Geologic Atlas of Texas USGS 1:250,000 1961-1987 Shapefile http://www.tnris.state.tx.us/ Soil Soil Survey Geographic Database NRCS 1:24,000 Shapefile http://soildatamart.nrcs.usda.gov/ Vegetation The Vegetation Types of Texas TPWD 1:250,000 1972-1980 Shapefile http://www.tpwd.state.tx.us Satellite Imagery Land Processes Distributed Active Archive USGS 30 meter 2006 HDR-EOS http://edcdaac.usgs.gov Variable Source Elevation Website http://seamless.usgs.gov/ Data Acquisition: Elevation Data Acquisition: Geology Data Acquisition: Vegetation Data Acquisition: Soils Pre-processing • There were 23 different soil series types within the seven study areas – 23 classes was too many to expect to find significance between any one or two values and the NLT sites – These were first dissolved into their soil series and then dissolved into their taxonomic classes – Condensed into 11 classes (still high) Soil Classes Code Taxonomic Class Soil Series 1 Udic Paleustalfs Arol, Chazos, Shiro, Singleton 2 Udertic Paleustalfs Axtell 3 Chromic Vertic albaqualfs Boonville 4 Ultic paleustalfs Burlewash 5 Aquic paleustalfs Falba 6 Aquic udifluvents Hatliff 7 Lithic ustorthents Koether-Rock 8 Oxyaquic vertic paleustalfs Lufkin, Mabank, Tabor 9 Grossarenic paleustalfs Padina 10 Aquic arenic paleustalfs Robco 11 Aquic haplustepts Uhland 12 Water Data Acquisition: Soils Data Acquisition: Independent Variables Not Included Annual Precipitation Average Annual Temperature Data Creation: Slope & Aspect • Slope – Use Spatial Analyst extension to derive from DEM – 30 meter output resolution – Percentage units • Aspect – Use Spatial Analyst extension to derive from DEM – 30 meter output resolution – Reclassified into categories: Data Creation: Aspect Data Creation: Drainage Proximity Fill Sinks Stream Definition Flow Direction Flow Accumulation Stream Segmentation Euclidean Distance Data Creation: Drainage Proximity Data Creation: Land Cover • • Reclassified ASTER imagery using ENVI operations: Layerstack, NDVI, Principle Components, Band Ratios & Supervised Maximum Likelihood Classification Chose Unsupervised IsoData Classification with 6 classes and 3 iterations CIR 3-2-1 IsoData Classification Data Creation: Land Cover Data Creation: Forest Proximity – Reclassify ASTER images in ENVI using supervised classification Supervised Parallelepiped Reclassification NDVI-4-3 Data Creation: Forest Proximity (cont’d) • Convert the reclassed ENVI output to an ESRI grid • Convert the raster to polygon • Delete all non-forest polygons • Perform Euclidean Distance with an output cell size of 30 meters Data Creation: Forest Proximity Statistical Input Preparation • Framework: Convert the 30 meter study area grid cells to centroid points • Dependent Variables: Use a copy of the study area grids to assign an NLT presence value to the cells intersecting NLT points • Independent Variables: – Convert the floating point raster grids to integer • Multiply slope by 100 to preserve decimals – Convert raster to polygon Statistical Input Preparation (cont’d) • Code categorical variables (i.e., aspect, soil, geology, vegetation & land cover) • Combine all variables using the Intersect tool with the centroids as the primary input • Remove records containing habitat where NLT cannot grow (i.e., soil = water, land cover = cloud cover or water) • Export to a dbf table and convert to a csv file ID NLT ELEV SLOPE ASPECT DRAIN_PROX SOIL GEOL VEG LAND_COV FOREST_PROX 1 0 87 55 3 849 4 3 3 2 358 2 0 87 60 3 819 4 3 3 4 314 3 0 87 76 3 801 4 3 3 4 281 4 0 87 86 3 795 4 3 3 4 222 5 0 88 116 4 801 4 3 3 1 198 6 0 88 137 4 819 4 3 3 1 198 7 0 89 137 4 849 4 3 3 4 222 8 0 89 128 4 889 4 3 3 4 281 9 0 90 113 3 937 4 3 3 2 298 Statistical Analysis • Logistic regression predicts the probability of an event occuring, in this case, finding where NLT are likely to be found • It allows one binary dependent variable and multiple independent variables that do not have to match • Through a maximum likelihood estimation, logistic regression applies the following formula to estimate significance: Θ = e(α+β1χ1+ β2χ2+…+ βiχi) / 1+ e(α+β1χ1+ β2χ2+…+ βiχi) α is the constant of the equation and β is the coefficient of the predictor variables Statistical Analysis • The R statistical program was chosen to perform the statistical analysis • It “is an integrated suite of software facilities for data manipulation, calculation and graphical display” • Categorical variables were entered as factors so that each value would be considered independently • Use GLM (generalized linear model) • Plot residuals (i.e., the difference between the observed values of the response and the fitted values of the response) against fitted values of entire study area as well as each independent variable • Create output table of probability values to display in ArcGIS and compare to the known NLT locations Everitt, B.S., Hothorn, T., A Handbook of Statistical Analyses Using R. (Boca Raton: Chapman & Hall/CRC, 2006), 83-108. Statistical Analysis • Changes implemented to improve model: – Study areas condensed to remove unsurveyed area – Original three sites were broken into seven sites to further decrease unsurveyed area an increase point density – Decreasing the number of soil type classifications from 23 to 11 – A stepwise regression (removing nonsignificant variables) was tested but did not significantly improve output results Results: Independent Variabls SITE NLT Points VARIABLE Elevation Slope Aspect Drainage Proximity Soil Geology Vegetation Land Cover Forest Proximity 1 11 124-128 0.08-1.26 N, NW 1406-2039 9 Simsboro Post Oak Woods cloud 222-855 2 38 79-92 0.08-2.31 NE, E, SE 0-2049 4 Manning Post Oak Woods 2, 3 0-506 3 157 66-87 0.02-3.80 none 0-2150 4 Wellborn Post Oak Mosaic 2, 3, 4 0-937 4 13 79-86 0.08-1.90 E, SE, S 0-979 1, 4 Manning Post Oak Mosaic 2, 3, 4 0-444 5 9 72-82 0.12-1.29 E, SE, S 0-1124 1, 4 Manning Post Oak Mosaic 2, 4 0-579 6 8 72-77 0.05-1.49 E, SE 298-1056 4 Manning Other 1, 4 222-843 7 14 60-74 0.00-2.71 SE, S, SW 0-2982 8, 1 Fluviatile Terrace Post Oak Mosaic 1, 2 0-1645 Aspect Land Cover Results: Independent Variables • Elevation: – Range: 197-417 feet – Mean: 256 feet – SD: 39 feet • Slope: – Range: 0.05-3.80% – Mean: 1% – SD: 0.57% • Aspect: – Mode: SE, SW, S, W • Drainage Proximity: – Range: 0-8251 feet – Mean: 2953 feet – SD: 1870 feet • Soil: – Mode: Aquic Paleustalfs • 75% of NLT points • Geology: – Mode: Wellborn Formation • 60.8% of NLT points • Manning Formation 28.8% • Vegetation: – Mode: Post Oak Mosaic • 77% of NLT points • Land Cover: – Three most common in order: dense veg., medium veg. & sparse veg. • Forest Proximity: – Range: 0-3602 feet – Mean: 705 feet – SD: 725 feet Results: 7 Sites • Significant quantitative variables: – Elevation with negative relation – Drainage proximity with positive relation • Significant qualitative variables: – SE, S & SW aspects – SOIL5 (Aquic Udifluvents) – GEOL4 (Simsboro Formation) Statistical Analysis • Site three was chosen to perform an additional logistic regression run because it was the largest site with the largest number of NLT sites and the greatest amount of independent variability • The vegetation variable was removed because it only contained one value in site three and could not be included Results: Site 3 • Overall improved level of significance • Significant quantitative variables: – Elevation with negative relation – Drainage proximity with positive relation • Significant qualitative variables: – SE, S & SW aspects – SOIL6, 8 & 10 Results: Site 3 Plots and Charts Results: Site 3 Independent Variable Residual Plots Results: Probability Surface • Probability should extend from 0 to 1, but here it only extends to 0.33 Results: Probability Surface • Gray areas depict the top 30% of probability values • 65% of known NLT sites fall within these areas • The top 50% of probability values contains 87.6% of known NLT sites Conclusion • Most results reinforce pre-existing research (e.g., elevation, slope, forest proximity) • Some strongly differed (e.g., soil, geology, drainage proximity) • Statistical output is not significant enough yet for real world modeling – Further refinements needed • Dataset Applicability: – DEM, ASTER and the datasets derived from them as well as soils are suitable – Reserve vegetation and geology for statewide analysis Future Improvements • Compare qualitative values to mean instead of suppressed values • Include all known NLT points and incorporate others gathered by TPWD • Improve predictability of proximity drainage variable by lowering stream threshold value • Develop better methods of delineating vegetative types, which may be a very crucial variable • Smooth out surfaces and remove isolated pixels using a moving window to calculate local probability means Literary References • • • • • • • • • • • • Everitt, B.S., Hothorn, T., A Handbook of Statistical Analyses Using R. (Boca Raton: Chapman & Hall/CRC, 2006), 83-108. Lea, W.A., 2005. USFWS Biological Opinion. Austin: United States Department of the Interior. Manly, B., B. Statistics for Environmental Science and Management. (Boca Raton: Chapman & Hall/CRC, 2001), 92-95. Manning, C., 2007. Logistic regression (with R). Odom, R.H., Ford, WlM., Edwards, J.W., Stihler, C.W., Menzel, J.M., 2001. Developing a habitat model for the endangered Virginia northern flying squirrel in the Allegheny Mountains of West Virginia. Biological Conservation 99 (2001): 245252. Osborne, P.E., Alonso, J.C., Bryant, R.G., 2001. Modelling landscape-scale habitat use using GIS and remote sensing: a case study with great bustards. Pelchat, C., 2005. Spiranthes parksii Correll-Navasota Ladies’ Tresses. The McAllen International Orchid Society Journal 6 (3): 9-15. Sperduto, M.B., Congalton, R.G., 1996. Predicting Rare Orchid (Small Whorled Pogonia) Habitat Using GIS. Photogrammetric Engineering & Remote Sensing 62 (11): 1269-1279. Texas Parks and Wildlife, “Navasota Ladies’-tresses (Spiranthes parksii),” Texas Parks and Wildlife, http://www.tpwd.state.tx.us/huntwild/wild/species/navasolt/ Varnakovida, P., Mohamed, A., Witchakool, S. Spatial Pattern Analysis of Settlement Locations Using Logistic Regression. Venables, W.N., Smith, D.M. and the R Development Core Team, 1990. An Introduction to R. Wu, X.B., Smeins, F.E., 2000. Multiple-scale habitat modeling approach for rare plant conservation. Landscape and Urban Planning 51 (2000): 11-28. Data Source References • Land Processes Distributed Active Archive Center, “ASTER L1B Registered Radiance at the Sensor,” USGS, http://edcdaac.usgs.gov/aster/ast_l1b.asp • NRCS, “Soil Data Mart,” USDA, http://soildatamart.nrcs.usda.gov/ • Texas Parks and Wildlife, “The Vegetation Types of Texas,” Texas Parks and Wildlife, http://www.tpwd.state.tx.us/landwater/land/maps/gis/dat a_downloads/ • TNRIS, “Geologic Atlas of Texas”, University of Texas, Bureau of Economic Geology, http://www.tnris.state.tx.us/default.aspx • USGS, “DataPool @ LP DAAC,” Land Processes Active Archive Center, http://lpdaac.usgs.gov/datapool/datapool.asp • USGS, “The National Map Seamless Server,“ USGS, http://seamless.usgs.gov/ Questions?