Download Modeling Suitable Habitat for the Endangered Navasota Ladies

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Choice modelling wikipedia , lookup

Least squares wikipedia , lookup

Linear regression wikipedia , lookup

Data assimilation wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Predictive Habitat Suitability Model
for the Endangered
Navasota Ladies’-Tresses
Patrick Young
GISC 6389 Masters Project
May 8, 2008
Discussion Topics
• Subject background
– Introduce plant
– Describe surveying obstacles
• Project purpose
– Test model using public data
– Determine real world
applicability value
• Literature review
– Criteria: discuss suitable
habitat
– Habitat modeling
– Logistic regression analyses
• Data acquisition
– NLT data
– Defining study areas
– Discuss variable set 1
• Data creation
– Describe methodology
(programs, tools, etc)
– Discuss variable set 2
• Statistical input preparation
– Prepare input table of all
dependent and independent
variables
• Statistical analysis
– Run logistic regression in R
– Discuss changes made to
improve model
• Results & interpretation
– Present R results
– Present scatter plots &
histograms
• Conclusion
– Recommendation for real world
application
– Future improvements
Navasota Ladies’-tresses
Subject Background: Spiranthes parksii,
Navasota Ladies’-Tresses (NLT)
• Endangered native East Texas orchid
• Perennial herb, 8-15 inches tall that produces a
single spike of tiny white flowers
• Produces a basal rosette of leaves in early
spring that die before the plant flowers from midOctober to mid-November
•Texas Parks and Wildlife, “Navasota Ladies’-tresses (Spiranthes parksii),”
Texas Parks and Wildlife, http://www.tpwd.state.tx.us/huntwild/wild/species/navasolt/
Subject Background: Surveying
• Must be surveyed in certain counties to obtain
construction permits
• Can only be identified when in bloom but blooming
does not occur every year
• Often immediately eaten by wildlife
• Very similar in
appearance to
Spiranthes cernua
(not endangered)
parksii
cernua
Purpose
• To determine whether GIS and
statistical-based methods can be used
to accurately model NLT habitat at the
landscape-scale using publicly available
datasets
• Results from this study will help
determine how GIS and readily
available datasets can be best used to
aid field surveyors
Literature Review: NLT Habitat
• Pelchat, C., 2005. Spiranthes parksii
Correll-Navasota Ladies’ Tresses.
– Discusses rediscovery, range/habitat, and
morphology of NLT
• Lea, W.A., 2005. USFWS Biological
Opinion.
– Written to federal agencies to propose
changes to roadway plans to avoid NLT
impacts
– Describes NLT life history, population
distribution and status
Literature Review: Habitat Modeling
•
Sperduto, M.B., Congalton, R.G., 1996. Predicting Rare Orchid (Small Whorled
Pogonia) Habitat Using GIS.
– Determined if landscape-level characteristics were associated with known sites
– Utilized a GIS model to identify combinations of important habitat characteristics to
locate other existing habitat
– Utilized both equal weight and chi-square models and found 57% and 78% correct
predictability, respectively
•
Wu, X.B., Smeins, F.E., 2000. Multiple-scale habitat modeling approach for rare
plant conservation.
– Studied availability and usefulness of available data to model habitat for eight rare
plants in southern Texas on three different scales: regional, landscape and site
– Developed the regional-scale model to predict plant distribution using coarse-scale
GIS data, overlapping it with the plant distributions to create areas of likelihood
– Created the site-scale model for field assessment in a study area. It combined
existing literature with field investigation to determine soils, landform and
vegetation. It proved the most effective but also the most costly.
– Created the landscape-scale model by combining existing detailed datasets with
ones developed by the researcher. It proved effective for generating spatial
distributions of potential and present habitat suitability over large project areas.
•
Odom, R.H., Ford, W.M., Edwards, J.W., Stihler, C.W., Menzel, J.M., 2001.
Developing a habitat model for the endangered Virginia northern flying
squirrel in the Allegheny Mountains of West Virginia.
– Many of the variables studied (elevation, landform index, surface curvature, slope,
aspect and distance to coniferous forest cover) were the same as mine.
– Statistical approach used non-parametric Wilcoxon tests between sites with known
habitat, sites without and random points as well as logistic regression.
Literature Review: Logistic Regression
Modeling
• Manly, B., 2001. Statistics for Environmental Science and
Management.
– Introduction of generalized linear models and the formula used
to calculate it
• Osborne, P.E., Alonso, J.C., Bryant, R.G., 2001. Modelling
landscape-scale habitat use using GIS and remote sensing: a
case study with great bustards.
– Studied vegetation, terrain characteristics and human
disturbance to determine bustard distribution
– Performed multivariate analyses using forward stepwise
logistic regression
– The study correctly predicted 90.0% of occupied sites and
78.9% of unoccupied sites based on a 50% probability
dichotomy
• Everitt, B.S., Hothorn, T., 2006. A Handbook of Statistical
Analyses Using R.
– In depth discussion of logistic regression
– Sample issues studied and R code
Literature Review:
Suitable Habitat Criteria
•
•
•
•
•
•
•
•
•
•
•
Geology – tertiary Eocene sands and quaternary Holocene sandy
depositional sites.
Soil – acid loamy sands in upland areas to acid loamy sands and
clays in bottomland areas
Vegetation – well-developed Post Oak Savannah woods, mixed
pine/oak woods
Forest Proximity – near small, natural openings in woodlands, rarely
in open areas dominated by grases
Drainage – within 600 feet of drainage areas representing natural
disturbance
Elevation – between 200 and 300 feet above sea level
Slope – flat to gently sloping
Aspect – no pre-existing documentation
Precipitation – blooming is dependent on rainfall in April/Map and
again in August/September
Temperature – no pre-existing documentation
References:
– Pelchat, C., 2005. Spiranthes parksii Correll-Navasota Ladies’
Tresses.
– Lea, W.A., 2005. USFWS Biological Opinion.
Data Acquisition: Dependent Variable
• NLT Locations
– Dependent variable
– GPS points provided
by HDR Engineering
– 670 points
– Acquired using
submeter Trimble
GeoXT GPS
– Collected between
2001-2006
Study Areas
• Seven sites were
chosen both for
their number of
points and their
density of points
• Boundaries of field
surveys were not
available, so the
assumption was
made to expand the
study area 100 feet
around the extent of
the NLT points.
Site
Acreage
NLT Points
1
6.71
62
2
91.77
148
3
653.71
331
4
53.4
14
5
39.46
20
6
13.57
37
7
468.36
17
Study Areas
Site 3
• Dimensions of study
areas were chosen to be
divisible by the analysis
resolution (30 meters)
– 30 meter resolution was
chosen to match the DEM
and ASTER data
resolution which
accounted
• 30 meter grids were
generated for each study
area using the Hawth’s
Tools extension to
ArcGIS
Data Acquisition: Independent Variables
Org.
Scale
Date
Collected
Format
National Elevation
Dataset
USGS
30 meter
various
Grid
Geology
Geologic Atlas of Texas
USGS
1:250,000
1961-1987
Shapefile
http://www.tnris.state.tx.us/
Soil
Soil Survey Geographic
Database
NRCS
1:24,000
Shapefile
http://soildatamart.nrcs.usda.gov/
Vegetation
The Vegetation Types of
Texas
TPWD
1:250,000
1972-1980
Shapefile
http://www.tpwd.state.tx.us
Satellite
Imagery
Land Processes
Distributed Active Archive
USGS
30 meter
2006
HDR-EOS
http://edcdaac.usgs.gov
Variable
Source
Elevation
Website
http://seamless.usgs.gov/
Data Acquisition: Elevation
Data Acquisition: Geology
Data Acquisition: Vegetation
Data Acquisition: Soils Pre-processing
• There were 23 different soil
series types within the seven
study areas
– 23 classes was too many to
expect to find significance
between any one or two values
and the NLT sites
– These were first dissolved into
their soil series and then
dissolved into their taxonomic
classes
– Condensed into 11 classes
(still high)
Soil Classes
Code
Taxonomic Class
Soil Series
1
Udic Paleustalfs
Arol, Chazos, Shiro, Singleton
2
Udertic Paleustalfs
Axtell
3
Chromic Vertic albaqualfs
Boonville
4
Ultic paleustalfs
Burlewash
5
Aquic paleustalfs
Falba
6
Aquic udifluvents
Hatliff
7
Lithic ustorthents
Koether-Rock
8
Oxyaquic vertic
paleustalfs
Lufkin, Mabank, Tabor
9
Grossarenic paleustalfs
Padina
10
Aquic arenic paleustalfs
Robco
11
Aquic haplustepts
Uhland
12
Water
Data Acquisition: Soils
Data Acquisition:
Independent Variables Not Included
Annual Precipitation
Average Annual Temperature
Data Creation: Slope & Aspect
• Slope
– Use Spatial Analyst extension to derive from DEM
– 30 meter output resolution
– Percentage units
• Aspect
– Use Spatial Analyst extension to derive from DEM
– 30 meter output resolution
– Reclassified into categories:
Data Creation: Aspect
Data Creation: Drainage Proximity
Fill Sinks
Stream
Definition
Flow
Direction
Flow
Accumulation
Stream
Segmentation
Euclidean
Distance
Data Creation: Drainage Proximity
Data Creation: Land Cover
•
•
Reclassified ASTER imagery using ENVI operations: Layerstack, NDVI,
Principle Components, Band Ratios & Supervised Maximum Likelihood
Classification
Chose Unsupervised IsoData Classification with 6 classes and 3 iterations
CIR 3-2-1
IsoData
Classification
Data Creation: Land Cover
Data Creation: Forest Proximity
– Reclassify ASTER images in ENVI using
supervised classification
Supervised
Parallelepiped
Reclassification
NDVI-4-3
Data Creation: Forest Proximity (cont’d)
• Convert the reclassed ENVI output to
an ESRI grid
• Convert the raster to polygon
• Delete all non-forest polygons
• Perform Euclidean Distance with an
output cell size of 30 meters
Data Creation: Forest Proximity
Statistical Input Preparation
• Framework: Convert the 30 meter study
area grid cells to centroid points
• Dependent Variables: Use a copy of the
study area grids to assign an NLT
presence value to the cells intersecting
NLT points
• Independent Variables:
– Convert the floating point raster grids to
integer
• Multiply slope by 100 to preserve decimals
– Convert raster to polygon
Statistical Input Preparation (cont’d)
• Code categorical variables (i.e., aspect, soil,
geology, vegetation & land cover)
• Combine all variables using the Intersect tool
with the centroids as the primary input
• Remove records containing habitat where
NLT cannot grow (i.e., soil = water, land
cover = cloud cover or water)
• Export to a dbf table and convert to a csv file
ID
NLT
ELEV
SLOPE
ASPECT
DRAIN_PROX
SOIL
GEOL
VEG
LAND_COV
FOREST_PROX
1
0
87
55
3
849
4
3
3
2
358
2
0
87
60
3
819
4
3
3
4
314
3
0
87
76
3
801
4
3
3
4
281
4
0
87
86
3
795
4
3
3
4
222
5
0
88
116
4
801
4
3
3
1
198
6
0
88
137
4
819
4
3
3
1
198
7
0
89
137
4
849
4
3
3
4
222
8
0
89
128
4
889
4
3
3
4
281
9
0
90
113
3
937
4
3
3
2
298
Statistical Analysis
• Logistic regression predicts the probability of
an event occuring, in this case, finding where
NLT are likely to be found
• It allows one binary dependent variable and
multiple independent variables that do not
have to match
• Through a maximum likelihood estimation,
logistic regression applies the following
formula to estimate significance:
Θ = e(α+β1χ1+ β2χ2+…+ βiχi) / 1+ e(α+β1χ1+ β2χ2+…+ βiχi)
α is the constant of the equation and β is the
coefficient of the predictor variables
Statistical Analysis
• The R statistical program was chosen to perform the
statistical analysis
• It “is an integrated suite of software facilities for data
manipulation, calculation and graphical display”
• Categorical variables were entered as factors so that
each value would be considered independently
• Use GLM (generalized linear model)
• Plot residuals (i.e., the difference between the
observed values of the response and the fitted values
of the response) against fitted values of entire study
area as well as each independent variable
• Create output table of probability values to display in
ArcGIS and compare to the known NLT locations
Everitt, B.S., Hothorn, T., A Handbook of Statistical Analyses Using R.
(Boca Raton: Chapman & Hall/CRC, 2006), 83-108.
Statistical Analysis
• Changes implemented to improve
model:
– Study areas condensed to remove
unsurveyed area
– Original three sites were broken into seven
sites to further decrease unsurveyed area
an increase point density
– Decreasing the number of soil type
classifications from 23 to 11
– A stepwise regression (removing nonsignificant variables) was tested but did not
significantly improve output results
Results: Independent Variabls
SITE
NLT
Points
VARIABLE
Elevation
Slope
Aspect
Drainage
Proximity
Soil
Geology
Vegetation
Land Cover
Forest
Proximity
1
11
124-128
0.08-1.26
N, NW
1406-2039
9
Simsboro
Post Oak
Woods
cloud
222-855
2
38
79-92
0.08-2.31
NE, E,
SE
0-2049
4
Manning
Post Oak
Woods
2, 3
0-506
3
157
66-87
0.02-3.80
none
0-2150
4
Wellborn
Post Oak
Mosaic
2, 3, 4
0-937
4
13
79-86
0.08-1.90
E, SE, S
0-979
1, 4
Manning
Post Oak
Mosaic
2, 3, 4
0-444
5
9
72-82
0.12-1.29
E, SE, S
0-1124
1, 4
Manning
Post Oak
Mosaic
2, 4
0-579
6
8
72-77
0.05-1.49
E, SE
298-1056
4
Manning
Other
1, 4
222-843
7
14
60-74
0.00-2.71
SE, S,
SW
0-2982
8, 1
Fluviatile
Terrace
Post Oak
Mosaic
1, 2
0-1645
Aspect
Land Cover
Results: Independent Variables
• Elevation:
– Range: 197-417 feet
– Mean: 256 feet
– SD: 39 feet
• Slope:
– Range: 0.05-3.80%
– Mean: 1%
– SD: 0.57%
• Aspect:
– Mode: SE, SW, S, W
• Drainage Proximity:
– Range: 0-8251 feet
– Mean: 2953 feet
– SD: 1870 feet
• Soil:
– Mode: Aquic Paleustalfs
• 75% of NLT points
• Geology:
– Mode: Wellborn Formation
• 60.8% of NLT points
• Manning Formation 28.8%
• Vegetation:
– Mode: Post Oak Mosaic
• 77% of NLT points
• Land Cover:
– Three most common in order:
dense veg., medium veg. &
sparse veg.
• Forest Proximity:
– Range: 0-3602 feet
– Mean: 705 feet
– SD: 725 feet
Results: 7 Sites
• Significant quantitative
variables:
– Elevation with negative
relation
– Drainage proximity with
positive relation
• Significant qualitative
variables:
– SE, S & SW aspects
– SOIL5 (Aquic
Udifluvents)
– GEOL4 (Simsboro
Formation)
Statistical Analysis
• Site three was chosen to perform an
additional logistic regression run
because it was the largest site with the
largest number of NLT sites and the
greatest amount of independent
variability
• The vegetation variable was removed
because it only contained one value in
site three and could not be included
Results: Site 3
• Overall improved
level of significance
• Significant
quantitative variables:
– Elevation with
negative relation
– Drainage proximity
with positive relation
• Significant qualitative
variables:
– SE, S & SW aspects
– SOIL6, 8 & 10
Results: Site 3 Plots and Charts
Results: Site 3 Independent Variable
Residual Plots
Results: Probability Surface
• Probability should
extend from 0 to
1, but here it only
extends to 0.33
Results: Probability Surface
• Gray areas depict
the top 30% of
probability values
• 65% of known
NLT sites fall
within these areas
• The top 50% of
probability values
contains 87.6% of
known NLT sites
Conclusion
• Most results reinforce pre-existing research
(e.g., elevation, slope, forest proximity)
• Some strongly differed (e.g., soil, geology,
drainage proximity)
• Statistical output is not significant enough yet
for real world modeling
– Further refinements needed
• Dataset Applicability:
– DEM, ASTER and the datasets derived from them
as well as soils are suitable
– Reserve vegetation and geology for statewide
analysis
Future Improvements
• Compare qualitative values to mean instead
of suppressed values
• Include all known NLT points and incorporate
others gathered by TPWD
• Improve predictability of proximity drainage
variable by lowering stream threshold value
• Develop better methods of delineating
vegetative types, which may be a very crucial
variable
• Smooth out surfaces and remove isolated
pixels using a moving window to calculate
local probability means
Literary References
•
•
•
•
•
•
•
•
•
•
•
•
Everitt, B.S., Hothorn, T., A Handbook of Statistical Analyses Using R. (Boca
Raton: Chapman & Hall/CRC, 2006), 83-108.
Lea, W.A., 2005. USFWS Biological Opinion. Austin: United States Department of
the Interior.
Manly, B., B. Statistics for Environmental Science and Management. (Boca Raton:
Chapman & Hall/CRC, 2001), 92-95.
Manning, C., 2007. Logistic regression (with R).
Odom, R.H., Ford, WlM., Edwards, J.W., Stihler, C.W., Menzel, J.M., 2001.
Developing a habitat model for the endangered Virginia northern flying squirrel in
the Allegheny Mountains of West Virginia. Biological Conservation 99 (2001): 245252.
Osborne, P.E., Alonso, J.C., Bryant, R.G., 2001. Modelling landscape-scale
habitat use using GIS and remote sensing: a case study with great bustards.
Pelchat, C., 2005. Spiranthes parksii Correll-Navasota Ladies’ Tresses. The
McAllen International Orchid Society Journal 6 (3): 9-15.
Sperduto, M.B., Congalton, R.G., 1996. Predicting Rare Orchid (Small Whorled
Pogonia) Habitat Using GIS. Photogrammetric Engineering & Remote Sensing 62
(11): 1269-1279.
Texas Parks and Wildlife, “Navasota Ladies’-tresses (Spiranthes parksii),” Texas
Parks and Wildlife, http://www.tpwd.state.tx.us/huntwild/wild/species/navasolt/
Varnakovida, P., Mohamed, A., Witchakool, S. Spatial Pattern Analysis of
Settlement Locations Using Logistic Regression.
Venables, W.N., Smith, D.M. and the R Development Core Team, 1990. An
Introduction to R.
Wu, X.B., Smeins, F.E., 2000. Multiple-scale habitat modeling approach for rare
plant conservation. Landscape and Urban Planning 51 (2000): 11-28.
Data Source References
• Land Processes Distributed Active Archive Center,
“ASTER L1B Registered Radiance at the Sensor,”
USGS, http://edcdaac.usgs.gov/aster/ast_l1b.asp
• NRCS, “Soil Data Mart,” USDA,
http://soildatamart.nrcs.usda.gov/
• Texas Parks and Wildlife, “The Vegetation Types of
Texas,” Texas Parks and Wildlife,
http://www.tpwd.state.tx.us/landwater/land/maps/gis/dat
a_downloads/
• TNRIS, “Geologic Atlas of Texas”, University of Texas,
Bureau of Economic Geology,
http://www.tnris.state.tx.us/default.aspx
• USGS, “DataPool @ LP DAAC,” Land Processes Active
Archive Center,
http://lpdaac.usgs.gov/datapool/datapool.asp
• USGS, “The National Map Seamless Server,“ USGS,
http://seamless.usgs.gov/
Questions?