Download Full presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Choice modelling wikipedia , lookup

Forecasting wikipedia , lookup

Linear regression wikipedia , lookup

Data assimilation wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Overview
• SDMs: what and why?
SDMs: what and why?
• Types of models
• Challenges
– Messy data
– Difficult questions
Difficult questions
Species distribution models
GIS data
Predictor variables
Statistical model
Predicted distribution
Predicted distribution
Species data
Response variable
Response variable
- Also known as (sometimes with different meanings):
•
- Ecological Niche Model (ENM)
- Habitat model
-Geographic Information System
Climate envelope model
- Occupancy model
» Raster
- Resource Selection Function
» Pixel
» Grain
G i
- Correlative
C
l ti model
d l
Austin MP (1999) Ecography, 22, 465-484.
Species distribution models
GIS data
Predictor variables
Statistical model
Predicted distribution
Predicted distribution
Species data
Response variable
Response variable
• Geographic Information System
» Raster
» Pixel
» Grain
G i
Austin MP (1999) Ecography, 22, 465-484.
Example: marine fishes, NZ
• Catch data from 17000 marine research trawls for 120 demersal species
• For predicting to maps (conservation F
di ti t
(
ti
planning: marine reserves)
planning: marine reserves)
Black oreo dory Allocyttus niger
Response: catch/no
R
t h/ catch
t h
or kg catch
Predictors (covariates)
Typical fitted functions…
Black oreo dory Allocyttus niger
-4
0
500
--2
0
2
contrib. - 10.3 %
fitted functio
on
2
0
--2
0 2 4 6
1500
0
5
10
15
contrib. - 9.3 %
contrib. - 5.5 %
contrib. - 3.1 %
-0.4
0.0
SalResid
0.4
0.00
0.10
SSTGrad
0
-2
2
0
-2
2
0
-0.8
2
SusPartMatter
fitted fu
unction
AvgDepth
fitted fu
unction
TempResid
-2
fitted fu
unction
contrib. - 29.9 %
fitted functio
on
2
0
--2
fitted functio
on
Boe - 31.6 %
0
1
2
3
4
ChlaCase2
5
Typical predictions…
Predicting with SDMs
Geographic ((“BIOTOPE”)
BIOTOPE ) space
Predictions back in
geographic space
Environmental ((“NICHE”)
NICHE ) space
Explain
• Used for explanation, understanding, hypotheses.. Image by N. Zimmermann
Predicting with SDMs
• Many species (8.7 million?, most still Many species (8.7 million?, most still
undescribed Mora et al. PlosOne 2011)
• Threats: clearing, invasives, climate change
• Maps help: Leathwick et al. 2008 Cons Letters
Predicting to new times and places
Invasive species
Climate change
2000
2070
Predicting to new times and places
Invasive species
Climate change
P (Y=
=1)
2000
2070
Temperature
T
Types
off models
d l
Species
p
data
Modelling methods
GIS data
Predictor variables
Statistical model
Predicted distribution
Predicted distribution
Species data
Response variable
Response variable
ONLY use presence data
Envelope: BIOCLIM
Distance‐based:
Distance
based: DOMAIN, Euclidean distance.. DOMAIN Euclidean distance
temperature
e
‐
‐
rainfall
Modelling methods
Presence‐only data Presence
only data
+ “background”
+ background
‐ e.g. GARP, MaxEnt
Modelling methods
Regression‐type methods
Machine‐learning methods
GLMs – Generalized Linear Models
MaxEnt – Maximum Entropy
GAMs – Generalized Additive Models
GARP ‐ Genetic Algorithms for Rule‐based Predictions
SVMs ‐ Support Vector Machines
CART – Classification
Cl ifi i and Regression Trees
dR
i T
RFs ‐ Random Forests
BRTs ‐ Boosted Regression Trees
MARS – Multivariate Adaptive Regression p
g
Splines
ANNs ‐ Artificial Neural Network models
R: raster, dismo.. Ch ll
Challenge
#1
#1: messy d
data
Atlas of Living Australia (ala.org.au)
ala.org.au, GBIF etc
• Data aggregators
Data aggregators
• Now: make data available
• Future: updates / corrections / clean data sets – feedback, inform all users
ALA birds
1980-1990
2000 2013
2000-2013
Thanks to Alejandra Morán-Ordóñez for these images
Example biases…
• Globally:
Gl b ll wealth,
lth language,
l
location,
l
ti
security
it
• Geography
GBIF: Spatial
variation in number of records per km
• Proximity to roads, rivers,
towns
• Community groups, favourite spots
• In uncleared areas; within political boundaries
• Preferences: rare and unusual sightings
g
g
• Survey “design” - stratified sampling
• Detection biases hidden in here too..
2
Bias
• The problem:
– Model assumes presence data are random sample
– Presence data are biased
Presence data are biased
– Background data are by default unbiased
Effect of bias
Virtual species in Ontario Canada
Virtual species in Ontario, Canada
– prefers mid‐range of all climatic variables
PA model from biased presence/absence data
Presence‐absence model recovers species distribution
PO‐bg model from biased occurrence data
Model recovers sampling bias, not species distribution
Dealing with bias
• “target
target group background
group background”
– Use sites for a group of species to select background
– Effect of sampling bias cancels out
• make predictors (rasters) representing bias
– Use to weight background points or as predictors
Challenge #2:
Difficult applications
Extrapolating
Invasive species
Climate change
2000
2070
The cane toad
- native
ti range Americas
A
i
- introduced to Aus in 1935 to ‘control’ cane beetles
- now a major pest (poisonous to most native fauna,
no predators
predators, over 200 million in Australia)
The cane toad – mechanistic model
Kearney, Phillips et al 2008 Ecography 31: 423-434
Predictions from mechanistic model
The cane toad as a model organism
Kearney and Porter 2008 Ecology Letters 12
Mechanistic model predicts occurrence well
• strong inference on physiological limitations
f
h
l
ll
• “gold
gold standard
standard” (?)
(?)
Previous SDMs for the cane toad
Available data
Data
• 8 climatic
li ti variables
i bl
– rainfall, temperature and humidity
– now; future
f t
(4.6ºC
(4 6ºC warmer))
• 1932 presences (weighted…)
• 575 absences or 10000 background samples
Models
• GLM (generalized linear model; xx, x2)
• GAM – smooths up to 4df
• BRT (boosted regression trees)
• MaxEnt
Predicting potential in current climates
Mechanistic
Background
Weights
GAM
GLM
BRT
MaxEnt
Correlation with mechanistic model:
0.80
0.78
0.64
0.79
Predicting potential: current
GAM
Mechanistic
GLM
Background
sample
1. invasives: choice of “absence” matters
2. know when you’re extrapolating
Observed
absence
Future: 4.6° warmer
GAM
current
future
GLM
Mechanistic
BRT
M E t
MaxEnt
Controlling the fitted functions.. Mechanistic
BRT
2900 trees
150 trees (same
learning rate &
tree depth)
MaxEnt
only hinge features;
standard
t d d
regularization
3. how you fit model matters
only hinge features;
strong regularization
GLMs
GAMs
how to extrapolate?
Microsoft
www.eBird.org