Download Stochastic species distributions are driven by organism size

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bifrenaria wikipedia , lookup

Biogeography wikipedia , lookup

Occupancy–abundance relationship wikipedia , lookup

Toxicodynamics wikipedia , lookup

Ecology wikipedia , lookup

Molecular ecology wikipedia , lookup

Habitat wikipedia , lookup

Latitudinal gradients in species diversity wikipedia , lookup

Ecological fitting wikipedia , lookup

Theoretical ecology wikipedia , lookup

Transcript
Ecology, 94(3), 2013, pp. 660–670
Ó 2013 by the Ecological Society of America
Stochastic species distributions are driven by organism size
JANNE SOININEN,1,3 JENNI J. KORHONEN,2
1
AND
MISKA LUOTO1
Department of Geosciences and Geography, P.O. Box 64, FI-00014 University of Helsinki, Finland
2
Department of Environmental Sciences, P.O. Box 65, FI-00014 University of Helsinki, Finland
Abstract. The strengths of environmental drivers and biotic interactions are expected to
show large variability across organism groups. We tested two ideas related to the degree of
ecological determinism vs. stochasticity using a large data set comprising bacterio-, phyto-,
and zooplankton. We expected that (1) there are predictable, size-driven differences in the
degree to which planktonic taxa respond to different drivers such as water chemistry, biotic
interactions, and climatic variables; and (2) species distribution models show lowest predictive
performance for the smallest taxa due to the stochastic distributions of microbes. Generalized
linear models (GLMs), generalized additive models (GAMs), and generalized boosted
methods (GBMs) were constructed for 84 species to model their occurrence as a function of
eight predictors. Predictive performance was measured as the area under the curve (AUC) of
the receiver–operating characteristic plot and true skill statistic (TSS) using independent
model evaluation data. We found that the model performances were typically remarkably low
for all planktonic groups. The proportion of satisfactory models (AUC . 0.7) was lowest for
bacteria (11.1% of the models), followed by phyto- (24.2%) and zooplankton (38.1%). The
occurrences of taxa within all planktonic groups were related to climatic variables to a certain
degree, but bacteria showed the strongest associations with the climatic variables. Moreover,
zooplankton occurrences were more related to biotic variables than the occurrences of smaller
taxa, while phytoplankton occurrences were more related to water chemistry. We conclude
that the occurrences of planktonic taxa are highly unpredictable and that stochasticity in
occurrences is negatively related to the organism size perhaps due to efficient dispersal and fast
population dynamics among the smallest taxa.
Key words: bacteria; dispersal; generalized additive models; generalized boosted methods; Finland;
lakes; occurrence; plankton; trophic structure.
INTRODUCTION
Biological communities are assembled both by deterministic and stochastic processes (Chase and Leibold
2003). Here, we define deterministic processes as species
interactions and environmental filtering, and stochastic
processes as ecological drift and random dispersal. The
degree of ecological determinism has recently been
shown to vary substantially between small- and largebodied species (Farjalla et al. 2012). First, Farjalla et al.
(2012) documented that habitat filtering strengthened
with organism body size in aquatic communities. The
reasons for such an outcome may be that (1) larger taxa
have lower plasticity in their fundamental niches and
that (2) larger taxa are more selective in their dispersal
(Farjalla et al. 2012). Second, they showed that the
degree of species associations showed predictable
variation among different-sized taxa; macroinvertebrates and zooplankton tend to exhibit more frequent
exclusions among the species pairs than bacteria,
Manuscript received 11 May 2012; revised 10 October 2012;
accepted 22 October 2012. Corresponding Editor: J. H.
Grabowski.
3 E-mail: janne.soininen@helsinki.fi
660
perhaps indicating stronger competitive interactions
between the species among the larger taxa.
Often the interest is not on a whole biotic community,
but on a distribution of a single species. Communities
comprise multiple interacting species, which also tend to
respond independently to the range of abiotic variables.
Even if species interactions are important for the
distribution of species, species’ responses to abiotic
variables have been widely used for explaining and
predicting their distributions. In fact, most species
distribution models are still based solely on contemporary environmental covariates and the importance of
biotic interactions for the species distributions has been
investigated only recently (Araújo and Luoto 2007,
Preston et al. 2008, Gotelli et al. 2010). Biotic
interactions may, however, have a notable influence on
species occupancy even at large spatial scales (Gotelli et
al. 2010) and accounting for them can increase the
predictive power of the models.
As mentioned, the strength of biotic interactions and
habitat association are bound to show large variability
across organisms. We tested two ideas related to the
degree of ecological determinism vs. stochasticity using a
large data set comprising lake bacterio-, phyto-, and
zooplankton (Soininen et al. 2011, Soininen and Luoto
2012).
March 2013
STOCHASTICITY AND ORGANISM SIZE
We expected that there would be predictable, sizedriven differences in the degree to which planktonic
communities respond to different drivers. Planktonic
groups encompass a gradient in body size of over four
orders of magnitude, with size increasing in the order:
bacteria (length ;1–10 lm) , phytoplankton (;10–100
lm) , zooplankton (;100–2000 lm). We expected that
bacteria would show only a relatively weak relationship
to environmental variables because of their stochastic
distributions and that they would also be almost
immune to large-scale geographical variables due to
their efficient random dispersal (Fig. 1; Finlay 2002, Van
der Gucht et al. 2007, Soininen et al. 2011, Farjalla et al.
2012).
Phytoplankton, in turn, would show the strongest
relationship to environmental variables among the
planktonic groups. This agrees with an idea that, as
phytoplankton species occupy a low trophic level, they
are under relatively strict environmental control like the
other autotrophic organisms, as they need nutrients and
light for their production (Paszkowski and Tonn 2000,
Soininen et al. 2007). We further expected that
zooplankton would respond strongest of all planktonic
groups to food web interactions and geographical
variables. We expected the former results because
zooplankton graze on phytoplankton, and are thus
suggested to be more strongly regulated by food web
interactions (i.e., their energy comes directly from the
lower trophic level) than autotrophic organisms that
respond more readily to chemical variables (McQueen et
al. 1989). We expected the latter results because of
zooplankton’s larger size and less likely dispersal across
sites when compared with bacteria or phytoplankton,
which are likely to disperse passively more efficiently
across sites.
We also expected that the species distribution models
would show lowest explanatory and predictive power for
bacteria taxa, while the predictive performance would be
substantially higher for the phytoplankton and highest
for zooplankton. This prediction is based on the idea
that ecological determinism increases with the body size
of the organisms (Beisner et al. 2006, Shurin et al. 2009,
Farjalla et al. 2012) due to random dispersal and a short
life cycle, resulting in the smallest taxa having faster
population dynamics (Finlay 2002, Brown et al. 2004).
To our knowledge, these ideas have not been tested
before on the taxon level using large-scale field data
subject to natural variability in the driving forces of
community variation. The novelty of this approach is
due to the fact that it explicitly shows the taxon-specific
responses to driving abiotic and biotic variables, unlike
many multivariate statistics that fail to show how
summary statistics such as community similarity builds
up from taxon-specific responses. Furthermore, taxonspecific models are particularly valuable for providing
insights into the main drivers of species distributions
when range-limiting physiological factors for studied
species are poorly known (Heikkinen et al. 2006,
661
FIG. 1. The expected importance of water chemistry,
geographical variables, and biotic variables for the distribution
of bacteria, phytoplankton, and zooplankton taxa. The width
of the arrows reflects the expected relative importance of each
variable group in the species occurrence.
Buisson et al. 2010), which is particularly true for the
small taxa. In addition, taxon-specific species models
may provide more precise and realistic predictions than
those offered by species-assemblage models (Lavergne et
al. 2010), not least because species’ responses to main
environmental drivers are thought to be mainly individualistic (Elith et al. 2006).
MATERIALS
AND
METHODS
Study area
Plankton samples were collected from 100 small lakes
in Finland during July in 2008 or 2009. The latitudinal
gradient between the southernmost and northernmost
sites was .700 km. The sites were sampled in five
drainage basins and in 20 lakes per basin. We sampled
60 lakes in three drainage basins in 2008 and 40 lakes in
two drainage basins in 2009. We acknowledge that
between-year variation in environmental conditions may
increase the residual variation in the data that could not
be controlled, likely resulting in lower model performance due to interannual variation. However, sampling
of 100 lakes during a single summer was not possible due
to strong seasonality and substantial increase of withinyear variation in the data. Moreover, Shurin et al. (2007)
recently showed that daily zooplankton richness was
linearly related to annual richness in a set of lakes over
several years. We suggest, however, that studying the
influence of such interannual variation on model
performance would be an important task in the future.
The sampled drainage basins were (1) Vantaanjoki,
(2) Karjaanjoki, (3) Kokemäenjoki, (4) Upper Kymijoki, and (5) Koutajoki (Appendix A). These drainage
662
JANNE SOININEN ET AL.
basins were chosen because they cover a large geographical range and their nutrient concentrations vary from
ultraoligotrophic to highly eutrophic, thus representing
a wide productivity range (Appendix B). We admit that
the sampled lakes were clustered rather than evenly
distributed within our study area. This is mainly due to
practical reasons, i.e., planktonic samples including
bacteria were sampled near biological stations in order
to preserve the samples properly in freezers. The study
lakes were, however, highly representative of the
variability in water chemistry of the landscapes in this
region. We sampled only small lakes and ponds (mean
lake area was 0.05 km2) to ensure that plankton
sampling covered the site as best as possible. Most of
the lakes within the drainage basins were not readily
interconnected to each other via water routes. We lack
detailed data on fish communities in these lakes, but
based on visual detection as well as earlier documentations, we concluded that all lakes harbored fish.
Plankton sampling and sample processing
Plankton samples were collected with a tube sampler
(2.3 L) from three locations in the middle of the lake and
pooled in a plastic bucket, ensuring proper mixing. We
collected the samples in the middle of the lakes with a
boat in order to avoid benthic taxa from the littoral zone
from entering the samples. The samples were collected at
0.5 m below the surface of the water. Our sampling
protocol for bacteria followed Longmuir et al. (2007).
First, 250 mL of water was filtered through a 0.42-lm
pore nitrocellulose filter (diameter 25 mm; Millipore,
Durapore, Billerica, Massachusetts, USA) to remove
larger particles. Bacteria cells were then collected on a
0.22-lm pore nitrocellulose filter, which was frozen
immediately in the field. For phytoplankton, a sample of
0.5 L was fixed immediately with acid Lugol’s iodine
solution in the field. Zooplankton samples (6.15 L in
total) were collected on a 50-lm mesh and preserved
with formaldehyde in the field. We acknowledge that
our methodology for zooplankton does not detect as
many individuals as it detects for phytoplankton
because of the relatively limited amount of water filtered
for the samples. However, as the methods are strictly
standardized, we think that these are highly suitable for
the studies of taxon occurrences across the lakes.
The maximum depth of the lakes was measured in the
field. Lake area was measured using Geographic
Information System (MapInfo Version 9.5; Pitney
Bowes Software 2008). Conductivity was measured in
the field using a conductivity meter (PW 9529; Philips,
Eindhoven, The Netherlands) and samples for water
chemistry analyses were collected simultaneously with
the plankton samples.
In the laboratory, community composition of bacteria
was determined using molecular fingerprinting (terminal
restriction fragment length polymorphisms, T-RFLP;
see Soininen et al. [2011] for details). It is a popular
method for generating a fingerprint of an unknown
Ecology, Vol. 94, No. 3
microbial community. Although it may underestimate
the true number of bacteria taxa present, our consistent
use of the method allows us to investigate the
distribution patterns of bacteria taxa among the lakes.
The phytoplankton samples were sedimented using an
Utermöhl chamber (PhycoTech, St. Joseph, Michigan,
USA) and counted with a light microscope (magnification 4003). For each sample, 50 fields were counted,
typically detecting 200–500 specimens (individuals or
colonies). For zooplankton, all individuals (typically 50–
200 individuals per sample) were counted at a magnification of 125–4003 using an inverted microscope. Both
crustacean zooplankton and rotifers were included in the
counting. For phyto- and zooplankton, most individuals
were identified to species level. However, some of the
taxa (24.1% in phytoplankton and 24.6% in zooplankton) were identified to genus level only. We thus
acknowledge that for bacteria within the data sets, not
all taxa represent a species, but rather a genus or
perhaps even higher taxonomic groups. This may mask
some structure in the data only observable if specieslevel data would have been used. We think, however,
that this does not affect our results systematically as
there were no significant differences in the predictive
performance of the distribution models based on
species-level vs. genus-level data (results not shown).
Water samples were analyzed in the laboratory for
chlorophyll a, water color, total nitrogen, and total
phosphorus. For chlorophyll a, water was filtered using
a GF/C Whatman filter and analyzed spectrophotometrically after extraction with ethanol. Water color was
determined using a color comparator and nutrients
using QuikChem 8000 (Lachat, Loveland, Colorado,
USA).
Geographical variables
Multiple linear regression was used to relate mean
July temperatures (period 1971–2000) to latitude,
longitude, and altitude of each study lake, downscaling
climate data from 10 3 10 km resolution grid (Finnish
Meteorological Institute, Helsinki, Finland; Venalainen
and Heikinheimo 2002) to lake-level. For the regression
models, r 2 ¼ 0.94 and 0.97. Annual precipitation was
downscaled to a 1-km2 grid by using kriging interpolation (for details, see van der Linden and Christensen
2003).
To provide productivity estimates for each lake
catchment, normalized difference vegetation index
(NDVI) values were obtained from existing Landsat
ETM databases at the resolution of 30 m (see details of
the data processing Soininen and Luoto 2012). NDVI is
the most frequently used parameter for quantifying the
productivity and aboveground biomass of ecosystems
(Tucker 1979). It is based on the strong absorption of
incident radiation by chlorophyll in the red (ETM3), and
the contrasting high reflectance by plant cells in the
near-infrared (ETM4 ) spectral region. Because it is
based on the normalized ratio of the reflectance in these
March 2013
STOCHASTICITY AND ORGANISM SIZE
two spectral regions, it is an indicator of the greenness of
vegetation canopies, which enables separation of vegetation from other land cover. NDVI measures were
generated from a geocorrected Landsat ETMþ satellite
images. The image was acquired in the years 1999–2002
to coincide with the middle of the growing season.
Maximum of the NDVI values were calculated for each
lake. Maximum NDVI is more related to the highest
potential productivity (Parviainen et al. 2010), and it
showed high correlation between species richness of
phytoplankton and zooplankton in the same study lakes
(Soininen and Luoto 2012). We conducted the NDVI
analyses using three different radii (50 m, 100 m, and
500 m) around each lake. Use of these relatively small
bands around the lakes ensured that there was no
substantial overlap in the catchments even for the lakes
that were located close to each other. However, we
expected that the use of the smallest radius would better
clarify the direct effects, e.g., of leaves on lake water
conditions (input of nutrients and organic material in
the water), while the largest radius reflects the overall
productivity of the neighboring landscape (Soininen and
Luoto 2012).
Biotic variables
For explaining the distribution of bacteria, we used
zooplankton number and phytoplankton biomass (chlorophyll a) as biotic variables. This is because bacteria
can assimilate the other planktonic organisms in the
water obtaining organic material and nutrients and
because other planktonic taxa provide microhabitats for
bacteria (reviewed by Cole 1982). The amount of
planktonic material in the water may thus affect the
distribution of bacterial taxa across the lakes since
bacteria also show among-taxon differences in the rates
they use resources and grow, similar to other planktonic
organisms (Yokokawa and Nagata 2005). For phytoplankton, we used zooplankton number and zooplankton richness as predictors that reflect both the number of
consumers as well as their range in traits (i.e.,
zooplankton richness reflects the number of different
grazers in terms of morphology and body sizes thus
indicating the fill of niche space). These variables were
included since the presence of predators can strongly
influence the abundance, distribution, and range limits
of prey species in a wide range of ecosystems (Estes et al.
2011). Among phytoplankton, e.g., filamentous bluegreen algal species may be relatively immune to grazing,
while some single-celled taxa may be more vulnerable to
grazing by zooplankton. We also conducted the models
using only the cladocerans as predators for phytoplankton because they are known to be efficient grazers on a
wide range of prey (Leibold 1989, Bertilsson et al. 2003).
For zooplankton, we used phytoplankton biomass and
phytoplankton richness in order to measure the amount
of food particles available, as well as the range in
morphology and size of the food particles. These
variables were included because the dependence of
663
animals on plants is often an important determinant of
species distributions at broad spatial extents (reviewed
by Wisz et al. 2012).
Species distribution models
The data comprised presence records of each of 407
taxa from 100 lakes, from which 12 bacteria, 51
phytoplankton, and 21 zooplankton taxa with 10 or
more records were used in the analyses. We made the
assumption that the absence of a record from a lake
corresponded to a true absence of the taxa (Guisan and
Hofer 2003).
Three statistical methods were used to model plankton species distributions: generalized linear models
(GLMs), generalized additive models (GAMs), and
generalized boosted methods (GBMs). These techniques
are widely used to model species distributions and are
capable of modeling nonlinear functions (Elith et al.
2006, Franklin 2009). This allowed a comparison of the
methods’ predictive ability and a quantification of
uncertainties deriving from the choice of the modeling
approach (Heikkinen et al. 2006).
The data set was randomly split into calibration (i.e.,
training; 70% of records) and validation (i.e., testing;
remaining 30% of records) data sets. We included the
following variables known to be highly influential for
the distribution of planktonic taxa in the models
(Soininen et al. 2011): conductivity, water color, total
phosphorus, maximum NDVI, July temperature, mean
annual precipitation, chlorophyll a (for bacteria and
zooplankton), phytoplankton richness (for zooplankton), zooplankton richness (for phytoplankton), and
zooplankton number (for bacteria and zooplankton).
GLMs are mathematical extensions of linear models
that do not force data into unnatural scales; they allow
for nonlinearity and nonconstant variance (heteroscedasticity) structures in the data. Here, the model
calibration was performed utilizing the statistical package R version 2.13 (R Development Core Team 2011),
with standard glm function. The probability of curvilinear relationships between the explanatory and dependent variables was examined by including the quadratic
terms of the predictors in the models (Crawley 2007).
Quadratic terms were always kept in the GLMs
regardless of their statistical significance. Both GLMs
and GAMs were fitted under the assumption of a
binomial distribution of errors and applying a logit link
function.
GAMs are a nonparametric extension of GLMs that
use smoothers to estimate the form of the relationship
between a response variable and a set of predictor
variables (Yee and Mitchell 1991). GAMs were fitted
using the mgcv package (Wood 2006, 2011) of the R (R
Development Core Team 2011), which calculates the
extent to which the degrees of smoothness for each
smoother can be lowered during model fitting without
significantly reducing model fit. GAMs were fitted
setting the initial degrees of smoothness for each
664
JANNE SOININEN ET AL.
TABLE 1. Mean values of area under the curve (AUC) and true
skill statistic (TSS) for the calibration and evaluation data
sets for three planktonic groups using generalized linear
models (GLM), generalized additive models (GAM), and
generalized boosted methods (GBM).
Calibration
Evaluation
Model
and group
AUC
TSS
AUC
TSS
GLM
Bacteria
Phytoplankton
Zooplankton
0.891
0.877
0.891
0.722
0.674
0.686
0.616
0.645
0.646
0.203
0.170
0.127
GAM
Bacteria
Phytoplankton
Zooplankton
0.913
0.900
0.899
0.760
0.728
0.716
0.580
0.627
0.653
0.022
0.157
0.149
GBM
Bacteria
Phytoplankton
Zooplankton
0.929
0.895
0.930
0.738
0.675
0.712
0.658
0.648
0.693
0.128
0.155
0.231
univariate term at four to avoid overly complex models
and potential overfitting.
GBM is a machine learning method that estimates the
form of the relationship between a response variable and
its predictors without a priori specification of a data
model (De’ath 2007, Elith et al. 2008). This technique
estimates a large numbers of simple models that are
combined to form a final model optimized for prediction, using cross-validation for model building. We used
R (R Development Core Team 2011) with functions
from the gbm and dismo packages (Ridgeway 2006,
Elith et al. 2008) to fit BRTs, setting interaction depth to
six, learning rate to a maximum of 0.001 (lower for
species where inadequate trees were calculated), and
bagging fraction to 0.75.
All model parameters were first estimated from the
calibration data, and then used to predict the occurrence
of species in model validation data set. The model
performance (discrimination ability) was determined by
calculating the area under the curve of a receiver
operating characteristic plot (AUC; Fielding and Bell
1997) and the true skill statistic (TSS; Allouche et al.
2006) using the model validation data set. AUC
provides an assessment of the agreement between the
observed presence/absence records over a range of
probability thresholds above which the model predicts
presence (Fielding and Bell 1997). TSS is a synthetic
index that takes into account sensitivity (ability to
identify taxon presence) and specificity (ability to
identify absence), and that is not sensitive to prevalence
(Allouche et al. 2006). In GLMs and GAMs, variable
importance was calculated as each variable’s drop
contribution (i.e., change in deviance associated with
exclusion of a given variable from a model containing all
the other variables). In the GBMs, Friedman’s (2001)
method was used to calculate the relative importance of
predictors, based on the frequency of variable selection
during model building and the reduction in deviance
Ecology, Vol. 94, No. 3
associated with each variable’s inclusion in models. For
both methods, the variables’ contributions were scaled
to sum to 100, with higher numbers indicating stronger
influence on the response variable. For each model, a
threshold value for the conversion of the continuous
probability values for occurrence into binary predictions
was selected to maximize sensitivity and specificity (Liu
et al. 2005).
Using the same set of explanatory variables in all
species models improved the comparability of the model
outputs. Additionally, we chose not to use stepwise
multiple regression because of issues including bias in
parameter estimation, inconsistencies among model
selection algorithms, an inherent problem of multiple
hypothesis testing, and an inappropriate focus or
reliance on a single best model (e.g., Whittingham et
al. 2006).
The possible differences among the planktonic groups
in the proportion of taxa of which occurrences could be
at least satisfactorily (AUC . 0.7 and TSS . 0.3;
Heikkinen et al. 2012) predicted by the models were
tested using generalized linear mixed model (GLMM),
where model type (i.e., GLM, GAM, or GBM) was
entered as a random factor and planktonic group as a
fixed factor. GLMMs were conducted using library lmer
in R using binomial response variable (R Development
Core Team 2011).
RESULTS
Overall, the AUC and TSS values of the 252
individual models (84 taxa, three modeling techniques)
based on model evaluation data set were on average 0.65
and 0.15, respectively. Model performance was highest
for zooplankton (mean AUC ¼ 0.66, mean TSS ¼ 0.17),
followed by phytoplankton (AUC ¼ 0.64 and TSS ¼
0.16) and bacteria (AUC ¼ 0.62 and TSS ¼ 0.12). See the
model performance for each species group and modeling
technique based on model calibration and evaluation
data sets in Table 1 and Appendix C.
For all planktonic groups, taxon occurrences were,
overall, poorly explained by the model variables (Fig. 2).
For example, clearly most bacteria taxa only showed
low predictability (i.e., AUC , 0.7, TSS , 0.3), whereas
the proportion of high-quality bacteria models was
negligible. According to GLMM, planktonic groups
showed significant differences (AUC, P ¼ 0.008, v2 ¼
9.49; TSS, P ¼ 0.04, v2 ¼ 6.433) in the proportion of taxa
of which occurrences were at least satisfactorily predicted by the models. The proportion of satisfactory models
based on AUC values was lowest for bacteria (11.1% of
the models), followed by phyto- (24.2%) and zooplankton (38.1%). All between-group differences in the
proportion of satisfactory models based on AUC values
were significant at P , 0.05. Qualitatively similar results
emerged for TSS.
We found that intercorrelations between physicochemical and geographical variables were relatively
weak (all r , 0.4), and thus, likely did not confound
March 2013
STOCHASTICITY AND ORGANISM SIZE
665
FIG. 2. The percentage of taxa showing low (AUC , 0.7, TSS , 0.3), intermediate (AUC 0.7–0.9, TSS 0.3–0.6), and high
(AUC . 0.9, TSS . 0.6) values of the area under the curve (AUC) and the true skill statistic (TSS), indicating predictive
performance for generalized linear models (GLM), generalized additive models (GAM), and generalized boosted methods (GBM)
for each planktonic group.
our main conclusions (Appendix D). According to all
three modeling approaches, the occurrences of bacteria
taxa were best explained both by large-scale factors
related to the productivity of the sites (July temperature
and mean annual precipitation), as well as by algal
biomass (Fig. 3). The responses of bacteria occurrences
towards many of the variables were overall, however,
very low. The occurrences of phytoplankton taxa were
best explained by water chemistry variables (especially
color and total P; see Appendix E for an example), as
well as by mean annual precipitation. For GLM,
occurrences were also related to zooplankton richness.
When only the cladocerans were used as zooplankton
predators for the phytoplankton, the number of
individuals of cladocerans showed stronger importance
on phytoplankton occurrence than the cladoceran
richness (Appendix C). For GAM, the importance of
number of individuals of cladocerans were significantly
higher than the importance of individuals of full
zooplankton community on phytoplankton occurrence
(paired t test, P ¼ 0.027). The best explaining variables
for the occurrences of zooplankton taxa included water
color, phytoplankton richness, and algal biomass (see
Appendix F for an example). For GBM, occurrences
were typically also related to phytoplankton richness
and precipitation.
DISCUSSION
The results largely concurred with our initial expectations about the main drivers of taxon occurrences for
the different trophic levels, as well as with the prediction
that the performance of the distribution models would
show a consistent relationship to organism size. We
showed that the stochasticity in the taxon distributions
was largest (i.e., predictive performance of the models
was lowest) among smallest taxa, i.e., for bacteria, as we
initially expected based on an idea that ecological
determinism decreases with decreasing organism size
(Shurin et al. 2009, Farjalla et al. 2012). In general, this
probably underscores the efficient dispersal and fast
population dynamics for the smallest taxa (Finlay 2002,
Brown et al. 2004). We further illustrated that the main
drivers of taxon occurrences showed substantial differences among the planktonic groups; for example, the
occurrences of phytoplankton taxa were related to local
water chemistry as we predicted based on earlier studies
666
JANNE SOININEN ET AL.
Ecology, Vol. 94, No. 3
FIG. 3. Boxplots of the variable importance for bacteria (n ¼ 12) and phyto- (n ¼ 51) and zooplankton (n ¼ 21) taxa in the
generalized linear models (GLM), generalized additive models (GAM), and generalized boosted methods (GBM). Variable
abbreviations are: cond, conductivity; totP, total phosphorus concentration; ndvimax, maximum normalized difference vegetation
index; jult, July temperature; annp, annual precipitation; chla, chlorophyll a concentration; zoonumber, number of zooplankton
individuals; zoorich; zooplankton richness; and phytorich, phytoplankton richness. The lines in the boxes show medians, box ends
are quartiles, whiskers show 95th percentiles, and dots show outliers. For GLMs and GAMs, variable importance was calculated as
the change in deviance associated with excluding that variable from a model containing all other variables. In the GBMs, variable
importance was based on the frequency of variable selection during model building and reduction in deviance associated with each
variable’s inclusion in models.
March 2013
STOCHASTICITY AND ORGANISM SIZE
from the same study area (Soininen et al. 2007), while
zooplankton also responded to biotic variables (i.e., to
both richness of the algae reflecting the variation in size
and morphology of the food particles, as well as to the
total amount of algal material in the water) as
highlighted in the literature elsewhere (McQueen et al.
1989). Nonetheless, groups also showed some consistencies in their responses to different drivers given that,
perhaps surprisingly, the occurrences of all planktonic
groups were related to climatic variables to a certain
degree.
We found that the predictive performance of the
taxon-specific models were related to organism size in a
consistent manner, as the number of taxa that showed
intermediate or high predictability increased from
bacteria to zooplankton. This probably reflects the high
passive dispersal ability of the smallest taxa, thus
showing more or less stochastic distributions across
sites (Finlay 2002) and also their short life cycle,
resulting in fast population dynamics (Brown et al.
2004). It is possible that efficiently dispersing taxa may
be temporarily present at sites that are not environmentally favorable for them due to the strong influx of
colonists from other sites. These taxa may thus occupy
both source and sink habitats, perhaps reflected as
overall weak relationships between species composition
and environment especially in bacteria. Although
bacteria occurrences were related to some of the
estimated environmental variables (such as to temperature and precipitation), overall predictive performance
of the models was low due to the weak relationships to
many included variables. This relatively weak species
sorting in bacteria is consistent with some other recent
findings from the lakes (Langenheder and Ragnarsson
2007, Östman et al. 2010), but contrasts with an idea
that efficiently dispersing taxa may be able to track
favorable abiotic conditions (Borcard et al. 2004). We
admit here that some of the fine-scale responses of
bacteria taxa may have been masked because of the
relatively coarse molecular fingerprinting method applied here, which likely lumped many bacteria taxa
together. For example, it has been shown that pyrosequencing of bacteria may reveal many more bacteria
taxa present than traditional fingerprinting methods,
such as T-RFLP that was applied here (Wang et al.
2012). Nonetheless, the main community patterns
studied by high-throughput sequencing typically resemble those found for more traditional fingerprinting
method (Wang et al. 2012). We also note that the
predictive performances of the models could have been
higher overall if more local environmental variables had
been included in the models. For example, detailed
measures of dissolved or particulate organic carbon
might have been useful for modeling the distribution of
bacteria. Likewise, inclusion of concentrations of
inorganic fraction of nutrients in addition to total
concentrations might have increased the predictive
performance of phytoplankton models. Due to the
667
model simplicity and reliable among-taxon comparisons, the inclusion of a relatively small and similar
number of key variables known to be important for each
group of predictors (i.e., local abiotic variables, biotic
variables, and geographical variables) was, however, an
essential step in the analyses.
Compared with the recent findings from other
organism groups such as butterflies and plants (Heikkinen et al. 2012), the AUC values of present distribution
models were exceptionally low. In fact, for virtually all
planktonic taxa, predictive performance of the models
was either intermediate or low, with few exceptions. This
mirrors either the high stochasticity of the planktonic
distributions in general or the fact that some of the
important variables were left unmeasured in the field. In
addition, it may well be that the snapshot field
measurements of the local environmental variables did
not always accurately reflect the conditions really
encountered by the planktonic taxa exhibiting fast
population dynamics. For example, Crump et al.
(2003) documented that the fast changes in the cell
densities for many of the transient bacteria taxa were
driven by the temporal variation of organic matter
supply. It is therefore possible that, e.g., the use of the
environmental data collected over longer periods of time
might provide models with better predictive performance.
Even if model performance was relatively low overall,
planktonic groups showed clear responses to some of the
drivers. In the light of earlier literature (e.g., Beisner et
al. 2006, Langenheder and Ragnarsson 2007), the
responses of lake bacterio- and phytoplankton that we
found to variables that reflect the large-scale variation in
climate and potential productivity such as air temperature and precipitation was somewhat unexpected.
Many earlier studies have suggested that local variables
such as nutrient supply or the amount of organic
material in the water would be the main drivers of
community structure in unicellular planktonic taxa
(Beisner et al. 2006, Langenheder and Ragnarsson
2007). The present finding of the importance of climatic
variables, e.g., on bacteria distribution is, however, in
line with Van der Gucht et al. (2007) who identified
water temperature as one of the main drivers of bacteria
composition in the lakes in Belgium, the Netherlands,
and Denmark. We further showed that the distributions
of lake bacteria taxa were related to actual plant
biomass production both within the lakes (algal
biomass) as well as plant biomass in the catchment area
(maximum NDVI). While the former finding may reflect
the responses of bacteria to directly usable organic
material in the water, the latter may be related to their
stronger response to landscape level or climatic variables
than traditionally believed. This also suggests that much
of the organic material available for bacteria possibly
originate from the catchment area. Nonetheless, these
findings add to the evidence that bacteria distributions
are driven jointly by regional and local forces just as the
668
JANNE SOININEN ET AL.
distributions of any larger organisms are (reviewed by
Martiny et al. 2006). We highlight that the present
findings are novel, as our model statistics were built up
from taxon-specific responses to both environmental
and biotic variables, while many multivariate statistical
approaches commonly applied in similar studies fail to
show in detail the responses of each taxa to such
variables. Taxon-specific species models may provide
more precise and realistic predictions because species’
responses to main environmental drivers are typically
individualistic (Elith et al. 2006); these individualistic
responses were clearly detected towards range of
environmental and biotic variables in our data (see
Appendices E and F).
We applied a conceptually simple view that classifies
processes as either deterministic or stochastic and
therefore could partition the influence of these processes
on distribution of species additively, not considering
their interactive component. Although this view facilitated the clear conclusions about how organism size
could affect the relative magnitude of stochastic and
deterministic processes in plankton, we note that these
processes may not be mutually exclusive, but we note
that these processes can also jointly mediate species
distributions (Coulson et al. 2004). For example, a
stochastic dispersal event is first needed for a planktonic
species to establish a population in a lake. We
emphasize, though, that there likely are systematic
differences in the number of dispersal events among
the planktonic groups as, e.g., bacteria are more likely to
disperse to a lake passively via air than, for example,
zooplankton, which often need active agents such as
birds or fish for their dispersal.
We thus conclude that the stochastic species distributions increase with decreasing organism size in plankton,
as illustrated by the poor predictive performance of the
distribution models for bacteria. Although planktonic
groups showed variable responses to interactions from
other trophic levels as well as to climatic and local
environmental variables, these results highlight the need
to consider not only the local abiotic variables, but also
the biotic interactions and climatic variables explicitly
even for the smallest taxa when predicting their
distributions in a changing environment.
ACKNOWLEDGMENTS
We thank Jyrki Eskelinen, Lucilla Galata, Johanna Karhu,
and Adrien Vetterli for assistance both in the field and in the
laboratory, and Peter C. le Roux for writing the R code utilized
in the analyses. We also thank two anonymous reviewers for
their constructive comments on an earlier draft of the
manuscript.
LITERATURE CITED
Allouche, O., A. Tsoar, and R. Kadmon. 2006. Assessing the
accuracy of species distribution models: prevalence, kappa
and the true skill statistic (TSS). Journal of Applied Ecology
43:1223–1232.
Araújo, M. B., and M. Luoto. 2007. The importance of biotic
interactions for modelling species distributions under climate
change. Global Ecology and Biogeography 16:743–753.
Ecology, Vol. 94, No. 3
Beisner, B. E., P. R. Peres-Neto, E. S. Lindström, A. Barnett,
and M. L. Longhi. 2006. The role of environmental and
spatial processes in structuring lake communities from
bacteria to fish. Ecology 87:2985–2991.
Bertilsson, S., L.-A. Hansson, W. Graneli, and A. Philibert.
2003. Size-selective predation on pelagic microorganisms in
Arctic freshwaters. Journal of Plankton Research 25:621–631.
Borcard, D., P. Legendre, C. Avois-Jacquet, and H. Tuomisto.
2004. Dissecting the spatial structure of ecological data at
multiple scales. Ecology 85:1826–1832.
Brown, J. H., J. F. Gillooly, A. P. Allen, Van M. Savage, and
G. B. West. 2004. Toward a metabolic theory of ecology.
Ecology 85:1771–1789.
Buisson, L., W. Thuiller, S. Lek, P. Lim, and G. Grenouillet.
2010. Climate change hastens the turnover of stream fish
assemblages. Global Change Biology 14:2232–2248.
Chase, J. M., and M. A. Leibold. 2003. Ecological niches.
Linking classical and contemporary approaches. University
of Chicago Press, Chicago, Illinois, USA.
Cole, J. J. 1982. Interactions between bacteria and algae in
aquatic ecosystems. Annual Review in Ecology and Systematics 13:291–314.
Coulson, T., P. Rohani, and M. Pascual. 2004. Skeletons, noise
and population growth: the end of an old debate? Trends in
Ecology and Evolution 19:359–364.
Crawley, M. J. 2007. The R book. John Wiley and Sons,
Chichester, UK.
Crump, B. C., G. W. Kling, M. Bahr, and J. E. Hobbie. 2003.
Bacterioplankton community shifts in an arctic lake correlate
with seasonal changes in organic matter source. Applied and
Environmental Microbiology 69:2253–2268.
De’ath, G. 2007. Boosted trees for ecological modeling and
prediction. Ecology 88:243–251.
Elith, J., et al. 2006. Novel methods improve prediction of
species’ distributions from occurrence data. Ecography
29:129–151.
Elith, J., J. R. Leathwick, and T. Hastie. 2008. A working guide
to boosted regression trees. Journal of Animal Ecology
77:802–813.
Estes, J. A., et al. 2011. Trophic downgrading of planet earth.
Science 333:301–306.
Farjalla, V. F., D. S. Srivastava, N. A. C. Marino, F. D.
Azevedo, V. Dib, P. M. Lopes, A. S. Rosado, R. L. Bozelli,
and F. A. Esteves. 2012. Ecological determinism increases
with organism size. Ecology 93:1752–1759.
Fielding, A. H., and J. F. Bell. 1997. A review of methods for
the assessment of prediction errors in conservation presence/
absence models. Environmental Conservation 24:38–49.
Finlay, B. J. 2002. Global dispersal of free-living microbial
eukaryote species. Science 296:1061–1063.
Franklin, J. 2009. Mapping species distributions: spatial
inference and prediction. Cambridge University Press, Cambridge, UK.
Friedman, J. H. 2001. Greedy function approximation: A
gradient boosting machine. Annals of Statistics 29:1189–1232.
Gotelli, N. J., G. R. Graves, and C. Rahbek. 2010. Macroecological signals of species interactions in the Danish
avifauna. Proceedings of the National Academy of Sciences
USA 107:5030–5035.
Guisan, A., and U. Hofer. 2003. Predicting reptile distributions
at the mesoscale: relation to climate and topography. Journal
of Biogeography 30:1233–1243.
Heikkinen, R. K., M. Luoto, M. B. Araújo, R. Virkkala, W.
Thuiller, and M. Sykes. 2006. Methods and uncertainties in
bioclimatic envelope modelling under climate change. Progress in Physical Geography 30:751–777.
Heikkinen, R. K., M. Marmion, and M. Luoto. 2012. Does the
interpolation accuracy of species distribution models come at
the expense of transferability? Ecography 35:276–288.
Langenheder, S., and H. Ragnarsson. 2007. The role of
environmental and spatial factors for the composition of
aquatic bacterial communities. Ecology 88:2154–2161.
March 2013
STOCHASTICITY AND ORGANISM SIZE
Lavergne, S., N. Mouquet, W. Thuiller, and O. Ronce. 2010.
Biodiversity and climate change: Integrating evolutionary
and ecological responses of species and communities. Annual
Review of Ecology, Evolution, and Systematics 41:321–350.
Leibold, M. 1989. Resource edibility and the effects of
predators and productivity on the outcome of trophic
interactions. American Naturalist 134:922–949.
Liu, C., P. M. Berry, T. P. Dawson, and R. P. Pearson. 2005.
Selecting thresholds of occurrence in the prediction of species
distributions. Ecography 28:385–393.
Longmuir, A., J. B. Shurin, and J. L. Clasen. 2007. Independent
gradients of producer, consumer, and microbial diversity in
lake plankton. Ecology 88:1663–1674.
Martiny, J. B. H., et al. 2006. Microbial biogeography: putting
microorganisms on the map. Nature Reviews Microbiology
4:102–112.
McQueen, D. J., R. S. Johannes, J. R. Post, T. J. Stewart, and
D. R. S. Lean. 1989. Bottom-up and top-down impacts on
freshwater pelagic community structure. Ecological Monographs 59:289–309.
Östman, Ö., S. Drakare, E. S. Kritzberg, S. Langenheder, J. B.
Logue, and E. S. Lindström. 2010. Regional invariance
among microbial communities. Ecology Letters 13:118–127.
Parviainen, M., M. Luoto, and R. K. Heikkinen. 2010. NDVIbased productivity and heterogeneity as indicators of plantspecies richness in boreal landscapes. Boreal Environment
Research 15:301–318.
Paszkowski, C. A., and W. M. Tonn. 2000. Community
concordance between the fish and aquatic birds of lakes in
northern Alberta, Canada: the relative importance of
environmental and biotic factors. Freshwater Biology
43:421–437.
Pitney Bowes Software. 2008. MapInfo Professional. Version
9.5. Pitney Bowes Software, New York, New York, USA.
Preston, K., J. T. Rotenberry, R. A. Redak, and M. F. Allen.
2008. Habitat shifts of endangered species under altered
climate conditions: importance of biotic interactions. Global
Change Biology 14:2501–2515.
R Development Core Team. 2011. R: a language and
environment for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria.
Ridgeway, G. 2006. gbm: Generalized boosted regression
models. R Foundation for Statistical Computing, Vienna,
Austria. http://cran.r-project.org/web/packages/gbm/.
Shurin, J. B., S. E. Arnott, H. Hillebrand, A. Longmuir, B.
Pinel-Alloul, M. Winder, and N. D. Yan. 2007. Diversity–
stability relationship varies with latitude in zooplankton.
Ecology Letters 10:127–134.
Shurin, J. B., K. Cottenie, and H. Hillebrand. 2009. Spatial
autocorrelation and dispersal limitation in freshwater organisms. Oecologia 159:151–159.
669
Soininen, J., M. Kokocinski, S. Estlander, J. Kotanen, and J.
Heino. 2007. Neutrality, niches, and determinants of
plankton metacommunity structure across boreal wetland
ponds. Ecoscience 14:146–154.
Soininen, J., J. J. Korhonen, J. Karhu, and A. Vetterli. 2011.
Disentangling the spatial patterns in community composition
of prokaryotic and eukaryotic lake plankton. Limnology and
Oceanography 56:508–520.
Soininen, J., and M. Luoto. 2012. Is catchment productivity a
useful predictor of taxa richness in lake plankton communities? Ecological Applications 22:624–633.
Tucker, C. J. 1979. Red and photographic infrared linear
combinations for monitoring vegetation. Remote Sensing of
Environment 8:127–150.
Van der Gucht, K., et al. 2007. The power of species sorting:
Local factors drive bacterial community composition over a
wide range of spatial scales. Proceedings of the National
Academy of Sciences USA 104:20404–20409.
Van der Linden, S., and J. H. Christensen. 2003. Improved
hydrological modeling for remote regions using a combination of observed and simulated precipitation data. Journal of
Geophysical Research–Atmospheres 108:4072.
Venalainen, A., and M. Heikinheimo. 2002. Meteorological
data for agricultural applications. Physics and Chemistry of
the Earth 27:1045–1050.
Wang, J., J. Soininen, J. He, and J. Shen. 2012. Phylogenetic
clustering increases with elevation for microbes. Environmental Microbiology Reports 4:217–226.
Whittingham, M. J., P. A. Stephens, R. B. Bradbury, and R. P.
Freckleton. 2006. Why do we still use stepwise modelling in
ecology and behaviour? Journal of Animal Ecology 75:1182–
1189.
Wisz, M. S., et al. 2012. The role of biotic interactions in
shaping distributions and realised assemblages of species:
implications for species distribution modelling. Biological
Reviews, in press. http://dx.doi.org/10.1111/j.1469-185X.
2012.00235.x
Wood, S. N. 2006. Generalized additive models: an introduction with R. Chapman and Hall/CRC, Boca Raton, Florida,
USA.
Wood, S. N. 2011. Fast stable restricted maximum likelihood
and marginal likelihood estimation of semiparametric
generalized linear models. Journal of the Royal Statistical
Society B 73:3–36.
Yee, T. W., and N. D. Mitchell. 1991. Generalized additive
models in plant ecology. Journal of Vegetation Science
2:587–602.
Yokokawa, T., and T. Nagata. 2005. Growth and grazing
mortality rates of phylogenetic groups of bacterioplankton in
coastal marine environments. Applied and Environmental
Microbiology 71:6799–6807.
SUPPLEMENTAL MATERIAL
Appendix A
Map of the study area in Finland with the studied drainage basins (Ecological Archives E094-058-A1).
Appendix B
Means and ranges for the main environmental variables of lakes for each drainage system (Ecological Archives E094-058-A2).
Appendix C
Modeling results for phytoplankton where only the cladocerans (seven species) were used as zooplankton grazers for
phytoplankton (Ecological Archives E094-058-A3).
Appendix D
The intercorrelations between all physicochemical and geographical variables (Ecological Archives E094-058-A4).
670
JANNE SOININEN ET AL.
Ecology, Vol. 94, No. 3
Appendix E
Response shapes (partial dependency plots) for the phytoplankton Staurastrum sp. based on GBM models (Ecological Archives
E094-058-A5).
Appendix F
Response shapes (partial dependency plots) for the zooplankton Daphnia galeata based on GBM models (Ecological Archives
E094-058-A6).