Download Supplementary Information (doc 45K)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia , lookup

Choice modelling wikipedia , lookup

Data assimilation wikipedia , lookup

Time series wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Supplementary On-line material
Multivariate linear regression approach
We modeled the multivariate response of the bacterial community to a
matrix of environmental variables and spatial covariates using the regression
approach and variance partitioning technique proposed by ter Braak (1986),
Legendre and Legendre (1998), Borcard and Legendre (2002) and Borcard
and colleagues (1992; 2004). This technique quantifies the amount of
variation attributable exclusively to the different sets of environmental or
spatial correlates and offers observational evidence on the relative importance
of the different processes that determine community structure (e.g. Cottenie,
2005 but see Smith and Lundholm, 2010 for a critical review).
Before variance partitioning, we de-trended the linear effects of latitude
using a correspondence analysis approach (CCA, Legendre and Legendre,
1998). Borcard and colleagues (1992) originally proposed the use of
polynomial of several degrees for describing main spatial trends. A
correspondence analysis basis is better for raw species contingency table
since it preserves chi-square distance, while residuals from CCA can be
analyzed by RDA, which is based on principal component analysis (Legendre
and Legendre, 1998).
Recently, more objective techniques based on extracting the principal
coordinates of neighbor matrices (i.e., geographical distances) have been
demonstrated to be more appropriate (Borcard et al., 2004; Dray et al., 2006).
Eigenvectors are extracted according to a hierarchy that accounts for spatial
patterns at progressively finer scales. Model selection procedures (e.g.
multivariate extension of the AIC criterion) allow selecting the best linear
combination of eigenvectors in terms of maximizing correlation with the data
and minimizing number of vectors (Dray et al., 2006). The earlier, and widely
used approach of eigenvector extraction is known as Principal Coordinate
Analysis of Neighbor Matrices PCNM (Borcard et al., 2004). Dray and
colleagues (2006) have generalized this method by showing that PCNM is a
special case of the more general class of Moran’s Eigenvector Mapping
(MEM), which also consists of extracting the eigenvectors of a sample
distance matrix. However, in the case of MEM, the distance matrix is obtained
by multiplying a connectivity matrix and a weighting matrix.
In our case, connectivity matrices such as Gabriel and Delauny graphs
(see Dray et al., 2006 for analytical details) are not trivial to estimate, since
the global scale of our study implies that the geometry of our sampling
coordinates is spherical. Therefore this system of coordinates cannot be
approximated to a Euclidean flat surface. However, the extraction of
eigenvectors such as PCNMs is not problematic if the sample distance matrix
is based on the great circle distance (the shortest path between two points on
a sphere). We thus used this approach. After deriving PCNMs, we followed
Dray et al., (2006) and accordingly selected the set of eigenvectors that best
accounts for autocorrelation. The linear combination of PCNMs was then used
as a predictor of the species table.
We had two types of spatial variation, one being represented by the
“Continent” effect. This categorical factor includes large, global patterns such
as latitudinal gradients in species distribution and possible biogeographical
effects that are reflected by the relative position of the continents within the
globe. One such effect, for example, could be a possible biogeographical
affinity between southern America and Antarctica, which has been
documented for many taxa belonging to different phyla and kingdoms (Chown
and Convey, 2006). This type of spatial correlation depends on historical
biogeography and should be separated by patterns of spatial autocorrelation
described by PCNMs, which are presumed to measure the effect of dispersal
processes and unmeasured environmental variables (but see Smith and
Lundholm, 2010 for a critical review). Thus, PCNM eigenvectors allow one to
account for patterns at multiple spatial scales, while a variance partitioning
technique based on three matrices (Legendre and Legendre 1998) allows one
to quantify the unique contribution of Climate, Continent and spatial patterns,
i.e. respectively Climate|Continent, PCNM (Climate effect after accounting for
Continent and spatial effects), Continent|Climate, PCNM (Continent effect
after accounting for Climate and spatial effects) and PCNM|Climate, Continent
(spatial effect after accounting for Climate and Continent effects). The
calculation used for quantifying the variance components were based on the
function varpart of the R package vegan (Oksanen et al., 2009), which
executes variance partitioning of a multivariate system with multiple tables.
Results were used for generating Table 1 of the main paper. Figure 3 of the
main paper was based on Continent|Climate, PCNMs.
Null Model Analysis
We performed 5000 randomizations of the original data matrix in order to
create null expectations for the C-score, an index which increases with
increasing species segregation (i.e. when species tend to avoid each other).
This approach decreases the frequency of Type I error but offers satisfying
statistical power (Gotelli, 2000 Gotelli and Entsminger, 2001).
Random
matrices had the same number of species and samples as the original matrix.
We then performed pair-wise comparison between the observed and
expected C-scores.
Neutral Model
We estimated the neutral diversity () and immigration (I) parameters
with the sampling formula for multiple samples by Etienne (2007) and a
maximum likelihood approach. Then, local simulated communities can be
predicted by the neutral model from the metacommunity that corresponds to
the estimates of  and I. Using indices that quantify ecological distance (e.g.
Jaccard and Bray-Curtis indices), actual and simulated communities can be
compared in terms of the level of dissimilarity, which thus offers a dynamical
null hypothesis. In order to estimate , I for our samples and simulate the
neutral local communities, we used the PARI/GP codes given in Etienne
(2007).
Local communities are assumed to be partially isolated from the
metacommunity. These communities are subjected to immigration, in
accordance with Hubbell’s neutral model (2001). The rate of immigration that
was originally called “m” by Hubbell (2001) can be expressed in terms of
number of individuals that are immigrants to the local community. Immigration
term,
m
I
I  J 1
where I is fundamental dispersal limitation parameterand J is total community
size The local diversity increases with immigration and the gamma diversity of
the metacommunity, which is measured by the diversity parameter  . We
observed very low values of  (between 2 and 15) and I (around 1 % of total
community abundance), which suggests local communities are almost
completely isolated.
In the neutral theory, metacommunity diversity depends on a speciation
parameter, which is implicitly embodied in  (Hubbell, 2001; Etienne, 2007).
When sample size is small, the estimate of I is problematic and multiple
likelihood maxima may appear. However, the formula for multiple samples is
robust to this problem (Etienne, 2007). Accordingly, our likelihood surfaces
revealed one peak only. We did not use the likelihood of the model for direct
comparison to other models of species abundance distribution such as the
log-normal (e.g. Volkov et al., 2003). Instead, we used the estimates of  and
I for creating a null expectation in terms of community dissimilarities (McGill,
2003; Gotelli and McGill, 2006; Dornelas, 2006; Etienne, 2007) under the
assumption that local community structure can be approximated by
demographic stochasticity and limited dispersal only.
We compared observed and expected levels of dissimilarity by a
bootstrapped t-test. We tried several dissimilarity indices (e.g. Bray-Curtis;
Gower; Jaccard) proposed by Anderson and colleagues (2006). Since all
tested indices led to the same conclusion, in the main paper we reported
results based on the Jaccard dissimilarity, which is
d Jaccard 
bc
abc
where a is the number of species shared by two communities (B and C), b is
the number of species in B that do not occur in C, and c is the number of
species in C that do not occur in B.
References
Anderson MJ, Ellingsen KE, McArdle BH. (2006) Multivariate dispersion as a
measure of beta diversity. Ecol Lett 9: 683–693.
Borcard D, Legendre P, Drapeau P. (1992). Partialling out the spatial
component of ecological variation. Ecology 73: 1045–1055.
Borcard D, Legendre P. (2002). All-scale spatial analysis of ecological data by
means of principal coordinates of neighbour matrices. Ecol Model 153: 51–68.
Borcard D, Legendre P, Avois-Jacquet C, Tuomisto H. (2004). Dissecting the
spatial structure of ecological data at multiple scales. Ecology 85: 1826–1832
Chown SL, Convey P. (2006). Antarctica as a global indicator. In: Bergstom
DM, Convey P,
Huiskes HL. (eds). Biogeography: Trends in Antarctic
terrestrial and limnetic ecosystems. Springer: Dordrecht, pp. 55–70.
Cottenie K. (2005). Integrating environmental and spatial processes in
ecological community dynamics. Ecol Lett 8: 1175–1182.
Dornelas M, Sean R, Connolly SR, Hughes TP. (2006). Coral reef diversity
refutes the neutral theory of Biodiversity. Nature 44: 80–82.
Dray S, Legendre P, Peres-Neto PR. (2006).
Spatial modelling: a
comprehensive framework for principal coordinate analysis of neighbour
matrices (PCNM). Ecol Model 196: 483–493.
Etienne RS. (2007). A neutral sampling formula for multiple samples and an
exact test of neutrality. Ecol Lett 10: 608–618.
Gotelli NJ. (2000). Null model analysis of species co-occurrence patterns.
Ecology 81: 2606–2621.
Gotelli NJ, Entsminger GL. (2001). Swap and fill algorithms in null model
analysis: rethinking the Knight's Tour. Oecologia 129: 281–291.
Gotelli NJ, McGill B. (2006). Null Versus Neutral Models: What's The
Difference? Ecography 29: 793–800.
Hubbell SP. (2001). The Unified Neutral Theory of Biodiversity and
Biogeography. Princeton Univ. Press: Princeton.
Legendre P, Legendre L. (1998). Numerical Ecology. Elsevier: Amsterdam.
McGill BJ. (2003). Strong and weak tests of macroecological theory. Oikos
102: 679–685.
Oksanen J, Kindt R, Legendre P, O’Hara RB, Gavin L, Simpson GL, Solymos
P, Stevens MH, Wagner H. (2009). vegan: Community Ecology Package. R
package version 1.15–4. http://CRAN.R-project.org/package=vegan
Smith TW, Lundholm JT. (2010). Variation partitioning as a tool to distinguish
between niche and neutral processes. Ecography 33: 648–655.
ter Braak CJF. (1986). Canonical Correspondence Analysis: a new
eigenvector technique for multivariate direct gradient analysis. Ecology 67:
1167–1179.
Volkov I, Banavar JR, Hubbell SP, Maritan A. (2003). Neutral theory and
relative species abundance in ecology. Nature 424: 1035–1037.