Download Spatial Analysis Improves Species Distribution Modelling during

SPATIAL ANALYSIS IMPROVES SPECIES DISTRIBUTION MODELLING DURING RANGE EXPANSION by Paulo De Marco Jr., José Alexandre Felizola Diniz-Filho and Luis Mauricio Bini SUPPLEMENTARY METHODS Creating simulated species distribution data Simulated data of species range distribution is an useful approach to compare species distribution modeling techniques mainly because it allows to control over some important species-range properties that affect modeling efficiency (Hirzel et al. 2001; Meynard & Quinn 2007). As the actual species distributions are unknown, the performances of modeling methods are difficult to address. Thus, the use of simulated data allows us to circumvent this difficult (Austin et al. 2006). We applied this approach to explore the effects of the colonization-extinction mechanisms that could generate nonequilibrium distributions and to evaluate the usefulness of Spatial Eigenvector Mapping (see below) to improve the prediction of species distribution. Non-equilibrium species distributions could arise under different ecological and evolutionary scenarios. Firstly, it could appear by failures in colonization of suitable areas due to recent environmental changes (e.g. habitat destruction, abrupt climatic shifts or presence of physical barriers) or low dispersal capacity. Also, during the initial stages of species’ invasion, non-equilibrium of species’ distributions with environment may occur. We call this scenario “colonization-lag” non-equilibrium (CNE). In this case, although distribution is actually determined by the environment (e.g. climatic variables), generating strong range cohesion, a mismatch between the actual and potential distributions is expected due to historical time lag. Secondly, complex colonization-extinction dynamics within the species’ potential range, generated by local processes as, for instance, biotic interactions or metapopulational dynamics, will appear as random noise in geographical space. This is what we call demographic non-equilibrium (DNE) scenario, which is expected to disrupt range cohesion. All simulations were based on the premise that species distribution is determined by a “suitability” measure, defined as a linear combination of six standardized environmental variables (see below) (figure S1). We used a two-step process to produce the species distribution data. Firstly, the simplest way to produce range distribution consists in choosing a vector of means for the environmental variables and assuming that suitability for the species is multi-normally distributed around this vector. This was similar to the “Gaussian Threshold” used by Meynard & Quinn (2007) to simulate distributions of artificial species. We assumed that the suitability is not affected by the correlation among the environmental variables, but only by the elements of the main diagonal of the covariance matrix which expressed the variances of each environmental variable. These variances are directly related to the environmental tolerance of the species. Species distributions were restricted by an envelope delimited by two standard deviation units from the mean of each environmental variable. Thus, the suitability was a continuous, Gaussiandistributed variable and represented the degree that the environmental vector in a given pixel is close to the mean values. This index of suitability was estimated for all 2545 cells covering the Cerrado Biome and was based on four climatic variables (Temperature: annual mean, seasonality, Precipitation: annual mean, seasonality) derived from the WORDCLIM (http://www.worldclim.org/), and two topographic variables (altitude and slope) derived from the Hydro-1K global digital elevation model (http://edcdaac.usgs.gov/gtopo30/hydro/). The Cerrado is a savanna-like vegetation that originally covered 2 million km2 in the centre of Brazil (Bridgewater et al. 2004), and this realm was used here for computational facility only. In the second step, we used a cellular automata (CA) modeling to produce a more realistic species’ range expansion process allowing for the dynamics of local colonization and extinction. CA allows one to run simulations of dispersal process in a spatially explicit context, where the spatial variability of suitability, coupled with stochastic colonization and extinction processes, could generate ranges distributions at non-equilibrium with the environment. An initial seed population was introduced in a randomly chosen grid cell among those with higher suitability, and the dynamic process of colonization and extinction initiated based on two simple rules (colonization-lag non-equilibrium - CNE): (a) a species automatically colonize a cell if there is any neighbor cell successfully colonized at time t-1; (b) extinction probability is linear and negatively related to the suitability. A second scenario (Demographic non-equilibrium - DNE) was created adding a suitabilityindependent persistence probability that increased linearly with the proportion of neighbor cells successfully colonized at time t-1. Thus, in CNE the non-equilibrium appears only due to species’ range expansion process and is fully determined by environmental drivers at a given time step. CNE is expected to decreases with increasing range expansion and disappears when a species occupies all the suitable cells. On the other hand, DNE is continuously caused by stochastic colonization and extinction processes, independently of the range expansion process. Differences between CNE and DNE range development under the same suitability data used in this study, for 100 time cycles, are shown in two AVI files also available as supplementary material. Figure S1. Spatial variability in suitability across de Cerrado Biome used to constrain species distributions during the process of range expansion. Suitability of each cell (2545 in the total) was given by the combination of six environmental variables. Modeling method The geographic distributions of the simulated species (under the different scenarios and at each time cycle) were randomly sampled to obtain 100 occurrence points. These points were then used as the input data for the species distribution modeling using Maxent (see below). Our modeling approach was based on the use of the Maximum Entropy principle (Phillips et al. 2006) implemented in the program Maxent version 2.3. Maxent is a machine-learning technique that estimates the probability distribution that is closest to uniform (i.e., which has the maximum entropy) under the constraint that the expected value of each environmental variable matches the empirical values observed at the occurrence data. Our primary interest here are on the general process of modeling range expansion and the use of Spatial Eigenvector Mapping , therefore, the choice of this method was based solely on it easy-to-use properties. The process of model fit in Maxent, as other SDM applications, involves the estimation of parameters and some optimization criteria. Recommended default values were used for the convergence threshold (10-5), and regularization parameter (1) (Phillips & Dudik 2008), except the number of iterations which was set to 1000. Spatial variables derived from Spatial Eigenvector Mapping (see Diniz-Filho & Bini 2005; Griffith & Peres-Neto 2006; Dormann et al. 2007, for recent reviews) were calculated in the software package SAM (Spatial Analysis in Macroecology; Rangel et al. 2006) and added as predictors in Maxent models. In this method, the eigenvectors from a double-centered truncated geographic distance matrix can be used in SDM as new orthogonal predictors that capture, at different scales, the geometry of the studied area. We defined truncation distances based on the intercept of Moran’s I correlograms for the Maxent residuals of the models estimated only with environmental predictors. We selected the first 5 eigenvectors to include in Maxent modeling, after testing the successive addition on the eigenvectors and checking for model stability in the final steps of simulations. Model evaluation Evaluation of species distribution models is dependent on the choice of a minimum threshold to convert predicted probabilities of occurrence on species’ presence or absence. A ROC (receiver operating characteristics) curve is produced by plotting sensitivity against the complement of specificity for different threshold values (Liu et al. 2005; Manel et al. 2001). The ROC procedure allows to find an optimum threshold by identifying the value that maximizes the sum of specificity and sensitivity (Manel et al. 2001), being regarded as one of the best methods for threshold determination in SDM (Liu et al. 2005). This procedure relies on the existence of a test data, which includes actual presences and absences, to estimate a confusion matrix. Nevertheless, Maxent can address the problem by producing a set of random test points (Phillips et al. 2006). This approach had the advantage of using a large training dataset and includes a proportion of false absences into the process of model evaluation. We use a set of 10000 random test points which is recommended for asymptotic properties of AUC (Phillips & Dudik 2008). The presence/absence prediction of Maxent was compared with the “real” distribution for each simulation using Kappa probabilities. Kappa values are considered one of the most useful procedure for model evaluation and had the advantage of taking both omission and commission errors into account (Liu et al. 2005; Pearson et al. 2006). As a measure of the relative importance of spatial filters to improve model fit, we used the difference of Kappa values () estimated between the models with both environmental predictors and spatial filters and the models with only environmental predictors.  values were compared between CNE and DNE models controlling for species range using an Analysis of Covariance (ANCOVA) (figure S2). This analytical design was chosen due to the known dependency of Kappa on species prevalence (Allouche et al. 2006). Figure S2. Relationships between  Kappa (gain in Kappa values of models that included spatial filters in relation to models with only environmental variables) and the geographical range size, for CNE (filled squares) and DNE (open squares) models. REFERENCES Allouche, O., Tsoar, A., and Kadmon, R. 2006 Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). J. Appl. Ecol. 43, 1223-1232. Austin, M. P., Belbin, L., Meyers, J. A., Doherty, M. D., and Luoto, M. 2006 Evaluation of statistical models used for predicting plant species distributions: Role of artificial data and theory. Ecological Modelling 199, 197-216. Bridgewater, S., Ratter, J. A., and Ribeiro, J. F. 2004 Biogeografic patterns, beta diversity and dominance in the cerrado biome of Brazil. Biodiversity and Conservation 13, 2295-2318. Hirzel, A. H., Helfer, V., and Metral, F. 2001 Assessing habitat-suitability models with a virtual species. Ecological Modelling 145, 111-121. Liu, C. R., Berry, P. M., Dawson, T. P., and Pearson, R. G. 2005 Selecting thresholds of occurrence in the prediction of species distributions. Ecography 28, 385-393. Manel, S., Williams, H. C., and Ormerod, S. J. 2001 Evaluating presence-absence models in ecology: the need to account for prevalence. J. Appl. Ecol. 38, 921-931. Meynard, C. N. and Quinn, J. F. 2007 Predicting species distributions: a critical comparison of the most common statistical models using artificial species. J. Biogeogr. Pearson, R. G., Thuiller, W., Araujo, M. B., Martinez-Meyer, E., Brotons, L., McClean, C., Miles, L., Segurado, P., Dawson, T. P., and Lees, D. C. 2006 Model based uncertainty in species range prediction. J. Biogeogr. 33, 1704-1708. Phillips, S. J., Anderson, R. P., and Schapire, R. E. 2006 Maximum entropy modeling of species geographic distributions. Ecological Modelling 190, 231-259. Phillips, S. J. and Dudik, M. 2008 Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography 31, 161-175.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Spatial Analysis Improves Species Distribution Modelling during