Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ecology, 94(3), 2013, pp. 660–670 Ó 2013 by the Ecological Society of America Stochastic species distributions are driven by organism size JANNE SOININEN,1,3 JENNI J. KORHONEN,2 1 AND MISKA LUOTO1 Department of Geosciences and Geography, P.O. Box 64, FI-00014 University of Helsinki, Finland 2 Department of Environmental Sciences, P.O. Box 65, FI-00014 University of Helsinki, Finland Abstract. The strengths of environmental drivers and biotic interactions are expected to show large variability across organism groups. We tested two ideas related to the degree of ecological determinism vs. stochasticity using a large data set comprising bacterio-, phyto-, and zooplankton. We expected that (1) there are predictable, size-driven differences in the degree to which planktonic taxa respond to different drivers such as water chemistry, biotic interactions, and climatic variables; and (2) species distribution models show lowest predictive performance for the smallest taxa due to the stochastic distributions of microbes. Generalized linear models (GLMs), generalized additive models (GAMs), and generalized boosted methods (GBMs) were constructed for 84 species to model their occurrence as a function of eight predictors. Predictive performance was measured as the area under the curve (AUC) of the receiver–operating characteristic plot and true skill statistic (TSS) using independent model evaluation data. We found that the model performances were typically remarkably low for all planktonic groups. The proportion of satisfactory models (AUC . 0.7) was lowest for bacteria (11.1% of the models), followed by phyto- (24.2%) and zooplankton (38.1%). The occurrences of taxa within all planktonic groups were related to climatic variables to a certain degree, but bacteria showed the strongest associations with the climatic variables. Moreover, zooplankton occurrences were more related to biotic variables than the occurrences of smaller taxa, while phytoplankton occurrences were more related to water chemistry. We conclude that the occurrences of planktonic taxa are highly unpredictable and that stochasticity in occurrences is negatively related to the organism size perhaps due to efficient dispersal and fast population dynamics among the smallest taxa. Key words: bacteria; dispersal; generalized additive models; generalized boosted methods; Finland; lakes; occurrence; plankton; trophic structure. INTRODUCTION Biological communities are assembled both by deterministic and stochastic processes (Chase and Leibold 2003). Here, we define deterministic processes as species interactions and environmental filtering, and stochastic processes as ecological drift and random dispersal. The degree of ecological determinism has recently been shown to vary substantially between small- and largebodied species (Farjalla et al. 2012). First, Farjalla et al. (2012) documented that habitat filtering strengthened with organism body size in aquatic communities. The reasons for such an outcome may be that (1) larger taxa have lower plasticity in their fundamental niches and that (2) larger taxa are more selective in their dispersal (Farjalla et al. 2012). Second, they showed that the degree of species associations showed predictable variation among different-sized taxa; macroinvertebrates and zooplankton tend to exhibit more frequent exclusions among the species pairs than bacteria, Manuscript received 11 May 2012; revised 10 October 2012; accepted 22 October 2012. Corresponding Editor: J. H. Grabowski. 3 E-mail: janne.soininen@helsinki.fi 660 perhaps indicating stronger competitive interactions between the species among the larger taxa. Often the interest is not on a whole biotic community, but on a distribution of a single species. Communities comprise multiple interacting species, which also tend to respond independently to the range of abiotic variables. Even if species interactions are important for the distribution of species, species’ responses to abiotic variables have been widely used for explaining and predicting their distributions. In fact, most species distribution models are still based solely on contemporary environmental covariates and the importance of biotic interactions for the species distributions has been investigated only recently (Araújo and Luoto 2007, Preston et al. 2008, Gotelli et al. 2010). Biotic interactions may, however, have a notable influence on species occupancy even at large spatial scales (Gotelli et al. 2010) and accounting for them can increase the predictive power of the models. As mentioned, the strength of biotic interactions and habitat association are bound to show large variability across organisms. We tested two ideas related to the degree of ecological determinism vs. stochasticity using a large data set comprising lake bacterio-, phyto-, and zooplankton (Soininen et al. 2011, Soininen and Luoto 2012). March 2013 STOCHASTICITY AND ORGANISM SIZE We expected that there would be predictable, sizedriven differences in the degree to which planktonic communities respond to different drivers. Planktonic groups encompass a gradient in body size of over four orders of magnitude, with size increasing in the order: bacteria (length ;1–10 lm) , phytoplankton (;10–100 lm) , zooplankton (;100–2000 lm). We expected that bacteria would show only a relatively weak relationship to environmental variables because of their stochastic distributions and that they would also be almost immune to large-scale geographical variables due to their efficient random dispersal (Fig. 1; Finlay 2002, Van der Gucht et al. 2007, Soininen et al. 2011, Farjalla et al. 2012). Phytoplankton, in turn, would show the strongest relationship to environmental variables among the planktonic groups. This agrees with an idea that, as phytoplankton species occupy a low trophic level, they are under relatively strict environmental control like the other autotrophic organisms, as they need nutrients and light for their production (Paszkowski and Tonn 2000, Soininen et al. 2007). We further expected that zooplankton would respond strongest of all planktonic groups to food web interactions and geographical variables. We expected the former results because zooplankton graze on phytoplankton, and are thus suggested to be more strongly regulated by food web interactions (i.e., their energy comes directly from the lower trophic level) than autotrophic organisms that respond more readily to chemical variables (McQueen et al. 1989). We expected the latter results because of zooplankton’s larger size and less likely dispersal across sites when compared with bacteria or phytoplankton, which are likely to disperse passively more efficiently across sites. We also expected that the species distribution models would show lowest explanatory and predictive power for bacteria taxa, while the predictive performance would be substantially higher for the phytoplankton and highest for zooplankton. This prediction is based on the idea that ecological determinism increases with the body size of the organisms (Beisner et al. 2006, Shurin et al. 2009, Farjalla et al. 2012) due to random dispersal and a short life cycle, resulting in the smallest taxa having faster population dynamics (Finlay 2002, Brown et al. 2004). To our knowledge, these ideas have not been tested before on the taxon level using large-scale field data subject to natural variability in the driving forces of community variation. The novelty of this approach is due to the fact that it explicitly shows the taxon-specific responses to driving abiotic and biotic variables, unlike many multivariate statistics that fail to show how summary statistics such as community similarity builds up from taxon-specific responses. Furthermore, taxonspecific models are particularly valuable for providing insights into the main drivers of species distributions when range-limiting physiological factors for studied species are poorly known (Heikkinen et al. 2006, 661 FIG. 1. The expected importance of water chemistry, geographical variables, and biotic variables for the distribution of bacteria, phytoplankton, and zooplankton taxa. The width of the arrows reflects the expected relative importance of each variable group in the species occurrence. Buisson et al. 2010), which is particularly true for the small taxa. In addition, taxon-specific species models may provide more precise and realistic predictions than those offered by species-assemblage models (Lavergne et al. 2010), not least because species’ responses to main environmental drivers are thought to be mainly individualistic (Elith et al. 2006). MATERIALS AND METHODS Study area Plankton samples were collected from 100 small lakes in Finland during July in 2008 or 2009. The latitudinal gradient between the southernmost and northernmost sites was .700 km. The sites were sampled in five drainage basins and in 20 lakes per basin. We sampled 60 lakes in three drainage basins in 2008 and 40 lakes in two drainage basins in 2009. We acknowledge that between-year variation in environmental conditions may increase the residual variation in the data that could not be controlled, likely resulting in lower model performance due to interannual variation. However, sampling of 100 lakes during a single summer was not possible due to strong seasonality and substantial increase of withinyear variation in the data. Moreover, Shurin et al. (2007) recently showed that daily zooplankton richness was linearly related to annual richness in a set of lakes over several years. We suggest, however, that studying the influence of such interannual variation on model performance would be an important task in the future. The sampled drainage basins were (1) Vantaanjoki, (2) Karjaanjoki, (3) Kokemäenjoki, (4) Upper Kymijoki, and (5) Koutajoki (Appendix A). These drainage 662 JANNE SOININEN ET AL. basins were chosen because they cover a large geographical range and their nutrient concentrations vary from ultraoligotrophic to highly eutrophic, thus representing a wide productivity range (Appendix B). We admit that the sampled lakes were clustered rather than evenly distributed within our study area. This is mainly due to practical reasons, i.e., planktonic samples including bacteria were sampled near biological stations in order to preserve the samples properly in freezers. The study lakes were, however, highly representative of the variability in water chemistry of the landscapes in this region. We sampled only small lakes and ponds (mean lake area was 0.05 km2) to ensure that plankton sampling covered the site as best as possible. Most of the lakes within the drainage basins were not readily interconnected to each other via water routes. We lack detailed data on fish communities in these lakes, but based on visual detection as well as earlier documentations, we concluded that all lakes harbored fish. Plankton sampling and sample processing Plankton samples were collected with a tube sampler (2.3 L) from three locations in the middle of the lake and pooled in a plastic bucket, ensuring proper mixing. We collected the samples in the middle of the lakes with a boat in order to avoid benthic taxa from the littoral zone from entering the samples. The samples were collected at 0.5 m below the surface of the water. Our sampling protocol for bacteria followed Longmuir et al. (2007). First, 250 mL of water was filtered through a 0.42-lm pore nitrocellulose filter (diameter 25 mm; Millipore, Durapore, Billerica, Massachusetts, USA) to remove larger particles. Bacteria cells were then collected on a 0.22-lm pore nitrocellulose filter, which was frozen immediately in the field. For phytoplankton, a sample of 0.5 L was fixed immediately with acid Lugol’s iodine solution in the field. Zooplankton samples (6.15 L in total) were collected on a 50-lm mesh and preserved with formaldehyde in the field. We acknowledge that our methodology for zooplankton does not detect as many individuals as it detects for phytoplankton because of the relatively limited amount of water filtered for the samples. However, as the methods are strictly standardized, we think that these are highly suitable for the studies of taxon occurrences across the lakes. The maximum depth of the lakes was measured in the field. Lake area was measured using Geographic Information System (MapInfo Version 9.5; Pitney Bowes Software 2008). Conductivity was measured in the field using a conductivity meter (PW 9529; Philips, Eindhoven, The Netherlands) and samples for water chemistry analyses were collected simultaneously with the plankton samples. In the laboratory, community composition of bacteria was determined using molecular fingerprinting (terminal restriction fragment length polymorphisms, T-RFLP; see Soininen et al. [2011] for details). It is a popular method for generating a fingerprint of an unknown Ecology, Vol. 94, No. 3 microbial community. Although it may underestimate the true number of bacteria taxa present, our consistent use of the method allows us to investigate the distribution patterns of bacteria taxa among the lakes. The phytoplankton samples were sedimented using an Utermöhl chamber (PhycoTech, St. Joseph, Michigan, USA) and counted with a light microscope (magnification 4003). For each sample, 50 fields were counted, typically detecting 200–500 specimens (individuals or colonies). For zooplankton, all individuals (typically 50– 200 individuals per sample) were counted at a magnification of 125–4003 using an inverted microscope. Both crustacean zooplankton and rotifers were included in the counting. For phyto- and zooplankton, most individuals were identified to species level. However, some of the taxa (24.1% in phytoplankton and 24.6% in zooplankton) were identified to genus level only. We thus acknowledge that for bacteria within the data sets, not all taxa represent a species, but rather a genus or perhaps even higher taxonomic groups. This may mask some structure in the data only observable if specieslevel data would have been used. We think, however, that this does not affect our results systematically as there were no significant differences in the predictive performance of the distribution models based on species-level vs. genus-level data (results not shown). Water samples were analyzed in the laboratory for chlorophyll a, water color, total nitrogen, and total phosphorus. For chlorophyll a, water was filtered using a GF/C Whatman filter and analyzed spectrophotometrically after extraction with ethanol. Water color was determined using a color comparator and nutrients using QuikChem 8000 (Lachat, Loveland, Colorado, USA). Geographical variables Multiple linear regression was used to relate mean July temperatures (period 1971–2000) to latitude, longitude, and altitude of each study lake, downscaling climate data from 10 3 10 km resolution grid (Finnish Meteorological Institute, Helsinki, Finland; Venalainen and Heikinheimo 2002) to lake-level. For the regression models, r 2 ¼ 0.94 and 0.97. Annual precipitation was downscaled to a 1-km2 grid by using kriging interpolation (for details, see van der Linden and Christensen 2003). To provide productivity estimates for each lake catchment, normalized difference vegetation index (NDVI) values were obtained from existing Landsat ETM databases at the resolution of 30 m (see details of the data processing Soininen and Luoto 2012). NDVI is the most frequently used parameter for quantifying the productivity and aboveground biomass of ecosystems (Tucker 1979). It is based on the strong absorption of incident radiation by chlorophyll in the red (ETM3), and the contrasting high reflectance by plant cells in the near-infrared (ETM4 ) spectral region. Because it is based on the normalized ratio of the reflectance in these March 2013 STOCHASTICITY AND ORGANISM SIZE two spectral regions, it is an indicator of the greenness of vegetation canopies, which enables separation of vegetation from other land cover. NDVI measures were generated from a geocorrected Landsat ETMþ satellite images. The image was acquired in the years 1999–2002 to coincide with the middle of the growing season. Maximum of the NDVI values were calculated for each lake. Maximum NDVI is more related to the highest potential productivity (Parviainen et al. 2010), and it showed high correlation between species richness of phytoplankton and zooplankton in the same study lakes (Soininen and Luoto 2012). We conducted the NDVI analyses using three different radii (50 m, 100 m, and 500 m) around each lake. Use of these relatively small bands around the lakes ensured that there was no substantial overlap in the catchments even for the lakes that were located close to each other. However, we expected that the use of the smallest radius would better clarify the direct effects, e.g., of leaves on lake water conditions (input of nutrients and organic material in the water), while the largest radius reflects the overall productivity of the neighboring landscape (Soininen and Luoto 2012). Biotic variables For explaining the distribution of bacteria, we used zooplankton number and phytoplankton biomass (chlorophyll a) as biotic variables. This is because bacteria can assimilate the other planktonic organisms in the water obtaining organic material and nutrients and because other planktonic taxa provide microhabitats for bacteria (reviewed by Cole 1982). The amount of planktonic material in the water may thus affect the distribution of bacterial taxa across the lakes since bacteria also show among-taxon differences in the rates they use resources and grow, similar to other planktonic organisms (Yokokawa and Nagata 2005). For phytoplankton, we used zooplankton number and zooplankton richness as predictors that reflect both the number of consumers as well as their range in traits (i.e., zooplankton richness reflects the number of different grazers in terms of morphology and body sizes thus indicating the fill of niche space). These variables were included since the presence of predators can strongly influence the abundance, distribution, and range limits of prey species in a wide range of ecosystems (Estes et al. 2011). Among phytoplankton, e.g., filamentous bluegreen algal species may be relatively immune to grazing, while some single-celled taxa may be more vulnerable to grazing by zooplankton. We also conducted the models using only the cladocerans as predators for phytoplankton because they are known to be efficient grazers on a wide range of prey (Leibold 1989, Bertilsson et al. 2003). For zooplankton, we used phytoplankton biomass and phytoplankton richness in order to measure the amount of food particles available, as well as the range in morphology and size of the food particles. These variables were included because the dependence of 663 animals on plants is often an important determinant of species distributions at broad spatial extents (reviewed by Wisz et al. 2012). Species distribution models The data comprised presence records of each of 407 taxa from 100 lakes, from which 12 bacteria, 51 phytoplankton, and 21 zooplankton taxa with 10 or more records were used in the analyses. We made the assumption that the absence of a record from a lake corresponded to a true absence of the taxa (Guisan and Hofer 2003). Three statistical methods were used to model plankton species distributions: generalized linear models (GLMs), generalized additive models (GAMs), and generalized boosted methods (GBMs). These techniques are widely used to model species distributions and are capable of modeling nonlinear functions (Elith et al. 2006, Franklin 2009). This allowed a comparison of the methods’ predictive ability and a quantification of uncertainties deriving from the choice of the modeling approach (Heikkinen et al. 2006). The data set was randomly split into calibration (i.e., training; 70% of records) and validation (i.e., testing; remaining 30% of records) data sets. We included the following variables known to be highly influential for the distribution of planktonic taxa in the models (Soininen et al. 2011): conductivity, water color, total phosphorus, maximum NDVI, July temperature, mean annual precipitation, chlorophyll a (for bacteria and zooplankton), phytoplankton richness (for zooplankton), zooplankton richness (for phytoplankton), and zooplankton number (for bacteria and zooplankton). GLMs are mathematical extensions of linear models that do not force data into unnatural scales; they allow for nonlinearity and nonconstant variance (heteroscedasticity) structures in the data. Here, the model calibration was performed utilizing the statistical package R version 2.13 (R Development Core Team 2011), with standard glm function. The probability of curvilinear relationships between the explanatory and dependent variables was examined by including the quadratic terms of the predictors in the models (Crawley 2007). Quadratic terms were always kept in the GLMs regardless of their statistical significance. Both GLMs and GAMs were fitted under the assumption of a binomial distribution of errors and applying a logit link function. GAMs are a nonparametric extension of GLMs that use smoothers to estimate the form of the relationship between a response variable and a set of predictor variables (Yee and Mitchell 1991). GAMs were fitted using the mgcv package (Wood 2006, 2011) of the R (R Development Core Team 2011), which calculates the extent to which the degrees of smoothness for each smoother can be lowered during model fitting without significantly reducing model fit. GAMs were fitted setting the initial degrees of smoothness for each 664 JANNE SOININEN ET AL. TABLE 1. Mean values of area under the curve (AUC) and true skill statistic (TSS) for the calibration and evaluation data sets for three planktonic groups using generalized linear models (GLM), generalized additive models (GAM), and generalized boosted methods (GBM). Calibration Evaluation Model and group AUC TSS AUC TSS GLM Bacteria Phytoplankton Zooplankton 0.891 0.877 0.891 0.722 0.674 0.686 0.616 0.645 0.646 0.203 0.170 0.127 GAM Bacteria Phytoplankton Zooplankton 0.913 0.900 0.899 0.760 0.728 0.716 0.580 0.627 0.653 0.022 0.157 0.149 GBM Bacteria Phytoplankton Zooplankton 0.929 0.895 0.930 0.738 0.675 0.712 0.658 0.648 0.693 0.128 0.155 0.231 univariate term at four to avoid overly complex models and potential overfitting. GBM is a machine learning method that estimates the form of the relationship between a response variable and its predictors without a priori specification of a data model (De’ath 2007, Elith et al. 2008). This technique estimates a large numbers of simple models that are combined to form a final model optimized for prediction, using cross-validation for model building. We used R (R Development Core Team 2011) with functions from the gbm and dismo packages (Ridgeway 2006, Elith et al. 2008) to fit BRTs, setting interaction depth to six, learning rate to a maximum of 0.001 (lower for species where inadequate trees were calculated), and bagging fraction to 0.75. All model parameters were first estimated from the calibration data, and then used to predict the occurrence of species in model validation data set. The model performance (discrimination ability) was determined by calculating the area under the curve of a receiver operating characteristic plot (AUC; Fielding and Bell 1997) and the true skill statistic (TSS; Allouche et al. 2006) using the model validation data set. AUC provides an assessment of the agreement between the observed presence/absence records over a range of probability thresholds above which the model predicts presence (Fielding and Bell 1997). TSS is a synthetic index that takes into account sensitivity (ability to identify taxon presence) and specificity (ability to identify absence), and that is not sensitive to prevalence (Allouche et al. 2006). In GLMs and GAMs, variable importance was calculated as each variable’s drop contribution (i.e., change in deviance associated with exclusion of a given variable from a model containing all the other variables). In the GBMs, Friedman’s (2001) method was used to calculate the relative importance of predictors, based on the frequency of variable selection during model building and the reduction in deviance Ecology, Vol. 94, No. 3 associated with each variable’s inclusion in models. For both methods, the variables’ contributions were scaled to sum to 100, with higher numbers indicating stronger influence on the response variable. For each model, a threshold value for the conversion of the continuous probability values for occurrence into binary predictions was selected to maximize sensitivity and specificity (Liu et al. 2005). Using the same set of explanatory variables in all species models improved the comparability of the model outputs. Additionally, we chose not to use stepwise multiple regression because of issues including bias in parameter estimation, inconsistencies among model selection algorithms, an inherent problem of multiple hypothesis testing, and an inappropriate focus or reliance on a single best model (e.g., Whittingham et al. 2006). The possible differences among the planktonic groups in the proportion of taxa of which occurrences could be at least satisfactorily (AUC . 0.7 and TSS . 0.3; Heikkinen et al. 2012) predicted by the models were tested using generalized linear mixed model (GLMM), where model type (i.e., GLM, GAM, or GBM) was entered as a random factor and planktonic group as a fixed factor. GLMMs were conducted using library lmer in R using binomial response variable (R Development Core Team 2011). RESULTS Overall, the AUC and TSS values of the 252 individual models (84 taxa, three modeling techniques) based on model evaluation data set were on average 0.65 and 0.15, respectively. Model performance was highest for zooplankton (mean AUC ¼ 0.66, mean TSS ¼ 0.17), followed by phytoplankton (AUC ¼ 0.64 and TSS ¼ 0.16) and bacteria (AUC ¼ 0.62 and TSS ¼ 0.12). See the model performance for each species group and modeling technique based on model calibration and evaluation data sets in Table 1 and Appendix C. For all planktonic groups, taxon occurrences were, overall, poorly explained by the model variables (Fig. 2). For example, clearly most bacteria taxa only showed low predictability (i.e., AUC , 0.7, TSS , 0.3), whereas the proportion of high-quality bacteria models was negligible. According to GLMM, planktonic groups showed significant differences (AUC, P ¼ 0.008, v2 ¼ 9.49; TSS, P ¼ 0.04, v2 ¼ 6.433) in the proportion of taxa of which occurrences were at least satisfactorily predicted by the models. The proportion of satisfactory models based on AUC values was lowest for bacteria (11.1% of the models), followed by phyto- (24.2%) and zooplankton (38.1%). All between-group differences in the proportion of satisfactory models based on AUC values were significant at P , 0.05. Qualitatively similar results emerged for TSS. We found that intercorrelations between physicochemical and geographical variables were relatively weak (all r , 0.4), and thus, likely did not confound March 2013 STOCHASTICITY AND ORGANISM SIZE 665 FIG. 2. The percentage of taxa showing low (AUC , 0.7, TSS , 0.3), intermediate (AUC 0.7–0.9, TSS 0.3–0.6), and high (AUC . 0.9, TSS . 0.6) values of the area under the curve (AUC) and the true skill statistic (TSS), indicating predictive performance for generalized linear models (GLM), generalized additive models (GAM), and generalized boosted methods (GBM) for each planktonic group. our main conclusions (Appendix D). According to all three modeling approaches, the occurrences of bacteria taxa were best explained both by large-scale factors related to the productivity of the sites (July temperature and mean annual precipitation), as well as by algal biomass (Fig. 3). The responses of bacteria occurrences towards many of the variables were overall, however, very low. The occurrences of phytoplankton taxa were best explained by water chemistry variables (especially color and total P; see Appendix E for an example), as well as by mean annual precipitation. For GLM, occurrences were also related to zooplankton richness. When only the cladocerans were used as zooplankton predators for the phytoplankton, the number of individuals of cladocerans showed stronger importance on phytoplankton occurrence than the cladoceran richness (Appendix C). For GAM, the importance of number of individuals of cladocerans were significantly higher than the importance of individuals of full zooplankton community on phytoplankton occurrence (paired t test, P ¼ 0.027). The best explaining variables for the occurrences of zooplankton taxa included water color, phytoplankton richness, and algal biomass (see Appendix F for an example). For GBM, occurrences were typically also related to phytoplankton richness and precipitation. DISCUSSION The results largely concurred with our initial expectations about the main drivers of taxon occurrences for the different trophic levels, as well as with the prediction that the performance of the distribution models would show a consistent relationship to organism size. We showed that the stochasticity in the taxon distributions was largest (i.e., predictive performance of the models was lowest) among smallest taxa, i.e., for bacteria, as we initially expected based on an idea that ecological determinism decreases with decreasing organism size (Shurin et al. 2009, Farjalla et al. 2012). In general, this probably underscores the efficient dispersal and fast population dynamics for the smallest taxa (Finlay 2002, Brown et al. 2004). We further illustrated that the main drivers of taxon occurrences showed substantial differences among the planktonic groups; for example, the occurrences of phytoplankton taxa were related to local water chemistry as we predicted based on earlier studies 666 JANNE SOININEN ET AL. Ecology, Vol. 94, No. 3 FIG. 3. Boxplots of the variable importance for bacteria (n ¼ 12) and phyto- (n ¼ 51) and zooplankton (n ¼ 21) taxa in the generalized linear models (GLM), generalized additive models (GAM), and generalized boosted methods (GBM). Variable abbreviations are: cond, conductivity; totP, total phosphorus concentration; ndvimax, maximum normalized difference vegetation index; jult, July temperature; annp, annual precipitation; chla, chlorophyll a concentration; zoonumber, number of zooplankton individuals; zoorich; zooplankton richness; and phytorich, phytoplankton richness. The lines in the boxes show medians, box ends are quartiles, whiskers show 95th percentiles, and dots show outliers. For GLMs and GAMs, variable importance was calculated as the change in deviance associated with excluding that variable from a model containing all other variables. In the GBMs, variable importance was based on the frequency of variable selection during model building and reduction in deviance associated with each variable’s inclusion in models. March 2013 STOCHASTICITY AND ORGANISM SIZE from the same study area (Soininen et al. 2007), while zooplankton also responded to biotic variables (i.e., to both richness of the algae reflecting the variation in size and morphology of the food particles, as well as to the total amount of algal material in the water) as highlighted in the literature elsewhere (McQueen et al. 1989). Nonetheless, groups also showed some consistencies in their responses to different drivers given that, perhaps surprisingly, the occurrences of all planktonic groups were related to climatic variables to a certain degree. We found that the predictive performance of the taxon-specific models were related to organism size in a consistent manner, as the number of taxa that showed intermediate or high predictability increased from bacteria to zooplankton. This probably reflects the high passive dispersal ability of the smallest taxa, thus showing more or less stochastic distributions across sites (Finlay 2002) and also their short life cycle, resulting in fast population dynamics (Brown et al. 2004). It is possible that efficiently dispersing taxa may be temporarily present at sites that are not environmentally favorable for them due to the strong influx of colonists from other sites. These taxa may thus occupy both source and sink habitats, perhaps reflected as overall weak relationships between species composition and environment especially in bacteria. Although bacteria occurrences were related to some of the estimated environmental variables (such as to temperature and precipitation), overall predictive performance of the models was low due to the weak relationships to many included variables. This relatively weak species sorting in bacteria is consistent with some other recent findings from the lakes (Langenheder and Ragnarsson 2007, Östman et al. 2010), but contrasts with an idea that efficiently dispersing taxa may be able to track favorable abiotic conditions (Borcard et al. 2004). We admit here that some of the fine-scale responses of bacteria taxa may have been masked because of the relatively coarse molecular fingerprinting method applied here, which likely lumped many bacteria taxa together. For example, it has been shown that pyrosequencing of bacteria may reveal many more bacteria taxa present than traditional fingerprinting methods, such as T-RFLP that was applied here (Wang et al. 2012). Nonetheless, the main community patterns studied by high-throughput sequencing typically resemble those found for more traditional fingerprinting method (Wang et al. 2012). We also note that the predictive performances of the models could have been higher overall if more local environmental variables had been included in the models. For example, detailed measures of dissolved or particulate organic carbon might have been useful for modeling the distribution of bacteria. Likewise, inclusion of concentrations of inorganic fraction of nutrients in addition to total concentrations might have increased the predictive performance of phytoplankton models. Due to the 667 model simplicity and reliable among-taxon comparisons, the inclusion of a relatively small and similar number of key variables known to be important for each group of predictors (i.e., local abiotic variables, biotic variables, and geographical variables) was, however, an essential step in the analyses. Compared with the recent findings from other organism groups such as butterflies and plants (Heikkinen et al. 2012), the AUC values of present distribution models were exceptionally low. In fact, for virtually all planktonic taxa, predictive performance of the models was either intermediate or low, with few exceptions. This mirrors either the high stochasticity of the planktonic distributions in general or the fact that some of the important variables were left unmeasured in the field. In addition, it may well be that the snapshot field measurements of the local environmental variables did not always accurately reflect the conditions really encountered by the planktonic taxa exhibiting fast population dynamics. For example, Crump et al. (2003) documented that the fast changes in the cell densities for many of the transient bacteria taxa were driven by the temporal variation of organic matter supply. It is therefore possible that, e.g., the use of the environmental data collected over longer periods of time might provide models with better predictive performance. Even if model performance was relatively low overall, planktonic groups showed clear responses to some of the drivers. In the light of earlier literature (e.g., Beisner et al. 2006, Langenheder and Ragnarsson 2007), the responses of lake bacterio- and phytoplankton that we found to variables that reflect the large-scale variation in climate and potential productivity such as air temperature and precipitation was somewhat unexpected. Many earlier studies have suggested that local variables such as nutrient supply or the amount of organic material in the water would be the main drivers of community structure in unicellular planktonic taxa (Beisner et al. 2006, Langenheder and Ragnarsson 2007). The present finding of the importance of climatic variables, e.g., on bacteria distribution is, however, in line with Van der Gucht et al. (2007) who identified water temperature as one of the main drivers of bacteria composition in the lakes in Belgium, the Netherlands, and Denmark. We further showed that the distributions of lake bacteria taxa were related to actual plant biomass production both within the lakes (algal biomass) as well as plant biomass in the catchment area (maximum NDVI). While the former finding may reflect the responses of bacteria to directly usable organic material in the water, the latter may be related to their stronger response to landscape level or climatic variables than traditionally believed. This also suggests that much of the organic material available for bacteria possibly originate from the catchment area. Nonetheless, these findings add to the evidence that bacteria distributions are driven jointly by regional and local forces just as the 668 JANNE SOININEN ET AL. distributions of any larger organisms are (reviewed by Martiny et al. 2006). We highlight that the present findings are novel, as our model statistics were built up from taxon-specific responses to both environmental and biotic variables, while many multivariate statistical approaches commonly applied in similar studies fail to show in detail the responses of each taxa to such variables. Taxon-specific species models may provide more precise and realistic predictions because species’ responses to main environmental drivers are typically individualistic (Elith et al. 2006); these individualistic responses were clearly detected towards range of environmental and biotic variables in our data (see Appendices E and F). We applied a conceptually simple view that classifies processes as either deterministic or stochastic and therefore could partition the influence of these processes on distribution of species additively, not considering their interactive component. Although this view facilitated the clear conclusions about how organism size could affect the relative magnitude of stochastic and deterministic processes in plankton, we note that these processes may not be mutually exclusive, but we note that these processes can also jointly mediate species distributions (Coulson et al. 2004). For example, a stochastic dispersal event is first needed for a planktonic species to establish a population in a lake. We emphasize, though, that there likely are systematic differences in the number of dispersal events among the planktonic groups as, e.g., bacteria are more likely to disperse to a lake passively via air than, for example, zooplankton, which often need active agents such as birds or fish for their dispersal. We thus conclude that the stochastic species distributions increase with decreasing organism size in plankton, as illustrated by the poor predictive performance of the distribution models for bacteria. Although planktonic groups showed variable responses to interactions from other trophic levels as well as to climatic and local environmental variables, these results highlight the need to consider not only the local abiotic variables, but also the biotic interactions and climatic variables explicitly even for the smallest taxa when predicting their distributions in a changing environment. ACKNOWLEDGMENTS We thank Jyrki Eskelinen, Lucilla Galata, Johanna Karhu, and Adrien Vetterli for assistance both in the field and in the laboratory, and Peter C. le Roux for writing the R code utilized in the analyses. We also thank two anonymous reviewers for their constructive comments on an earlier draft of the manuscript. LITERATURE CITED Allouche, O., A. Tsoar, and R. Kadmon. 2006. Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology 43:1223–1232. Araújo, M. B., and M. Luoto. 2007. The importance of biotic interactions for modelling species distributions under climate change. Global Ecology and Biogeography 16:743–753. Ecology, Vol. 94, No. 3 Beisner, B. E., P. R. Peres-Neto, E. S. Lindström, A. Barnett, and M. L. Longhi. 2006. The role of environmental and spatial processes in structuring lake communities from bacteria to fish. Ecology 87:2985–2991. Bertilsson, S., L.-A. Hansson, W. Graneli, and A. Philibert. 2003. Size-selective predation on pelagic microorganisms in Arctic freshwaters. Journal of Plankton Research 25:621–631. Borcard, D., P. Legendre, C. Avois-Jacquet, and H. Tuomisto. 2004. Dissecting the spatial structure of ecological data at multiple scales. Ecology 85:1826–1832. Brown, J. H., J. F. Gillooly, A. P. Allen, Van M. Savage, and G. B. West. 2004. Toward a metabolic theory of ecology. Ecology 85:1771–1789. Buisson, L., W. Thuiller, S. Lek, P. Lim, and G. Grenouillet. 2010. Climate change hastens the turnover of stream fish assemblages. Global Change Biology 14:2232–2248. Chase, J. M., and M. A. Leibold. 2003. Ecological niches. Linking classical and contemporary approaches. University of Chicago Press, Chicago, Illinois, USA. Cole, J. J. 1982. Interactions between bacteria and algae in aquatic ecosystems. Annual Review in Ecology and Systematics 13:291–314. Coulson, T., P. Rohani, and M. Pascual. 2004. Skeletons, noise and population growth: the end of an old debate? Trends in Ecology and Evolution 19:359–364. Crawley, M. J. 2007. The R book. John Wiley and Sons, Chichester, UK. Crump, B. C., G. W. Kling, M. Bahr, and J. E. Hobbie. 2003. Bacterioplankton community shifts in an arctic lake correlate with seasonal changes in organic matter source. Applied and Environmental Microbiology 69:2253–2268. De’ath, G. 2007. Boosted trees for ecological modeling and prediction. Ecology 88:243–251. Elith, J., et al. 2006. Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29:129–151. Elith, J., J. R. Leathwick, and T. Hastie. 2008. A working guide to boosted regression trees. Journal of Animal Ecology 77:802–813. Estes, J. A., et al. 2011. Trophic downgrading of planet earth. Science 333:301–306. Farjalla, V. F., D. S. Srivastava, N. A. C. Marino, F. D. Azevedo, V. Dib, P. M. Lopes, A. S. Rosado, R. L. Bozelli, and F. A. Esteves. 2012. Ecological determinism increases with organism size. Ecology 93:1752–1759. Fielding, A. H., and J. F. Bell. 1997. A review of methods for the assessment of prediction errors in conservation presence/ absence models. Environmental Conservation 24:38–49. Finlay, B. J. 2002. Global dispersal of free-living microbial eukaryote species. Science 296:1061–1063. Franklin, J. 2009. Mapping species distributions: spatial inference and prediction. Cambridge University Press, Cambridge, UK. Friedman, J. H. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics 29:1189–1232. Gotelli, N. J., G. R. Graves, and C. Rahbek. 2010. Macroecological signals of species interactions in the Danish avifauna. Proceedings of the National Academy of Sciences USA 107:5030–5035. Guisan, A., and U. Hofer. 2003. Predicting reptile distributions at the mesoscale: relation to climate and topography. Journal of Biogeography 30:1233–1243. Heikkinen, R. K., M. Luoto, M. B. Araújo, R. Virkkala, W. Thuiller, and M. Sykes. 2006. Methods and uncertainties in bioclimatic envelope modelling under climate change. Progress in Physical Geography 30:751–777. Heikkinen, R. K., M. Marmion, and M. Luoto. 2012. Does the interpolation accuracy of species distribution models come at the expense of transferability? Ecography 35:276–288. Langenheder, S., and H. Ragnarsson. 2007. The role of environmental and spatial factors for the composition of aquatic bacterial communities. Ecology 88:2154–2161. March 2013 STOCHASTICITY AND ORGANISM SIZE Lavergne, S., N. Mouquet, W. Thuiller, and O. Ronce. 2010. Biodiversity and climate change: Integrating evolutionary and ecological responses of species and communities. Annual Review of Ecology, Evolution, and Systematics 41:321–350. Leibold, M. 1989. Resource edibility and the effects of predators and productivity on the outcome of trophic interactions. American Naturalist 134:922–949. Liu, C., P. M. Berry, T. P. Dawson, and R. P. Pearson. 2005. Selecting thresholds of occurrence in the prediction of species distributions. Ecography 28:385–393. Longmuir, A., J. B. Shurin, and J. L. Clasen. 2007. Independent gradients of producer, consumer, and microbial diversity in lake plankton. Ecology 88:1663–1674. Martiny, J. B. H., et al. 2006. Microbial biogeography: putting microorganisms on the map. Nature Reviews Microbiology 4:102–112. McQueen, D. J., R. S. Johannes, J. R. Post, T. J. Stewart, and D. R. S. Lean. 1989. Bottom-up and top-down impacts on freshwater pelagic community structure. Ecological Monographs 59:289–309. Östman, Ö., S. Drakare, E. S. Kritzberg, S. Langenheder, J. B. Logue, and E. S. Lindström. 2010. Regional invariance among microbial communities. Ecology Letters 13:118–127. Parviainen, M., M. Luoto, and R. K. Heikkinen. 2010. NDVIbased productivity and heterogeneity as indicators of plantspecies richness in boreal landscapes. Boreal Environment Research 15:301–318. Paszkowski, C. A., and W. M. Tonn. 2000. Community concordance between the fish and aquatic birds of lakes in northern Alberta, Canada: the relative importance of environmental and biotic factors. Freshwater Biology 43:421–437. Pitney Bowes Software. 2008. MapInfo Professional. Version 9.5. Pitney Bowes Software, New York, New York, USA. Preston, K., J. T. Rotenberry, R. A. Redak, and M. F. Allen. 2008. Habitat shifts of endangered species under altered climate conditions: importance of biotic interactions. Global Change Biology 14:2501–2515. R Development Core Team. 2011. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Ridgeway, G. 2006. gbm: Generalized boosted regression models. R Foundation for Statistical Computing, Vienna, Austria. http://cran.r-project.org/web/packages/gbm/. Shurin, J. B., S. E. Arnott, H. Hillebrand, A. Longmuir, B. Pinel-Alloul, M. Winder, and N. D. Yan. 2007. Diversity– stability relationship varies with latitude in zooplankton. Ecology Letters 10:127–134. Shurin, J. B., K. Cottenie, and H. Hillebrand. 2009. Spatial autocorrelation and dispersal limitation in freshwater organisms. Oecologia 159:151–159. 669 Soininen, J., M. Kokocinski, S. Estlander, J. Kotanen, and J. Heino. 2007. Neutrality, niches, and determinants of plankton metacommunity structure across boreal wetland ponds. Ecoscience 14:146–154. Soininen, J., J. J. Korhonen, J. Karhu, and A. Vetterli. 2011. Disentangling the spatial patterns in community composition of prokaryotic and eukaryotic lake plankton. Limnology and Oceanography 56:508–520. Soininen, J., and M. Luoto. 2012. Is catchment productivity a useful predictor of taxa richness in lake plankton communities? Ecological Applications 22:624–633. Tucker, C. J. 1979. Red and photographic infrared linear combinations for monitoring vegetation. Remote Sensing of Environment 8:127–150. Van der Gucht, K., et al. 2007. The power of species sorting: Local factors drive bacterial community composition over a wide range of spatial scales. Proceedings of the National Academy of Sciences USA 104:20404–20409. Van der Linden, S., and J. H. Christensen. 2003. Improved hydrological modeling for remote regions using a combination of observed and simulated precipitation data. Journal of Geophysical Research–Atmospheres 108:4072. Venalainen, A., and M. Heikinheimo. 2002. Meteorological data for agricultural applications. Physics and Chemistry of the Earth 27:1045–1050. Wang, J., J. Soininen, J. He, and J. Shen. 2012. Phylogenetic clustering increases with elevation for microbes. Environmental Microbiology Reports 4:217–226. Whittingham, M. J., P. A. Stephens, R. B. Bradbury, and R. P. Freckleton. 2006. Why do we still use stepwise modelling in ecology and behaviour? Journal of Animal Ecology 75:1182– 1189. Wisz, M. S., et al. 2012. The role of biotic interactions in shaping distributions and realised assemblages of species: implications for species distribution modelling. Biological Reviews, in press. http://dx.doi.org/10.1111/j.1469-185X. 2012.00235.x Wood, S. N. 2006. Generalized additive models: an introduction with R. Chapman and Hall/CRC, Boca Raton, Florida, USA. Wood, S. N. 2011. Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society B 73:3–36. Yee, T. W., and N. D. Mitchell. 1991. Generalized additive models in plant ecology. Journal of Vegetation Science 2:587–602. Yokokawa, T., and T. Nagata. 2005. Growth and grazing mortality rates of phylogenetic groups of bacterioplankton in coastal marine environments. Applied and Environmental Microbiology 71:6799–6807. SUPPLEMENTAL MATERIAL Appendix A Map of the study area in Finland with the studied drainage basins (Ecological Archives E094-058-A1). Appendix B Means and ranges for the main environmental variables of lakes for each drainage system (Ecological Archives E094-058-A2). Appendix C Modeling results for phytoplankton where only the cladocerans (seven species) were used as zooplankton grazers for phytoplankton (Ecological Archives E094-058-A3). Appendix D The intercorrelations between all physicochemical and geographical variables (Ecological Archives E094-058-A4). 670 JANNE SOININEN ET AL. Ecology, Vol. 94, No. 3 Appendix E Response shapes (partial dependency plots) for the phytoplankton Staurastrum sp. based on GBM models (Ecological Archives E094-058-A5). Appendix F Response shapes (partial dependency plots) for the zooplankton Daphnia galeata based on GBM models (Ecological Archives E094-058-A6).