Download SupplementaryText_jm+FL.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA barcoding wikipedia , lookup

Species distribution wikipedia , lookup

Koinophilia wikipedia , lookup

Transcript
Supplementary text
A. Defining Biogeographic Regions
We delimited geographic zones depicted in Fig. 3 based on commonly
recognized major biogeographic regions: Neotropics (South and Central America,
including Mexico were grouped together based on a preliminary correlation in species
distribution between these two regions detected with ITS data), Nearctic (North
America, without Mexico), Afrotropics, West Palearctic, East Palearctic, Oriental
(represented by a single specimen from Vietnam) and Australasia. We considered the
Pacific Northwest of North America as a separate region because of its high degree of
endemism for both the mycobiont (exclusive occurrence of three Peltigera species: P.
pacifica, P. neopolydactyla 5, and P. neopolydactyla 6) and two Nostoc phylogroups
(XIb, XVII).
Based on preliminary geographical distributions of mycobionts and
cyanobionts, we noticed that taxon compositions within arcto-boreal and temperate
regions divided as described above were more similar than between these biomes. For
example, Nostoc phylogroups IV, VIIa, XIa, XIII, as well as identical haplotypes of
P. neopolydactyla 1, 2, 4, P. scabrosa 1, 2, 3 and P. occidentalis were found in the
arcto-boreal zone crossing three continents (North America, Asia and Europe). We
therefore decided to further split the Nearctic, West Palearctic and East Palearctic into
their temperate and arcto-boreal elements, when comparing mycobiont species and
cyanobiont phylogroup compositions among these regions using UniFrac
(Supplementary Fig. S5) and in Fig. 5. Because the three arcto-boreal regions grouped
together, we decided to treat them as a Holarctic arcto-boreal region, and to keep the
1
temperate regions divided into Temperate Nearctic, Temperate West Palearctic and
Temperate East Palearctic (Fig. 5).
B. Comparison of loci for phylogenetic inferences within the genus Peltigera and its
Nostoc partner.
At the genus and section levels, EFT2.1 was the most difficult locus to
amplify, with a 70–75% success rate, compared to 90–100% for the ITS, LSU, RPB1
and rbcLX. ß-tubulin was somewhat intermediary, with a success rate of 75–85%,
due to clade specific amplification problems. For Matrices 1 and 2, LSU was the
locus with the lowest proportion of variable characters and the lowest contribution to
species delimitation within section Polydactylon. The ITS was the most variable
marker at the section level, but this locus was too variable (positional homology was
ambiguous for most parts of the ITS1 and 2) to be included in the phylogenetic
analysis of Peltigera as a whole (i.e., excluded from Matrix 1). RPB1 was the most
variable gene at the genus level (Matrix 1), but was the least variable protein-coding
gene within section Polydactylon (Matrix 2). For the latter, ß-tubulin delivered the
greatest number of variable characters among the three protein-coding genes. It
resolved the highest number of internodes with the highest level of internodal support,
compared to all remaining loci (based on single locus phylogenetic analyses; trees not
shown), even though it contained fewer variable characters than the ITS (131 vs. 150;
Table 1). The ITS included the largest proportion of ambiguous sites that had to be
excluded from the analyses (44% excluded at the section level, Matrix 2), followed by
ß-tubulin (28% excluded at the genus level, Matrix 1; 19% excluded at the section
level, Matrix 2), and the remaining three loci (17–11% at the genus level, Matrix 1;
and 13–6% at the section level, Matrix 2).
2
C. Structurama: Sensitivity Analysis on Priors
If priors were totally driving the results, we would obtain a number of
populations/species that would correspond to, depending on the hyperprior chosen,
either the fixed expected number of populations, or the mean of the gamma
distribution (which, with a scale of 1, is the value of the shape parameter) because if
there is no information in the data, the posterior distribution equals the prior
distribution. If priors have no impact on the results, runs with different priors would
eventually converge on the same results and we would obtain the same number of
populations regardless of the priors (Bayes 1763). Our results are intermediate to
these two extremes.
In the dolichorhizoid clade, for instance, we obtained 10 species with a gamma
shape of 1, which means that even if the priors are trying to reduce the number of
species, the data nevertheless support 10 species (Supplementary Fig. S1). With a
gamma shape of 50, we obtained 26 species, which means that even if the priors are
tending to increase the number of species, the data limited the number of putative
species to 26. When using gamma shape parameters centered around 20,
approximately 20 species were recognized, suggesting that the priors and the data
were in agreement.
We assumed that the correct assignment of species is somewhere in the range
where the obtained number of species coincides with the gamma shape prior value
(Supplementary Fig. S1). Overall, our results using this criterion are congruent with
the monophyly, geography and morphology of these circumscribed putative species in
section Polydactylon (e.g., Supplementary Fig. S1c). We selected gamma shape
parameter values around 20 and conducted longer runs (20 million generations) for
3
these values. Increasing the number of generations from 1 to 20 millions did not
influence significantly the number of estimated populations (difference of 1 or 2
populations) and did not improve the likelihood values of the estimates and the
convergence of the results for different prior values. For the dolichorhizoid clade, it
seems that the gamma shape values that give coherent results are between 19 and 22
(Supplementary Fig. S1).
For the scabrosoid clade, the results were slightly different, because gamma
shape values from 1 to 10, always converged on eight species. This is a good
indication that the data strongly support this number of species (Supplementary Fig.
S2c). However, the assignment of individuals to the eight species varied. For gamma
shape values between 1 and 4, the results were inconsistent, leading to paraphyletic
and highly heterogeneous species. For gamma values ranging from 5 to 10, the results
were consistent, delimiting monophyletic groups that were phenotypically and
geographically more homogeneous. We thus consider the latter results as optimal.
With higher gamma values single specimens were assigned to new species that we
could not justify phenotypically; suggesting that this was an artifactual result driven
by misspecified priors.
For the polydactyloid clade, as soon as the gamma shape value was over 7, we
started to see several additional splits, resulting in many singleton species
(Supplementary Fig. S2b). We think that this is caused by a substantial amount of
missing data for this clade, and more diverse alleles compared to the number of
individuals sampled. This is also the reason why we included in the analyses of this
clade, the 5.8S region, which, unlike in the other two clades, included a relatively
high level of allelic diversity within the polydactyloid clade. Therefore, we had to opt
for low gamma shape values to keep the results biologically relevant. However, even
4
with a gamma shape value of 3, species delimitations included cases were P. nana 1
was divided into three species, and P. sp. 10 was grouped with the North American
clade of P. polydactylon (Fig. 3). With a gamma shape of 2, P1652 was grouped with
P. sp. 8 rather than with its close relative P450.
5