Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Supplementary text A. Defining Biogeographic Regions We delimited geographic zones depicted in Fig. 3 based on commonly recognized major biogeographic regions: Neotropics (South and Central America, including Mexico were grouped together based on a preliminary correlation in species distribution between these two regions detected with ITS data), Nearctic (North America, without Mexico), Afrotropics, West Palearctic, East Palearctic, Oriental (represented by a single specimen from Vietnam) and Australasia. We considered the Pacific Northwest of North America as a separate region because of its high degree of endemism for both the mycobiont (exclusive occurrence of three Peltigera species: P. pacifica, P. neopolydactyla 5, and P. neopolydactyla 6) and two Nostoc phylogroups (XIb, XVII). Based on preliminary geographical distributions of mycobionts and cyanobionts, we noticed that taxon compositions within arcto-boreal and temperate regions divided as described above were more similar than between these biomes. For example, Nostoc phylogroups IV, VIIa, XIa, XIII, as well as identical haplotypes of P. neopolydactyla 1, 2, 4, P. scabrosa 1, 2, 3 and P. occidentalis were found in the arcto-boreal zone crossing three continents (North America, Asia and Europe). We therefore decided to further split the Nearctic, West Palearctic and East Palearctic into their temperate and arcto-boreal elements, when comparing mycobiont species and cyanobiont phylogroup compositions among these regions using UniFrac (Supplementary Fig. S5) and in Fig. 5. Because the three arcto-boreal regions grouped together, we decided to treat them as a Holarctic arcto-boreal region, and to keep the 1 temperate regions divided into Temperate Nearctic, Temperate West Palearctic and Temperate East Palearctic (Fig. 5). B. Comparison of loci for phylogenetic inferences within the genus Peltigera and its Nostoc partner. At the genus and section levels, EFT2.1 was the most difficult locus to amplify, with a 70–75% success rate, compared to 90–100% for the ITS, LSU, RPB1 and rbcLX. ß-tubulin was somewhat intermediary, with a success rate of 75–85%, due to clade specific amplification problems. For Matrices 1 and 2, LSU was the locus with the lowest proportion of variable characters and the lowest contribution to species delimitation within section Polydactylon. The ITS was the most variable marker at the section level, but this locus was too variable (positional homology was ambiguous for most parts of the ITS1 and 2) to be included in the phylogenetic analysis of Peltigera as a whole (i.e., excluded from Matrix 1). RPB1 was the most variable gene at the genus level (Matrix 1), but was the least variable protein-coding gene within section Polydactylon (Matrix 2). For the latter, ß-tubulin delivered the greatest number of variable characters among the three protein-coding genes. It resolved the highest number of internodes with the highest level of internodal support, compared to all remaining loci (based on single locus phylogenetic analyses; trees not shown), even though it contained fewer variable characters than the ITS (131 vs. 150; Table 1). The ITS included the largest proportion of ambiguous sites that had to be excluded from the analyses (44% excluded at the section level, Matrix 2), followed by ß-tubulin (28% excluded at the genus level, Matrix 1; 19% excluded at the section level, Matrix 2), and the remaining three loci (17–11% at the genus level, Matrix 1; and 13–6% at the section level, Matrix 2). 2 C. Structurama: Sensitivity Analysis on Priors If priors were totally driving the results, we would obtain a number of populations/species that would correspond to, depending on the hyperprior chosen, either the fixed expected number of populations, or the mean of the gamma distribution (which, with a scale of 1, is the value of the shape parameter) because if there is no information in the data, the posterior distribution equals the prior distribution. If priors have no impact on the results, runs with different priors would eventually converge on the same results and we would obtain the same number of populations regardless of the priors (Bayes 1763). Our results are intermediate to these two extremes. In the dolichorhizoid clade, for instance, we obtained 10 species with a gamma shape of 1, which means that even if the priors are trying to reduce the number of species, the data nevertheless support 10 species (Supplementary Fig. S1). With a gamma shape of 50, we obtained 26 species, which means that even if the priors are tending to increase the number of species, the data limited the number of putative species to 26. When using gamma shape parameters centered around 20, approximately 20 species were recognized, suggesting that the priors and the data were in agreement. We assumed that the correct assignment of species is somewhere in the range where the obtained number of species coincides with the gamma shape prior value (Supplementary Fig. S1). Overall, our results using this criterion are congruent with the monophyly, geography and morphology of these circumscribed putative species in section Polydactylon (e.g., Supplementary Fig. S1c). We selected gamma shape parameter values around 20 and conducted longer runs (20 million generations) for 3 these values. Increasing the number of generations from 1 to 20 millions did not influence significantly the number of estimated populations (difference of 1 or 2 populations) and did not improve the likelihood values of the estimates and the convergence of the results for different prior values. For the dolichorhizoid clade, it seems that the gamma shape values that give coherent results are between 19 and 22 (Supplementary Fig. S1). For the scabrosoid clade, the results were slightly different, because gamma shape values from 1 to 10, always converged on eight species. This is a good indication that the data strongly support this number of species (Supplementary Fig. S2c). However, the assignment of individuals to the eight species varied. For gamma shape values between 1 and 4, the results were inconsistent, leading to paraphyletic and highly heterogeneous species. For gamma values ranging from 5 to 10, the results were consistent, delimiting monophyletic groups that were phenotypically and geographically more homogeneous. We thus consider the latter results as optimal. With higher gamma values single specimens were assigned to new species that we could not justify phenotypically; suggesting that this was an artifactual result driven by misspecified priors. For the polydactyloid clade, as soon as the gamma shape value was over 7, we started to see several additional splits, resulting in many singleton species (Supplementary Fig. S2b). We think that this is caused by a substantial amount of missing data for this clade, and more diverse alleles compared to the number of individuals sampled. This is also the reason why we included in the analyses of this clade, the 5.8S region, which, unlike in the other two clades, included a relatively high level of allelic diversity within the polydactyloid clade. Therefore, we had to opt for low gamma shape values to keep the results biologically relevant. However, even 4 with a gamma shape value of 3, species delimitations included cases were P. nana 1 was divided into three species, and P. sp. 10 was grouped with the North American clade of P. polydactylon (Fig. 3). With a gamma shape of 2, P1652 was grouped with P. sp. 8 rather than with its close relative P450. 5