Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Molecular Ecology Resources (2014) doi: 10.1111/1755-0998.12277 OPINION Challenges to barcoding an entire flora TAMMY L. ELLIOTT and T . J O N A T H A N D A V I E S Department of Biology, McGill University, 1205 Docteur Penfield Avenue, Montreal, Quebec H3A 1B1 Canada Abstract DNA barcodes are species-specific genetic markers that allow taxonomic identification of biological samples. The promise of DNA barcoding as a rapid molecular tool for conducting biodiversity inventories has catalysed renewed efforts to document and catalogue the diversity of life, parallel to the large-scale sampling conducted by Victorian naturalists. The unique contribution of DNA barcode data is in its ability to identify biotic material that would be impossible to classify using traditional taxonomic keys. However, the utility of DNA barcoding relies upon the construction of accurate barcode libraries that provide a reference database to match to unidentified samples. Whilst there has been much debate in the literature over the choice and efficacy of barcode markers, there has been little consideration of the practicalities of generating comprehensive barcode reference libraries for species-rich floras. Here, we discuss several challenges to the generation of such libraries and present a case study from a regional biodiversity hotspot in southern Quebec. We suggest that the key challenges include (i) collection of specimens for rare or ephemeral species, (ii) limited access to taxonomic expertise necessary for reliable identification of reference specimens and (iii) molecular challenges in amplifying and matching barcode data. To be most effective, we recommend that sampling must be both flexible and opportunistic and conducted across the entire growing season by expert taxonomists. We emphasize that the success of the global barcoding initiative will depend upon the close collaboration of taxonomists, plant collectors, and molecular biologists. Keywords: barcode library, biodiversity survey, hotspot, incomplete sampling, regional flora, taxonomic impediment Received 5 September 2013; revision received 12 April 2014; accepted 20 April 2014 Introduction For centuries, plant hunters have travelled the world in search of specimens. Early collectors, such as Henry Compton (1632–1713), Hans Sloane (1660–1753), Carl Linnaeus (1707–1778), and the wave of Victorian naturalists that succeeded them, systematically documented the diversity of the natural world and populated the world’s herbaria with their collections. In recent decades, planthunting expeditions have increasingly shifted focus to collecting specimens for use in medicinal and other scientific studies (Hepper 1989). Following the launch of the Consortium for the Barcode of Life (CBOL) in May 2004, the International Barcode of Life Project (iBOL) was established to create a barcode library for all eukaryotic life (Schindel & Miller 2005; Frezal & Leblois 2008; Vernooy et al. 2010). This new effort more closely parallels the large-scale sampling conducted by the Victorian naturalists, but in addition to submitting plant species to herbaria, collectors now also submit genetic data generCorrespondence: Tammy L. Elliott, E-mail: [email protected] © 2014 John Wiley & Sons Ltd Fax: 1-514-398-5069; ated using new molecular tools to digital repositories such as The Barcode of Life Data System (BOLD; www. barcodinglife.org) (Ratnasingham & Hebert 2007). The iBOL initiative aims to produce biological inventories in the form of DNA barcode libraries that will provide a reference database to aid in the identification of unidentified or hard to identify taxa (Ratnasingham & Hebert 2007; Jinbo et al. 2011). Although there were initial fears that DNA barcoding might usurp the role of traditional taxonomists (e.g. Ebach & Holdrege 2005), it is now realized that successful barcoding efforts rely on strong collaborations between ‘barcoders’ and taxonomists (Frezal & Leblois 2008). To create a barcode library, a site’s flora must first be barcoded. These libraries can then be used as a cheap and quick method for biodiversity assessments (Krishnamurthy & Francis 2012). Comprehensive, accurate barcode libraries also allow for the identification of fragmentary samples or morphologically indistinct life stages (e.g. shoots, eggs or larvae), and the detection of species in environments when more comprehensive sampling is not practical, such as in biosecurity (Armstrong & Ball 2005) or for early detection of invasive 2 T. L. ELLIOTT and T.J. DAVIES species (Dejean et al. 2012). The wealth of molecular sequence data that is available in such libraries has provided researchers with the opportunity to generate large, well-resolved regional phylogenies, allowing for more accurate quantification of community structure and species interactions (e.g. Kress et al. 2009). In addition, barcode reference libraries have been used to identify ingredients in herbal medicines (e.g. Li et al. 2011b; Han et al. 2012; Kool et al. 2012) and teas (e.g. Stoeckle et al. 2011) as well as examine the belowground spatial structure of plant communities (e.g. Kesanakurti et al. 2011). The exact DNA sequences to be used as barcode markers in plants have been a source of heated debate. Key features for an effective DNA barcode include universality (loci that can be amplified across taxa with common primers), quality (loci that most likely produce bidirectional sequences with few or no ambiguous base calls), discrimination (loci that distinguish the highest number of species) and cost-effectiveness (Hollingsworth et al. 2009). The combination of rbcLa (c. 599 base pairs at 50 end of gene) and matK (c. 846 base pairs at centre of gene) is now widely considered as the DNA barcode for land plants (Fazekas et al. 2008; Hollingsworth et al. 2011), although other markers (see Hollingsworth et al. 2011), such as the internal transcribed spacer (ITS), have also been proposed (Li et al. 2011a), and others have been used to discriminate species in some plant clades (e.g. Roy et al. 2010). Recent work has explored the efficacy of these markers in discriminating different species in regional floras (e.g. Burgess et al. 2011) and the resolution they provide in reconstructing phylogenetic trees of regional floras (e.g. Kress et al. 2009). As barcoding efforts have expanded, a major challenge has been to generate DNA barcode libraries for complete floras. However, to date, there have been few published studies on such efforts. We surveyed ISI Web of Science (WoS; accessed 28 March 2013) based on the following criteria: barcoding; plants; NOT alga; NOT fungi; NOT insects. This search returned 353 matching papers, only five of which focused on barcoding regional floras (Fig. 1), including all the vascular plants of Mt. Valerio, Italy (Bruni et al. 2012), Churchill, Canada (Kuzmina et al. 2012), and the Koffler Scientific Reserve, Canada (Burgess et al. 2011), as well as the angiosperms and conifers of Wales, United Kingdom (de Vere et al. 2012). Other notable studies, from more diverse regions, have focused on certain plant groups, for example, the woody trees, shrubs and palms on Barro Colorado Island, Panama (Kress et al. 2009). In part, the paucity of published studies reflects the relative youthfulness of the field; we therefore additionally surveyed contributors to the WG1.2 Land Plants iBOL Working Group. Ten of the 17 respondents described projects focused on barcoding regional floras at taxonomic scales of angiosperms or Barcoding methods (15%) Barcoding utility (12%) Barcoding regional flora (2%) Review/ ideas (13%) Species discrimination (53%) Sequence selection (5%) Fig. 1 Web of Science search results of published papers on barcoding plants (n = 353). Papers were classified as ‘Barcoding regional flora’ if they focused on the taxonomic scale of angiosperms or higher within a single region. ‘Species discrimination’ papers focused on determining which gene sequences best differentiated plant species within various genera or families, whereas papers on ‘Sequence selection’ examined the broader question of selecting the best barcode sequences for land plants. ‘Barcoding methods’ papers either introduced novel methods that could be used in future barcoding research or presented methods to improve existing protocols. ‘Barcoding utility’ papers provided ideas or examples of new plant barcoding applications. ‘Review/idea’ papers were literature reviews or viewpoints on plant barcoding research. In addition, a request of information from contributors to the BOLD Land Plants campaign yielded 10 unpublished projects working on barcoding regional floras. higher (Table S1, Supporting information). This estimate likely represents only a fraction of ongoing projects as the release of information by BOLD contributors was voluntary, and it is possible that there are other barcoding studies that do not use BOLD. These large barcoding efforts pose many significant practical challenges, some of which have been largely ignored in the recent literature, including (i) difficulties in collecting specimens for rare or ephemeral species; (ii) paucity of or limited access to sufficient taxonomic expertise and resources to obtain reliable, up-to-date identifications of samples used to build the barcode reference library; and (iii) molecular challenges of barcoding (e.g. difficulties in amplification of samples spanning a wide taxonomic spectrum and distinguishing between closely related species). Even though many of these challenges are not novel, the first two have tended to be overlooked in barcoding literature in comparison with the emphasis that has been placed on the third challenge – © 2014 John Wiley & Sons Ltd BARCODING REGIONAL FLORAS 3 molecular methods and the performance of barcodes for species identification (e.g. Lahaye et al. 2008; Burgess et al. 2011). Here, we attempt to redress this balance and focus on the first two challenges. Specifically, we discuss the practicalities of generating DNA barcode libraries for complete floras. We illustrate these challenges using a case study of the flora of the Gault Nature Reserve on Mont Saint Hilaire (MSH), a biosphere reserve in the Monteregie region of southern Quebec. Barcoding a complete flora – Mont St. Hilaire as a case study In 2012, we established an initiative to DNA barcode the entire flora of the Gault Nature Reserve, a small reserve approximately 30 km from the major urban centre of Montreal, Quebec, Canada. The flora of the Gault reserve is well-documented with a nearly complete list of vascular plants available for the site (see Maycock 1961; Table S2, Supporting information) as well as the presence of herbarium specimens for a majority of the reserve’s species in either the McGill University Herbarium (MTMG) or the Marie-Victorin Herbarium (MT). The reserve is situated on Mont St. Hilaire, one of the eight Monteregian Hills of the St. Laurence Lowlands in southern Quebec (Maycock 1961). The old growth forest of the reserve is dominated by the deciduous tree species Acer saccharum, Fagus grandifolia and Quercus rubra (Gilbert & Lechowicz 2004). Protection of this 1000 ha reserve goes back until the 1600s, and it is now recognized as a UNESCO Biosphere Reserve (Maycock 1961; Francis 2004; White et al. 2011). The reserve contains approximately 650 vascular plant species (Maycock 1961; Table S2, Supporting information), several of which are considered rare. Barcoding this flora provides an illustration of the more general (and generally more extreme) challenges to barcoding complete floras. The reserve is relatively small and easily accessible, and although the area is a regional diversity hotspot, species richness is relatively low compared with, for example, tropical floras of similar areal extent (Myers 1988). Over approximately 384 and 235 sampling hours during the 2012 and 2013 respective field seasons, 582 different terrestrial vascular species were collected with sampling focused on collecting one individual per species, except in cases where there were uncertainties in field identifications for which multiple samples were gathered. We found 111 species not previously reported in the reserve and suspect that changing distributions in response to climate change (Parmesan 2006; Kelly & Goulden 2008), exotic species moving in with disturbance (Gilbert & Lechowicz 2005) and differing taxonomic skill sets of the various surveyors (Oredsoon 2000; Chen et al. 2009) might all contribute to explaining this increase in number. We also think that the previously © 2014 John Wiley & Sons Ltd documented species that we did not find might have been extirpated because of the impacts of climate change and increased deer browsing (C^ ote et al. 2004; Gilbert & Lechowicz 2005). In addition, we identified several discrepancies in the list used to guide collection efforts, leading to an inflated species total from past years (Fig. 2). Of the 787 vascular species previously recorded, 143 were removed from the list because they were synonyms and 21 species names were duplicated due to typographical errors. Thus, our sample represents an impressive 84% of the revised species list, contributing 133 rbcL and 126 matK new sequences for species not previously represented through the BOLD public data portal as of December 2013. Sampling in future years will target the remaining 16%, but it is impossible to verify how many of the species on this list were originally misidentified as only a handful have matching voucher specimens (e.g. Maycock 1961) that we can cross-check. The flora of Mont St. Hilaire is likely much smaller than that targeted by regional barcoding efforts in tropical or larger areas; nonetheless, our efforts identified a number of key challenges to creating a comprehensive DNA barcode library based on sampling one individual per vascular plant species that will apply more generally (and likely to a greater degree). Many of these challenges arise from the time constraints imposed by the shortterm nature of most research projects, which are often determined by funding cycles and the demands of Typographical errors (2%) Synonyms (17%) New species to list (12%) Not collected (16%) Collected (52%) Fig. 2 Results of 2012 and 2013 sampling efforts of the vascular plant species on the Gault Nature Reserve, Quebec, based on a pre-existing species list of the reserve (Gault Nature Reserve, unpublished data). The original species list contained several ‘typographical errors’ and ‘synonyms’. ‘New species to list’ were taxa not included in the original 2012 list. Of the remaining taxa on the list, ‘collected’ species were those that were collected during the 2012 and 2013 field seasons, whereas ‘not collected’ were species that were not found during the two field seasons. 4 T. L. ELLIOTT and T.J. DAVIES thoroughly sampling all of the species within a site’s boundaries during short growing seasons. These challenges parallel all such efforts aimed at conducting comprehensive biodiversity surveys (Palmer et al. 2002; Chen et al. 2009), but generating a barcode library imposes additional demands. Simply creating complete regional floristic inventories, even for relatively small floras such as Mont St. Hilaire, is challenging and will considerable effort, which is a substantial impediment, given current costs and budgeting constraints (Pautasso 2012). In addition, rare species will take more time to find, whilst common species can be quick and easy to sample (Preston 1948). For example, we collected a high number of species in a relatively short period of time during the 2012 field season because sampling targeted accessible areas with high species numbers (Fig. 3). However, this opportunistic sampling strategy reduced the amount of time spent searching for rarer species, which can be more difficult to find, illustrating diminishing returns to the amount of effort invested. During the 2013 field season, more targeted sampling based on previous location reports for rarer species was generally aimed at more remote locations within the reserve; however, this strategy required much larger effort but yielded fewer new species as these sites were often species poor (Fig. 4). Low population numbers of rare species can also present a challenge as it is not ethical to collect when sampling might impact population viability; thus, workers must spend valuable time searching for alternative sources of plant material. Differences in phenology can also present major challenges and limit the utility of short, intensive sampling efforts or ‘bioblitzes’ (a term originally used by Susan Ruby of U.S. National Park Service in 1996; http://www. biodiversityonline.ca/BioBlitz/intro.htm), as the collection of some plant species must be conducted during a brief window of time during which traits important for species identification are expressed as fruits and/or flowers (Miller & Nyberg 1995; Leponce et al. 2010). For example, the number of plant species collected during the field season at Mont St. Hilaire fluctuated as high numbers of spring ephemerals were collected whilst flowering early in the growing season, which made them easiest to identify, but species numbers dropped off as the summer progressed (Fig. 4). Combining phenological differences with the logistics of getting to remote locations presents an even larger challenge to comprehensive sampling. Within Mont St. Hilaire, several rare species grow in difficult-to-access areas (Maycock 1961), including several locations that were impossible to access due to dangerous physical obstacles such as steep cliffs. In a few cases, several time-consuming return visits to these remote areas were required because of differences in the phenology of rare species found at these sites. Assuming sufficient time and funds exist for adequate collection, there remains a major challenge in correctly identifying species. First, even experts might overlook species if the material that is present at a site is not in an identifiable form for at least part of the year (see above). 3.75 km 4.05 km Fig. 3 Location of sampling points for plant specimens from Mont St. Hilaire, Quebec, during the 2012 and 2013 field seasons. The reserve boundaries have been extended to include the point in the top right-hand side of the figure. Sampling locations for plants collected during the 2012 field season are indicated by stars, whereas plants collected during the 2013 field season are indicated by circles. Number of unique species collected 160 140 2012 120 2013 100 80 60 40 20 0 April May June July August September October Month Fig. 4 Number of terrestrial plant species collected per month during the 2012 and 2013 field seasons at Mont St. Hilaire, Quebec (note: collectors were not available to go to the field in July 2013). © 2014 John Wiley & Sons Ltd BARCODING REGIONAL FLORAS 5 Second, species within genera that are the product of rapid radiations can represent suites of morphologically similar taxa that are difficult to distinguish either in the field or herbarium. In cases of taxonomic uncertainty, more time must then be spent collecting extra specimens of these taxa so that species that closely resemble each other are not missed, in addition to the extra time required to collect, process and send off material to taxonomic experts. Collection of specimens at the Gault Nature Reserve was conducted by experienced botanists; nonetheless, several genera such as the Carex, Dichanthelium, Crataegus, Salix and Amelanchier were problematic to distinguish in the field because of the presence of hybrids, agamic complexes and difficult-to-distinguish species characteristics. Correct identification of such specimens required reference to multiple resources and referral to specialist taxonomists in some cases, as has also been the case in other regional barcoding studies (e.g. Kuzmina et al. 2012). In general, the decline in taxonomists and limited access to herbarium resources and updated taxonomic treatments present a major challenge to accurate species identification – the taxonomic impediment (Hoagland 1996; Rodman & Cody 2003). Although these challenges might be familiar to most biodiversity surveys, the unique approach of DNA barcoding presents additional challenges related to species identification. Well-constructed barcode libraries allow end-users to identify plant material to species or genus based on tissue fragments (e.g. a single leaf, root or stem) that would be impossible to identify using traditional taxonomic keys. However, it is not possible for end-users to detect errors in the original species identification because barcode matches are based only on sequence data and not morphology, although digital images and herbarium specimens can be revisited in the future to verify identifications. The correct identification of reference specimens is therefore paramount (Collins & Cruickshank 2013). Once plants have been collected, voucher specimens must be accessioned into a herbarium as required by BOLD. Mounting, labelling and accessioning our 680 specimens were an additional challenge to barcoding. Perhaps most surprisingly, time and cost estimates based on our experiences at the Gault Nature Reserve (Table S3, Supporting information) suggest that over one-third of the time required to complete floristic barcoding studies should be allocated to data organization and processing voucher specimens. Further, our experience conducting botanical surveys near Schefferville, in northern Quebec, suggests these same two steps might still require a similar proportional time commitment when preparing barcode libraries for more remote areas (Table S3, Supporting information). Paid and dedicated staff would be necessary to process samples for larger collection efforts. © 2014 John Wiley & Sons Ltd Sequencing success is not guaranteed once specimens have been collected and vouchered. The choice of DNA barcodes has been a trade-off between factors such as universality (i.e. ability to amplify all species) and resolution (i.e. the ability to differentiate between species) (Hollingsworth et al. 2009). In plants, rbcL has proven relatively straightforward to amplify for most clades but provides only limited resolution, particularly among closely related taxa (Hollingsworth et al. 2009). In contrast, matK offers greater resolution, but the proposed primer sets perform poorly in nonangiosperm plant groups such as the gymnosperms and ferns (Hollingsworth et al. 2011; Li et al. 2011c), although alternative primer combinations are now available (Li et al. 2011c). We do not discuss further the details of marker choice here as the selection of DNA barcode markers has already been debated extensively in the literature (e.g. Kress et al. 2005; Rubinoff et al. 2006; Kress & Erickson 2007; Pennisi 2007; Hollingsworth et al. 2011); however, additional challenges can arise postsequencing. Once barcode sequences are available, additional complexities must be considered before using barcode reference libraries to assign identifications to unknown plant material (e.g. Kesanakurti et al. 2011; Li et al. 2011b). First, the genetic distance differentiating species must be decided upon if cryptic species are suspected (Meyer & Paulay 2005; Collins & Cruickshank 2013). The answer to this is likely taxon and gene specific, and there is probably no correct answer in some circumstances (Hollingsworth et al. 2011), although sequences can be simply grouped with the closest matching barcode if a library is assumed ‘complete’ (Hajibabaei et al. 2007; Kesanakurti et al. 2011). Whilst species resolution is generally higher in regionally focused studies compared with those focused on particular taxonomic groups (e.g. Le Clerc-Blain et al. 2010; Roy et al. 2010; Burgess et al. 2011), even within the flora of Mont St. Hilaire, we suspect the identification of several genera such as Salix and Carex might be problematic, as these genera have previously been reported to be difficult to distinguish using DNA barcodes (Starr et al. 2009; von Cr€ autlein et al. 2011), and multiple species within each genus are found in the regional flora. A second (and related) question is how to decide the appropriate sampling density to get estimates of within-species variation in sequences (Meyer & Paulay 2005; Bergsten et al. 2012). One approach would be to maximize the geographical distance between samples (Bergsten et al. 2012). However, as the number of individuals required would likely vary by species and perhaps barcode region, there again is probably no single answer. The effort required for the implementation of this strategy would in any case likely be prohibitive for sampling species-rich floras as many individuals per species would have to be sampled to get an appropriate esti- 6 T. L. ELLIOTT and T.J. DAVIES mate. Finally, users of barcode data must decide how to interpret results if different genes give conflicting answers (Li et al. 2011a). Common barcoding genes in plants are perhaps unlikely to give conflicting answers because they are from the chloroplast (Soltis et al. 1998); nonetheless, neutral evolution might still result in apparent conflict and chloroplast vs. nuclear genes might still show incongruence (Rieseberg & Soltis 1991). Possible solutions include adding sequence data and/or individuals or to explore neutral vs. non-neutral mutations separately. We are not aware of any review of this issue as it applies to barcoding, although it has been widely discussed in the broader phylogenetics literature (Pamilo & Nei 1988; Degnan & Rosenberg 2009). These are all important questions that must be addressed in the barcoding literature. Consequences of incomplete sampling Future conservation and research programmes will have to take into consideration the reality that sampling efforts will probably not yield comprehensive floristic inventories, and thus, barcode libraries will often be incomplete. Of concern, the taxa most likely to be missed are rare species (e.g. orchids on MSH), which might lack barcodes and could be of highest conservation priority. Limited sampling or misidentification of closely related species could also impact efforts to generate phylogenetic trees using barcode data. It is possible to generate phylogenetic hypotheses from partial or gappy molecular matrices, at least for those taxa with some barcode data (Wiens 2003; Wiens & Morrill 2011); however, missing sequence data might increase phylogenetic uncertainty, resulting in unresolved nodes (i.e. polytomies). The consequences of this lack of phylogenetic resolution will depend on the questions to which the phylogenetic hypotheses are applied. For example, testing fine-scale patterns in species co-occurrence would require more well-resolved phylogenetic trees at the species level (e.g. Slingsby & Verboom 2006; Araya et al. 2012). In addition, metrics of phylogenetic community structure differ in their sensitivity to phylogenetic resolution (Swenson 2009). Species-level resolution might be more important for capturing interspecific interactions, such as competition and facilitation. Considerations such as these are important for determining whether it is worth investing the extra time and resources to retrieve complete barcode sequences for all species. Lessons learned: suggestions for future studies Based on our experiences, we suggest several ways forward to more effectively plan and execute regional barcoding efforts. First, herbaria, species inventories, local experts and the Global Biodiversity Information Facility (GBIF; www.gbif.org) can all provide possible locations for species that are hard to find, taking into account possible errors in identification and synonyms. Other suggestions are to use online resources such as Project BudBurst (www.budburst.org) and review herbarium specimens to design field sampling around times when plants are in flowering or fruiting stage. The current year’s climatic conditions will have to be considered when interpreting the data available through such sources, and the resolution of GBIF location data might not help in the location of rare species. A targeted sampling approach could focus sampling on areas where rare species have been observed in the past, but to be effective, this method requires precise GPS co-ordinates or detailed, accurate information on herbarium labels for previously observed species. This strategy is of course only applicable when targets are known and their populations are not transient. Second, methods such as accumulation and rarefaction curves can be used to provide estimates of potential gains in species numbers provided by additional sampling, although these approaches do not give an absolute answer to the number of species in the area (Gotelli & Colwell 2010). Third, we suggest to be prepared to sample more than one field season by having a sampling plan that focuses on collecting the most common species the first year and targets rare species during subsequent field seasons. Follow-up grants can be used to support continued sampling in subsequent years, but diminishing returns of species numbers relative to the amount of time invested suggest that continued full-time sampling would yield only a few substantial new results. We therefore recommend to maintain continuous sampling at a site by establishing strong long-term collaborations with other researchers at the site who might also benefit from the publically available barcode data. One alternative to reduce the time and costs spent supporting field work is to sequence DNA material from herbarium specimens, although DNA from such material can be degraded, inhibiting amplification of barcode markers. Nonetheless, herbarium specimens are useful for sampling very small populations of rare species that can be documented but not sampled because of their conservation status. Whilst herbaria should be part of the global barcoding initiative, they cannot, however, replace field sampling, especially where the flora for a region is incompletely described. Critically, field collection allows for the discovery of new species and can document shifting species distributions with climate change that would be missed by reviewing herbarium specimens for a region. During our study, 12% of the species on our final list were new records for Mont St. Hilaire (Fig. 2) and would have been omitted if we had relied only on herbarium specimens. © 2014 John Wiley & Sons Ltd BARCODING REGIONAL FLORAS 7 Limited access to taxonomic expertise and resources can also hamper regional barcoding efforts; therefore, it is essential to collaborate with taxonomists as well as work with parataxonomists and herbarium resources when possible. Ideally, strong collaborations with taxonomists should be made before gathering data. Taxonomists are a key component to producing regional barcode libraries and can train the field collectors and notify them of problematic species before the field season and as such should be compensated appropriately for their time and effort (Tripp & Hoagland 2013). Parataxonomists can provide an additional resource and may help encourage local participation in regional barcoding efforts (Janzen 2004; Abadie et al. 2008; Janzen & Hallwachs 2011), but they are not appropriate for all study systems. We suggest using herbaria as much as possible as they provide many resources crucial in species identification; however, extra time and money might be required for workers in areas without local herbaria. Finally, further taxonomic confusion could be avoided by having data repositories such as BOLD recommend common databases and identification resources for attributing names to species. Comprehensively sampling entire areas to create barcode libraries will require large-scale field work incurring major financial costs. Funding such scientific endeavours is a socio-political issue that society needs to address and is largely beyond the scope of this paper. One possibility is to incorporate regional barcoding efforts into current research projects; however, the shortterm nature of grant funding cycles and the considerable time commitment required to compile barcode libraries may provide a disincentive. The unfortunate reality is that adequately sampling large areas, such as northern Quebec, will be expensive and might not be possible under current funding structures. As recent funding cutbacks have already ended the era of free barcode sequencing funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) and other major granting agencies through BOLD, we might thus predict a decrease in the volume of barcode data being produced in the near future. This decline will be a double blow as we will not only see a decrease in the rate of data acquisition, but also because the value of existing barcode data increases as the database grows. We have shown that even for a relatively small, easily accessible and well-sampled site with a flora of only ~650 vascular plant species, comprehensive taxonomic sampling to create a barcode library is a formidable challenge. We have identified a number of specific challenges to barcoding a regional flora that are sometimes overlooked in the literature, and we suggest some possible ways forward. In addition, we emphasize some of the particular benefits that may be gained from efforts to © 2014 John Wiley & Sons Ltd barcode all of the plant species at a site, such as documenting changes in floristic composition and contributing valuable new specimens to herbarium collections. However, regardless of the effort put into collecting specimens in the field and advances in sequencing performance, the information provided by DNA barcodes will not be valuable to science unless the original specimens are correctly identified, and/or voucher specimens are available such that identifications can be revised later. The iBOL initiative aims to provide comprehensive DNA barcode libraries that can be used in biodiversity research, conservation and forensics, by allowing endusers to identify biological material through matching barcode sequences. In many cases, this material might be fragmentary or in various developmental stages that are difficult or impossible to identify using traditional taxonomic keys (Jinbo et al. 2011). Correct identification of barcoded specimens relies on resources produced by expert taxonomists, such as updated taxonomic keys, identification services and correctly identified sequences on GenBank. Thus, the success of the iBOL initiative will rely upon professionally trained taxonomists working together with plant collectors and molecular biologists to generate accurate barcode libraries. Acknowledgements We would like to thank Nadia Cavallin and the staff at the Gault Nature Reserve for their help with sampling. In addition, we thank the African Centre for DNA Barcoding (ACDB) and the Quebec Centre for Biodiversity Science (QCBS) for supporting this work. Gratitude is also extended to the respondents of our survey of land plant barcoding contributors to BOLD. Funding for T.L.E. was provided by the Natural Sciences and Engineering Research Council of Canada (NSERC). References Abadie JC, Andrade C, Machon N, Porcher E (2008) On the use of parataxonomy in biodiversity monitoring: a case study on wild flora. Biodiversity and Conservation, 17, 3485–3500. Araya YN, Silvertown J, Gowing DJ, McConway KJ, Linder HP, Midgley G (2012) Do niche-structured plant communities exhibit phylogenetic conservatism? A test case in an endemic clade. Journal of Ecology, 100, 1434–1439. Armstrong KF, Ball SL (2005) DNA barcodes for biodiversity: invasive species identification. Philosophical Transactions of the Royal Society B: Biological Sciences, 360, 1813–1823. Bergsten J, Bilton DT, Fujisawa T et al. (2012) The effect of geographical scale of sampling on DNA barcoding. Systematic Biology, 61, 851–869. Bruni I, De Mattia F, Martellos S et al. (2012) DNA barcoding as an effective tool in improving a digital plant identification system: a case study for the area of Mt. Valerio, Trieste (NE Italy). PLoS One, 7, e43256. Burgess KS, Fazekas AJ, Kesanakurti PR et al. (2011) Discriminating plant species in a local temperate flora using the rbcL+ matK DNA barcode. Methods in Ecology and Evolution, 2, 333–340. Chen G, Kery M, Zhang J, Ma K (2009) Factors affecting detection probability in plant distribution studies. Journal of Ecology, 97, 1383–1389. 8 T. L. ELLIOTT and T.J. DAVIES Collins RA, Cruickshank RH (2013) The seven deadly sins of DNA barcoding. Molecular Ecology Resources, 13, 969–975. C^ ote SD, Rooney TP, Tremblay JP, Dussault C, Waller DM (2004) Ecological impacts of deer overabundance. Annual Review of Ecology, Evolution, and Systematics, 35, 113–147. von Cr€ autlein M, Korpelainen H, Pietil€ainen M, Rikkinen J (2011) DNA barcoding: a tool for improved taxon identification and detection of species diversity. Biodiversity and Conservation, 20, 373–389. Degnan JH, Rosenberg NA (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends in Ecology & Evolution, 24, 332–340. Dejean T, Valentini A, Miquel C, Taberlet P, Bellemain E, Miaud C (2012) Improved detection of an alient invasive species through environmental DNA barcoding: the example of the American bullfrog Lithobates catesbeianus. Journal of Applied Ecology, 49, 953–959. Ebach MC, Holdrege C (2005) DNA barcoding is no substitute for taxonomy. Nature, 434, 697. Fazekas AJ, Burgess KS, Kesanakurti PR et al. (2008) Multiple multilocus DNA barcodes from the plastid genome discriminate plant species equally well. PLoS One, 3, e2802. Francis G (2004) Biosphere reserves in Canada: ideals and some experience. Environments: A Journal of Interdisciplinary Studies, 32, 1–26. Frezal L, Leblois R (2008) Four years of DNA barcoding: current advances and prospects. Infection, Genetics and Evolution, 8, 727–736. Gilbert B, Lechowicz MJ (2004) Neutrality, niches, and dispersal in a temperate forest understory. Proceedings of the National Academy of Sciences of the United States of America, 101, 7651–7656. Gilbert B, Lechowicz MJ (2005) Invasibility and abiotic gradients: the positive correlation between native and exotic plant diversity. Ecology, 86, 1848–1855. Gotelli NJ, Colwell RK (2010) Estimating species richness. In: Biological Diversity: Frontiers in Measurement and Assessment (eds Magurran AE, McGill BJ), pp. 39–54. Oxford University Press, Oxford. Hajibabaei M, Singer GAC, Hebert PDN, Hickey DA (2007) DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics. Trends in Genetics, 23, 167–172. Han J, Shi L, Chen X, Lin Y (2012) Comparison of four DNA barcodes in identifying certain medicinal plants of Lamiaceae. Journal of Systematics and Evolution, 50, 227–234. Hepper FN (1989) Plant Hunting for Kew. HMSO Publications Centre, London. Hoagland K (1996) The taxonomic impediment and the convention on biodiversity. Association of Systematics Collections Newsletter, 24, 61–62. Hollingsworth PM, Forrest LL, Spouge JL et al. (2009) A DNA barcode for land plants. Proceedings of the National Academy of Sciences, 106, 12794–12797. Hollingsworth PM, Graham SW, Little DP (2011) Choosing and using a plant DNA barcode. PLoS One, 6, e19254. Janzen DH (2004) Setting up tropical biodiversity for conservation through non-damaging use: participation by parataxonomists. Journal of Applied Ecology, 41, 181–187. Janzen DH, Hallwachs W (2011) Joining inventory by parataxonomists with DNA barcoding of a large complex tropical conserved wildland in northwestern Costa Rica. PLoS One, 6, e18123. Jinbo U, Kato T, Ito M (2011) Current progress in DNA barcoding and future implications for entomology. Entomological Science, 14, 107–124. Kelly AE, Goulden ML (2008) Rapid shifts in plant distribution with recent climate change. Proceedings of the National Academy of Sciences, 105, 11823–11826. Kesanakurti PR, Fazekas AJ, Burgess KS et al. (2011) Spatial patterns of plant diversity below-ground as revealed by DNA barcoding. Molecular Ecology, 20, 1289–1302. et al. (2012) Molecular identification of Kool A, de Boer HJ, Kr€ uger A commercialized medicinal plants in Southern Morocco. PLoS One, 7, e39459. Kress WJ, Erickson DL (2007) A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS One, 2, e508. Kress WJ, Wurdack KJ, Zimmer EA, Weigt LA, Janzen DH (2005) Use of DNA barcodes to identify flowering plants. Proceedings of the National Academy of Sciences of the United States of America, 102, 8369–8374. Kress WJ, Erickson DL, Jones FA et al. (2009) Plant DNA barcodes and a community phylogeny of a tropical forest dynamics plot in Panama. Proceedings of the National Academy of Sciences, 106, 18621–18626. Krishnamurthy PK, Francis RA (2012) A critical review on the utility of DNA barcoding in biodiversity conservation. Biodiversity and Conservation, 21, 1901–1919. Kuzmina ML, Johnson KL, Barron HR, Hebert PDN (2012) Identification of the vascular plants of Churchill, Manitoba, using a DNA barcode library. BMC Ecology, 12, 25–35. Lahaye R, Van Der Bank M, Bogarin D et al. (2008) DNA barcoding the floras of biodiversity hotspots. Proceedings of the National Academy of Sciences, 105, 2923–2928. Le Clerc-Blain J, Starr JR, Bull RD, Saarela JM (2010) A regional approach to plant DNA barcoding provides high species resolution of sedges (Carex and Kobresia, Cyperaceae) in the Canadian Arctic Archipelago. Molecular Ecology Resources, 10, 69–91. Leponce M, Meyer C, Hauser C et al. (2010) Collecting plant genetic diversity: technical guidelines. In: Manual on Field Recording Techniques and Protocols for All Taxa Biodiversity Inventories. ABC Taxa (eds Eymann J, Degreef J, Hauser C et al.), pp. 19–49. Belgium Development Cooperation, Brussels, Belgium. Li DZ, Gao LM, Li HT et al. (2011a) Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants. Proceedings of the National Academy of Sciences, 108, 19641–19646. Li M, Cao H, But PPH, Shaw PC (2011b) Identification of herbal medicinal materials using DNA barcodes. Journal of Systematics and Evolution, 49, 271–283. Li Y, Gao LM, Poudel RC, Li DZ, Forrest A (2011c) High universality of matK primers for barcoding gymnosperms. Journal of Systematics and Evolution, 49, 169–175. Maycock PF (1961) Botanical studies on Mont St. Hilaire, Rouville County, Quebec. I – General description of the area and a floristic survey. Canadian Journal of Botany, 39, 1293–1325. Meyer CP, Paulay G (2005) DNA barcoding: error rates based on comprehensive sampling. PLoS Biology, 3, e422. Miller AG, Nyberg JA (1995) Collecting herbarium vouchers. In: Collecting Plant Genetic Diversity: Technical Guidelines (eds Guarino L, Rao VR, Reid R), pp. 561–573. CAB International, Wallingford. Myers N (1988) Threatened biotas: “hot spots” in tropical forests. The Environmentalist, 8, 187–208. Oredsoon A (2000) Choice of surveyor is vital to the reliability of floristic change studies. Watsonia, 23, 287–292. Palmer MW, Earls PG, Hoagland BW, White PS, Wohlgenmuth T (2002) Quantitative tools for perfecting species lists. Environmetrics, 13, 121– 137. Pamilo P, Nei M (1988) Relationships between gene trees and species trees. Molecular Biology and Evolution, 5, 568–583. Parmesan C (2006) Ecological and evolutionary responses to recent climate change. Annual Review of Ecology, Evolution, and Systematics, 37, 637–669. Pautasso M (2012) Challenges in the conservation and sustainable use of genetic resources. Biology Letters, 8, 321–323. Pennisi E (2007) Wanted: a barcode for plants. Science, 318, 190–191. Preston FW (1948) The commonness, and rarity, of species. Ecology, 29, 254–283. Ratnasingham S, Hebert PDN (2007) BOLD: The Barcode of Life Data System (http://www.barcodinglife.org). Molecular Ecology Notes, 7, 355–364. Rieseberg LH, Soltis D (1991) Phylogenetic consequences of cytoplasmic gene flow in plants. Evolutionary Trends in Plants, 5, 65–84. Rodman JE, Cody JH (2003) The taxonomic impediment overcome: NSF’s Partnerships for Enhancing Expertise in Taxonomy (PEET) as a model. Systematic Biology, 52, 428–435. © 2014 John Wiley & Sons Ltd BARCODING REGIONAL FLORAS 9 Roy S, Tyagi A, Shukla V et al. (2010) Universal plant DNA barcode loci may not work in complex groups: a case study with Indian Berberis species. PLoS One, 5, e13674. Rubinoff D, Cameron S, Will K (2006) Are plant DNA barcodes a search for the Holy Grail? Trends in Ecology & Evolution, 21, 1–2. Schindel D, Miller SE (2005) DNA barcoding a useful tool for taxonomists. Nature, 435, 17. Slingsby JA, Verboom GA (2006) Phylogenetic relatedness limits cooccurrence at fine spatial scales: evidence from the schoenoid sedges (Cyperaceae: Schoeneae) of the Cape Floristic Region, South Africa. The American Naturalist, 168, 14–27. Soltis D, Soltis PS, Doyle JJ (1998) Molecular Systematics of Plants II: DNA Sequencing. Klewer Academic Publishers, Norwell, MA. Starr JR, Naczi RFC, Chouinard BN (2009) Plant DNA barcodes and species resolution in sedges (Carex, Cyperaceae). Molecular Ecology Resources, 9, 151–163. Stoeckle MY, Gamble CC, Kirpekar R et al. (2011) Commercial teas highlight plant DNA barcode identification successes and obstacles. Scientific Reports, 1, 1–7. Swenson NG (2009) Phylogenetic resolution and quantifying the phylogenetic diversity and dispersion of communities. PLoS One, 4, e4390. Tripp EA, Hoagland KE (2013) Typifying an era in biology through synthesis of biodiversity information: achievements and impediments. Taxon, 62, 899–911. de Vere N, Rich T, Ford C et al. (2012) DNA barcoding the native flowering plants and conifers of Wales. PLoS One, 7, 1–e37945. Vernooy R, Haribabu E, Muller M et al. (2010) Barcoding life to conserve biological diversity: beyond the taxonomic imperative. PLoS Biology, 8, e1000417. White PT, McGill B, Lechowicz M (2011) Human-disturbance and caterpillars in managed forest fragments. Biodiversity and Conservation, 20, 1745–1762. © 2014 John Wiley & Sons Ltd Wiens JJ (2003) Missing data, incomplete taxa, and phylogenetic accuracy. Systematic Biology, 52, 528–538. Wiens JJ, Morrill MC (2011) Missing data in phylogenetic analysis: reconciling results from simulations and empirical data. Systematic Biology, 60, 719–731. T.L.E. and T.J.D. conceived the idea and wrote the manuscript; T.L.E. gathered and analysed the data. Data Accessibility Sequences and accompanying metadata are stored on BOLD (process ID’s: VEMSH001-VEMSH662; MSTH00112-MSTH061-12). Supporting Information Additional Supporting Information may be found in the online version of this article: Table S1 Contributors to the WG1.2 Land Plants iBOL Working Group. Table S2 Updated terrestrial vascular plant species list for the Gault Reserve at Mont St. Hilaire, Quebec. Table S3 Time and cost estimates for two regional barcoding efforts in Quebec, Canada.