Download Challenges to barcoding an entire flora

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Community fingerprinting wikipedia , lookup

DNA barcoding wikipedia , lookup

Transcript
Molecular Ecology Resources (2014)
doi: 10.1111/1755-0998.12277
OPINION
Challenges to barcoding an entire flora
TAMMY L. ELLIOTT and T . J O N A T H A N D A V I E S
Department of Biology, McGill University, 1205 Docteur Penfield Avenue, Montreal, Quebec H3A 1B1 Canada
Abstract
DNA barcodes are species-specific genetic markers that allow taxonomic identification of biological samples. The
promise of DNA barcoding as a rapid molecular tool for conducting biodiversity inventories has catalysed renewed
efforts to document and catalogue the diversity of life, parallel to the large-scale sampling conducted by Victorian
naturalists. The unique contribution of DNA barcode data is in its ability to identify biotic material that would be
impossible to classify using traditional taxonomic keys. However, the utility of DNA barcoding relies upon the construction of accurate barcode libraries that provide a reference database to match to unidentified samples. Whilst
there has been much debate in the literature over the choice and efficacy of barcode markers, there has been little
consideration of the practicalities of generating comprehensive barcode reference libraries for species-rich floras.
Here, we discuss several challenges to the generation of such libraries and present a case study from a regional biodiversity hotspot in southern Quebec. We suggest that the key challenges include (i) collection of specimens for rare or
ephemeral species, (ii) limited access to taxonomic expertise necessary for reliable identification of reference specimens and (iii) molecular challenges in amplifying and matching barcode data. To be most effective, we recommend
that sampling must be both flexible and opportunistic and conducted across the entire growing season by expert taxonomists. We emphasize that the success of the global barcoding initiative will depend upon the close collaboration
of taxonomists, plant collectors, and molecular biologists.
Keywords: barcode library, biodiversity survey, hotspot, incomplete sampling, regional flora, taxonomic impediment
Received 5 September 2013; revision received 12 April 2014; accepted 20 April 2014
Introduction
For centuries, plant hunters have travelled the world in
search of specimens. Early collectors, such as Henry
Compton (1632–1713), Hans Sloane (1660–1753), Carl
Linnaeus (1707–1778), and the wave of Victorian naturalists that succeeded them, systematically documented the
diversity of the natural world and populated the world’s
herbaria with their collections. In recent decades, planthunting expeditions have increasingly shifted focus to
collecting specimens for use in medicinal and other scientific studies (Hepper 1989). Following the launch of
the Consortium for the Barcode of Life (CBOL) in May
2004, the International Barcode of Life Project (iBOL) was
established to create a barcode library for all eukaryotic
life (Schindel & Miller 2005; Frezal & Leblois 2008; Vernooy et al. 2010). This new effort more closely parallels
the large-scale sampling conducted by the Victorian naturalists, but in addition to submitting plant species to
herbaria, collectors now also submit genetic data generCorrespondence: Tammy L. Elliott,
E-mail: [email protected]
© 2014 John Wiley & Sons Ltd
Fax:
1-514-398-5069;
ated using new molecular tools to digital repositories
such as The Barcode of Life Data System (BOLD; www.
barcodinglife.org) (Ratnasingham & Hebert 2007).
The iBOL initiative aims to produce biological inventories in the form of DNA barcode libraries that will provide a reference database to aid in the identification of
unidentified or hard to identify taxa (Ratnasingham &
Hebert 2007; Jinbo et al. 2011). Although there were initial fears that DNA barcoding might usurp the role of
traditional taxonomists (e.g. Ebach & Holdrege 2005), it
is now realized that successful barcoding efforts rely on
strong collaborations between ‘barcoders’ and taxonomists (Frezal & Leblois 2008). To create a barcode library,
a site’s flora must first be barcoded. These libraries can
then be used as a cheap and quick method for biodiversity assessments (Krishnamurthy & Francis 2012). Comprehensive, accurate barcode libraries also allow for the
identification of fragmentary samples or morphologically
indistinct life stages (e.g. shoots, eggs or larvae), and the
detection of species in environments when more comprehensive sampling is not practical, such as in biosecurity
(Armstrong & Ball 2005) or for early detection of invasive
2 T. L. ELLIOTT and T.J. DAVIES
species (Dejean et al. 2012). The wealth of molecular
sequence data that is available in such libraries has provided researchers with the opportunity to generate large,
well-resolved regional phylogenies, allowing for more
accurate quantification of community structure and species interactions (e.g. Kress et al. 2009). In addition, barcode reference libraries have been used to identify
ingredients in herbal medicines (e.g. Li et al. 2011b; Han
et al. 2012; Kool et al. 2012) and teas (e.g. Stoeckle et al.
2011) as well as examine the belowground spatial structure of plant communities (e.g. Kesanakurti et al. 2011).
The exact DNA sequences to be used as barcode
markers in plants have been a source of heated debate.
Key features for an effective DNA barcode include universality (loci that can be amplified across taxa with common primers), quality (loci that most likely produce
bidirectional sequences with few or no ambiguous base
calls), discrimination (loci that distinguish the highest
number of species) and cost-effectiveness (Hollingsworth
et al. 2009). The combination of rbcLa (c. 599 base pairs at
50 end of gene) and matK (c. 846 base pairs at centre of
gene) is now widely considered as the DNA barcode for
land plants (Fazekas et al. 2008; Hollingsworth et al.
2011), although other markers (see Hollingsworth et al.
2011), such as the internal transcribed spacer (ITS), have
also been proposed (Li et al. 2011a), and others have been
used to discriminate species in some plant clades (e.g.
Roy et al. 2010). Recent work has explored the efficacy of
these markers in discriminating different species in
regional floras (e.g. Burgess et al. 2011) and the resolution they provide in reconstructing phylogenetic trees of
regional floras (e.g. Kress et al. 2009).
As barcoding efforts have expanded, a major challenge has been to generate DNA barcode libraries for
complete floras. However, to date, there have been few
published studies on such efforts. We surveyed ISI Web
of Science (WoS; accessed 28 March 2013) based on the
following criteria: barcoding; plants; NOT alga; NOT
fungi; NOT insects. This search returned 353 matching
papers, only five of which focused on barcoding regional
floras (Fig. 1), including all the vascular plants of Mt.
Valerio, Italy (Bruni et al. 2012), Churchill, Canada (Kuzmina et al. 2012), and the Koffler Scientific Reserve, Canada (Burgess et al. 2011), as well as the angiosperms and
conifers of Wales, United Kingdom (de Vere et al. 2012).
Other notable studies, from more diverse regions, have
focused on certain plant groups, for example, the woody
trees, shrubs and palms on Barro Colorado Island, Panama (Kress et al. 2009). In part, the paucity of published
studies reflects the relative youthfulness of the field; we
therefore additionally surveyed contributors to the
WG1.2 Land Plants iBOL Working Group. Ten of the 17
respondents described projects focused on barcoding
regional floras at taxonomic scales of angiosperms or
Barcoding methods (15%)
Barcoding
utility (12%)
Barcoding
regional
flora (2%)
Review/
ideas (13%)
Species
discrimination (53%)
Sequence selection (5%)
Fig. 1 Web of Science search results of published papers on barcoding plants (n = 353). Papers were classified as ‘Barcoding
regional flora’ if they focused on the taxonomic scale of angiosperms or higher within a single region. ‘Species discrimination’
papers focused on determining which gene sequences best differentiated plant species within various genera or families,
whereas papers on ‘Sequence selection’ examined the broader
question of selecting the best barcode sequences for land plants.
‘Barcoding methods’ papers either introduced novel methods
that could be used in future barcoding research or presented
methods to improve existing protocols. ‘Barcoding utility’
papers provided ideas or examples of new plant barcoding
applications. ‘Review/idea’ papers were literature reviews or
viewpoints on plant barcoding research. In addition, a request
of information from contributors to the BOLD Land Plants campaign yielded 10 unpublished projects working on barcoding
regional floras.
higher (Table S1, Supporting information). This estimate
likely represents only a fraction of ongoing projects as
the release of information by BOLD contributors was voluntary, and it is possible that there are other barcoding
studies that do not use BOLD.
These large barcoding efforts pose many significant
practical challenges, some of which have been largely
ignored in the recent literature, including (i) difficulties
in collecting specimens for rare or ephemeral species; (ii)
paucity of or limited access to sufficient taxonomic
expertise and resources to obtain reliable, up-to-date
identifications of samples used to build the barcode reference library; and (iii) molecular challenges of barcoding (e.g. difficulties in amplification of samples spanning
a wide taxonomic spectrum and distinguishing between
closely related species). Even though many of these challenges are not novel, the first two have tended to be overlooked in barcoding literature in comparison with the
emphasis that has been placed on the third challenge –
© 2014 John Wiley & Sons Ltd
BARCODING REGIONAL FLORAS 3
molecular methods and the performance of barcodes for
species identification (e.g. Lahaye et al. 2008; Burgess
et al. 2011). Here, we attempt to redress this balance and
focus on the first two challenges. Specifically, we discuss
the practicalities of generating DNA barcode libraries for
complete floras. We illustrate these challenges using a
case study of the flora of the Gault Nature Reserve on
Mont Saint Hilaire (MSH), a biosphere reserve in the
Monteregie region of southern Quebec.
Barcoding a complete flora – Mont St. Hilaire as a
case study
In 2012, we established an initiative to DNA barcode the
entire flora of the Gault Nature Reserve, a small reserve
approximately 30 km from the major urban centre of
Montreal, Quebec, Canada. The flora of the Gault reserve
is well-documented with a nearly complete list of vascular plants available for the site (see Maycock 1961; Table
S2, Supporting information) as well as the presence of
herbarium specimens for a majority of the reserve’s species in either the McGill University Herbarium (MTMG)
or the Marie-Victorin Herbarium (MT). The reserve is situated on Mont St. Hilaire, one of the eight Monteregian
Hills of the St. Laurence Lowlands in southern Quebec
(Maycock 1961). The old growth forest of the reserve is
dominated by the deciduous tree species Acer saccharum,
Fagus grandifolia and Quercus rubra (Gilbert & Lechowicz
2004). Protection of this 1000 ha reserve goes back until
the 1600s, and it is now recognized as a UNESCO Biosphere Reserve (Maycock 1961; Francis 2004; White et al.
2011). The reserve contains approximately 650 vascular
plant species (Maycock 1961; Table S2, Supporting information), several of which are considered rare. Barcoding
this flora provides an illustration of the more general
(and generally more extreme) challenges to barcoding
complete floras. The reserve is relatively small and easily
accessible, and although the area is a regional diversity hotspot, species richness is relatively low compared with, for
example, tropical floras of similar areal extent (Myers 1988).
Over approximately 384 and 235 sampling hours during the 2012 and 2013 respective field seasons, 582 different terrestrial vascular species were collected with
sampling focused on collecting one individual per species, except in cases where there were uncertainties in
field identifications for which multiple samples were
gathered. We found 111 species not previously reported
in the reserve and suspect that changing distributions in
response to climate change (Parmesan 2006; Kelly &
Goulden 2008), exotic species moving in with disturbance (Gilbert & Lechowicz 2005) and differing taxonomic skill sets of the various surveyors (Oredsoon 2000;
Chen et al. 2009) might all contribute to explaining this
increase in number. We also think that the previously
© 2014 John Wiley & Sons Ltd
documented species that we did not find might have
been extirpated because of the impacts of climate change
and increased deer browsing (C^
ote et al. 2004; Gilbert &
Lechowicz 2005). In addition, we identified several discrepancies in the list used to guide collection efforts,
leading to an inflated species total from past years
(Fig. 2). Of the 787 vascular species previously recorded,
143 were removed from the list because they were synonyms and 21 species names were duplicated due to
typographical errors. Thus, our sample represents an
impressive 84% of the revised species list, contributing
133 rbcL and 126 matK new sequences for species not previously represented through the BOLD public data portal as of December 2013. Sampling in future years will
target the remaining 16%, but it is impossible to verify
how many of the species on this list were originally misidentified as only a handful have matching voucher specimens (e.g. Maycock 1961) that we can cross-check.
The flora of Mont St. Hilaire is likely much smaller
than that targeted by regional barcoding efforts in tropical or larger areas; nonetheless, our efforts identified a
number of key challenges to creating a comprehensive
DNA barcode library based on sampling one individual
per vascular plant species that will apply more generally
(and likely to a greater degree). Many of these challenges
arise from the time constraints imposed by the shortterm nature of most research projects, which are often
determined by funding cycles and the demands of
Typographical errors (2%)
Synonyms
(17%)
New species to
list (12%)
Not
collected
(16%)
Collected
(52%)
Fig. 2 Results of 2012 and 2013 sampling efforts of the vascular
plant species on the Gault Nature Reserve, Quebec, based on a
pre-existing species list of the reserve (Gault Nature Reserve,
unpublished data). The original species list contained several
‘typographical errors’ and ‘synonyms’. ‘New species to list’
were taxa not included in the original 2012 list. Of the remaining
taxa on the list, ‘collected’ species were those that were collected
during the 2012 and 2013 field seasons, whereas ‘not collected’
were species that were not found during the two field seasons.
4 T. L. ELLIOTT and T.J. DAVIES
thoroughly sampling all of the species within a site’s
boundaries during short growing seasons. These challenges parallel all such efforts aimed at conducting comprehensive biodiversity surveys (Palmer et al. 2002; Chen
et al. 2009), but generating a barcode library imposes
additional demands.
Simply creating complete regional floristic inventories, even for relatively small floras such as Mont St.
Hilaire, is challenging and will considerable effort, which
is a substantial impediment, given current costs and
budgeting constraints (Pautasso 2012). In addition, rare
species will take more time to find, whilst common species can be quick and easy to sample (Preston 1948). For
example, we collected a high number of species in a relatively short period of time during the 2012 field season
because sampling targeted accessible areas with high
species numbers (Fig. 3). However, this opportunistic
sampling strategy reduced the amount of time spent
searching for rarer species, which can be more difficult
to find, illustrating diminishing returns to the amount of
effort invested. During the 2013 field season, more targeted sampling based on previous location reports for
rarer species was generally aimed at more remote locations within the reserve; however, this strategy required
much larger effort but yielded fewer new species as these
sites were often species poor (Fig. 4). Low population
numbers of rare species can also present a challenge as it
is not ethical to collect when sampling might impact
population viability; thus, workers must spend valuable
time searching for alternative sources of plant material.
Differences in phenology can also present major challenges and limit the utility of short, intensive sampling
efforts or ‘bioblitzes’ (a term originally used by Susan
Ruby of U.S. National Park Service in 1996; http://www.
biodiversityonline.ca/BioBlitz/intro.htm), as the collection of some plant species must be conducted during a
brief window of time during which traits important for
species identification are expressed as fruits and/or flowers (Miller & Nyberg 1995; Leponce et al. 2010). For
example, the number of plant species collected during
the field season at Mont St. Hilaire fluctuated as high
numbers of spring ephemerals were collected whilst
flowering early in the growing season, which made them
easiest to identify, but species numbers dropped off as
the summer progressed (Fig. 4). Combining phenological
differences with the logistics of getting to remote locations presents an even larger challenge to comprehensive
sampling. Within Mont St. Hilaire, several rare species
grow in difficult-to-access areas (Maycock 1961), including several locations that were impossible to access due
to dangerous physical obstacles such as steep cliffs. In a
few cases, several time-consuming return visits to these
remote areas were required because of differences in the
phenology of rare species found at these sites.
Assuming sufficient time and funds exist for adequate
collection, there remains a major challenge in correctly
identifying species. First, even experts might overlook
species if the material that is present at a site is not in an
identifiable form for at least part of the year (see above).
3.75 km
4.05 km
Fig. 3 Location of sampling points for plant specimens from
Mont St. Hilaire, Quebec, during the 2012 and 2013 field seasons. The reserve boundaries have been extended to include the
point in the top right-hand side of the figure. Sampling locations
for plants collected during the 2012 field season are indicated by
stars, whereas plants collected during the 2013 field season are
indicated by circles.
Number of unique species collected
160
140
2012
120
2013
100
80
60
40
20
0
April
May
June
July
August September October
Month
Fig. 4 Number of terrestrial plant species collected per month
during the 2012 and 2013 field seasons at Mont St. Hilaire, Quebec
(note: collectors were not available to go to the field in July 2013).
© 2014 John Wiley & Sons Ltd
BARCODING REGIONAL FLORAS 5
Second, species within genera that are the product of
rapid radiations can represent suites of morphologically
similar taxa that are difficult to distinguish either in the
field or herbarium. In cases of taxonomic uncertainty,
more time must then be spent collecting extra specimens
of these taxa so that species that closely resemble each
other are not missed, in addition to the extra time
required to collect, process and send off material to taxonomic experts. Collection of specimens at the Gault Nature Reserve was conducted by experienced botanists;
nonetheless, several genera such as the Carex, Dichanthelium, Crataegus, Salix and Amelanchier were problematic
to distinguish in the field because of the presence of
hybrids, agamic complexes and difficult-to-distinguish
species characteristics. Correct identification of such
specimens required reference to multiple resources and
referral to specialist taxonomists in some cases, as has
also been the case in other regional barcoding studies
(e.g. Kuzmina et al. 2012). In general, the decline in taxonomists and limited access to herbarium resources and
updated taxonomic treatments present a major challenge
to accurate species identification – the taxonomic impediment (Hoagland 1996; Rodman & Cody 2003). Although
these challenges might be familiar to most biodiversity
surveys, the unique approach of DNA barcoding presents additional challenges related to species identification. Well-constructed barcode libraries allow end-users
to identify plant material to species or genus based on
tissue fragments (e.g. a single leaf, root or stem) that
would be impossible to identify using traditional taxonomic keys. However, it is not possible for end-users to
detect errors in the original species identification because
barcode matches are based only on sequence data and
not morphology, although digital images and herbarium
specimens can be revisited in the future to verify identifications. The correct identification of reference specimens
is therefore paramount (Collins & Cruickshank 2013).
Once plants have been collected, voucher specimens
must be accessioned into a herbarium as required by
BOLD. Mounting, labelling and accessioning our 680
specimens were an additional challenge to barcoding.
Perhaps most surprisingly, time and cost estimates based
on our experiences at the Gault Nature Reserve (Table
S3, Supporting information) suggest that over one-third
of the time required to complete floristic barcoding studies should be allocated to data organization and processing voucher specimens. Further, our experience
conducting botanical surveys near Schefferville, in northern Quebec, suggests these same two steps might still
require a similar proportional time commitment when
preparing barcode libraries for more remote areas (Table
S3, Supporting information). Paid and dedicated staff
would be necessary to process samples for larger collection efforts.
© 2014 John Wiley & Sons Ltd
Sequencing success is not guaranteed once specimens
have been collected and vouchered. The choice of DNA
barcodes has been a trade-off between factors such as
universality (i.e. ability to amplify all species) and resolution (i.e. the ability to differentiate between species) (Hollingsworth et al. 2009). In plants, rbcL has proven
relatively straightforward to amplify for most clades but
provides only limited resolution, particularly among closely related taxa (Hollingsworth et al. 2009). In contrast,
matK offers greater resolution, but the proposed primer
sets perform poorly in nonangiosperm plant groups such
as the gymnosperms and ferns (Hollingsworth et al.
2011; Li et al. 2011c), although alternative primer combinations are now available (Li et al. 2011c). We do not discuss further the details of marker choice here as the
selection of DNA barcode markers has already been
debated extensively in the literature (e.g. Kress et al.
2005; Rubinoff et al. 2006; Kress & Erickson 2007; Pennisi
2007; Hollingsworth et al. 2011); however, additional
challenges can arise postsequencing.
Once barcode sequences are available, additional
complexities must be considered before using barcode
reference libraries to assign identifications to unknown
plant material (e.g. Kesanakurti et al. 2011; Li et al.
2011b). First, the genetic distance differentiating species
must be decided upon if cryptic species are suspected
(Meyer & Paulay 2005; Collins & Cruickshank 2013). The
answer to this is likely taxon and gene specific, and there
is probably no correct answer in some circumstances
(Hollingsworth et al. 2011), although sequences can be
simply grouped with the closest matching barcode if a
library is assumed ‘complete’ (Hajibabaei et al. 2007;
Kesanakurti et al. 2011). Whilst species resolution is generally higher in regionally focused studies compared
with those focused on particular taxonomic groups (e.g.
Le Clerc-Blain et al. 2010; Roy et al. 2010; Burgess et al.
2011), even within the flora of Mont St. Hilaire, we suspect the identification of several genera such as Salix and
Carex might be problematic, as these genera have previously been reported to be difficult to distinguish using
DNA barcodes (Starr et al. 2009; von Cr€
autlein et al.
2011), and multiple species within each genus are found
in the regional flora. A second (and related) question is
how to decide the appropriate sampling density to get
estimates of within-species variation in sequences (Meyer
& Paulay 2005; Bergsten et al. 2012). One approach
would be to maximize the geographical distance between
samples (Bergsten et al. 2012). However, as the number
of individuals required would likely vary by species and
perhaps barcode region, there again is probably no single
answer. The effort required for the implementation of
this strategy would in any case likely be prohibitive for
sampling species-rich floras as many individuals per species would have to be sampled to get an appropriate esti-
6 T. L. ELLIOTT and T.J. DAVIES
mate. Finally, users of barcode data must decide how to
interpret results if different genes give conflicting
answers (Li et al. 2011a). Common barcoding genes in
plants are perhaps unlikely to give conflicting answers
because they are from the chloroplast (Soltis et al. 1998);
nonetheless, neutral evolution might still result in apparent conflict and chloroplast vs. nuclear genes might still
show incongruence (Rieseberg & Soltis 1991). Possible
solutions include adding sequence data and/or individuals or to explore neutral vs. non-neutral mutations separately. We are not aware of any review of this issue as it
applies to barcoding, although it has been widely discussed in the broader phylogenetics literature (Pamilo &
Nei 1988; Degnan & Rosenberg 2009). These are all
important questions that must be addressed in the barcoding literature.
Consequences of incomplete sampling
Future conservation and research programmes will have
to take into consideration the reality that sampling
efforts will probably not yield comprehensive floristic
inventories, and thus, barcode libraries will often be
incomplete. Of concern, the taxa most likely to be missed
are rare species (e.g. orchids on MSH), which might lack
barcodes and could be of highest conservation priority.
Limited sampling or misidentification of closely related
species could also impact efforts to generate phylogenetic trees using barcode data. It is possible to generate
phylogenetic hypotheses from partial or gappy molecular matrices, at least for those taxa with some barcode
data (Wiens 2003; Wiens & Morrill 2011); however, missing sequence data might increase phylogenetic uncertainty, resulting in unresolved nodes (i.e. polytomies).
The consequences of this lack of phylogenetic resolution
will depend on the questions to which the phylogenetic
hypotheses are applied. For example, testing fine-scale
patterns in species co-occurrence would require more
well-resolved phylogenetic trees at the species level (e.g.
Slingsby & Verboom 2006; Araya et al. 2012). In addition,
metrics of phylogenetic community structure differ in
their sensitivity to phylogenetic resolution (Swenson
2009). Species-level resolution might be more important
for capturing interspecific interactions, such as competition and facilitation. Considerations such as these are
important for determining whether it is worth investing
the extra time and resources to retrieve complete barcode
sequences for all species.
Lessons learned: suggestions for future studies
Based on our experiences, we suggest several ways forward to more effectively plan and execute regional barcoding efforts. First, herbaria, species inventories, local
experts and the Global Biodiversity Information Facility
(GBIF; www.gbif.org) can all provide possible locations
for species that are hard to find, taking into account possible errors in identification and synonyms. Other suggestions are to use online resources such as Project
BudBurst (www.budburst.org) and review herbarium
specimens to design field sampling around times when
plants are in flowering or fruiting stage. The current
year’s climatic conditions will have to be considered
when interpreting the data available through such
sources, and the resolution of GBIF location data might
not help in the location of rare species. A targeted sampling approach could focus sampling on areas where rare
species have been observed in the past, but to be effective, this method requires precise GPS co-ordinates or
detailed, accurate information on herbarium labels for
previously observed species. This strategy is of course
only applicable when targets are known and their populations are not transient. Second, methods such as accumulation and rarefaction curves can be used to provide
estimates of potential gains in species numbers provided
by additional sampling, although these approaches do
not give an absolute answer to the number of species in
the area (Gotelli & Colwell 2010). Third, we suggest to be
prepared to sample more than one field season by having
a sampling plan that focuses on collecting the most common species the first year and targets rare species during
subsequent field seasons. Follow-up grants can be used
to support continued sampling in subsequent years, but
diminishing returns of species numbers relative to the
amount of time invested suggest that continued full-time
sampling would yield only a few substantial new results.
We therefore recommend to maintain continuous sampling at a site by establishing strong long-term collaborations with other researchers at the site who might also
benefit from the publically available barcode data.
One alternative to reduce the time and costs spent
supporting field work is to sequence DNA material from
herbarium specimens, although DNA from such material
can be degraded, inhibiting amplification of barcode
markers. Nonetheless, herbarium specimens are useful
for sampling very small populations of rare species that
can be documented but not sampled because of their
conservation status. Whilst herbaria should be part of
the global barcoding initiative, they cannot, however,
replace field sampling, especially where the flora for a
region is incompletely described. Critically, field collection allows for the discovery of new species and can document shifting species distributions with climate change
that would be missed by reviewing herbarium specimens for a region. During our study, 12% of the species
on our final list were new records for Mont St. Hilaire
(Fig. 2) and would have been omitted if we had relied
only on herbarium specimens.
© 2014 John Wiley & Sons Ltd
BARCODING REGIONAL FLORAS 7
Limited access to taxonomic expertise and resources
can also hamper regional barcoding efforts; therefore, it
is essential to collaborate with taxonomists as well as
work with parataxonomists and herbarium resources
when possible. Ideally, strong collaborations with taxonomists should be made before gathering data. Taxonomists are a key component to producing regional
barcode libraries and can train the field collectors and
notify them of problematic species before the field season
and as such should be compensated appropriately for
their time and effort (Tripp & Hoagland 2013). Parataxonomists can provide an additional resource and may
help encourage local participation in regional barcoding
efforts (Janzen 2004; Abadie et al. 2008; Janzen & Hallwachs 2011), but they are not appropriate for all study systems. We suggest using herbaria as much as possible as
they provide many resources crucial in species identification; however, extra time and money might be required
for workers in areas without local herbaria. Finally, further taxonomic confusion could be avoided by having
data repositories such as BOLD recommend common databases and identification resources for attributing names
to species.
Comprehensively sampling entire areas to create barcode libraries will require large-scale field work incurring major financial costs. Funding such scientific
endeavours is a socio-political issue that society needs to
address and is largely beyond the scope of this paper.
One possibility is to incorporate regional barcoding
efforts into current research projects; however, the shortterm nature of grant funding cycles and the considerable
time commitment required to compile barcode libraries
may provide a disincentive. The unfortunate reality is
that adequately sampling large areas, such as northern
Quebec, will be expensive and might not be possible
under current funding structures. As recent funding cutbacks have already ended the era of free barcode
sequencing funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) and other
major granting agencies through BOLD, we might thus
predict a decrease in the volume of barcode data being
produced in the near future. This decline will be a double blow as we will not only see a decrease in the rate of
data acquisition, but also because the value of existing
barcode data increases as the database grows.
We have shown that even for a relatively small, easily
accessible and well-sampled site with a flora of only ~650
vascular plant species, comprehensive taxonomic sampling to create a barcode library is a formidable
challenge. We have identified a number of specific challenges to barcoding a regional flora that are sometimes
overlooked in the literature, and we suggest some possible ways forward. In addition, we emphasize some of
the particular benefits that may be gained from efforts to
© 2014 John Wiley & Sons Ltd
barcode all of the plant species at a site, such as documenting changes in floristic composition and contributing valuable new specimens to herbarium collections.
However, regardless of the effort put into collecting
specimens in the field and advances in sequencing performance, the information provided by DNA barcodes
will not be valuable to science unless the original specimens are correctly identified, and/or voucher specimens
are available such that identifications can be revised
later. The iBOL initiative aims to provide comprehensive
DNA barcode libraries that can be used in biodiversity
research, conservation and forensics, by allowing endusers to identify biological material through matching
barcode sequences. In many cases, this material might be
fragmentary or in various developmental stages that are
difficult or impossible to identify using traditional taxonomic keys (Jinbo et al. 2011). Correct identification of
barcoded specimens relies on resources produced by
expert taxonomists, such as updated taxonomic keys,
identification services and correctly identified sequences
on GenBank. Thus, the success of the iBOL initiative will
rely upon professionally trained taxonomists working
together with plant collectors and molecular biologists to
generate accurate barcode libraries.
Acknowledgements
We would like to thank Nadia Cavallin and the staff at the Gault
Nature Reserve for their help with sampling. In addition, we
thank the African Centre for DNA Barcoding (ACDB) and the
Quebec Centre for Biodiversity Science (QCBS) for supporting
this work. Gratitude is also extended to the respondents of our
survey of land plant barcoding contributors to BOLD. Funding
for T.L.E. was provided by the Natural Sciences and Engineering Research Council of Canada (NSERC).
References
Abadie JC, Andrade C, Machon N, Porcher E (2008) On the use of parataxonomy in biodiversity monitoring: a case study on wild flora. Biodiversity and Conservation, 17, 3485–3500.
Araya YN, Silvertown J, Gowing DJ, McConway KJ, Linder HP, Midgley
G (2012) Do niche-structured plant communities exhibit phylogenetic
conservatism? A test case in an endemic clade. Journal of Ecology, 100,
1434–1439.
Armstrong KF, Ball SL (2005) DNA barcodes for biodiversity: invasive
species identification. Philosophical Transactions of the Royal Society B:
Biological Sciences, 360, 1813–1823.
Bergsten J, Bilton DT, Fujisawa T et al. (2012) The effect of geographical
scale of sampling on DNA barcoding. Systematic Biology, 61, 851–869.
Bruni I, De Mattia F, Martellos S et al. (2012) DNA barcoding as an effective tool in improving a digital plant identification system: a case study
for the area of Mt. Valerio, Trieste (NE Italy). PLoS One, 7, e43256.
Burgess KS, Fazekas AJ, Kesanakurti PR et al. (2011) Discriminating plant
species in a local temperate flora using the rbcL+ matK DNA barcode.
Methods in Ecology and Evolution, 2, 333–340.
Chen G, Kery M, Zhang J, Ma K (2009) Factors affecting detection probability in plant distribution studies. Journal of Ecology, 97, 1383–1389.
8 T. L. ELLIOTT and T.J. DAVIES
Collins RA, Cruickshank RH (2013) The seven deadly sins of DNA barcoding. Molecular Ecology Resources, 13, 969–975.
C^
ote SD, Rooney TP, Tremblay JP, Dussault C, Waller DM (2004) Ecological impacts of deer overabundance. Annual Review of Ecology, Evolution,
and Systematics, 35, 113–147.
von Cr€
autlein M, Korpelainen H, Pietil€ainen M, Rikkinen J (2011) DNA
barcoding: a tool for improved taxon identification and detection of
species diversity. Biodiversity and Conservation, 20, 373–389.
Degnan JH, Rosenberg NA (2009) Gene tree discordance, phylogenetic
inference and the multispecies coalescent. Trends in Ecology & Evolution, 24, 332–340.
Dejean T, Valentini A, Miquel C, Taberlet P, Bellemain E, Miaud C (2012)
Improved detection of an alient invasive species through environmental DNA barcoding: the example of the American bullfrog Lithobates
catesbeianus. Journal of Applied Ecology, 49, 953–959.
Ebach MC, Holdrege C (2005) DNA barcoding is no substitute for taxonomy. Nature, 434, 697.
Fazekas AJ, Burgess KS, Kesanakurti PR et al. (2008) Multiple multilocus
DNA barcodes from the plastid genome discriminate plant species
equally well. PLoS One, 3, e2802.
Francis G (2004) Biosphere reserves in Canada: ideals and some experience. Environments: A Journal of Interdisciplinary Studies, 32, 1–26.
Frezal L, Leblois R (2008) Four years of DNA barcoding: current
advances and prospects. Infection, Genetics and Evolution, 8, 727–736.
Gilbert B, Lechowicz MJ (2004) Neutrality, niches, and dispersal in a temperate forest understory. Proceedings of the National Academy of Sciences
of the United States of America, 101, 7651–7656.
Gilbert B, Lechowicz MJ (2005) Invasibility and abiotic gradients: the
positive correlation between native and exotic plant diversity. Ecology,
86, 1848–1855.
Gotelli NJ, Colwell RK (2010) Estimating species richness. In: Biological
Diversity: Frontiers in Measurement and Assessment (eds Magurran AE,
McGill BJ), pp. 39–54. Oxford University Press, Oxford.
Hajibabaei M, Singer GAC, Hebert PDN, Hickey DA (2007) DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics. Trends in Genetics, 23, 167–172.
Han J, Shi L, Chen X, Lin Y (2012) Comparison of four DNA barcodes in
identifying certain medicinal plants of Lamiaceae. Journal of Systematics
and Evolution, 50, 227–234.
Hepper FN (1989) Plant Hunting for Kew. HMSO Publications Centre,
London.
Hoagland K (1996) The taxonomic impediment and the convention on
biodiversity. Association of Systematics Collections Newsletter, 24, 61–62.
Hollingsworth PM, Forrest LL, Spouge JL et al. (2009) A DNA barcode
for land plants. Proceedings of the National Academy of Sciences, 106,
12794–12797.
Hollingsworth PM, Graham SW, Little DP (2011) Choosing and using a
plant DNA barcode. PLoS One, 6, e19254.
Janzen DH (2004) Setting up tropical biodiversity for conservation
through non-damaging use: participation by parataxonomists. Journal
of Applied Ecology, 41, 181–187.
Janzen DH, Hallwachs W (2011) Joining inventory by parataxonomists
with DNA barcoding of a large complex tropical conserved wildland
in northwestern Costa Rica. PLoS One, 6, e18123.
Jinbo U, Kato T, Ito M (2011) Current progress in DNA barcoding and
future implications for entomology. Entomological Science, 14, 107–124.
Kelly AE, Goulden ML (2008) Rapid shifts in plant distribution with
recent climate change. Proceedings of the National Academy of Sciences,
105, 11823–11826.
Kesanakurti PR, Fazekas AJ, Burgess KS et al. (2011) Spatial patterns of
plant diversity below-ground as revealed by DNA barcoding. Molecular Ecology, 20, 1289–1302.
et al. (2012) Molecular identification of
Kool A, de Boer HJ, Kr€
uger A
commercialized medicinal plants in Southern Morocco. PLoS One, 7,
e39459.
Kress WJ, Erickson DL (2007) A two-locus global DNA barcode for land
plants: the coding rbcL gene complements the non-coding trnH-psbA
spacer region. PLoS One, 2, e508.
Kress WJ, Wurdack KJ, Zimmer EA, Weigt LA, Janzen DH (2005) Use
of DNA barcodes to identify flowering plants. Proceedings of the
National Academy of Sciences of the United States of America, 102,
8369–8374.
Kress WJ, Erickson DL, Jones FA et al. (2009) Plant DNA barcodes and a
community phylogeny of a tropical forest dynamics plot in Panama.
Proceedings of the National Academy of Sciences, 106, 18621–18626.
Krishnamurthy PK, Francis RA (2012) A critical review on the utility of
DNA barcoding in biodiversity conservation. Biodiversity and Conservation, 21, 1901–1919.
Kuzmina ML, Johnson KL, Barron HR, Hebert PDN (2012) Identification
of the vascular plants of Churchill, Manitoba, using a DNA barcode
library. BMC Ecology, 12, 25–35.
Lahaye R, Van Der Bank M, Bogarin D et al. (2008) DNA barcoding the
floras of biodiversity hotspots. Proceedings of the National Academy of
Sciences, 105, 2923–2928.
Le Clerc-Blain J, Starr JR, Bull RD, Saarela JM (2010) A regional approach
to plant DNA barcoding provides high species resolution of sedges
(Carex and Kobresia, Cyperaceae) in the Canadian Arctic Archipelago.
Molecular Ecology Resources, 10, 69–91.
Leponce M, Meyer C, Hauser C et al. (2010) Collecting plant genetic
diversity: technical guidelines. In: Manual on Field Recording Techniques
and Protocols for All Taxa Biodiversity Inventories. ABC Taxa (eds Eymann
J, Degreef J, Hauser C et al.), pp. 19–49. Belgium Development Cooperation, Brussels, Belgium.
Li DZ, Gao LM, Li HT et al. (2011a) Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants. Proceedings of the National
Academy of Sciences, 108, 19641–19646.
Li M, Cao H, But PPH, Shaw PC (2011b) Identification of herbal medicinal materials using DNA barcodes. Journal of Systematics and Evolution,
49, 271–283.
Li Y, Gao LM, Poudel RC, Li DZ, Forrest A (2011c) High universality of
matK primers for barcoding gymnosperms. Journal of Systematics and
Evolution, 49, 169–175.
Maycock PF (1961) Botanical studies on Mont St. Hilaire, Rouville
County, Quebec. I – General description of the area and a floristic survey. Canadian Journal of Botany, 39, 1293–1325.
Meyer CP, Paulay G (2005) DNA barcoding: error rates based on comprehensive sampling. PLoS Biology, 3, e422.
Miller AG, Nyberg JA (1995) Collecting herbarium vouchers. In: Collecting Plant Genetic Diversity: Technical Guidelines (eds Guarino L, Rao VR,
Reid R), pp. 561–573. CAB International, Wallingford.
Myers N (1988) Threatened biotas: “hot spots” in tropical forests. The
Environmentalist, 8, 187–208.
Oredsoon A (2000) Choice of surveyor is vital to the reliability of floristic
change studies. Watsonia, 23, 287–292.
Palmer MW, Earls PG, Hoagland BW, White PS, Wohlgenmuth T (2002)
Quantitative tools for perfecting species lists. Environmetrics, 13, 121–
137.
Pamilo P, Nei M (1988) Relationships between gene trees and species
trees. Molecular Biology and Evolution, 5, 568–583.
Parmesan C (2006) Ecological and evolutionary responses to recent climate change. Annual Review of Ecology, Evolution, and Systematics, 37,
637–669.
Pautasso M (2012) Challenges in the conservation and sustainable use of
genetic resources. Biology Letters, 8, 321–323.
Pennisi E (2007) Wanted: a barcode for plants. Science, 318, 190–191.
Preston FW (1948) The commonness, and rarity, of species. Ecology, 29,
254–283.
Ratnasingham S, Hebert PDN (2007) BOLD: The Barcode of Life Data
System (http://www.barcodinglife.org). Molecular Ecology Notes, 7,
355–364.
Rieseberg LH, Soltis D (1991) Phylogenetic consequences of cytoplasmic
gene flow in plants. Evolutionary Trends in Plants, 5, 65–84.
Rodman JE, Cody JH (2003) The taxonomic impediment overcome: NSF’s
Partnerships for Enhancing Expertise in Taxonomy (PEET) as a model.
Systematic Biology, 52, 428–435.
© 2014 John Wiley & Sons Ltd
BARCODING REGIONAL FLORAS 9
Roy S, Tyagi A, Shukla V et al. (2010) Universal plant DNA barcode loci
may not work in complex groups: a case study with Indian Berberis
species. PLoS One, 5, e13674.
Rubinoff D, Cameron S, Will K (2006) Are plant DNA barcodes a search
for the Holy Grail? Trends in Ecology & Evolution, 21, 1–2.
Schindel D, Miller SE (2005) DNA barcoding a useful tool for taxonomists. Nature, 435, 17.
Slingsby JA, Verboom GA (2006) Phylogenetic relatedness limits cooccurrence at fine spatial scales: evidence from the schoenoid sedges
(Cyperaceae: Schoeneae) of the Cape Floristic Region, South Africa.
The American Naturalist, 168, 14–27.
Soltis D, Soltis PS, Doyle JJ (1998) Molecular Systematics of Plants II: DNA
Sequencing. Klewer Academic Publishers, Norwell, MA.
Starr JR, Naczi RFC, Chouinard BN (2009) Plant DNA barcodes and species resolution in sedges (Carex, Cyperaceae). Molecular Ecology
Resources, 9, 151–163.
Stoeckle MY, Gamble CC, Kirpekar R et al. (2011) Commercial teas highlight plant DNA barcode identification successes and obstacles. Scientific Reports, 1, 1–7.
Swenson NG (2009) Phylogenetic resolution and quantifying the phylogenetic diversity and dispersion of communities. PLoS One, 4,
e4390.
Tripp EA, Hoagland KE (2013) Typifying an era in biology through synthesis of biodiversity information: achievements and impediments.
Taxon, 62, 899–911.
de Vere N, Rich T, Ford C et al. (2012) DNA barcoding the native flowering plants and conifers of Wales. PLoS One, 7, 1–e37945.
Vernooy R, Haribabu E, Muller M et al. (2010) Barcoding life to conserve
biological diversity: beyond the taxonomic imperative. PLoS Biology, 8,
e1000417.
White PT, McGill B, Lechowicz M (2011) Human-disturbance and caterpillars in managed forest fragments. Biodiversity and Conservation, 20,
1745–1762.
© 2014 John Wiley & Sons Ltd
Wiens JJ (2003) Missing data, incomplete taxa, and phylogenetic accuracy. Systematic Biology, 52, 528–538.
Wiens JJ, Morrill MC (2011) Missing data in phylogenetic analysis: reconciling results from simulations and empirical data. Systematic Biology,
60, 719–731.
T.L.E. and T.J.D. conceived the idea and wrote the manuscript; T.L.E. gathered and analysed the data.
Data Accessibility
Sequences and accompanying metadata are stored on
BOLD (process ID’s: VEMSH001-VEMSH662; MSTH00112-MSTH061-12).
Supporting Information
Additional Supporting Information may be found in the online
version of this article:
Table S1 Contributors to the WG1.2 Land Plants iBOL Working
Group.
Table S2 Updated terrestrial vascular plant species list for the
Gault Reserve at Mont St. Hilaire, Quebec.
Table S3 Time and cost estimates for two regional barcoding
efforts in Quebec, Canada.