Download Molecular-aided identification of woody plants in a tropical forest of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Point mutation wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

DNA damage theory of aging wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Epigenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

DNA vaccination wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Molecular cloning wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Genome editing wikipedia , lookup

DNA supercoil wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Microsatellite wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

Genomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Koinophilia wikipedia , lookup

Microevolution wikipedia , lookup

Helitron (biology) wikipedia , lookup

Metagenomics wikipedia , lookup

DNA barcoding wikipedia , lookup

Transcript
Gonzalez et al.
1
1
2
Supporting Information S1. Additional information on sequence clustering methods
3
4
Non-parametric clustering
5
6
To cluster DNA sequences, the first method we used is a non-parametric coalescent-
7
based approach (Pons et al. 2006). It assumes that intraspecific and interspecific gene
8
genealogies have different statistical properties and that they may be modeled differently
9
(the former by a Yule model, the latter by a neutral coalescent, see also Nielsen & Matz
10
2006). It detects species clusters in the tree, which correspond to the evolutionary
11
boundary among species (Pons et al. 2006). In our case, the non-parametric clustering
12
worked well only for the rpoC1 marker, for which we had typically several samples per
13
species. For rpoC1, 209 clusters were obtained, close to the real value of 198 taxa. In
14
existing proposals for the construction of DNA barcoding reference databases, each
15
species should have at least three representatives. Hence, this algorithm may yield more
16
consistent results than the ones obtained here. However, for tropical plants, obtaining
17
three representatives per species represents a formidable logistical challenge. We also
18
used Munch et al. (2009) algorithm, but had trouble at the compilation stage.
19
20
Alignment-based parametric clustering
21
22
In addition to TaxonDNA (see Main Text), we also tested DOTUR, a popular distance-
23
based agglomerative clustering algorithm, developed initially for delimiting microbial OTUs
24
based on 16S rDNA sequences (Schloss & Handelsman 2005). Comparing the accuracy
25
of assignment into MOTUs, we found that DOTUR had a poor performance for all the
Gonzalez et al.
2
26
markers. In addition, DOTUR could not be implemented in the most variable markers
27
(psbA-trnH and ITS). The error in assignment rate (either by incorrectly lumping two
28
species or splitting one species) was very high with this method. We believe that this is
29
because DOTUR cannot handle sequence distance matrix including high pairwise
30
distances. For these reasons, we do not recommend the use of DOTUR in routine DNA
31
barcoding projects.
32
33
Alignment-free parametric clustering
34
35
Alignment-free algorithms were tested using the blastclust software, which clusters
36
unaligned sequences using a single-linkage clustering algorithm based on megablast
37
similarity scores (part of the blast package version 2.2.20 downloaded from
38
ftp://ftp.ncbi.nih.gov/blast/executables/release/). The blastclust algorithm (part of the blast
39
package version 2.2.20 downloaded from ftp://ftp.ncbi.nih.gov/blast/executables/release/)
40
is similar to previously developed software (Parkinson et al. 2002; Blaxter et al. 2005). It
41
showed a good clustering performance (see Main Text). In both parametric algorithms, we
42
assumed that threshold sequence divergences range from 0.001 to 0.05. Table S3
43
provides a comparison between TaxonDNA and blastclust.
44
45
In addition to blastclust, we also tested FastgroupII, a software used for clustering 16S
46
rDNA sequences (Yu et al. 2006, sequence match option), and freely available online
47
(http://biome.sdsu.edu/fastgroup/). FastGroupII usually performed slightly worse than
48
blastclust (mean correct rate of assignment of 62% versus 65.5%). In addition, the
49
pairwise matching algorithm used by Yu et al. (2006) is unclear. Consequently, we do not
50
recommend the use of FastGroupII in routine DNA barcoding projects.
Gonzalez et al.
3
51
52
53
54
55
56
57
58
59
60
61
62
63
Blaxter, M., et al. 2005 Defining operational taxonomic units using DNA barcode data Phil. Trans. R. Soc. B
64
analyses of large 16S rDNA libraries. BMC Bioinformatics 7, 57-xxx.
65
360, 1935-1943
Munch, K., Boomsma, W., Willerslev, E. & Nielsen, R. 2008. Fast phylogenetic DNA barcoding. Phil. Trans.
R. Soc. B 363, 3997-4002.
Nielsen, R.,& M. V. Matz. 2006 Statistical approaches for DNA barcoding. Syst. Biol. 55,162-169.
Parkinson, J., Guiliano, D. & Blaxter, M. 2002 Making sense of EST sequences by CLOBBing them. BMC
Bioinformatics 3, 31
Pons, J. et al. 2006. Sequence-based species delimitation for the DNA taxonomy of undescribed insects.
Syst. Biol. 55, 595-609.
Schloss, P. D. & Handelsman, J. 2005 Introducing DOTUR, a computer program for defining operational
taxonomic units and estimating species richness. Appl. Envir. Microbiol. 71, 1501-1506.
Yu, Y., Breitbart, M., McNairnie, P. & Rohwer, F. 2006 FastgroupII: a web-based bioinformatics platform for