Download Causes, consequences and solutions of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Quantitative trait locus wikipedia , lookup

Gene nomenclature wikipedia , lookup

Public health genomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene desert wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Minimal genome wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Viral phylodynamics wikipedia , lookup

Genome (book) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene wikipedia , lookup

Multiple sequence alignment wikipedia , lookup

Pathogenomics wikipedia , lookup

Koinophilia wikipedia , lookup

Genome editing wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome evolution wikipedia , lookup

Gene expression profiling wikipedia , lookup

Helitron (biology) wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression programming wikipedia , lookup

Quantitative comparative linguistics wikipedia , lookup

Metagenomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Maximum parsimony (phylogenetics) wikipedia , lookup

Microevolution wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Transcript
B RIEFINGS IN BIOINF ORMATICS . VOL 16. NO 3. 536^548
Advance Access published on 27 May 2014
doi:10.1093/bib/bbu015
Causes, consequences and solutions
of phylogenetic incongruence
Anup Som
Submitted: 24th January 2014; Received (in revised form) : 5th April 2014
Abstract
Phylogenetic analysis is used to recover the evolutionary history of species, genes or proteins. Understanding phylogenetic relationships between organisms is a prerequisite of almost any evolutionary study, as contemporary species
all share a common history through their ancestry. Moreover, it is important because of its wide applications that
include understanding genome organization, epidemiological investigations, predicting protein functions, and deciding the genes to be analyzed in comparative studies. Despite immense progress in recent years, phylogenetic reconstruction involves many challenges that create uncertainty with respect to the true evolutionary relationships of
the species or genes analyzed. One of the most notable difficulties is the widespread occurrence of incongruence
among methods and also among individual genes or different genomic regions. Presence of widespread incongruence
inhibits successful revealing of evolutionary relationships and applications of phylogenetic analysis. In this article, I
concisely review the effect of various factors that cause incongruence in molecular phylogenies, the advances in
the field that resolved some factors, and explore unresolved factors that cause incongruence along with possible
ways for tackling them.
Keywords: molecular phylogeny; phylogenetic incongruence; phylogenomics; systematic error; heterotachy; phylogenetic
network
INTRODUCTION
Recovering true historical relationships between any
groups of species, genes or proteins is one of the
principal goals of evolutionary research. Knowledge
of phylogeny tells us about the pattern of evolutionary relationships, revealing the historical pattern of
speciation and divergence and enabling us to classify
life according to evolutionary scheme. Moreover, it
is important because of its wide applications
throughout biology such as in studies of human diseases: for epidemiological investigations [1–3], for
identifying and characterizing newly discovered
pathogens [4, 5] and for identifying and tracking
natural reservoirs of zoonotic diseases [6]. In addition, phylogenetic analysis has been widely used to
test the a priori hypothesis of epidemiological clustering in suspected transmission chains of HIV [7].
Phylogenetic analysis of molecular sequences is also
one of the principal interpretive tools for
understanding the organization and evolution of
genes and genomes [8]. However, the success of revealing evolutionary relationships and applications of
phylogenetic analysis greatly depends on the quality
of the inferred phylogeny.
Despite immense progress in recent years, phylogenetic reconstruction involves many challenges that
create uncertainty with respect to the true historical
relationships of the organisms or genes analyzed [9].
One of the most notable difficulties is the widespread
occurrence of incongruence (see glossary) among
methods [10, 11] and also among individual genes
or different genomic regions [12–15], and the existence of pervasive incongruences impedes achievement of the primary goals of evolutionary research.
Evolutionary biologists expected that the advent of
large volumes of data with sophisticated computational tools would resolve the problem.
Surprisingly, the advent of molecular and genomic
Corresponding author. Anup Som, Center of Bioinformatics, Institute of Interdisciplinary Studies, University of Allahabad, Allahabad
211002, India. Tel.: þ91-532-2460027; Fax: þ91-532-2461009; E-mail: [email protected]
Anup Som is an assistant professor at the Center of Bioinformatics, Institute of Interdisciplinary Studies, University of Allahabad,
India—a public university and an institution of national importance. His research interests include evolutionary bioinformatics and
systems biology.
ß The Author 2014. Published by Oxford University Press. For Permissions, please email: [email protected]
Phylogenetic incongruence
data has increased the variety of classifications rather
than reducing the problem [10, 16]. Thus, despite
the rise of phylogenomics [17], many important
nodes remain unresolved [18]. In reality, large
amounts of data with available efficient algorithms
have made immense progress, but still many important nodes remain unresolved [15, 19]. This issue
raises a critical question: why important nodes
remain unresolved, and how do we overcome this
problem. Molecular data are continuously accumulating at a rapid pace, analytical methods are refined
and new ones are developed but time has not
stopped. Incongruence is still a problem that scientists are trying to tackle.
In this review, I emphasize the various factors that
cause incongruence in molecular phylogenies, the
advances in the field that resolved some factors,
and explore unresolved factors that cause incongruence along with possible ways of tackling them.
Reviewing the causes, consequences and solutions
of phylogenetic incongruence is a difficult task, and
may be too big to be addressed in a single article.
However, some of these causes, consequences and
solutions have already been reviewed separately in
the different literatures [11, 19–21], but no such
effort to combine all the possible causes and their
consequences has been concisely discussed in a
single article.
PRESENCE OF INCONGRUENCE: A
SURVEY
The widespread occurrences of phylogenetic incongruence among methods and also among genes are
well-known [10, 11]. Although there is no doubt
about the existence of statistically significant incongruence among different tree reconstruction methods, but statistically significant incongruence among
genes has been questioned [10]. However, it has
been accepted for a long time that gene trees could
be incongruent with each other because of the genetic drift in natural populations, gene duplication,
horizontal gene transfer or lineage sorting [10, 22].
In an investigation, Rokas et al. [13] has shown that
the single-gene phylogenies reveal extensive incongruence when a particular tree reconstruction
method is used. They analyzed 106 genes of eight
yeast species and showed that there are several
strongly supported (defined here as a bootstrap
value >70%) alternative phylogenies. Analyses of
106 genes resulted in >20 alternative maximum
537
likelihood (ML) or maximum parsimony (MP)
trees. In a similar study, Kopp and True [12] also
reported that single-gene datasets of Drosophila melanogaster species group produce several strongly supported conflicting clades, which were also
consistently seen in combined analyses. Lack of resolution in combined data analyses stems from conflicting phylogenetic signal between individual genes
either because of real differences in their evolutionary history or because of different statistical biases
[25]. Many studies have also reported that there are
several strongly supported trees among different
genes [23–25]. On the other hand, Jeffroy et al.
[10] have analyzed the same 106 genes assembled
by Rokas et al. [13] from 14 yeast species and
shown that although individual genes recover different trees, they are not statistically significant (i.e. no
incongruence among the 106 individual genes), and
stated ‘all single gene trees are different because of
the predominant effect of stochastic error’. This
result contradicted other results and also stands in
sharp contrast because gene trees are different not
only due to statistical biases, but also due to real differences in their evolutionary history [26, 27].
Despite evidence of incongruence in recent studies, a further investigation has been conducted using
nucleotide sequence data published in the scientific
literatures (Table 1). These multigene datasets were
used for testing different evolutionary hypotheses,
and therefore quality of data had been verified.
Reanalysis of the multigene datasets reveal that the
phylogenetic incongruence is present in both cases
(i.e. among methods and also among genes) even
when robust reconstruction methods (i.e. ML and
Bayesian method) are used. It is admissible that the
incongruence among methods is very common and
statistically robust (Figure 1), which agreed with
other observations. Figure 1 shows the subtrees of
six-taxon phylogenies that are based on the concatenation of four genes of Pryer et al. [28] data. Trees
were inferred using neighbor-joining (NJ) and ML
methods under the generalized time reversible
(GTR)
model
with
rate
heterogeneity
(a ¼ 0.2652). Pryer etal. [28] data contained an alignment of 5072 nucleotide characters based on the
concatenation of four genes from 35 taxa. In
Figure 1, instead of 35 taxon trees, two six-taxa subtrees have been shown where incongruence is significant. The ML tree shows that Pinus and Gnetum
are monophyletic with 78% bootstrap value, whereas
the NJ tree shows Pinus and Gnetum
538
Som
Table 1: Information about the datasets used in this study, including source paper, number of taxa, sequence
length, genes, organism and evolutionary model
Dataset
Reference
Taxa
Length
Gene(s)
Organism
Evolution model
Pryer
Springer
Murphy
Teeling
Yoder
Madsen
James
[28]
[29]
[30]
[31]
[32]
[33]
[34]
35
42
66
29
31
30
214
5072
19397
9779
7187
4454
6518
6436
atpB þ rbcL þ rps4 þ SSUrRNA
19 nuclear & three mtRNA genesa
15 nuclear & three mtDNA genesa
A2AB þ BRCA1 þRAG1 þRAG2
Cytochrome b, ND2, IRBP
A2AB þ IRBP þ vWF þ rRNA
18S þ 28S þ 5.8S þ EF1a þRPB1 þ RPB2
Seed plant
Placental mammal
Placental mammal
Rhinolophoid microbat
Malagasy carnivore
Placental mammal
Fungus
GTR þG
F84 þ G
K2P, HKY þG
GTR þG þ I
GTR þG þ I
HKY85
GTR þ I þ G
Note: aGenes name can be found in the corresponding reference article.
A NJ-bootstrap tree
Cycas
61
95
Ginkgo
Pinus
91
Austrobaileya
100
Chloranthus
Gnetum
0.02
B ML-bootstrap tree
78
Pinus
Gnetum
70
Ginkgo
66
them are lacking statistical support. In Figure 2, phylogenies obtained from a single gene dataset of
Springer et al. [29] data. Gene trees were obtained
from ATP7A (690 bp, 42 taxa and rate of heterogeneity a ¼ 0.751) and mt-RNA (1647 bp, 42 taxa and
a ¼ 0.751) genes by using neighbor-joining–maximum composite likelihood (NJ-MCL) method of
phylogenetic inference [36–38]. Gene ATP7A
shows Elephant and Hyrax are monophyletic with
65% bootstrap support whereas mt-RNA tree shows
that Elephant and Sirenian are monophyletic with
81% bootstrap support.
Cycas
Chloranthu
100
Austrobaileya
Figure 1: Incongruence among phylogenetic methods
(Pinus and Gnetum case). Phylogenies based on the
combined dataset (concatenation of four genes) containing alignments of 5072 nucleotide characters from
35 taxa [28]. Trees were inferred using (A) NJ and (B)
ML methods under GTR model with rate heterogeneity
(a ¼ 0.2652). The bootstrap supports of each tree are
indicated to the left of the corresponding node; 1000
bootstrap supports for NJ method using MEGA4 [38]
and 500 bootstrap supports for ML method using
PAUP [35] were performed. The ML tree shows that
Pinus and Gnetum are monophyletic with 78% bootstrap support, whereas the NJ tree shows that Pinus
and Gnetum are paraphyletic. This result shows the
statistically significant incongruence among methods.
[Note: In this figure, instead of 35 taxa trees, two sixtaxa subtrees are shown where incongruence was
significant]
are paraphyletic. This result shows a distinct example
of statistically significant incongruence among methods. On the other hand, topological differences
among genes are also widespread, but most of
FACTORS THAT CAUSE
PHYLOGENETIC INCONGRUENCE
Analytical factors
The causes of negative impact of various factors are
analytical and biological. Analytical factors affecting
phylogenetic reconstruction include: (i) the choice of
optimality criterion [39], (ii) taxon sampling [40, 41]
and (iii) specific assumptions in the modeling of
sequence evolution [42]. The existing probabilitybased optimality criteria (i.e. ML and Bayesian
method) and substitution models are statistically
robust, but they have limitations to represent reality
[43–45]. It might be possible that small variation
from reality makes the problem. Models of sequence
evolution are designed to capture the most essential
aspects of processes shaping sequence evolution.
However, these models rely on assumptions that
render them sensitive to a number of influences.
For example, extreme variability in rates of evolution
among lineages has been shown to lead to longbranch attraction [46]. Complex models of sequence
evolution may better approximate the evolution of
the sequences, and therefore might be expected to
Phylogenetic incongruence
A NJ-MCL tree for ATP7A gene
Subtree
Sirenian
89
Hyrax
65
Elephant
B NJ-MCL tree for mt-RNA gene
Subtree
Hyrax
539
phylogenetic estimation. In phylogenetic analyses,
the inclusion or exclusion of certain taxa may influence the accuracy of phylogenetic estimations.
Considerable bodies of work have addressed the
effect of taxon sampling on phylogenetic reconstruction, especially in relation to density of sampling, and
have stated that an adequate increase of taxon sampling strongly improves the phylogenetic estimation
[40, 52–55]. Heath et al. [56] performed a comprehensive study on taxon sampling that included an
exploration of the effect and strategies of taxon
sampling and concluded that there are many benefits
of dense taxon sampling and advised evolutionary
biologists to appreciate taxon sampling when interpreting phylogenetic analysis results to explain any
unexpected patterns. For further details about the
issue of taxon sampling and phylogenetic analysis
readers are advised to go through the comprehensive
review article by Nabhan and Sarkar [21].
Biological factors
95
Elephant
81
Sirenian
Figure 2: Incongruence among genes. Phylogenies obtained from single gene datasets of Springer et. al.
[29]. Gene trees were obtained from (A) ATP7A
(690 bp, 42 taxa and rate heterogeneity a ¼ 0.751) and
(B) mt-RNA (1647 bp, 42 taxa and a ¼ 0.751) genes by
using NJ method of phylogenetic inference under MCL
model with rate heterogeneity. The bootstrap supports
of each tree are indicated to the left of the corresponding node; 1000 replicates for NJ-MCL using MEGA4
[38] were performed. The ATP7A gene shows that
Elephant and Hyrax are monophyletic with 65% bootstrap support, whereas mt-RNA tree shows that
Elephant and Sirenian are monophyletic with 81% bootstrap support.
give more accurate results. On the other hand, more
complex models require the estimation of more parameters, each of which is subjected to some errors.
Several studies reported that the simple models
sometimes perform better than the complex models
[47–49], and this finding highlighted the complexity
of phylogeny reconstruction and demands for more
theoretical work on statistical models [50, 51].
Appropriate and extensive taxon sampling is one
of the most important determinants of accurate
Biological factors cause incongruence between two
phylogenies because of (i) violations of the orthology
due to three major mechanisms, namely, lineage
sorting, hidden paralogy and horizontal gene transfer
[57, 58]; (ii) stochastic error or character sampling
bias related to the length of the genes; and (iii) systematic error due to the presence of a nonphylogenetic signal in the data [10].
A growing corpus of studies has revealed conflicting evolutionary histories among genes [11, 59]. The
conflicting evolutionary histories among gene trees
could mainly be due to coalescent stochasticity [11,
22, 60] when incomplete sorting of the ancestral
polymorphism occurs during successive speciation
events, leading to gene genealogies that differ from
the species phylogeny (Figure 3). This phenomenon,
called ‘incomplete lineage sorting’, has been detected
in several different taxa [24, 60], including hominids
[61]. A recent genome-wide analysis of hominid primates indicated that, as a consequence of incomplete
lineage sorting, roughly 30% of our genome support
the [chimpanzee, (human, gorilla)] or [human,
(chimpanzee, gorilla)] branching order, i.e. topologies different from the true [gorilla, (human, chimpanzee)] species tree [60, 62]. A very different reason
why gene trees can be truly incongruent is gene duplications and extinctions. If a dataset includes paralogous copies, then the true phylogeny will partly
reflect the duplication history of the gene that is independent of species divergence history. The third
540
Som
Gene Tree
Ancestral
Polymorphism
A
B
C
D
Incomplete
Lineage Sorting
Polymorphisms maintained
between speciation events
A
B
C
D
Figure 3: An illustration of incomplete lineage sorting
problem. The history of a gene (colored lines) is drawn
in the context of a species tree (bars). New lineages
arising from new polymorphisms in the gene are
drawn in different colors. Consequently, the gene tree
topology is (((AB)C)D), whereas the species tree topology is (((A(BC))D). A colour version of this figure is
available at BIB online: http://bib.oxfordjournals.org.
mechanism is horizontal gene transfer. If genetic
exchanges occur between species, then the phylogeny of individual genes will be influenced by the
number and nature of transfers they have undergone.
It is well-known that the horizontal gene transfers
are an important source of conflicts between gene
trees in bacteria [63, 64]. Quantifying the amount of
phylogenetic conflict caused by horizontal gene
transfer versus other factors would appear worthwhile for a correct interpretation of bacterial phylogenomic data: how frequently, and under which
conditions, should we invoke horizontal gene transfer? Horizontal gene transfer is expected to be very
rare or absent in animals. Thus, phylogenetic agreement in animals is much more common that disagreement, indicating that horizontal gene transfer
is not widespread enough to erase the vertical
signal [11, 15]. Beside the three major mechanisms,
hybridization, recombination and natural selection,
are cited as potential causes of gene tree incongruence [11]. Figure 4 summarizes the negative factors
that cause incongruence in species and genes trees.
alignment rather than on the method of reconstruction [65–67]. One of the earlier studies to see the
effects of nucleotide sequences alignment on phylogeny, Morrison and Ellis [68] did a case study of
18s rDNAs of Apicomplexa and concluded that
many of the literature disagreements concerning
the phylogeny of the Apicomplexa are probably
based on differences in sequence alignment strategies
rather than differences in data or tree-building methods [68].
Over the decade, continuous efforts have been
made to improve the quality of multiple sequence
alignments, and as a result, several improved algorithms of sequence alignment have been developed
such as SATCHMO [69], RASCAL [70], MUSCLE
[71], MAVID [72] and CLUSTAL OMEGA [73].
Different alignment algorithms claimed high alignment accuracy, but a recent study finds the pitfalls of
the existing algorithms and raised questions about the
quality of multiple alignments and their effects on
subsequent analyses [74, 75]. Multiple alignments
are typically produced by feeding sequences into
the different alignment programs from left to right.
Landan and Graur [74] used existing alignment programs and instructed them to read the sequences
both from left to right and also from right to left,
and found that the two resulting alignments can be
quite different and consequently produce different
trees even though both alignments start with identical information and were processed by identical algorithms. Therefore, one of the potential sources of
incongruence in phylogeny is due to artifacts in multiple sequence alignments. So what are the possible
solutions for resolving incongruence cause by alignment artifacts? The problems are hard and so far no
perfect remedy exists. However, a good approach is
choosing highly conserved, recently diverged and
long sequences where the effect of alignments in
both directions seems to be relatively minor.
Furthermore, for alignments that are not too short,
removal of problematic regions (i.e. eliminate regions that cannot be aligned with confidence) leads
to better trees [76, 77].
INCONGRUENCE DUE TO
SEQUENCE ALIGNMENTS
The reconstruction of phylogenetic history is predicated on being able to accurately establish hypotheses
of character homology, and mistaken hypotheses of
homology are a primary source of error in evolutionary studies. Phylogenies are often thought to be
more dependent upon the specifics of the sequence
INCONGRUENCE CAUSED BY
HETEROTACHY
Heterotachy refers to the variation in the evolutionary rate of a given site (i.e. nucleotide/amino acid) of
a gene or protein through time. It has been convincingly demonstrated that the evolutionary rate of a
Phylogenetic incongruence
541
Factors causes phylogenetic incongruence
Horizontal gene transfer
Analytical factors
Biological factors
Optimality criterion
Substitution model
Taxon sampling
Incomplete lineage sorting
Violation of orthology
Stochastic error
Systematic error
Gene duplication
Hybridization
Recombination
Compositional heterogeneity
Rate heterogeneity
Heterotachy
Figure 4: A summarization of the negative factors that cause incongruence in species and genes trees.
100
MP
90
Bootstrap confidence level (%)
given site is not always constant throughout time.
Functional constraints on sites in a gene sequence
often change through time, causing shifts in site-specific evolutionary rates, a phenomenon called heterotachy [78]. This phenomenon has been recently
confirmed as an important process of sequence evolution [79], and can lead to phylogenetic reconstruction artifacts [80]. Unlike other types of biases,
heterotachy does not leave any evident footprints
in sequences, and therefore leads to artifacts that
are particularly difficult to detect [81, 82].
Recently, several simulation studies have been
conducted to compare the relative performance of
the phylogenetic inference methods under the heterotachous conditions, and studies have shown that
ML outperforms MP, but that in the majority of
cases, both methods failed to recover the correct
topology under the heterotachous conditions [51,
80, 83–86]. Recent controversies over the relative
performances of the ML and MP methods can not
address the important issue of how to reconstruct a
correct phylogeny when the concatenations suffer
from heterotachy. Particularly, adopting a genomescale approach increases the chance of heterotachy
because genes and genomes are unlikely to
evolve in a similar way as we concatenate sequence
data from species that are extremely distantly
related or have large differences in their environmental habitat, biological composition and life
history traits.
So, what are the implications of these findings?
Are all previously accepted multigene trees in need
of reinspection? To find an acceptable answer, a
simulation study was conducted to measure the performance of the phylogenetic inference methods
when different level of heterotachy is present in
NJ
80
ML
70
BI
60
50
40
30
20
10
0
Level of heterotachy (H)
Weak heterotachy
Strong heterotachy
Figure 5: A histogram plot shows the performance of
the phylogenetic methods with the variation of heterotachous signal (see text). The simulation strategy of
Kolaczkowski and Thornton [80] was used to introduce
heterotachy in the data. A colour version of this figure
is available at BIB online: http://bib.oxfordjournals.org.
the data. The simulation strategy of Kolaczkowski
and Thornton [80] was used to introduce heterotachy in the data. Traditional (i.e. ML, MP and NJ)
and Bayesian inference (BI) methods were considered for this simulation under Hasegawa,
Kishino and Yano (HKY) model of sequence evolution. The results in Figure 5 show that when the
level of heterotachy is very strong, all methods failed
to reconstruct true tree; when the level of heterotachy is very weak, all methods are very efficient; and
when level of heterotachy is intermediate, ML and
BI are very efficient (BI slightly more efficient than
ML), whereas the MP and NJ methods perform very
poorly. Simulation experiments show that the
542
Som
Box 1: Glossary
Molecular phylogenetics: the study of evolutionary relationships among organisms or genes by a combination of
molecular biology and statistical techniques
Phylogenetic incongruence: two (or more) phylogenetic trees are said to be incongruent when they exhibit
conflicting branching orders (i.e. topologies) and can not be superimposed.This implies that at least one node (also
known as a bipartition) present in one tree is not found in the other(s), where it is replaced by alternative groupings
of taxa.
Phylogenetic signals: tendency for related taxa to resemble each other more than they resemble species drawn
at random from the tree. The strength of the phylogenetic signal is proportional to the number of substitutions
occurring along the branch.
Phylogenomics: reconstruction of phylogenies using a large number of genes or genomics regions.Two fundamentally different approaches are used for reconstructing phylogenies from multiple datasets. In one, the phylogenetic
reconstruction is done after the gene sequences are concatenated head-to-tail to form a super-gene alignmentç
called ‘supermatrix’ approach. In the other, phylogenies are inferred separately for each gene and the resulting
gene trees are used to generate a majority rule consensus phylogenyçcalled ‘supertree’ approach.The size of homologous sequence datasets has increased dramatically in recent years, and many of these datasets now involve few
hundred to several thousand species or genes [16,18,108].
Model of sequence evolution: a statistical description of the process of substitution in nucleotide or amino acid
sequences. Complex models better approximate the evolutionary process but at the expense of more parameters
and computational time. As parameter-rich models require more data to behave properly, they have become
really useful with the advent of phylogenomic datasets.
Gene tree: a phylogenetic tree that shows the evolutionary relationships between a group of genes.
Species tree: a phylogenetic tree that shows the evolutionary relationships between a group of species.
Paralogy: two genes are said to be paralogous when they derive from a gene duplication event.
Horizontal gene transfer: the transfer of genetic material from one organism to another one that is not its
offspring.
Incomplete lineage sorting: the failure of two or more lineages in a population to coalesce, leading to the possibility that at least one of the lineages first coalesces with a lineage from a less closely related population.
Ancestral polymorphism: the variants that arose by mutation prior to the speciation event that generated the
species in which they segregate.
Coalescent: the process of joining ancestral lineages when the genealogical relationships of a random sample of sequences from a modern population are traced back.
Heterotachy: refers to the fact that the evolutionary rate of a given nucleotide/amino acid position varies
throughout time.
Long-branch attraction: the tendency of species at the ends of long branches in a phylogenetic tree to be
grouped artificially close to each other. Long branches can result from a fast evolutionary rate and/or a long time
span. In particular, the branch of the outgroup is, in essence, long and often artifactually attracts fast-evolving
lineages.
Stochastic error: the error in phylogenetic estimation caused by the finite length of the sequences used in the inference. As the size of the sequences (n) increases, the magnitude of the error decreases (stochastic error / p1ffiffin.)
Systematic error: the error in phylogenetic estimation that is due to the failure of the reconstruction method to
account fully for the properties of the data.
Natural selection: the process in nature by which, according to Darwin’s theory of evolution, only the organisms
best adapted to their environment tend to survive and produce more offspring while those less adapted tend to
be eliminated.
Hybridization (DNA): the process of combining two complementary single-stranded DNA or RNA molecules
and allowing them to form a single double-stranded molecule through base pairing.
Recombination: the exchange of a segment of DNA between two homologous chromosomes during meiosis leading to a novel combination of genetic material in the offspring.
Monophyletic: monophyletic taxa include all the species that are derived from a single common ancestor.
Polyphyletic: taxon is composed of unrelated organisms descended from more than one ancestor.
Optimality criteria: an objective function that is used to evaluate a given tree.The tree that maximizes or minimizes the function is chosen as the best estimate of phylogeny.
Phylogenetic networks: a phylogenetic network is any graph used to visualize evolutionary relationships (either
abstractly or explicitly) between nucleotide sequences, genes, chromosomes, genomes or species. They are used
when reticulate events such as hybridization, horizontal gene transfer, recombination or gene duplication and loss
are believed to be involved.
Phylogenetic incongruence
situation is not so dire because when we are reconstructing a large-scale phylogeny by concatenating
several genes, there is a fair chance of occurring heterotachy. However, for a set of conserved homologous genes, level of heterotachy is expected to be
low or intermediate, and under such situations ML
and BI are preferred methods to avoid incongruence
caused by heterotachous signal. Recently, two
models of sequence evolution have been proposed
for dealing with heterotachy. Pagel and Meade [87]
proposed a general likelihood-based covarion ‘mixture-model’ for inferring phylogenetic tree from
multigene data that suffer from heterotachy.
Another model is the site-heterogeneous mixture
model, named CAT, was developed for dealing
with heterotachy [88]. However, it has been claimed
that these models accommodate cases in which different sites in the alignment evolve with different
rates, but evidently these models of evolution
could not detect or overcome heterotachy [30].
Although probabilistic methods (i.e. BI and ML)
are very efficient up to certain levels of heterotachous signal. The existence of heterotachy compels
us to improve the phylogenetic reconstruction methods. So far, there is no such way to accommodate
heterotachous signal in the methods of phylogenetics
inferences. Moreover, it is well-known that existence of heterotachy dramatically affects phylogenetic
accuracy [80]. Therefore, phylogenetic incongruence
might be because of heterotachy, and phylogenetic
inference methods should be improved so that methods can successfully accommodate within-site rate
variation (i.e. heterotachous signals), and successful
modeling of within-site rate variation will help to
resolve many unresolved nodes.
RESOLVING INCONGRUENCE
USING PHYLOGENOMICS
APPROACH
To resolve the effect of analytical and biological factors, evolutionary biologists had tried to reconstruct a
true (or at least a fair) evolutionary history by analysis
of concatenated datasets [13, 16, 29, 89–93].
Concatenated datasets offer two major advantages.
First, our confidence in the inferred relationships is
greatly improved when they are supported by several
independent datasets. Second, each individual dataset
may not contain sufficient information to resolve all
relationships; therefore, combining the data may increase the amount of phylogenetic signal and provide
543
the necessary resolution. But a problem occurs when
different genes support conflicting phylogenies.
Perfect agreement among the gene-specific phylogenies is rare, and therefore phylogenetic analyses of
different sets of concatenated genes often do not
converge on the same tree [26, 27]. There is growing
evidence that trees built from concatenated sequences have to be treated with caution [14, 27,
93, 94]. Some studies have yielded results at odds
with widely accepted phylogenies [95].
Over the past two decades, tremendous progress
in this field has resulted in resolving various analytical
and biological factors that cause incongruence in
phylogenies. Improvement of phylogenetic inference
methods, probabilistic models of sequence evolution
and genome-scale approach (i.e. phylogenetic reconstruction is done after the gene sequences are concatenated head-to-tail to form a super gene
alignment) have resolved a significant amount of incongruence, but still there are large number of
incongruences among methods and also among
genes [10–15]. Adopting phylogenomics approach
resolved two biological factors. Those are violation
of orthology assumption and stochastic error. The
nonorthologous comparisons are gene-specific and
are buffered in a multigene analysis; and stochastic
error is related to gene lengths, that is, naturally
vanishes when more and more genes are concatenated. On the other hand, systematic errors (i.e.
nonhistorical signals where reality differs from simplified models) are not expected to disappear with
the addition of data. Moreover, longer sequences
exacerbate the potential for biases (systematic error)
to be positively misleading. Systematic errors result
from nonphylogenetic signals being present in the
data; those are compositional signal due to heterogeneity of nucleotide compositions among genes or
species, rate-heterogeneity signal (i.e. rate variation
across lineages) and also heterotachous signal due to
within-site rate variation [19]. The bias causing systematic errors creates an erroneous signal that could
dominate the true phylogenetic signal causing the
tree reconstruction method to be inconsistent and
lead to an incorrect, but highly supported tree [96].
APPROACHES TO OVERCOME
SYSTEMATIC ERRORS
Phylogenetic inference from nucleotides or amino
acids can be misled by both sampling (stochastic)
error and systematic error. Systematic errors are due
544
Som
to heterogeneity of nucleotide compositions, rate variation across lineages and also within-site rate variation.
Heterogeneity of nucleotide compositions causes incorrect phylogenetic inference because unrelated
clades with similar nucleotide frequencies (due to convergence rather than shared ancestral frequencies) will
have greater similarity and may group together in a
phylogenetic analysis, sometimes with strong statistical
support. Several methods have been developed to account for the problem of compositional heterogeneity
among sequences, including distance approaches
[97–99], parsimony approaches [100] and maximumlikelihood approaches [101]. Recently, the RYcoding strategy (coding the nucleotides as purines
and pyrimidines) was recognized a way to deal with
compositional signal [96]. Indeed, the G þ C content
can be extremely variable among homologous
sequences from various organisms, whereas the frequency of purines is remarkably homogeneous [102].
This constitutes a method of choice to avoid incongruence resulting from compositional bias. A recent
research has reported that GC-rich genes induced a
higher amount of conflict among gene trees and
performed worse than AT-rich genes in retrieving
well-supported, consensual nodes on the placental
tree [103]. This GC effect is mainly as a consequence
of genome-wide variations in recombination rate.
Indeed, recombination is known to drive GC-content
evolution through GC-biased gene conversion and
might be problematic for phylogenetic reconstruction,
for instance, in an incomplete lineage sorting context.
Beside compositional heterogeneity, rate variation
across lineages has been well-studied and successfully
modeled by the use of the gamma distribution [104,
105]. The mentioned implementations improved
the reconstruction of the phylogenies, especially
when very divergent species or genes were
included and long-branch attraction artifacts were
widespread.
As part of the continuous effort to resolve systematic errors, recently, several complementary
approaches have been applied, namely, (i) variation
in taxon sampling, (ii) removal of fast-evolving species, genes or sites [106–108] and (iii) allowing a
more efficient detection of multiple substitutions.
In addition to the mentioned complementary
approaches, amino acid coding into functional categories has potential to overcome some types of
systematic errors in genome-scale datasets [30].
Although recent studies indicated that reconstructing phylogeny from alignments of
concatenated genes greatly reduces the stochastic
error, the potential for systematic error still remains,
heightening the need for reliable methods to analyze
multigene datasets. Recently, a new methodology,
called ‘phylogenetic networks’, has been adopted to
deal with incongruences of gene trees [109, 110].
Phylogenetic networks provide an alternative to
phylogenetic trees when analyzing datasets whose
evolution involves significant amounts of reticulate
events such as horizontal gene transfer, hybridization
or recombination [111, 112]. However, it has been
claimed that the phylogenetic networks approach has
potential to visualize contradictory evidence for species phylogenies and incorporate additional information [113], but evidently this approach needs more
theoretical work to establish an efficient network
algorithm and simultaneously need comprehensive
tests using large-scale data to assess the accuracy of
this approach. These further analyses will find out the
potential applications and pitfalls of phylogenetic
networks approach.
CONCLUSIONS, IMPLICATIONS
AND FUTURE RESEARCH
Although recovering a highly resolved tree of life is
yet to be achieved, wealth of genomic data with
probabilistic tree reconstruction methods and
complex models of sequence evolution have taken
phylogenetic analysis to a new height. Furthermore,
cautionary application of several contemporary
approaches such as variation of taxon sampling,
removal of fast evolving species (genes or sites), removal of ambiguously aligned regions or use of
amino acid sequences instead of nucleotides resolved
several uncertainties. For further improvement, an
accurate probabilistic modeling of heterotachy (i.e.
within-site rate variation) is needed. In the future,
it should be examined whether the phylogenomics
approach itself can be fixed in case of heterotachy.
To achieve this goal, it would be necessary to define
a model of sequence evolution that considers heterotachy and to implement it in phylogenetic reconstruction methods. Secondly, accurate establishment
of character homology (i.e. multiple sequence alignment) is an essential requirement. Evolutionary
biologists should pay great attention to improve the
quality of multiple sequence alignments because
recent findings have demonstrated that the existing
alignment algorithms fail to accurately establish
character homology and mistaken hypotheses of
Phylogenetic incongruence
homology are a primary source of error in phylogenetic analysis.
Despite several known factors, there are possibly
other hidden factors that may cause incongruence,
which prevent the achievement of Darwin’s dream
of having ‘ . . . fairly true genealogical trees for each
great kingdom of Nature . . . ’. Evolutionary biologists should explore the possibility of hidden negative
factors and find out their best possible solutions.
6.
7.
8.
9.
Key Points
Resolving phylogenetic incongruence is not an easy problem;
particularly, the problem becomes more complicated when attempts of resolving one negative factor may introduce a new
negative factor that causes incongruence.
Adopting phylogenomics approach resolved few negative factors
that caused phylogenetic incongruence, but simultaneously
increased the chance of systematic error; particularly,‘heterotachous signal’ that is not expected to disappear with the addition
of data.
Toward achieving the goal of recovering a highly resolved tree of
life, accurate establishment of character homology (i.e. multiple
sequence alignment) is an essential requirement.
Beside the known negative factors, there are possibly other
hidden factors that may cause incongruence; therefore, evolutionary biologists should explore the possibility of hidden negative factors and find out their solutions.
10.
11.
12.
13.
14.
15.
FUNDING
16.
The work was supported by the Center of
Bioinformatics, University of Allahabad.
17.
Acknowledgements
I thank Dr. Sudhir Kumar of Arizona State University and Dr.
Dan Graur of University of Houston for useful discussions. I also
thank the anonymous reviewers for their constructive
comments.
18.
19.
20.
References
1.
2.
3.
4.
5.
Ou CY, Ciesielski CA, Myers G, et al. Molecular epidemiology of HIV transmission in a dental practice. Science 1992;
256:1165–71.
de Oliveira T, Pybus OG, Rambaut A, et al. Molecular
epidemiology: HIV-1 and HCV sequences from Libyan
outbreak. Nature 2006;444:836–7.
Wertheim JO, Fourment M, Kosakovsky Pond SL.
Inconsistencies in estimating the age of HIV-1 subtypes
due to heterotachy. Mol Biol Evol 2012;29:451–6.
Nichol ST, Spiropoulou CF, Morzunov S, et al. Genetic
identification of a hantavirus associated with an outbreak
of acute respiratory illness. Science 1993;262:914–17.
Hjelle B, Jenison S, Torrez-Martinez N, et al. A novel
hantavirus associated with an outbreak of fatal respiratory
disease in the southwestern United States: evolutionary
relationships to known hantaviruses. J Virol 1994;68:592–6.
21.
22.
23.
24.
25.
26.
545
Smith JS. New aspects of rabies with emphasis on epidemiology, diagnosis, and prevention of the disease in the United
States. Clin Microbiol Rev 1996;9:166–76.
Scaduto DI, Brown JM, Haaland WC, et al. Source identification in two criminal cases using phylogenetic analysis of
HIV-1 DNA sequences. Proc Natl Acad Sci USA 2010;107:
21242–7.
Bowers JE, Chapman BA, Rong J, et al. Unravelling angiosperm genome evolution by phylogenetic analysis of
chromosomal duplication events. Nature 2003;422:433–8.
Huelsenbeck JP, Rannala B, Masly JP. Accommodating
phylogenetics uncertainty in evolutionary studies. Science
2000;288:2349–50.
Jeffroy O, Brinkmann H, Delsuc F, et al. Phylogenomics:
the beginning of incongruence? Trends Genet 2006;22:
225–31.
Degnan JH, Rosenberg NA. Gene tree discordance phylogenetic inference and the multispecies coalescent. Trends Ecol
Evol 2009;24:332–40.
Kopp A, True JR. Phylogeny of the oriental Drosophila
melanogaster species group: a multilocus reconstruction.
Syst Biol 2002;51:786–805.
Rokas A, Williams BL, King N, et al. Genome-scale
approaches to resolving incongruence in molecular phylogenies. Nature 2003;425:798–804.
Soltis DE, Albert VA, Savolinen V, et al. Genome-scale data
angiosperm relationships and ‘ending incongruence’: a cautionary tale in phylogenetics. Trends Plant Sci 2004;9:477–83.
Galtier N, Daubin V. Dealing with incongruence in phylogenomic analyses. PhilosTrans R Soc Lond B Biol Sci 2008;
363:4023–9.
Ciccarelli FD, Doerks T, Von Mering C, et al. Toward
automatic reconstruction of a highly resolved tree of life.
Science 2006;311:1283–7.
Eisen JA. Phylogenomics intersection of evolution and genomics. Science 2003;300:1706–7.
Nishihara H, Okada N, Hasegawa M. Rooting the eutherian tree: the power and pitfalls of phylogenomics. Genome
Biol 2007;8:R199.
Delsuc F, Brinkmann H, Philippe H. Phylogenomics
and the reconstruction of the tree of life. Nat Rev 2005;6:
361–75.
de Queiroz A, Gatesy J. The supermatrix approach to systematics. Trends Ecol Evol 2007;22:34–41.
Nabhan AR, Sarkar IN. The impact of taxon sampling on
phylogenetic inference: a review of two decades of controversy. Brief Bioinf 2011;13:122–34.
Pamilo P, Nei M. Relationships between gene trees and
species trees. Mol Biol Evol 1988;5:568–83.
Ko WY, David RM, Akashi H. Molecular phylogeny of the
Drosophila melanogaster species subgroup. J Mol Evol 2003;
57:562–73.
Pollard DA, Iyer VN, Moses AM, et al. Widespread discordance of gene trees with species tree in Drosophila: evidence
for incomplete lineage sorting. PLoS Genet 2006;2:e173.
Wong A, Jensen JD, Pool JE, et al. Phylogenetic incongruence in the Drosophila melanogaster species group. Mol
Phylogenet Evol 2007;43:1138–50.
Philippe H, Derelle R, Lopez P, et al. Phylogenomics revives traditional views on deep animal relationships. Curr
Biol 2009;19:706–12.
546
Som
27. Philippe H, Brinkmann H, Lavrov DV, et al. Resolving
difficult phylogenetic questions: why more sequences are
not enough. PLoS Biol 2011;9:e1000602.
28. Pryer KM, Schneider H, Smith AR, et al. Horsetails and
ferns are a monophyletic group and the closest living relatives to seed plants. Nature 2001;409:618–22.
29. Springer MS, Murphy WJ, Eizirik E, et al. Placental
mammal diversification and the Cretaceous–Tertiary
boundary. PNAS 2003;100:1056–61.
30. Rodrı́guez-Ezpeleta N, Brinkmann H, Roure B, et al.
Detecting and overcoming systematic errors in genomescale phylogenies. Syst Biol 2007;56:389–99.
31. Teeling EC, Madsen O, Van Den Bussche RA, et al.
Microbat paraphyly and the convergent evolution of a
key innovation in Old World rhinolophoid microbats.
Proc Natl Acad Sci USA 2002;99:1431–6.
32. Yoder AD, Burns MM, Zehr S, et al. Single origin of
Malagasy carnivora from an African ancestor. Nature 2003;
421:734–7.
33. Madsen O, Scally M, Douady CJ, et al. Parallel adaptive
raditions in two major clades of placental mammals.
Nature 2001;409:610–14.
34. James TY, Kauff F, Conrad L, etal. Reconstructing the early
evolution of Fungi using a six-gene phylogeny. Nature
2006;443:818–22.
35. Swofford DL. PAUP*: phylogenetic analysis using parsimony
*and other methods Version 4. Sunderland, MA: Sinauer
Associates, 2002.
36. Tamura K, Nei M, Kumar S. Prospects for inferring very
large phylogenies by using the Neighbor-Joining method.
Proc Natl Acad Sci USA 2004;101:11030–5.
37. Som A. ML or NJ-MCL? A comparison between two
robust phylogenetic methods. Comput Biol Chem 2009;33:
373–8.
38. Tamura K, Dudley J, Nei M, Kumar S. MEGA4: Molecular
Evolutionary Genetics Analysis (MEGA) software version
4.0. Mol Biol Evol 2007;24:1596–9.
39. Huelsenbeck JP. Performance of phylogenetic methods in
simulation. Syst Biol 1995;44:17–48.
40. Zwickl DJ, Hillis DM. Increased taxon sampling
greatly reduces phylogenetic error. Syst Biol 2002;51:
588–98.
41. Rokas A, Carroll SB. More genes or more taxa: the
relative contribution of gene number and taxon
number to phylogenetic accuracy. Mol Biol Evol
2005;22:1337–44.
42. Sullivan J, Joyce P. Model selection in Phylogenetics. Annu
Rev Ecol Evol Syst 2005;36:445–66.
43. Holder M, Lewis PO. Phylogeny estimation: traditional and
Bayesian approaches. Nat Rev Genet 2003;4:275–84.
44. Felsenstein J. Inferring Phylogenies. Sunderland, MA: Sinauer
Associates, 2004.
45. Yang Z, Rannala B. Molecular phylogenetics: principles
and practice. Nat Rev Gent 2012;13:304–14.
46. Felsenstein J. Cases in which parsimony or compatibility
methods will be positively misleading. Syst Zool 1978;27:
401–10.
47. Piontkivska H. Efficiencies of maximum likelihood
methods of phylogenetics inferences when different
substitution models are used. Mol Phylogenet Evol 2004;31:
865–73.
48. Som A. A new approach for estimating the efficiencies of
the nucleotide substitution models. Theory Biosci 2007;125:
133–45.
49. Yang Z. How often do wrong models produce better phylogenies. Mol Biol Evol 1997;17:1251–8.
50. Som A. Theoretical foundation to estimate the relative efficiencies of the JCþgamma model and the JC model in
obtaining the correct phylogenetic tree. Gene 2006;385:
103–10.
51. Som A, Fuellen G. The effect of heterotachy in multigene
analysis using the neighbor joining method. Mol Phylogenet
Evol 2009;52: 846–51.
52. Hillis DM. Taxonomic sampling phylogenetic accuracy and
investigator bias. Syst Biol 1998;47:3–8.
53. Hedtke S, Townsend T, Hillis D. Resolution of phylogenetic conflict in large data sets by increased taxon sampling.
Syst Biol 2006;55:522.
54. Poe S. The effect of taxonomic sampling on accuracy of
phylogenetic estimation: test case of a known phylogeny.
Mol Biol Evol 1998;15:1086–90.
55. Pick KS, Philippe H, Schreiber F, et al. Improved phylogenomic taxon sampling noticeably affects nonbilaterian relationships. Mol Biol Evol 2010;27:1983–7.
56. Heath T, Hedtke S, Hillis D. Taxon sampling and
the accuracy of phylogenetic analyses. J Syst Evol 2008;46:
239–57.
57. Maddison WP. Gene trees in species trees. SystBiol 1997;46:
523–36.
58. Wendel JF, Doyle JJ. Phylogenetic Incongruence: window
into genome history and molecular evolution. In: Soltis DE,
Soltis PS, Doyle JJ (eds). Molecular Systematics of Plants II:
DNA Sequencing. Netherlands: Kluwer academic publishers,
1998;265–96.
59. Degnan JH, Rosenberg NA. Discordance of species trees
with their most likely gene trees. PLoSGenet 2006;2:762–8.
60. Hobolth A, Dutheil J, Hawks J, et al. Incomplete lineage
sorting patterns among human chimpanzee and orangutan
suggest recent orangutan speciation and widespread selection. Genome Res 2011;21:349–56.
61. Patterson N, Richter DJ, Gnerre S, et al. Genetic evidence
for complex speciation of humans and chimpanzees. Nature
2006;441:1103–8.
62. Hobolth A, Christensen O F, Mailund T, Schierup M.
Genomic relationships and speciation times of human chimpanzee and gorilla inferred from a coalescent hidden
Markov model. PLoS Genet 2007;3:e7.
63. Philippe H, Douady CJ. Horizontal gene transfer and
phylogenetics. Curr Opin Microbiol 2003;6:498–505.
64. Boto L. Horizontal gene transfer in evolution: facts and
challenges. Proc Biol Sci 2010;277:819–27.
65. Morrison DA. Multiple sequence alignment for phylogenetic purposes. Aust Syst Bot 2006;19:479–539.
66. Ogden TH, Rosenberg MS. Multiple sequence alignment accuracy and phylogenetic inference. Syst Biol 2006;
55:314–28.
67. Höhl M, Ragan MA. Is multiple-sequence alignment
required for accurate inference of phylogeny? Syst Biol
2007;56:206–21.
68. Morrison DA, Ellis JT. Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18s rDNAs
of apicomplexa. Mol Biol Evol 1997;14:428–41.
Phylogenetic incongruence
69. Edgar RC, Sjolander K. SATCHMO: sequence alignment
and tree construction using hidden Markov models.
Bioinformatics 2003;19:1404–11.
70. Thompson JD, Thierry JC, Poch O. RASCAL: rapid scanning and correction of multiple sequence alignments.
Bioinformatics 2003;19:1155–61.
71. Edgar RC. MUSCLE: multiple sequence alignment with
high accuracy and high throughput. Nucleic Acids Res 2004;
32:1792–97.
72. Bray N, Pachter L. MAVID: constrained ancestral
alignment of multiple sequences. Genome Res 2004;14:
693–9.
73. Sievers F, Wilm A, Dineen DG, et al. Fast scalable generation of high-quality protein multiple sequence alignments
using Clustal Omega. Mol Syst Biol 2011;7:539.
74. Landan G, Graur D. Heads or tails: a simple reliability check
for multiple sequence alignments. Mol Biol Evol 2007;24:
1380–3.
75. Notredame C. Recent evolutions of multiple sequence
alignment algorithms. PLoS Comput Biol 2007;3:e123.
76. Wiens JJ. Missing data and the design of phylogenetic analyses. J Biomed Inform 2006;39:34–42.
77. Talavera G, Castresana J. Improvement of phylogenies
after removing divergent and ambiguously aligned
blocks from protein sequence alignments. Syst Biol 2007;
56:564–77.
78. Philippe H, Lopez P. On the conservation of protein sequences in evolution. Trends Biochem Sci 2001;26:414–16.
79. Lopez P, Casane D, Philippe H. Heterotachy an important
process of protein evolution. Mol Biol Evol 2002;19:1–7.
80. Kolaczkowski B, Thornton JW. Performance of maximum
parsimony and likelihood phylogenetics when evolution is
heterogeneous. Nature 2004;431:980–4.
81. Inagaki Y, Susko E, Fast NM, et al. Covarion shifts cause a
long-branch attraction artifact that unites Microsporidia and
Archaebacteria in EF-1a phylogenies. Mol Biol Evol 2004;21:
1340–9.
82. Baele G, Raes J, Van de , Peer Y, et al. An improved statistical method for detecting heterotachy in nucleotide
sequences. Mol Biol Evol 2006;23:1397–405.
83. Gadagkar SR, Kumar S. Maximum likelihood outperforms
maximum parsimony even when evolutionary rates are heterotachous. Mol Biol Evol 2005;22:2139–41.
84. Gaucher EA, Miyamoto MM. A call for likelihood phylogenetics even when the process of sequence evolution is
heterogeneous. Mol Phylogenet Evol 2005;37:928–31.
85. Philippe H, Zhou Y, Brinkmann H, et al. Heterotachy and
long-branch attraction in phylogenetics. BMC Evol Biol
2005;5:50.
86. Spencer M, Susko E, Roger AJ. Likelihood parsimony and heterogeneous evolution. Mol Biol Evol 2005;22:
1161–4.
87. Pagel M, Meade A. A phylogenetic mixture model
for detecting pattern-heterogeneity in gene sequence or
character-state data. Syst Biol 2004;53:571–81.
88. Lartillot N, Philippe H. A Bayesian mixture model for
across-site heterogeneities in the amino-acid replacement
process. Mol Biol Evol 2004;21:1095–109.
89. Soltis PS, Soltis DE, Chase MW. Angiosperm phylogeny
inferred from multiple genes as a tool for comparative biology. Nature 1999;402:402–4.
547
90. Murphy WJ, Eizirik E, Johnson WE, et al. Molecular
phylogenetics and the origins of placental mammals.
Nature 2001;409:614–18.
91. Sanderson MJ, Driskell AC, Ree RH, etal. Obtaining maximal concatenated phylogenetic data sets from large
sequence databases. Mol Biol Evol 2003;20:1036–42.
92. Bapteste E, Brinkmann H, Lee JA, et al. The analysis of 100
genes supports the grouping of three highly divergent
amoebae: dictyostelium Entamoeba and Mastigamoeba.
Proc Natl Acad Sci USA 2002;99:1414–19.
93. Bapteste E, Susko E, Leigh J, et al. Alternative methods for
concatenation of core genes indicate a lack of resolution in
deep nodes of the prokaryotic phylogeny. Mol Biol Evol
2008;25:83–91.
94. Som A. Genome-scale approach and the performance of
phylogenetic methods. J Phylogen Evol Biol 2013;1:116.
95. Naylor GJ, Brown WM. Amphioxus mitochondrial
DNA chordate phylogeny and the limits of inference
based on comparisons of sequences. Syst Biol 1998;47:
61–76.
96. Phillips MJ, Delsuc F, Penny D. Genome-scale phylogeny
and the detection of systematic biases. Mol Biol Evol 2004;
21:1455–8.
97. Lake JA. Reconstructing evolutionary trees from DNA and
protein sequences: paralinear distances. Proc Natl Acad Sci
USA 1994;91:1455–9.
98. Galtier N, Gouy M. Inferring phylogenies from DNA sequences of unequal base compositions. Proc Natl Acad Sci
USA 1995;92:11317–21.
99. Gu X, Li W-H. Estimation of evolutionary distances under
stationary and nonstationary models of nucleotide substitution. Proc Natl Acad Sci USA 1998;95:5899–905.
100. Steel MA, Lockhart PJ, Penny D. A frequencydependent significance test for parsimony. Mol Phyl Evol
1995;4:64–71.
101. Galtier N, Gouy M. Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous
model of DNA sequence evolution for phylogenetic analysis. Mol Biol Evol 1998;15:871–9.
102. Woese CR, Achenbach L, Rouviere P, Mandelco L.
Archaeal phylogeny: reexamination of the phylogenetic
position of Archaeoglobus fulgidus in light of certain
composition-induced artifacts. Syst Appl Microbiol 1991;14:
364–71.
103. Romiguier J, Ranwez V, Delsuc F, et al. Less is more in
mammalian phylogenomics: AT-rich genes minimize tree
conflicts and unravel the root of placental mammals. Mol
Biol Evol 2013;30:2134–44.
104. Yang Z. Maximum likelihood estimation of phylogeny
from DNA sequences when substitution rates differ over
sites. Mol Biol Evol 1993;10:1396–401.
105. Yang Z. Maximum-likelihood models for combined
analyses of multiple sequence data. J Mol Evol 1996;42:
587–96.
106. Hirt RP, Logsdon JM, Healy B, et al. Microsporidia are
related to fungi: evidence from the largest subunit of
RNA polymerase II and other proteins. Proc Natl Acad Sci
USA 1999;96:580–5.
107. Brinkmann H, Giezen M, ZhouY , et al. An empirical
assessment of long-branch attraction artefacts in deep
eukaryotic phylogenomics. Syst Biol 2005;54:743–57.
548
Som
108. Burleigh JG, Bansal MS, Eulenstein O, et al. Genome-scale
phylogenetics: inferring the plant tree of life from 18,896
gene trees. Syst Biol 2011;60:117–25.
109. Holland B, Moulton V. Consensus networks: a method for
visualising incompatibilities in collections of trees. In:
Benson G, Page R (eds). Algorithms in Bioinformatics.
Berlin: Springer-Verlag, 2003;165–76.
110. Huson DH, Bryant D. Application of phylogenetic
networks in evolutionary studies. Mol Biol Evol 2006;23:
254–67.
111. Huson DH, Scornavacca C. A survey of combinatorial
methods for phylogenetic networks. Genome Biol Evol
2011;3:23–35.
112. Ferreri M, Qu W, Han B. Phylogenetic networks: a tool
to display character conflict and demographic history.
AfrJ Biotechnol 2011;10:12799–803.
113. Holland BR, Huber KT, Moulton V, etal. Using consensus
networks to visualize contradictory evidence for species
phylogeny. Mol Biol Evol 2004;21:1459–61.