Download Phylogenetic Network and Physicochemical Properties of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

RNA-Seq wikipedia , lookup

Transfer RNA wikipedia , lookup

Genome evolution wikipedia , lookup

Metagenomics wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Non-coding DNA wikipedia , lookup

Mutagen wikipedia , lookup

Koinophilia wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Human genome wikipedia , lookup

Gene wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

NUMT wikipedia , lookup

Helitron (biology) wikipedia , lookup

Microevolution wikipedia , lookup

Microsatellite wikipedia , lookup

Epistasis wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Oncogenomics wikipedia , lookup

Frameshift mutation wikipedia , lookup

Mutation wikipedia , lookup

Expanded genetic code wikipedia , lookup

Genetic code wikipedia , lookup

Point mutation wikipedia , lookup

Transcript
Phylogenetic Network and Physicochemical Properties of Nonsynonymous
Mutations in the Protein-Coding Genes of Human Mitochondrial DNA
Jukka S. Moilanen and Kari Majamaa
Biocenter and Department of Neurology, University of Oulu, Oulu, Finland
Theories on molecular evolution predict that phylogenetically recent nonsynonymous mutations should contain more
non-neutral amino acid replacements than ancient mutations. We analyzed 840 complete coding-region human
mitochondrial DNA (mtDNA) sequences for nonsynonymous mutations and evaluated the mutations in terms of the
physicochemical properties of the amino acids involved. We identified 465 distinct missense and 6 nonsense mutations.
48% of the amino acid replacements changed polarity, 26% size, 8% charge, 32% aliphaticity, 13% aromaticity, and 44%
hydropathy. The reduced-median networks of the amino acid changes revealed relatively few differences between the
major continent-specific haplogroups, but a high variation and highly starlike phylogenies within the haplogroups. Some
56% of the mutations were private, and 25% were homoplasic. Nonconservative changes were more common than
expected among the private mutations but less common among the homoplasic mutations. The asymptotic maximum of
the number of nonsynonymous mutations in European mtDNA was estimated to be 1,081. The results suggested that
amino acid replacements in the periphery of phylogenetic networks are more deleterious than those in the central parts,
indicating that purifying selection prevents the fixation of some alleles.
Introduction
The human mitochondrial genome (mtDNA) has
genes coding for 2 rRNAs, 22 tRNAs, and 13 subunits
of the respiratory chain complexes (MTND1, MTND2,
MTND3, MTND4, MTND4L, MTND5, MTND6, MTATP6,
MTATP8, MTCO1, MTCO2, MTCO3, and MTCYB). The
protein-coding genes occupy 68% of the genome, and
therefore a random nucleotide substitution has a high probability of being nonsynonymous and of leading to amino
acid replacement. The neutral (Kimura 1968) and the
nearly neutral (Ohta 1992) theories of molecular evolution
predict that a certain proportion of nonsynonymous mutations will be neutral in effect, whereas the rest will be
more or less deleterious. Several studies have demonstrated
an excess of nonsynonymous mutations within species as
compared with variation between species (Nachman et al.
1996; Rand and Kann 1996; Hasegawa, Cao, and Yang
1998; Nachman 1998; Fry 1999), and this finding has
been interpreted as suggesting selection against mildly
deleterious mutations, which prevents their fixation.
Furthermore, direct measurements of the intergenerational
substitution rate in human mtDNA have yielded rates
higher than the estimates derived from phylogenetic analyses, suggesting that a significant fraction of mutations is
removed by selection (Parsons et al. 1997).
The effects of nonsynonymous mutations depend
both on the position of the amino acid replacement in the
protein sequence and on the physicochemical properties of
the amino acids involved. The genetic code appears to
have evolved toward minimizing changes in physicochemical properties, which also affect the rate of nonsynonymous substitutions (Xia and Li 1998), suggesting that
amino acid replacements resulting in a dissimilar amino
acid are generally more deleterious than replacements resulting in an amino acid with similar properties. If the
hypothesis of selection against mildly deleterious mutations is correct, phylogenetically recent mutations should
contain more deleterious mutations and more dissimilar
amino acid replacements than the older ones.
On the one hand, there are many examples of pathogenic single-nucleotide mutations in mtDNA. In addition,
there is evidence that certain combinations of otherwise
harmless polymorphisms in mitochondrial lineages may be
associated with susceptibility to complex diseases (Wallace,
Brown, and Lott 1999; Chinnery et al. 2000; Ruiz-Pesini
et al. 2000), or with successful aging (De Benedictis et al.
1999). Their effect is most likely due to changes in the amino
acid sequences of the protein-coding genes. On the other
hand, several studies have failed to make the distinction
between a pathogenic mutation and a haplotype-associated
neutral polymorphism (Herrnstadt et al. 2002a). For these
reasons, knowledge of the nature and phylogenetic relationships of amino acid haplotypes in the human mitochondrial
genome is also important in clinical practice.
Although the number of complete mtDNA sequences
available has grown exponentially (Finnilä et al. 2000;
Ingman et al. 2000; Elson et al. 2001; Finnilä, Lehtonen,
and Majamaa 2001; Maca-Meyer et al. 2001; Herrnstadt
et al. 2002a), marking the start of mitochondrial population
genomics (Hedges 2000), the functional consequences of
the numerous variations in these sequences have not yet
received much attention. We report here on the characterization of the nonsynonymous mutations in 840 complete
human mitochondrial coding region sequences in terms of
their physicochemical properties, and on the construction
of a phylogenetic network for the amino acid sequences of
all 13 protein-coding genes. Furthermore, the physicochemical properties of the amino acid replacements were
compared according to their positions in the network to
assess the hypothesis of selection against mildly deleterious replacements.
Key words: human mitochondrial DNA, molecular evolution, population genetics, amino acid substitution, phylogenetics, neutral theory.
E-mail: [email protected].
Materials and Methods
Alignment of mtDNA Sequences
Mol. Biol. Evol. 20(8):1195–1210. 2003
DOI: 10.1093/molbev/msg121
2003 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
Human mtDNA sequences were obtained from public
sources (table 1). Historical or current reference sequences
1195
1196 Moilanen and Majamaa
Table 1
Available Complete Mitochondrial DNA Coding Region Sequences
Ida
Identifierb
Refseq
F1-F192
G1-G33
G34
G35
G36
G37-G89
G90-G92
G93
G94
G95
M1-M560
Population
mitomapRCRS
GenBank AF382013.1-AF381981.1
GenBank NC_001807.4
GenBank J01415.1
GenBank AB055387.1
GenBank AF347015.1-AF346963.1
GenBank E27671.1-E27669.1
GenBank D38112.1
GenBank X93334.1
GenBank V00662.1
MtDNA1-mtDNA560
Finnish
Diverse
African
Japanese
Diverse
Japanese
African
Swedish
Diverse from UK, USA
Notes
Reference sequence in this studyc
Population (Finnilä, Lehtonen and Majamaa 2001)
Population (Maca-Meyer et al. 2001)
Latest GenBank reference sequenced
Historical reference sequence
Cardiomyopathy patient (Shin et al. 2000)
Population (Ingman et al. 2000)
Mitochondrial diabetese
Population (Horai et al. 1995)
Population (Arnason, Xu and Gullberg 1996)
Historical reference sequence (Anderson et al. 1981)
Population and patientsf
a
Sequence identifiers used in this study.
Sequence identifiers in public files.
The MITOMAP reference sequence, a modified version of the 2001 Revised CRS (Andrews et al. 1999; available at http://www.mitomap.org).
d
Identical to G37, has 41 differences relative to mitomapRCRS.
e
Sequences with 3243A . G, 3423T . G and 3426A . G but otherwise similar to G95.
f
Includes patients with type 2 diabetes and neurodegenerative disorders (Herrnstadt et al. 2002a; available at http://www.mitokor.com/science/560mtdnas.php).
Transitions at nucleotide position 5262 were added to two MitoKor sequences according to the published erratum (Herrnstadt et al. 2002b). M104 is from the CCL2
HeLa cell culture (Herrnstadt et al. 2002c).
b
c
(G34, G35, G95) were excluded from the analyses, and
sequences G90–G92 were excluded as they only demonstrate variation at positions 3243, 3423, 3426, and 11447
and are otherwise identical to G95, including its errors.
The CCL2 HeLa sequence (M104) was excluded because
of an unusually high rate of divergence (Herrnstadt et al.
2002c). The remaining 840 complete coding region
sequences were compared with the MITOMAP reference
sequence using the diffseq utility of the EMBOSS software
package (Rice, Longden, and Bleasby 2000). The sequences were aligned with the reference sequence, and the
nucleotide data of all sequences with position information
were stored in a relational SQL database. Comparison of
the stored sequences with the original ones by means of
the diffseq utility did not reveal any errors. The nucleotide
sequences of the protein-coding genes were extracted
according to the MITOMAP mtDNA function locations,
and noncoding sections between the genes were ignored.
The SQL query language and the programming language
Perl were used for sequence alignment and subsequence
extraction.
Identification of Nonsynonymous Mutations
Amino acid translations of protein-coding genes were
obtained by the methods provided by the Bio::PrimarySeqI
interface of Bioperl (available at http://www.bioperl.org),
and nonsynonymous changes were subsequently identified. The neighboring changes of each mtDNA variant
were examined in order to identify multiple nucleotide
substitutions within a single codon. Observed nonsense
mutations were verified manually from the original DNA
sequences. Comparison of the amino acid translations of
325 mtDNA mutations with those found in MITOMAP led
to the identification of six discrepancies, whereupon manual examination of the sequences indicated that all were
errors in MITOMAP.
Construction of Reduced-Median Networks
Six reduced-median networks (Bandelt et al. 1995)
were constructed from all the nonsynonymous mtDNA
mutations in the 840 sequences to infer the protein-level
phylogeny in African, Asian, and European haplogroup
clusters. All the coding region variation (Finnilä, Lehtonen, and Majamaa 2001; Herrnstadt et al. 2002a) was used
to assign each sequence to one of the six networks. Sequence assignment was verified by comparing the identified
mutations against those displayed in the published networks. Because the actual content of the Finnish and the
corrected MitoKor sequences was used, the comparison
led to the identification of an unpublished error in the
haplogroup H skeleton network (Herrnstadt et al. 2002a),
in which two sequences had been marked with the wrong
identifiers (45 and 530), which belonged to two haplogroup J sequences. Furthermore, we found that the transition at nucleotide position 14097 in two sequences (F162
and F163) was incorrectly shown as 14096 (Finnilä,
Lehtonen, and Majamaa 2001). Ten GenBank sequences
could not be unambiguously assigned to any of the major
African, Asian, or European haplogroups and were included in the Asian haplogroup cluster network, since they
were of Asian or Pacific origin. The sequences were
converted to a binary data matrix by considering transitions and transversions as distinct entities (Bandelt et al.
1995). Reduced median networks were constructed from
the binary data using Network 2.1 (available at http://
fluxus-engineering.com). All binary characters were
weighted equally, including transitions and transversions,
and the default reduction threshold r ¼ 2 (Bandelt,
Macaulay, and Richards 2000) was used in the analysis.
Characterization of Amino Acid Replacements
The amino acids involved in the nonsynonymous mutations were characterized in terms of six
Network of Nonsynonymous Mutations in mtDNA 1197
physicochemical properties relevant to protein evolution
(Xia and Li 1998), namely polarity, size, isoelectric point,
aliphatic and aromatic nature, and hydropathy. We defined
amino acids with polarity 8.6 (Grantham 1974) as polar,
amino acids with a side chain molecular volume 61Å3
(Grantham 1974) as small, amino acids with an isoelectric
point 7.59 as positively charged, amino acids with an
isoelectric point 3.22 as negatively charged (AlffSteinberger 1969), amino acids with aliphatic side chains
(I, L, and V) as aliphatic, amino acids with aromatic rings
(F, H, W, and Y) as aromatic, and amino acids with
a negative Kyte-Doolittle hydropathy index (Kyte and
Doolittle 1982) as hydrophilic, and those with a positive
index as hydrophobic. Amino acid replacements were then
assigned to categories according to changes in these
physicochemical properties. Furthermore, each replacement was defined as conservative or nonconservative according to the BLOSUM62 matrix used for sequence
comparisons (Henikoff and Henikoff 1992), nonconservative replacements having a negative value in the matrix
(Cargill et al. 1999).
The distribution of mutations within genes was
assessed by identifying hydrophobic and hydrophilic regions of genes. These regions were defined by comparing
the average hydropathy of each 19-amino acid segment to
the mean of all segments for the respective gene. The
average Kyte-Doolittle hydropathy index of 19 neighboring amino acids was calculated for each amino acid
position according to the MITOMAP reference sequence
and by reference to the pepinfo utility of the EMBOSS
package. We used this segment size because it has been
shown to be a good value for identifying transmembrane
regions (Kyte and Doolittle 1982).
Contingency Table Analysis
The nonsynonymous mutations were counted as
differences relative to the reference sequence and without
correcting for multiple hits; that is, each mutation was
counted once regardless of the number of its occurrences
in the networks. This approach results in an underestimate
of the true number of mutations that have occurred during
human mtDNA evolution, but despite this disadvantage,
the method was used here to avoid the confounding effect
of the expected high degree of homoplasy. Private
mutations occupying the peripheral tips of phylogeny
were inferred from alleles that were present in only one
sequence, whereas homoplasic mutations were inferred
from the presence of a mutation in .1 lineages in the
networks. Since each mutation was counted only once, it
was possible to classify each amino acid replacement at
a given sequence position unambiguously as private/
nonprivate and homoplasic/nonhomoplasic. Alternatively,
it could have been possible to infer each occurrence of
a homoplasic mutation from the phylogeny and to count
the occurrences separately, but with this approach the
frequencies of mutation categories among homoplasic
mutations would have been inflated by the subset of
mutations that were highly homoplasic, and would also
have depended on the method and parameters used in the
phylogenetic reconstruction.
The frequencies of the mutation categories among
private amino acid replacements, homoplasic replacements, and replacements in hydrophobic regions were
compared with those among the remaining ones using the
Fisher’s exact test as implemented in R 1.4.1 (Ihaka and
Gentleman 1996; available at http://cran.r-project.org/),
which computes the exact value of P and the conditional
maximum likelihood estimate of the odds ratio. This test
was used because small cell frequencies were expected,
and the two-tailed test was used because no particular
direction of differences was assumed a priori. Sample
estimates of the odds ratio were similar to the reported
conditional maximum likelihood estimates as differences
were observed only in second to fourth decimal positions.
Inflated type I error rate due to multiple comparisons was
assessed by obtaining the adjusted significance level (ac)
from 1 (1ac)n ¼ 0.05, where n is the number of comparisons and 0.05 is the significance level corresponding to
95% confidence limit.
Rate of Detection of New Mutations in European
Sequences
An estimate for the cumulative rate of discovery of
new nonsynonymous mutations in the 647 European sequences was derived by taking 500 random permutations
and examining the sequences contained in each consecutively, calculating for each sequence the cumulative sum of
mutations that had not occurred in the previous sequences.
The sequences were sampled without replacement. The
arithmetic mean of the cumulative sums of the 500
permutations was plotted, and statistical models having an
asymptotic maximum were fitted to this mean curve by the
nonlinear least squares method to provide an estimate of
the total number of nonsynonymous mutations in European mtDNA and to predict the number of sequences
required for identifying most of the mutations.
Results
Mutations in the Protein-Coding Genes of mtDNA
A total of 988 synonymous, 465 nonsynonymous
missense, and 6 nonsense mutations were identified in the
protein-coding genes of 840 complete coding region
mtDNA sequences, when the mutations were counted as
differences relative to the reference sequence. One-third
(32%) of all mutations were nonsynonymous (table 2).
MTATP6 and MTATP8 had the highest proportion of
nonsynonymous mutations from all mutations (52.5% and
51.3%, respectively), whereas MTND4L, MTCO2, and
MTND3 had the lowest (19.2%, 22.2%, and 22.2%, respectively). Several sequences were detected in which two
mutations co-occurred in one codon including 4769A . G
and 4767A . G in sequences M175, M222, M385, and
M409; 8574C . T and 8572G . A in M533; 8703C . T
and 8701A . G in G42; 10400C . T and 10398A . G in
53 sequences; and 14767T . C and 14766C . T in M455.
All these pairs consisted of a nonsynonymous and
a synonymous mutation. Furthermore, each amino acid
replacement resulted from a specific nucleotide substitution, as we found no instances where two different
1198 Moilanen and Majamaa
Table 2
Synonymous and Nonsynonymous Mutations in the 840
mtDNA Sequences, by Genes
Gene
MTND1
MTND2
MTCO1
MTCO2
MTATP8
MTATP6
MTCO3
MTND3
MTND4L
MTND4
MTND5
MTND6
MTCYB
Total
Lengtha
Synonymousb
Nonsynonymousc
956
1,042
1,542
684
207
681
784
346
297
1,378
1,812
525
1,141
78
91
122
63
19
57
67
35
21
124
158
54
105
38
36
40
18
20
63
32
10
5
38
75
29
70
116
127
162
81
39
120
99
45
26
162
233
83
175
11,341
988
471
1,459
Total
NOTE.—MTND, NADH dehydrogenase; MTCO, cytochrome c oxidase;
MTATP, ATP synthase; MTCYB, cytochrome b.
a
Gene length in nucleotides.
b
Number of synonymous mutations.
c
Number of nonsynonymous mutations. Mutations were counted as differences relative to the reference sequence and without correcting for multiple hits.
Sums over all genes do not equal the totals due to overlapping regions between
MTATP6 and MTATP8, MTATP6 and MTCO3, and MTND4 and MTND4L.
nonsynonymous mutations had caused an identical amino
acid change. The most common amino acid replacement
was an A-T change in either direction, followed in decreasing frequency by I-V, I-T, and F-L (fig. 1).
Nonsense Mutations
Two mutations in the initiator codon of the MTND1
gene were identified. 3308T . C was present in 10
sequences (G20, G66, M158, M165, M192, M215, M293,
M379, M386, and M514). This mutation has been identified in the chimpanzee (Arnason, Xu, and Gullberg
1996), but in humans it was originally reported in a patient
FIG. 1.—Matrix of amino acid replacements for the 840 mtDNA
sequences. The area of each circle is proportional to the frequency of
distinct replacements between the respective amino acids. For reference,
the number of replacements between T and A is 98, and that between K
and N is 1.
FIG. 2.—Collapsed network of the continent-specific major haplogroup clusters. The central haplotype from each cluster is shown. Each
dashed rectangle indicates the figure containing the expanded network for
the respective cluster. The mutations are shown as amino acid changes
relative to the MITOMAP reference sequence (refseq). Outgroup,
sequence G37. þ, a homoplasic mutation.
with bilateral striatal necrosis and MELAS (Campos et al.
1997). However, experimental data have suggested that
3308T . C does not affect the synthesis of the MTND1
polypeptide and that any methionine codon close to the 59
end of a mitochondrial mRNA may serve as a translation
initiator (Fernandez-Moreno et al. 2000). Our phylogenetic
analysis indicated that the mutation represents a polymorphism in the African haplogroup L1b, as suggested
previously (Rocha et al. 1999), but it was also present in
another branch of the African network, indicating that it
has arisen more than once. 3308T . A, resulting in a codon
for lysine, was found in the sequence M170, which also
harbored 3312dupC, a single-nucleotide duplication in the
second codon. The sequence of the first three codons was
therefore AAACCCCATG instead of ATACCCATG. A
third initiator codon mutation was observed in MTND5,
where 12338T . C (M339) led to a codon for threonine.
Methionine occupies position 3 in MTND1 and MTND5
and probably serves as a translation initiator in the
presence of [3308T . A; 3312dupC] and 12338T . C.
Network of Nonsynonymous Mutations in mtDNA 1199
FIG. 3.—Reduced median network of nonsynonymous mutations in Asian haplogroup clusters. The mutations are shown as amino acid changes
relative to the MITOMAP reference sequence (refseq). Outgroup, sequence G37. Squares, links to the networks of other haplogroup clusters. @, a back
mutation; þ, a homoplasic mutation; CM, cardiomyopathy. The weights of all the characters in the analysis were equal. Some branch lengths have been
distorted to increase legibility. Sequence identifiers are shown inside the nodes. F, Finnish sequences; M, MitoKor sequences. The origin of each
GenBank sequence, denoted with the letter G, is given next to the sequence identifier. PNG, Papua-New Guinea.
Eight sequences (F45, F46, F47, F48, F49, G46, M6,
and M426) harbored 7444G . A in the stop codon of
MTCO1, leading to the translation of KQK, which has
been suggested to increase the penetrance of primary
mutations in Leber’s hereditary optic neuropathy (Brown
et al. 1995). A single-nucleotide deletion 6577delG in the
middle of MTCO1 in G36 led to G225E and caused a
premature termination of translation with an open reading
1200 Moilanen and Majamaa
FIG. 4.—Reduced median network of nonsynonymous mutations in the African haplogroup cluster. See the legend of figure 3 for explanation of
symbols.
frame for 28 amino acids (EETPFYTNTYSDFSVTLKFMFLSYQASE). The sequence G36 also harbored 12192G .
A, which has been reported to be associated with cardiomyopathy (Shin et al. 2000; MIM 590040), although this
variant is a polymorphism in the Finnish population
(Finnilä, Lehtonen, and Majamaa 2001). Assuming that
the frameshift mutation 6577delG (G225fsX28) is not an
error in the published sequence, the mutation might
provide an alternative explanation for cardiomyopathy in
G36.
Reduced-Median Networks of Nonsynonymous Mutations
Reduced-median networks of Asian and African
haplogroups and the European haplogroup clusters IWX,
Network of Nonsynonymous Mutations in mtDNA 1201
FIG. 5.—Reduced median network of nonsynonymous mutations in the European haplogroup cluster IWX. See the legend of figure 3 for
explanation of symbols.
KU, JT, and HV were constructed using information on all
nonsynonymous mutations in the 840 sequences and by
placing the African sequence G37 as an outgroup. The
African, Asian, and European major haplogroups were
found to be closely related in their amino acid sequences
(fig. 2). The center of the Asian network consisted of
a reticulation formed by MTATP6:A20T, MTCYB:S172N,
and MTATP6:T59A (fig. 3). The central node of
haplogroup L (fig. 4) and the common root of haplogroups
D, E, and M were found to belong to this reticulation and
had an identical amino acid sequence. Only two amino
acid changes separated haplogroups C and Z from L. The
central nodes of the European haplogroup clusters IWX
(fig. 5) and KU (fig. 6) and the Asian haplogroup B1
had an identical amino acid sequence, which was separated from haplogroup L by MTATP6:T59A and MTND3:
T114A. Additional amino acid replacements separated the
other European haplogroup clusters JT (fig. 7) and HV
(fig. 8) and the Asian haplogroups A and B2 from this
node. The major haplogroups in all the ethnic groups were
clearly discernible. Amino acid sequences formed highly
starlike phylogenies with major center nodes in all the
haplogroup clusters. Thirteen of the 20 amino acid
replacements that distinguished the major haplogroups
were homoplasic (fig. 2) and 18 of 20 were conservative.
MTND4:P140S, and MTCYB:T7I were homoplasic and
nonconservative.
Characterization of Amino Acid Replacements
Half of the amino acid replacements (48%) involved
a change in polarity, and hydropathy was changed in 44%
of the replacements. Only four replacements (MTND1:
L289Q, MTATP6:V21E, MTND5:Q546L, and MTND5:
L555Q) involved changes between the seven most hydrophilic amino acids and the seven most hydrophobic
ones, defined as a change of at least –1.7 to þ1.7 or vice
versa on the Kyte-Doolittle scale. Changes in polarity and
1202 Moilanen and Majamaa
FIG. 6.—Reduced median network of nonsynonymous mutations in the European haplogroup cluster KU. See the legend of figure 3 for
explanation of symbols.
hydropathy were followed in decreasing frequency by
changes in aliphaticity (32%), size (26%), aromaticity
(13%), and charge (8.3%). Of the amino acid replacements, 133 (28%) were nonconservative according to the
BLOSUM62 matrix (table 3).
The distribution of amino acid replacements among
the 13 protein-coding genes suggested that the mutations
were not distributed randomly across or between genes
(fig. 9). The mutations were quite evenly distributed in
MTATP6, MTATP8, MTCO3, MTND3, MTND4L, and
Network of Nonsynonymous Mutations in mtDNA 1203
FIG. 7.—Reduced median network of nonsynonymous mutations in the European haplogroup cluster JT. See the legend of figure 3 for explanation
of symbols.
MTND6, whereas each of the remaining genes had at least
one region which appeared relatively conserved as compared to the other regions of the gene. Apparent mutational
hotspots, or nonconstrained regions, were identified in
both hydrophobic and hydrophilic regions. An excess of
amino acid replacements in MTND6 (22/29, 76%) were
private (P ¼ 0.03, Fisher’s exact test), but no comparable
deviations from the expected proportion of 56% were
identified among the other genes.
Contingency Table Analysis
Of the replacements, 261 (56%) were private,
whereas 207 replacements (44%) were present in more
1204 Moilanen and Majamaa
FIG. 8.—Reduced median network of nonsynonymous mutations in the European haplogroup cluster HV. See the legend of figure 3 for
explanation of symbols. Inset, additional nodes with private amino acid changes and connecting only to the center of the network (‘‘HV’’).
than one sequence. Nonconservative changes were more
common among the private replacements than among the
nonprivate ones (P ¼ 0.005, Fisher’s exact test). Changes
in size, charge, aliphaticity, and aromaticity were also
more common among the private replacements than
among the nonprivate ones, but these differences were
not significant (table 4).
Of the 468 amino acid replacements, 116 (25%) were
homoplasic, indicating that they had arisen multiple times
during human evolution. Nonconservative changes were
Network of Nonsynonymous Mutations in mtDNA 1205
Table 3
Properties of the 468 Amino Acid Replacements Detected in the 840 mtDNA Sequences
Na
Direction of changeb
Polarity
224 (.48)
Size
123 (.26)
Hydropathy
207 (.44)
Polar fi nonpolar
Nonpolar fi polar
Small fi large
Large fi small
Hydrophobic fi hydrophilic
Hydrophilic fi hydrophobic
Neutral fi positive
Neutral fi negative
Positive fi neutral
Positive fi negative
Negative fi neutral
Negative fi positive
Aliphatic fi nonaliphatic
Nonaliphatic fi aliphatic
Aromatic fi nonaromatic
Nonaromatic fi aromatic
Category of Change
Charge
39 (.08)
Aliphaticity
151 (.32)
Aromaticity
59 (.13)
Nonconservativec
Privated
Homoplasice
Hydrophobic locationf
Hydrophilic locationg
133
261
116
239
192
Na
104
120
53
70
112
95
10
13
8
0
6
2
82
69
36
23
(.22)
(.26)
(.11)
(.15)
(.24)
(.20)
(.02)
(.03)
(.02)
(0)
(.01)
(.004)
(.18)
(.15)
(.08)
(.05)
(.28)
(.56)
(.25)
(.51)
(.41)
a
Number of mutations in the category. Proportion from the total number of mutations is shown in parentheses.
Direction is shown relative to the reference sequence.
Mutation with a negative value in the BLOSUM62 matrix.
d
Mutation observed in only one sequence.
e
Mutation observed in 2 lineages.
f
Average hydropathy index of 19 neighboring amino acids is higher than the mean for the respective gene.
g
Average hydropathy index is lower than the mean of the respective gene.
b
c
less common than expected among the homoplasic
replacements (P ¼ 0.002). A change from an aliphatic to
a nonaliphatic amino acid or vice versa occurred in 25
homoplasic replacements (22%) and in 126 (36%) of the
non-homoplasic ones (P ¼ 0.004), while an aromatic
amino acid was replaced by a nonaromatic one or vice
versa in 8 homoplasic replacements (7%) and in 51 (14%)
of the non-homoplasic ones (P ¼ 0.04). Replacements
between small and large amino acids were also less common in the homoplasic group (P ¼ 0.04). The other types
of changes did not differ in frequency between the homoplasic and non-homoplasic replacements (table 4).
The mean hydropathy indices were 1.006 for
MTATP6, –0.401 for MTATP8, 0.725 for MTCO1, 0.432
for MTCO2, 0.411 for MTCO3, 0.673 for MTCYB, 0.662
for MTND1, 0.596 for MTND2, 1.075 for MTND3, 0.705
for MTND4, 1.376 for MTND4L, 0.563 for MTND5, and
1.036 for MTND6. The average hydropathy calculated for
19 neighboring amino acids was not defined for 37 amino
acid replacements that were near either end of the subunit.
239 (55%) of the remaining 431 replacements were among
the 1,843 positions located in regions that were more
hydrophobic than the mean, whereas 192 (45%) were
among the 1,712 positions located in the hydrophilic
regions. The amino acid replacements in hydrophobic
regions altered the amino acid charge less often than those
in hydrophilic regions and were more often conservative,
whereas replacements between aliphatic and nonaliphatic
amino acids were more frequent among those in hydrophobic regions than among those in hydrophilic regions
(table 4). Amino acid content between the hydrophobic
and hydrophilic regions differed, because 103/381 (27%)
of the charged amino acids (D, E, H, K, R) and 697/1,065
(65%) of the aliphatic amino acids (I, L, V) in the reference sequence were found to be located in hydrophobic
regions of genes.
Rate of Detection of New Mutations in 647 European
Sequences
Because private replacements were common among
the 840 sequences, we set out to estimate the total number
of nonsynonymous mutations that may be present in the
population. The rate of detection of new mutations was
calculated from 500 permutations of the 647 European
sequences harboring 301 distinct nonsynonymous mutations. The Weibull growth curve provided the best fit with
the mean of the cumulative sums (fig. 10). The asymptotic
maximum of the number of nonsynonymous mutations in
European mtDNA was estimated to be 1,081 (standard
error 7.3). The 301 mutations detected in 647 European
sequences therefore encompass approximately 28% of all
nonsynonymous mutations that may be present in
European populations. Assuming that mutation identification continues to follow the estimated model, 12,200
sequences will be required to identify 90% of the 1,081
mutations and 18,100 sequences to identify 95%. Similar
predictions for non-European sequences were not feasible
because of the small number of Asian and African sequences known.
Discussion
We found 1,459 distinct mutations in the proteincoding genes of 840 complete human mtDNA coding
1206 Moilanen and Majamaa
FIG. 9.—Distribution of amino acid replacements and hydropathic regions in the 13 mtDNA-encoded proteins. The x-axis shows the amino acid
position, and the y-axis shows a common scale for hydropathy and amino acid dissimilarity. Curve, the average Kyte-Doolittle hydropathy index for 19
neighboring amino acids; positive values indicate hydrophobic regions. 3, private replacement; þ, homoplasic replacement; 8, other replacement.
Negative values for amino acid replacements indicate nonconservative changes and positive values indicate conservative changes according to the
BLOSUM62 matrix. Histogram, the number of distinct amino acid changes within a window of 50 amino acid positions plotted at the median position
of the window. One unit on the y-axis scale corresponds to 10 amino acid changes.
region sequences, when the mutations were counted as
differences relative to the reference sequence. One-third of
the mutations were nonsynonymous. The frequency of
changes in the physicochemical properties of the respective amino acids was high, suggesting that such changes
are quite common in human mtDNA and that evaluation
of the pathogenicity of an amino acid replacement should
not rely solely on these structural considerations.
The differences between the frequencies of the
particular types of changes are inherent consequences of
differences in the frequencies of individual amino acid
replacements (fig. 1), which in turn depend on several
factors, including sequence composition (Naylor, Collins,
and Brown 1995), variable substitution rates and selective
constraints among sites and substitutions (Xia 1998;
Tourasse and Li 2000; McClellan and McCracken 2001),
and the tendency of the genetic code to prefer substitutions
between similar amino acids over dissimilar ones (Haig
and Hurst 1991). The mitochondrial genome differs from
nuclear genes in several properties, including amino acid
composition (Naylor, Collins, and Brown 1995) and
genetic code (Barrell, Bankier, and Drouin 1979; Knight,
Landweber, and Yarus 2001). The proportion of nonconservative amino acid replacements out of all replace-
Network of Nonsynonymous Mutations in mtDNA 1207
Table 4
Comparisons of Categories of the 468 Amino Acid Replacements
Category of Changea
Polarity
Size
Hydropathy
Charge
Aliphaticity
Aromaticity
Nonconservative
Hydrophobic location
Private
N ¼ 261
b
122
74
110
25
93
37
88
141
61
22
57
11
25
8
20
51
117
64
110
8
94
29
56
102
49
97
14
58
22
45
98
(.49)
(.24)
(.47)
(.07)
(.28)
(.11)
(.22)
(.47)
ORc
95% CId
P Valuee
0.90
1.28
0.83
1.46
1.42
1.39
1.83
1.24
0.62–1.32
0.82–1.99
0.56–1.21
0.71–3.13
0.94–2.16
0.77–2.56
1.18–2.85
0.83–1.86
0.64
0.29
0.35
0.31
0.09
0.27
0.005*
0.28
1.29
0.58
1.30
1.21
0.49
0.44
0.44
0.73
0.83–2.00
0.33–1.00
0.83–2.03
0.53–2.62
0.29–0.82
0.17–0.97
0.25–0.76
0.46–1.17
0.28
0.04*
0.24
0.57
0.004*
0.04*
0.002**
0.17
1.09
1.04
1.10
0.22
2.05
0.84
0.61
0.73–1.62
0.66–1.64
0.73–1.64
0.08–0.52
1.32–3.21
0.46–1.55
0.39–0.96
0.70
0.91
0.70
0.0001**
0.0009**
0.57
0.02*
Nonhomoplasic
N ¼ 352
(.53)
(.19)
(.49)
(.09)
(.22)
(.07)
(.17)
(.44)
Hydrophobic
Location N ¼ 239
Polarity
Size
Hydropathy
Charge
Aliphaticity
Aromaticity
Nonconservative
b
(.47)
(.28)
(.42)
(.10)
(.36)
(.14)
(.34)
(.54)
Homoplasic
N ¼ 116
Polarity
Size
Hydropathy
Charge
Aliphaticity
Aromaticity
Nonconservative
Hydrophobic location
Nonprivate
N ¼ 207
163
101
150
28
126
51
113
188
(.46)
(.29)
(.43)
(.08)
(.36)
(.14)
(.32)
(.53)
Hydrophilic
Location N ¼ 192
(.49)
(.27)
(.46)
(.03)
(.39)
(.12)
(.23)
90
50
84
26
46
27
64
(.47)
(.26)
(.44)
(.14)
(.24)
(.14)
(.33)
a
See the footnote to table 3 for explanation of categories.
Number of amino acid replacements of the respective type. Proportions are shown in parentheses.
c
Odds ratio.
d
95% confidence interval for odds ratio.
e
Probability of the null hypothesis that OR is 1 (Fisher’s exact test).
* P , 0.05. ** P , 0.00223, which corresponds to the 95% significance level adjusted for multiple comparisons.
b
ments (28.4%) was nevertheless not appreciably different
from that in 106 nuclear genes (Cargill et al. 1999), where
36% were nonconservative (odds ratio 1.4, 95% confidence interval 0.95–2.03, P ¼ 0.07; Fisher’s exact test).
The reduced-median networks of the nonsynonymous
mutations provided a comprehensive description of the
intraspecies protein-level phylogeny in humans. The
phylogenetic signal of synonymous mutations was lost,
because only the nonsynonymous mutations were considered, but the various haplogroups were still discernible.
Disregarding synonymous mutations may even improve
the accuracy of a phylogenetic network (Naylor and
Brown 1997). Many branches in the full networks (Finnilä,
Lehtonen, and Majamaa 2001; Herrnstadt et al. 2002a)
contain at least one nonsynonymous mutation, and the
branches were also shown clearly in the present networks.
Exceptions to this pattern included the root of haplogroups
H and V, which was a single node, because all the
nucleotide differences between these haplogroups were
synonymous. Furthermore, the central nodes of several
major haplogroups (U2 and B1; L and the root of D, E, and
M) had identical amino acid sequences.
The major haplogroups were found to be closely
related in their amino acid sequences, with relatively few
replacements separating their center nodes, but the
variation within haplogroups was high, resulting in starlike
phylogenies. More than half of the observed amino acid
changes were present in only one sequence, giving rise to
rare amino acid haplotypes. This finding is analogous to
earlier observations of an excess of nonsynonymous mutations within species, as compared with variation between
species (Nachman et al. 1996; Rand and Kann 1996;
Hasegawa, Cao, and Yang 1998; Nachman 1998; Fry
1999). This is usually assumed to result from purifying
selection against slightly deleterious alleles, which prevents their fixation. Such mildly deleterious mutations
should reside in the periphery of phylogenetic networks.
This hypothesis was supported by the present comparison
of private replacements and nonprivate ones, which
revealed that nonconservative changes are more frequent
among the private replacements.
The frequency of homoplasic mutations in human
mtDNA has been found to be high (Finnilä, Lehtonen, and
Majamaa 2001; Herrnstadt et al. 2002a). We found here
that homoplasy among nonsynonymous mutations is also
common, as one-fourth of all amino acid replacements
were homoplasic. Interestingly, the homoplasic replacements included fewer nonconservative replacements and
replacements involving small, aliphatic, and aromatic
amino acids. This observation suggests that physicochem-
1208 Moilanen and Majamaa
FIG. 10.—Identification of new nonsynonymous mutations in 647
European sequences. 500 permutations of the sequence order (Index) and
the cumulative sum of mutations not observed in previous sequences in
each permutation were obtained. Solid curve, the mean of the 500
cumulative sum curves. The largest and lowest value of nonsynonymous
mutations at the corresponding index observed in any permutation are
shown above and below the mean curve. Dashed curve, the Weibull
growth curve y ¼ a – b exp[–exp(d) xe] fitted to the mean curve by the
nonlinear least squares method and using all 647 data points. The fitted
curve with parameters a ¼ 1080.84 (SE 7.3), b ¼ 1080.31 (SE 7.3), d ¼
–5.435 (SE 0.0033), and e ¼ 0.6664 (SE 0.00075) is superimposed almost
perfectly on the mean curve (residual sum of squares ¼ 16.86). a indicates
the asymptotic maximum of the Weibull growth curve.
ical properties determine, at least in part, whether amino
acid replacements are removed by selection or whether
they persist long enough to be observed in separate
lineages in the phylogeny—that is, whether they become
homoplasic. Homoplasic replacements are therefore not
confined exclusively to nonconstrained amino acid positions. Most ancient amino acid replacements distinguishing
the major haplogroups were observed in other parts of
the phylogeny as well, and all but two were conservative,
which is consistent with their neutrality.
Our findings support the assumption that amino acid
replacements resulting in dissimilar amino acid properties
are generally more deleterious than replacements resulting
in similar properties. However, the effects of nonsynonymous mutations depend also on the position of the amino
acid replacement in the protein sequence. Nonsynonymous
mutations were found to occupy both hydrophobic and
hydrophilic regions of genes, when the regions were defined according to the average hydropathy for the
respective gene. Mutations in hydrophobic regions involved less changes in charge and more changes in
aliphaticity than expected and were less often nonconservative than mutations in hydrophilic regions; but such
differences are confounded by the differences in the amino
acid composition of the respective regions. Even if it is
accepted that the hydrophobic regions may be generally
more conserved than hydrophilic regions (Naylor, Collins,
and Brown 1995), the distribution of amino acid changes
among genes (fig. 9) suggested that not all hydrophobic
regions are alike. Several amino acid replacements were
identified in the fifth, eleventh, and twelfth hydrophobic
domains of MTCO1, for example, but none were identified
in the seventh or eighth.
Although it may eventually be possible to determine
the degree and nature of the constraints on each region,
and perhaps even on each position in mtDNA, the distribution of nonsynonymous mutations along the genes is
still relatively sparse, suggesting that even larger numbers
of sequences and polymorphisms will be required for
detailed identification and characterization of functionally
constrained and nonconstrained regions in human mtDNA.
The cumulative rate of detection of new nonsynonymous
mutations in European sequences was found to follow the
Weibull growth curve model, the estimated parameters
suggesting that 193 the current number of mtDNA
sequences will be required to identify 90% of the nonsynonymous mutations that may be present in European
populations.
In conclusion, the results of this descriptive analysis
of 471 nonsynonymous mutations showed that nonconservative changes were more common among private replacements and nonhomoplasic replacements than among
nonprivate and homoplasic ones, and that a similar trend
was evident in certain physicochemical characteristics of
replacements, suggesting a role for selection against these
in the evolution of the protein-coding genes of mtDNA.
Selection presumably varies between genes, functional
domains, and sites, however, and even more sequences
will be required for reliable mapping of constrained and
nonconstrained regions. Assessment of the pathogenicity
of an amino acid change should not rely on single
structural considerations, because changes in physicochemical properties such as hydropathy, size, charge, and
polarity are common in the mtDNA-encoded proteins in
human. The entire mtDNA genome should be screened to
exclude other mutations when a particular variant is
suspected of being pathogenic, and a population-genetic
approach should be adopted to recognize neutral variants
that are present in populations. The reduced-median networks and the tabulation of physicochemical properties of
amino acid changes presented here should therefore also
have practical applications.
Supplementary Material
The complete table of nonsynonymous mutations in
the 840 sequences, their amino acid translations, and their
physicochemical properties is provided as online Supplementary Material. Links to updated versions of the table
may appear at http://cc.oulu.fi/;jukkamoi/mtres/.
Network of Nonsynonymous Mutations in mtDNA 1209
Acknowledgments
This work was supported by grants from the Sigrid
Juselius Foundation, the Maud Kuistila Memorial Foundation, and the Research Council for Health, Academy of
Finland.
Literature Cited
Alff-Steinberger, C. 1969. The genetic code and error transmission. Proc. Natl. Acad. Sci. USA 64:584–591.
Anderson, S., A. T. Bankier, B. G. Barrell et al. (14 co-authors).
1981. Sequence and organization of the human mitochondrial
genome. Nature 290:457–465.
Andrews, R. M., I. Kubacka, P. F. Chinnery, R. N. Lightowlers,
D. M. Turnbull, and N. Howell. 1999. Reanalysis and revision
of the Cambridge reference sequence for human mitochondrial DNA. Nat. Genet. 23:147.
Arnason, U., X. Xu, and A. Gullberg. 1996. Comparison between
the complete mitochondrial DNA sequences of Homo and the
common chimpanzee based on nonchimeric sequences. J. Mol.
Evol. 42:145–152.
Bandelt, H. J., P. Forster, B. C. Sykes, and M. B. Richards. 1995.
Mitochondrial portraits of human populations using median
networks. Genetics 141:743–753.
Bandelt, H. J., V. Macaulay, and M. Richards. 2000. Median
networks: speedy construction and greedy reduction, one
simulation, and two case studies from human mtDNA. Mol.
Phylogenet. Evol. 16:8–28.
Barrell, B. G., A. T. Bankier, and J. Drouin. 1979. A different
genetic code in human mitochondria. Nature 282:189–194.
Brown, M. D., A. Torroni, C. L. Reckord, and D. C. Wallace.
1995. Phylogenetic analysis of Leber’s hereditary optic
neuropathy mitochondrial DNA’s indicates multiple independent occurrences of the common mutations. Hum. Mutat. 6:
311–325.
Campos, Y., M. A. Martin, J. C. Rubio, M. C. Gutierrez del
Olmo, A. Cabello, and J. Arenas. 1997. Bilateral striatal
necrosis and MELAS associated with a new T3308C mutation
in the mitochondrial ND1 gene. Biochem. Biophys. Res.
Commun. 238:323–325.
Cargill, M., D. Altshuler, J. Ireland et al. (17 co-authors). 1999.
Characterization of single-nucleotide polymorphisms in
coding regions of human genes. Nat. Genet. 22:231–238.
Chinnery, P. F., G. A. Taylor, N. Howell, R. M. Andrews, C. M.
Morris, R. W. Taylor, I. G. McKeith, R. H. Perry, J. A.
Edwardson, and D. M. Turnbull. 2000. Mitochondrial DNA
haplogroups and susceptibility to AD and dementia with
Lewy bodies. Neurology 55:302–304.
De Benedictis, G., G. Rose, G. Carrieri et al. (13 co-authors).
1999. Mitochondrial DNA inherited variants are associated
with successful aging and longevity in humans. FASEB J.
13:1532–1536.
Elson, J. L., R. M. Andrews, P. F. Chinnery, R. N. Lightowlers,
D. M. Turnbull, and N. Howell. 2001. Analysis of European mtDNAs for recombination. Am. J. Hum. Genet. 68:
145–153.
Fernandez-Moreno, M. A., B. Bornstein, Y. Campos, J. Arenas,
and R. Garesse. 2000. The pathogenic role of point mutations
affecting the translational initiation codon of mitochondrial
genes. Mol. Genet. Metab. 70:238–240.
Finnilä, S., I. E. Hassinen, L. Ala-Kokko, and K. Majamaa. 2000.
Phylogenetic network of the mtDNA haplogroup U in
Northern Finland based on sequence analysis of the complete
coding region by conformation-sensitive gel electrophoresis.
Am. J. Hum. Genet. 66:1017–1026.
Finnilä, S., M. S. Lehtonen, and K. Majamaa. 2001. Phylogenetic
network for European mtDNA. Am. J. Hum. Genet. 68:1475–
1484.
Fry, A. J. 1999. Mildly deleterious mutations in avian mitochondrial DNA: evidence from neutrality tests. Evolution 53:
1617–1620.
Grantham, R. 1974. Amino acid difference formula to help explain protein evolution. Science 185:862–864.
Haig, D., and L. D. Hurst. 1991. A quantitative measure of
error minimization in the genetic code. J. Mol. Evol. 33:412–
417.
Hasegawa, M., Y. Cao, and Z. Yang. 1998. Preponderance of
slightly deleterious polymorphisms in mitochondrial DNA:
nonsynonymous/synonymous rate ratio is much higher within
species than between species. Mol. Biol. Evol. 15:1499–1505.
Hedges, S. B. 2000. A start for population genomics. Nature
408:652–653.
Henikoff, S., and J. G. Henikoff. 1992. Amino acid substitution
matrices from protein blocks. Proc. Natl. Acad. Sci. USA
89:10915–10919.
Herrnstadt, C., J. L. Elson, E. Fahy et al. (11 co-authors). 2002a.
Reduced-median-network analysis of complete mitochondrial
DNA coding-region sequences for the major African, Asian,
and European haplogroups. Am. J. Hum. Genet. 70:1152–
1171.
———. 2002b. Reduced-median-network analysis of complete
mitochondrial DNA coding-region sequences for the major
African, Asian, and European haplogroups [erratum]. Am. J.
Hum. Genet. 71:448–449.
Herrnstadt, C., G. Preston, R. Andrews, P. Chinnery, R. N.
Lightowlers, D. M. Turnbull, I. Kubacka, and N. Howell.
2002c. A high frequency of mtDNA polymorphisms in HeLa
cell sublines. Mutat. Res. 501:19–28.
Horai, S., K. Hayasaka, R. Kondo, K. Tsugane, and N. Takahata.
1995. Recent African origin of modern humans revealed by
complete sequences of hominoid mitochondrial DNAs. Proc.
Natl. Acad. Sci. USA 92:532–536.
Ihaka, R., and R. Gentleman. 1996. R: a language for data
analysis and graphics. J. Comp. Graph. Stat. 5:299–314.
Ingman, M., H. Kaessmann, S. Pääbo, and U. Gyllensten. 2000.
Mitochondrial genome variation and the origin of modern
humans. Nature 408:708–713.
Kimura, M. 1968. Evolutionary rate at the molecular level.
Nature 217:624–626.
Knight, R. D., L. F. Landweber, and M. Yarus. 2001. How
mitochondria redefine the code. J. Mol. Evol. 53:299–313.
Kyte, J., and R. F. Doolittle. 1982. A simple method for
displaying the hydropathic character of a protein. J. Mol. Biol.
157:105–132.
Maca-Meyer, N., A. M. Gonzáles, J. M. Larruga, C. Flores, and
V. M. Cabrera. 2001. Major genomic mitochondrial lineages
delineate early human expansions. BMC Genetics 2:13.
McClellan, D. A., and K. G. McCracken. 2001. Estimating the
influence of selection on the variable amino acid sites of the
cytochrome b protein functional domains. Mol. Biol. Evol.
18:917–925.
Nachman, M. W. 1998. Deleterious mutations in animal
mitochondrial DNA. Genetica 102–103:61–69.
Nachman, M. W., W. M. Brown, M. Stoneking, and C. F.
Aquadro. 1996. Nonneutral mitochondrial DNA variation in
humans and chimpanzees. Genetics 142:953–963.
Naylor, G. J., and W. M. Brown. 1997. Structural biology and
phylogenetic estimation. Nature 388:527–528.
Naylor, G. J., T. M. Collins, and W. M. Brown. 1995.
Hydrophobicity and phylogeny. Nature 373:565–566.
Ohta, T. 1992. The nearly neutral theory of molecular evolution.
Annu. Rev. Ecol. Syst. 23:263–286.
1210 Moilanen and Majamaa
Parsons, T. J., D. S. Muniec, K. Sullivan et al. (11 co-authors).
1997. A high observed substitution rate in the human
mitochondrial DNA control region. Nat. Genet. 15:
363–368.
Rand, D. M., and L. M. Kann. 1996. Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes
from Drosophila, mice, and humans. Mol. Biol. Evol. 13:735–
748.
Rice, P., I. Longden, and A. Bleasby. 2000. EMBOSS: the
European molecular biology open software suite. Trends.
Genet. 16:276–277.
Rocha, H., C. Flores, Y. Campos, J. Arenas, L. Vilarinho, F. M.
Santorelli, and A. Torroni. 1999. About the ‘‘pathological’’
role of the mtDNA T3308C mutation. . . Am. J. Hum. Genet.
65:1457–1459.
Ruiz-Pesini, E., A. C. Lapena, C. Diez-Sanchez et al. (11 coauthors). 2000. Human mtDNA haplogroups associated with
high or reduced spermatozoa motility. Am. J. Hum. Genet.
67:682–696.
Shin, W. S., M. Tanaka, J. Suzuki, C. Hemmi, and T. Toyo-oka.
2000. A novel homoplasmic mutation in mtDNA with a single
evolutionary origin as a risk factor for cardiomyopathy. Am. J.
Hum. Genet. 67:1617–1620.
Tourasse, N. J., and W. H. Li. 2000. Selective constraints, amino
acid composition, and the rate of protein evolution. Mol. Biol.
Evol. 17:656–664.
Wallace, D. C., M. D. Brown, and M. T. Lott. 1999. Mitochondrial DNA variation in human evolution and disease.
Gene 238:211–230.
Xia, X. 1998. The rate heterogeneity of nonsynonymous substitutions in mammalian mitochondrial genes. Mol. Biol. Evol.
15:336–344.
Xia, X., and W. H. Li. 1998. What amino acid properties affect
protein evolution? J. Mol. Evol. 47:557–564.
Wolfgang Stephan, Associate Editor
Accepted March 10, 2003