Document related concepts

Gene expression profiling wikipedia , lookup

SNP genotyping wikipedia , lookup

Mutation wikipedia , lookup

Genetic code wikipedia , lookup

DNA damage theory of aging wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Genome (book) wikipedia , lookup

Nucleosome wikipedia , lookup

Minimal genome wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Genetic engineering wikipedia , lookup

DNA vaccination wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Transposable element wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Molecular cloning wikipedia , lookup

DNA supercoil wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Epigenomics wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Oncogenomics wikipedia , lookup

DNA barcoding wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

RNA-Seq wikipedia , lookup

Replisome wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Primary transcript wikipedia , lookup

Metagenomics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Human genome wikipedia , lookup

Designer baby wikipedia , lookup

Genome evolution wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Genomic library wikipedia , lookup

NUMT wikipedia , lookup

History of genetic engineering wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Gene wikipedia , lookup

Non-coding DNA wikipedia , lookup

Point mutation wikipedia , lookup

Genomics wikipedia , lookup

Microevolution wikipedia , lookup

Microsatellite wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Genome editing wikipedia , lookup

Human mitochondrial genetics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

A Thesis
Presented To
The Faculty of Graduate Studies
The University of Guelph
in partial fuifilment of requirements
for the degree of
Master of Science
August, 1997
O Richard J. Barrette, 1997
National Library
Bibliothèque nationale
du Canada
Acquisitions and
Bibliographie Services
Acquisitions et
services bibliographiques
395 Wellington Street
Ottawa ON K1A ON4
395. tue Wellington
OltawaON KlAON4
The author has granted a nonexclusive licence ailowing the
National Lïbrary of Canada to
reproduce, loan, distri'bute or sell
copies of this thesis in microform,
paper or electronic formats.
L'auteur a accordé une Licence non
exclusive permettant à la
Bibliothèque nationale du Canada de
reproduire, prêter, disiriiuer ou
vendre des copies de cette thèse sous
la forme de microfiche/film, de
reproduction sur papier ou sur foxmat
The author retains ownership of the
copyright in this thesis. Neither the
thesis nor substantial extracts fiom it
may be printed or otherwise
reproduced without the author's
L'auteur conserve la propriété du
droit d'auteur qui protège cette thèse.
Ni la thèse ni des extraits substantiels
de celle-ci ne doivent être imprimés
ou autrement reproduits sans son
Richard J. Barrette
University of Guelph, 1997
Professor TJ. Crease
An investigation of the mitochondrid DNA molecule of the pea aphid,
Acyrthosiphon pistim, was undertaken using PCR amplification, Southern blotting and
DNA sequence analysis. The results showed that pea aphid genome content and
organization is very sirnilrir to that of Drosophila yak~tba,Anopheles quudrirnacdan~and
Apis rnellifera. Codon usage is highiy biased in favour of codons rich in adenine and
thymine. Pea aphid mtDNA varies in size from 16.8kb to 18.lkb. These size fluctuations
occur in two regions of the molecule: the A+T-rich region, and in the vicinity of the ND4
and ND5 genes. The most probable mectianism for the A+T-rich region size variation
involves a 123 nucleotide tandem repeat. Andysis of the A+T-rich region also reveded the
presence of a thymine-rich haiiirpin-loop structure that is andogous to the marnmalian
putative Iight-strand origin of replication.
1 would like to take this opportunity to thank the individuais (Dr. Teri Crease, Dr.
Paul Hebert, Dr. Sara Via, Dr. Bob Foottit and Dr. Bob Forster) for supplying aphid
material, laboratory space and equipment, and the inteiiectuai guidance needed to complete
this research project. I wouid dso like to thank Dr. Elizabeth Boulding for her insightfil
comments of the thesis. Mr. Eric Maw must also be thmked profusely for his continuous
help in both computer wizardry and in culturing bulk quantities of pea aphids used in
Most importantly, 1must thank my f d y , friends and my wife P a e l l a , for their
patience and support throughout my education and career in science.
A special thanks must be said to Dr. R.T.M'Closkey and especidly Dr. Paul
Hebert for taking a chance and giving this 3rd year undergraduate student the initiai
opportunity at summer ernpIoyment in science.
Acknowledgements ..................................... ............................................i
Table of contents
................................................................................. u..
List of tables ...,. ..........................................~..............................-......
List of figures
..................................................................................... vil
INTRODUCnON ................................................................................. 1
Genorne content and organization ......................... .
Length variation and heteroplasmy .................................................
The pea aphid system ...................... .
................ 1 1
MATERIALS AND METHODS ............................................................ ,.- 14
Pea aphid sarnpling and mtDNA purification .......................................... -14
PCR ampiifïcation and Gene Clean pudication ....................... .
Cloning ..................................................................................... 16
Sequencing .............................................................................1 9
Sequence alignment and analysis ......................................................-20
RESULTS .........................................................................................2 1
A+T richness and codon use ..........................................................2 1
Cyt b and ND 1 intergenic region ...................................................3 1
Tnnsfer RNA genes ................................................................... - 3 4
A+T rich region ........................................................................
- 38
DISCUSSION .................................................................................... -44
Genome orgdzation ................................................................... -44
Genetic code and codon usage ..........................................................-46
A+T rich region .......................................................................... - 49
Length variation and heteroplasmy .................................................... - 54
...................................................................................6 9
APPENDICES .................................................................................- - - - 8 2
Appendix la: The relationship between the nurnber of cut sites in A. pisrtm mtDNA
and the A+T content of recognition sequences for f i restriction enzymes ...... -82
Appendix lb: Frequency distribution of Iength variants for region 1 and region 2 in
thirty-five clones of A. pisrtm .......................................................... 8 2
Appendix lc: Disuibution of size class CO-occurrencesin the 13 clones of A. pisrcm
from alfalfa with heterophsmy at region 2 ............................................ - 83
Appenduc ld: Analysis of the temporal stability of length variants for regions 1 & 2
in four clones of A. pisurn from aifaifa anaiyzed in 1990 and again in 1992 after
approximately 30 genentions of parthenogenetic reproduction .....................84
Appendix le: Restriction map and gene order in A. pisum.
........................ -85
Appendix 2: Single-letter amino acid code designation used in codon usage Table 8
.............................................................................................. -86
Appendix 3a: DNA sequence sequence alignrnent used in the comparative analysis
of the 12s rRNA partial sequence for the four insects ................................ .8 7
Appendix 36: DNA sequence sequence alignment used in the comparative analysis
of the 16s rRNA partial sequence for the four insects ................................ .88
Appendix 3c: DNA sequence sequence alignment used in the comparative analysis
of the COI gene sequence for the four insects ....................................... -89
Appendix 3d: DNA sequence sequence alignment used in the comparative analysis
of the ND4 gene sequence for the four insects ....................................... ..90
Appendix 4a: Possible mechanisms of cecombination generating Iength variation in
cricket mtDNA ...,.............,.,..........................,.. .. .. .. - .. . ....- . .. . 9 1
AppendYt 4b: Replication slippage mode1 for the creation of tandem duplications
...........................................-......-..........-.......-.....-.......... ...--.- -92
Table 1.
Sequence and amplifcation conditions for the PCR primers used in both
DNA sequencing and ctoning of pea aphid rntDNA ....................... 15
Table 2.
Compslrison of DNA sequence and amino acid sequence for cytochrome
oxidase subunit 1for Acynhosiphon pismi and three other insects
Table 3.
..... 22
Comparison of DNA sequence and amino acid sequence of cytochrome b
gene for Acyrthosiphon pisrrm and three other insects.......................23
Table 4.
Cornparison of DNA sequence and amino acid sequence of NADH
dehydrogenase subunit 1 for Acyrthosiphon pisum and three other insects
Table 5.
Cornparison of DNA sequence and amino acid sequence of NADH
dehydrogenase subunit IV for Acyrthosiphon pisurn and three other insects
Table 6.
Comparison of DNA sequence of the 12s ribosornd gene for Acyrthosiphon
pisurn and three other insects ..................................................-26
Table 7.
Comparison of the DNA sequence of the 16s nbosomai gene from
Acyrthosiphon pisurn and three other insects ................................ .27
Table 8.
Codon usage table of the four partiaiiy sequenced protein coding genes of
Acyrthosiphon pisum
...........................................................2 9
Table 9.
Base composition of the codons used in the four partially sequenced protein
coding genes for A. pisum
Table 10.
....................-...- - - -...-..- -.... . - ......- ....- 3 O
Cornparison of the Iength and base composition of the A+T rïch region for
A. pisrirn and three other insects ...................-............-..- ....-..-.- - 3 9
Figure 1.
Cornparison of the protein and DNA sequence of the 3' ends of the Cyt b
gene, the ND 1 gene and the intergenic space between these two coding
sequences ........................................................................ - 3 3
Figure 2.
Predicted secondiiry structures of the four R N A genes identified in the pea
aphid ...............................................................................35
Figure 3.
DNA sequence and location of notable features of both the large and small
size chss of cloned PCR products from pea aphid mtDNA
Figure 4.
Predicted secondary structure of the primary hairpin-loop identified in the
A+T rich region of pea aphid.. ................................................- 41
Figure 5.
Alignment of the repetitive units from smaü and large size variants ......-42
Figure 6.
Predicted secondary structures for the repetitive units ....................... 43
Mitochondria play a centrai role in energy metabolism by generating ATP in the cell
throuph oxidative phosphorylation. Oxidative phosphorylation is a cornplex biochemical
mechanism that involves the conversion of potential energy from electron gradients into
chernical energy. Each mitochondrion contains its own genetic code. However, the
mitochondria do not encode al1 of the proteins needed for organellar function. Sixty
nuclear-encoded gene products are uansported into the mitochondrion (Heddi et al., 1994)
where they interact with those polypeptides coded by the mitochondrial genome and
assemble into the Five oxidative phosphorylation complexes: 1 to IV of the electron
transport chain and complex V of the ATP syntase complex (Chomyn & Attardi, 1987).
Genome content and organizntion
With the single exception of species in the cnidarian genus Hydra, where the
rnitochondriai genorne occurs as two unique linear molecules (Warrior & Gall, 1985), al1
metazoan m i m d mitochondrïal DNA (mtDNA) exists as covalently closed circular duplex
molecules present in high copy number (103-104 mtDNA molecules per somatic cell)
(Clayton, 1982; Brown, 1985). Al1 animai cells examined to date maintain a significant
proportion of their mtDNA in either the form of catenanes in which monomer circles are
intermeshed Iike links in a chain, or as simple head to tail unicircular dimers (Clayton,
1982). Evidence for recombination between molecules has been lacking (discussed in
Brown, 1985; Moritz et al., 1987). However, the minicircle end-products of mtDNA
recombination have now been detected in the mitochondnai genome of the phytonernatode,
Meloidogyne javanica (Lunt and Hyman, 1997). Although it was originaily believed that
mtDNA was strictly matemally inherited (Dawid & Blackler, 1972; Hutchison et al., 1974;
Reilly & Thomas, 1980), recent studies on the marine mussel Mytilus (Fisher and
Skibinski, 1990; Hoeh et al., 1991) and mouse (Gyllensten et al., 1991) have
demonstnted some patemal involvement in mtDNA inheritance.
An underlying feature of the mitochondrial genome is its extremely compact and
highly "economic" genetic organization (Attardi, 1985). Apart from the putative regdatory
region(s), the mitochondrial genome is saturated with sequences encoding discrete gene
products. Coding sequences are either directly abutting or separated by few intergenic
sequences, and in some situations they overlap. Each metazoan mtDNA contains the genes
for the structural RNAs of the mitochondrion's own protein translation machinery ( 2
ribosomal RNAs [rRNA], 22 transfer RNAs [tRNA]), and 13 proteins. These proteins are
al1 components of enzyme complexes associated with the inner mitochondrial membrane:
cytochrome b (Cyt b), subunits 1-ID of cytochrome c oxidase (COI-III)subunits
6 and 8
of the F, ATPase complex (ATPase6 and ATPases), and subunits 1-6 and 4L of the
respiratory chain NADH dehydrogenase (ND 1-6 and 4L) (Chomyn and Attardi, L987).Of
the metazoan mitochondrial genomes sequenced to date there are oniy a few examples of
mtDNAs that are lacking the complete complement of coding genes. The mtDNAs of the
nematodes Caenorhabdiriselegans and Ascaris sicum (Okiomoto et al., 1992) and the
marine mussel Mytilus edulis (Hoffrnanand Brown, 1992) no longer contain the ATPase 8
structural gene. The sea anemone Metridium senile mtDNA codes for only 2 M A Sof the
22 tRNAs required for decoding the mitochondrial genetic code: one for tryptophane and
the other for f-methionine. This represents the only known example in which genes coding
for tRNAs have been exported from the mitochondrial g n o m e (Wolstenholme, 1992).
Intragenic sequences are absent in ail other animai mtDNA. However, M. senile is again
distinct as its COI and ND5 genes contain group 1 introns. The COI intron is postulated to
encode an RNA splicase. Moreover, the ND5 gene intron contains the only copies of the
ND 1 and ND3 genes (Wolstenholme, 1992).
Despite the complete transcription of both strands, the distributions of the stmcturd
genes in vertebrates, sorne invertebrates and insects are considerably different. The
distribution of coding regions of vertebrates and various invertebrates, including sea
urchins, is highly asyrnrnetrical: the ND6 and a few tRNAs are coded on the Light strand
while the balance of the genes are al1 encoded on the heavy strand. The two strands
designated as heavy (H) and light (L), were named according to physiochernicai
experiments that measured the different buoyant densities of each strand in alkaline CsCtI
density gradients based upon their base composition (Aloni and Attardi, 1971). A more
balanced gene distribution between the two strands is found in the insect genomes
sequenced thus far.
The component protein, rRNA and tRNA genes have identical gene arrangements
among such diverse vertebrates as fish, arnphibians and mammals (Wolstenhoime, 1992).
A similar arrangement is found in birds except that the segments coding for Cyt b,
tRNApr0 and tRNAthr and the ND6 and tRNAglu genes have been transposed reIative to
each other (Desjardins and Morais, 1990; 199 1). In contrast to the vertebrates,
invertebrates have undergone significant rearrangements in mtDNA gene order. Multiple
inversions and translocations involving numerous loci are evident when the mtDNA
genomes are compared between insects, sea urchins and nematodes (Clary and
Wolstenholme, 1985; Crozier and Crozier, 1993; Jacobs et al., 1988; Cantatore et al.,
1989; Okiomoto et al., 1992). Furthemore, among insects, the location and orientation of
pmtein and rRNA genes and the putative control region are the same. However, significant
variation ha been observed in the positioning and orientation of certain tRNAs (Clary and
Wolstenholme, 1985; Crozier and Crozier, 1993; Mitchell et al., 1993; Beard et ai., 1993).
The tRNA genes for al1 rnetazoan mtDNAs are interspersed throughout the mRNA and
rRNA coding sequences. The tRNA punctuation mode1 (Ojala et al., 1980; 1981) proposes
that tRNAs serve as recognition sites for processing of polycistronic mRNA by RNase-Plike enzymes that cleave the transcnpt at the junctions between the rnRNAs, rRNAs and the
The presence of one major noncoding segment in the mitochondrial genome is a
feature common to al1 metazoa. In vertebrates, except birds, this region is located between
tRNApro and tRNAphe and varies considerably in size (879 bp in mouse; 1122 bp in
human and 2134 bp in h g , Xenopus Inevis)(WoIstenholme, 1992). In marnmds and
amphibia, thk sequence has been shown to include the signais necessary for both the
initiation of transcription and replication (Montoya et al., 1982; 1983; Clayton, 1984) and
it has therefore been designated the control region. Although the exact events goveming
mitochondrid replication and transcription are not known, comparative analyses of the
controI region from various mamindian species have identified regions of sequence
consensus and possible functionai importance (Ciayton, 199la; 199 1b). These include the
conserved sequence blocks [CSB) (Walberg and Clayton, 1981; Brown et ai., 1986;
Dunon-Bluteau and Brun, 1987; Saccone et al., 1987; Saccone et al., 1991), the
termination associated sequences (TAS) (Doda et al., 1981; Mackay et al., L986),the Iight
strand promoter (LSP)and the heavy suand promoter (HSP) (Chang and Clayton, 1984;
1985), the binding sites for mitochondriai transcription factor (MTF) (Fisher et ai., l987),
rnitochondrial singIe stranded binding protein (mtSSB-protein) (Mignotte et al., 1985) ruid
the origin of heavy strand replication (OH)(Clayton, 1982).
in insects, the single noncoding region is referred to as the "A+T-rich region"
because it is composed of 90% to 96% deoxyadenylate and thymidylate residues (Fauron
and Wolstenholme, 1976). This region, which has been shown to contain the origin of
DNA replication (Goddard and WoIstenh~lme,1978; 1980), is situated between the
tRNAilc gene and the 5' end of the small subunit ribosomal (12s) gene. However, no
apparent signals for initiation of transcription or replication comparable to those of
vertehrates have been detected. in the mtDNA genomes of insects thus f a sequenced there
is considerable variation in both length and nucleotide sequence for this region (4601 bp in
Drosophila rnelanogaster, 1077 bp in Drosophila ycrkuba, 826 bp in Apis rnellifera, 520 bp
in Anopheles gunzbiae and 625 bp in Anopheles quadrirnaculatiis (Lewis et al., 1994; Clary
and Wolstenholme, 1985; Crozier and Crozier, 1993; Beard et al., 1993; Mitchell et al.,
1993). Few conserved sequence motifs are found within this region.
Replication and transcription in both vertebntes and invertebrates are believed to be
strictly correlated with one another. In vertebntes, studies have demonstrated that priming
for both H-suand replication and polycistronic transcription of the L-strand start from the
LSP and are therefore indistinguishable from one another (Clayton, 1984; Attardi, 1985,
Cantatore and Saccone, 1987). Clayton (199 la) has proposed that the concentration of
mitochondnal transcription factor is decisive in regulating the two processes. Low
concentrations of mtTFl would acuvate LSP leading to the transcription of the genes
encoded on the L-strand (i.e., 8 tRNAs and ATPase 6 in mammals). Elevation of the
concentration of mtTFl in the rnitochondrial matrix would then activate HSP resulting in
transcription of the 14 tRNAs and the 12 protein encoded genes of the H-strand. Although
it is unknown what factor(s) signal the commencement of DNA replication versus
transcription, RNA replication priming starts from the LSP, proceeds downstrearn, and
provided sufficient levels of RNase MRP are present, terminates near the OH at the
RNA/DNA transition site (CSB L in mouse; TuUo et al., 1994). A three stranded DNA
structure is fonned between a short nascent DNA (H) suand formed by displacement
synthesis, the parental H-strand and the complementary L-strand. In this short strand,
called the D-loop or 7s, DNA synthesis starts from the OH and in mammals stops
approximately 600 to 700 nt downstream at the TAS sites, and is repeatedly synthesized
and degraded (Wolstenholme, 1992). However, at some point, synthesis of the nascent Hstrand proceeds in a unidirectional manner until the daughter strand is completely
synthesized. The L-strand origin of replication (OL)consists of anocher noncoding
sequence located in mammals and amphibians in a cluster of tRNA genes between tRNAsn
and tRNAcys and is approximately two thirds of the mtDNA genome away from the OH.
This sequence is similas to other replication origins in that it can form a themodynamicaily
stable hairpin loop structure. It consists of a GC-rich stem of variable length ranging in
size from 9 bp in frog to 12 bp in mouse and a T-rich loop of 12 to 19 nt (Clayton, 1982;
Wong et al., 1983). Once H-strand synthesis is at l e m 67% complete and the OL is
exposed as a single-stranded template, initiation of L-strand synthesis can begin. The
synthesis of the RNA primer has been shown to begin in the T-cich stretch of the loop
structure and the transition of RNA to DNA synthesis occurs at the base of the hairpin
structure (Wong and Clayton, 1985). Further experiments have shown that a
pentanucleotide sequence 5'-CGGCC-3' present at the base of the stem that overlaps with
a few basepairs of the 5' end of the tRNAcys gene is necessary for efficient replication
(Hixson and Brown, 1986). The replication of both strands proceeds asynchronously
resulting in the segregation of two distinct daughter molecules (alpha and beta). Once the
beta daughter rnolecule is fully synthesized, dpha and beta molecules are converted to
closed circles. Synthesis of full length daughter strmds requires approximately one hour
and the entire cycle requires two hours (Clayton, 1982).
Overali, the mitochondrial putative control region of invertebrates remains poorly
studied relative to vertebrates. Most research defining the mechanics of mitochondrial
replication and transcription has involved mammaiim and amphibian systems. The data
available from the limited number of invertebrate mtDNA genomes that have been entirely
sequenced show a lack of sequence conservation in the regdatory region. It is beiieved that
this may reflect a difference in the mechanisms of DNA replication a d o r transcriptional
initiation and its regdation among invertebrate groups (Lewis et al., 1994).
Lengtlz variation and heteroplasrny
Attardi (1985) originally described the animai mitochondriai genome as an example
of "an exuemely econornicai unit". This generalization was bsised on the notion that the
mitochondriai genome has been under intense selection for small size and invariable
structure. Insertion and deletion events were thought to be rare, the relative gene order was
thought to be conserved, and most mutational changes seemed to involve base
substitutions at either silent sites or in noncoding regions (Brown, 1985). However, as the
number of characterized metazoan mitochondrial genomes has expanded, it has become
quite evident that Attardi's (1985) traditionai view of "extreme economy" is in need of
Initial studies at the population level reveaied little intraspecific and interspecific
variation in mtDNA genome size. Intra-individual heterogeneity (Le., heteroplasmy) was
rarely observed (Robberson et al.. 1974; Potter et al., 1975; Ojala and Attardi, 1977; Coote
et al., 1979; Brown and Des Rosiers, 1983).Therefore, it was generally believed that there
was a rapid sorting-out of the transient heteroplasmic States that must occur following
mutational events (Upholt and Dawid, 1977; Takahata and Maruyama, 1981; Birky et ai.,
1983). However, as the number of population level studies expanded, heteroplasmy has
been found to be much more cornmon with five types having been recognized. (1)
Nucleotide site heteroplasmy, although quite rare has been identified in cow (Hauswinh
and Laipis, 1985), human (Greenberg et ai., 1983) and fmit fly (Hale and Singh, 1986).
(2) Heteroplasmy may also involve variable number of nucleotides in homopolymer
stretches as detected in cow (Hauswirth and Laipis, 1985) and rat (Brown and Des Rosier,
1983), or (3) large deletions in coding sequences as found in mouse (Boursot et al.,
1987). The last two classes (4) & (5) encompass either continuous Iength variation up to 1
kb in size or discrete length variation involving variable copy number of tandem repeat(s).
These and numerous other studies have unveiled a "fluid" nature associated with
the mitochondrial genome (Rand, 1994). Metazoan mtDNA is now known to vary in size
by as much as threefold, from 13.8 kb in the nematode Caenorhabditis elegans (Okiomoto
et al., 1992) to 41 kb in the scailop, Plampecten magellanicris (Gjetvrij et al., 1992).
Although the large size variation among animal mtDNAs is not typically attributable to
duplication/deletion events, there are a few cases of these mechanisms. Duplications of
coding regions have been detected in Cnernidophorus lizards (1.5 ->
Heteronotia geckos ( 1.2 ->
8.0 kb) and
10.4 kb) where they comprise a portion of the conuol region
and adjacent rRNA, tRNA and protein genes (Moritz and Brown, 1986; 1987; Moritz,
1991; Zevering et ai., 1991). In addition, a single sequence containing the 12s rRNA, 16s
rRNA, ND 1 and ND2 genes of the newt, Trit~iruscrisratus, has been amplified (Wallis,
1987). A tandem duplication involving a region similar to the 3' end of the COI gene and
the tRNA1ru gene has been located at the junction of the honeybee, A. rneilifera. COII gene
and COII gene (Cornuet et al., 1991). A 3.0 kb tandem duplication has also been detected
in the nematode Romnnomennis crrlcivorar. Each repeating unit was found to contain six
open reading frames (ORF), two encoding ND3 and ND6 and a third ORF sharing
simîlarity to the cytochrome P-450 gene (Azevedo and Hyman, 1993). However, unlike
the examples listed above where considerable substitutions, short duplications, and
deletions occurred in the duplicated regions, substitution levels in the nematode repeating
units were less then 0.0 1% (Hyrnan and Azevedo, 1996).
Deletions involving coding regions, such as those found in mouse (Boursot et ai.,
1987), have also been detected in humans. For example, Kearns-Sayre syndrome, a
human neuromuscular disease, is the direct result of dysfunctional rnitochondria. The
encephalomyopathy involves deletions of 1.3 kb to 7.7 kb that can encompass part of the
COII, ATPase 6 & 8, NADH 4,4L,5 & 6 the N-terminus of the Cyt b gene and 5 tRNA
genes. The remainder of the COII and Cyt b genes are fused (Holt et al., 1988,
Shoulbridge et ai., 1990, Heddi et al., 1994).
MtDNA size variation due to substantid duplication or deletion of coding regions
is much less comrnon than size variation and heteroplasmy due to variation in copy number
of tandemly repeated sequences in the control region. In vertebrates, repeat motifs
generally occur at either one of two locations. The first is the region associated with the
TAS site(s) for the three-stranded D-loop structure and the second is located between the
OHand the LSP (often within or adjacent to the CSBs) (Gernmell et al., 1996; Fumagdli et
ai., 1996). Hoelzel and coworkers (1994) partitioned these two zones into five distinct
repetitive sequence regions (RS 1 to RS5; refer to Figure 1 of Hoelzel et ai., 1994). The
TAS zone repeats (RI & R2) and the CSB2 and CSB3 (R4 & R5) zones involve repeat
motifs whose size varies in multiples of 40 nt and ranges from 40,80 or 160 nt resulting in
total Iength changes of 80 to 650 nt. Up to 50% of the ceiis are heteroplasmic for the RS 1
and RS2 variants. These repetitive sequences have been described in several vertebrates
including the white sturgeon (Buroker et ai., 1990), Atlantic cod (Amason and Rand,
1993, evening bat (Wiikinson and Chapman, 199l), Japanese monkeys (Hayasaka et d-,
1991) and nbbit (Mignotte et al., 1990). Repeat motifs situated between CSB 1 and CSB2
(RS3) are distinctive from the other four RS zones in that the repeat motifs are
considenbly smaller in size (6 to 38 nt) and the frequency of heteroplasmy is ver- high,
although the total size of repeat amys parallels that of the other RS zones. RS3 repeats are
found in pig (Ghivizzani et ai., 1993), harbour seal (Arnason and Johnsson, 1992),
elephant seals (Hoelzel et al., 1993) and 18 carnivore species (Hoelzel et ai., 1994).
Shrews (Crocidwa rirssala and Sorex araneus) are unique relative to other mammals in
that they simultaneously possess tandem repeats in two locations (RS 1 or RS2; 78 nt long
& RS3 12 nt in C. russala, 14 nt in S. armeus) unlike other mammals where they occur at
only one of the five RS locations (Fumagalli et ai., 1996).
In invectebrates, tandernly repeated sequences such as those detected in vertebrates
do occur within or irnrnediately adjacent to the A+T rich region. The A+T rich regions of
three closely related species of bark weevil (Curculionidae: Pissodes) were not only found
to be dramaticaily enlzuged (9 to 13 kb) but also to be flanked by variable numbers of a
tandernly repeated unit of approximately 200 nt. The repeat region varïed in size from 0.8
to 2.0 kb (Boyce et ai., 1989). The size of the mtDNA genorne for these three species
ranged from 24 to 30 kb in P. terminalis, from 25 to 34 kb in P. nemerensis, and from 28
to 36 kb in P. strobi. Remarkably, dl219 bark weevils that were sampled were found to
be heteroplasmic and exhibited anywhere from two to five distinct size classes of mtDNA
that differed by as much as 7.5 kb. Rand and Harrison (1989) documented the presence of
length variation in two species of crickets, Grylllrs fimus and Gryllus pennsylvanicus.
Nucleotide sequence anaiysis revealed a 206 nt repeat that is bounded by a G+C-rich 14 nt
sequence with dyad symmetry and is present in one to seven copies. Sixty percent of G.
finnus and 45% of G. pennsylvanicus were heteroplasmic for two or three haplotypes,
Undoubtedly, the most notable example of large scale mtDNA size variation is that
found arnong scallops (Bivalvia: Pectinidae). Of the seven species studied thus far, only
the bay scailop, Argopecten irradians, apparently lacks repetitive sequences and shows a
typical size-invariant 16.2 kb mtDNA molecule (Gjetvaj et al., 1992). The remaining six
species ail possess relatively large mtDNAs accompanied by a broad spectrum of size
classes: the giant scaiiop, Pecten marimiis ( 19.9 to 26.3 kb); the rock scallop, Crassadoma
g i g m t a (22.8 to 24.8 kb);the Queen scallop, Aeqriipecten opercrîlnris (2 1 to 28.2 kb);the
spiny scallop, C h l m y s hastata (23.9 to 27.2 kb); the Iceland scallop Chlmys islnndicn
(22.2 to 25 kb) and the deep-sea scailop Plampecten magellanicris (3 1 to 41 kb) (Snyder et
al., 1987; La Roche et ai., 1990; Gjetvaj et al., 1992). Restriction enzyme analysis has
revealed the presence of three separate regions associated with size variation in the mtDNA
molecule of the Iceland scûllop. Size variation in region one is the result of changes in copy
number (one to three) of a 1.2 kb tandemiy repeated sequence. Variation in region two is
associated with deletions of 100 to 250 nt, and discrete size variation due to the insertion of
a single 1.4 kb sequence is found in region 3 (Gjetvaj et al., 1992). The largest rnetazoan
rntDNA identified is that of the deep-sea scallop at 4 1 kb. Again, restriction enzyme
analysis characterized three regions responsibie for size variation. Size variation in region
one is due to fluctuations in the copy number (two to eight) of a 1.45 kb repeated sequence
whereas a continuum of size variation caused by multiple insertions or deletions of an
approximately 100 nt sequence typifies region iwo. In region three, the variation occurs in
250 nt increments. As in the Iceland scailop, these three regions appear to Vary
independently from one another (Gjetvaj et ai., 1992). Heteroplasmy was not detected in
either the bay scallop or the giant scallop. However, ten percent of spiny scallops were
heteroplasmic for size variants. In one individual, three size classes were present in about
equal frequency and differed by increments of approximately 600 nt. In the Queen scallop,
a multiplicity of molecules whose size differs by LOO nt increments is generated by Ava-I
restriction digestion, Fiteen percent of the Iceland scallops were heteroplasmic primarily
for region one and to a lesser extent regions two and three. One individuai with a single
size class of mtDNA was found to be heteroplasmic for the presencehbsence of a Barn HI
restriction site. In the deep sea scdlop, most individuals were heteroplasmic for the 100 nt
increment located in region twu. Ten to twenty percent of the individuals were
heteroplasmic for the 1.45 kb repeat sequence characterized for region one.
The pen aphid systern
Aphids (Aphidoidea) are an extremely complex and diverse group of insects
consisting of approximately 4401 species placed in 493 currencly accepted genera
(Blackman and Eastop, 1994). Aphids are believed to have evolved some 300 million
years ago (Heie, 1981; 1987). As with other members of the insect order Hemiptera,
aphids have slender, piercing and sucking mouthparts that form a hollow tube with which
they suck juices From plants. The life cycles of aphids are very diverse and include both
parthenogenetic and sexual generations, eIaborate polyphenisms, and obligate shifting
between unrelated host-plant taxa (Moran, 1992). Although some aphids have Lost the
sexual phase of the life cycle, most altemate one or more parthenogenetic generations with
a single annual sexual generation that produces the ovenivintering eggs. In the more
primitive families, Adelgidae and Phyiioxeridae, both sexual and parthenogenetic ferndes
are egg producers (oviparous). However in the Aphididae, parthenogenetic females always
give birth to live young (viviparous) with the embryos of granddaughters developing
within the embryonic daughters of a given femde.
The pea aphid, Acyrthosiphon pisiim (Harris), is a polyphagous Pest of legumes
that invaded North Arnerica within the last century (Johnson, 1899). Pea aphids are
cyclical parthenogens. In cold climates the sexual forms appear in the €dland Lay
overwintenng eggs ( L m b and Painting, 1972). In spring, the eggs hatch into fundatrices
and establish Iineages that reproduce apomictically for ten to twelve generations.
Environmentai factors trigger the generation of the sexuai biotypes that produce the
overwintering eggs, thus completing the cycle (Via, 1991). In the dairy producing areas
ne= Lansing, New York, pea aphids are found in sirnilar abundance on their two major
hosts, aifalfa, Medicng~sariva, and red clover, Tnfoli~irnprnrense (Via, 1991).
Studies of mtDNA diversity in aphid species is quite lirnited with only 3 species,
Schiraphis grnrninrtm (Powers et ai., 1989), Rhopnlosiphwn padi (Martinez et al-, 1992)
and A. pimm (Barrette et al., 1994) having been so far examined. Sequence diversity
within Schiznphis and Rhopalosiphum was found to be very low, with length variation in
R. pndi arising from differences in the copy number of a LOO bp tandernly repeated
sequence. MtDNA analysis of thirty five clonal lines of A. piscrrn originally collected from
clover fields in New York reveaied Little sequence diversity. A survey of variation using
eighteen restriction enzymes that cut pea aphid mtDNA at l e s r once, reveated onIy two
polymorphic restriction sites (Barrette et al., 1994). However, this survey uncovered
several notable features associated with A. pisiim mtDNA that warranted further
investigation. For example, examination of the relationship between restriction site
frequency and the A+T content of recognition sequences for fifty of the sixty restriction
enzymes used to digest pea aphid rntDNA showed that enzymes with an A+T content of
less than 40% cut far less frequently than those with higher A+T content (Table 2; Barrette
et al., 1994). The A+T contents of insect mtDNA genomes are elevated with the highest
recorded value (84.9%) detected in bees (Crozier and Crozier, 1993). Restriction site
frequency in pea aphid is suggestive that its A+T content may be at least as high as that of
While restriction site polymorphisms were rare, length variation and beteroplasmy
were common in the pea aphid clones sarnpled with rnitochondrid genome sizes ranging
from 16.8 to 18.1 kb. Two regions of length heterogeneity were detected. Region one
(RI) contained three variants differing in size by multiptes of about 120 nt, while six size
classes varying in size by multiples of about 210 nt were detected in region two (Rî). The
largest size class of R1 was the most common, whiie the intermediate size class of R2
occurred most frequentiy (Appendix 1b: Barrette et ai., 1994).AU clones appeared to be
homoplasrnic at R1, but 13 of 35 clones were heteroplasmic at R2. One clone had four size
classes of mtDNA, but other clones contained just two size classes that usually differed by
single-unit size shifts, but occasionaiiy by two-unit or three-unit size shifts (Table 5 in
Barrette et al., 1994). A further 16 clones were screened for their R1 and R2 Taq 1
restriction enzyme digest patterns. Two clones were found to be heteroplasmic for R L.
Temporal stability of Iength variants was assessed by isohting mtDNA from four
heteroplasmic clonal lineages after approximately 30 generations of parthenogenetic
reproduction. Although length variation was unchanged for RI, the ratio of length variants
in R2 changed with time even though the complement of size classes remained the same
(Table 6 in Barrette et al., 1994).
Restriction enzyme mapping localized R1 and R2 to opposite sides of the rnolecule.
Prelirninary analysis of gene order suggested that the overall gene order of A. pisrrm is
very sirnilar to chat of D. yakuba with RI occurring within or adjacent to the A+T rich
repion and R2 occurring somewhere within the area spanning the ND3 gene to the ND5
gene (Appendix le: Barrette et al., 1994). The purpose of this study is to continue
characterizating the pea aphid mtDNA genome. 1s the R1 length variation due to repetitive
elements within the A+T rich region? Does this region contain any of the transcription and
replication initiation/termination signals described in other metazoan rnitochondrial
genomes? Lastly, is codon usage in the gene coding sequences and A+T nucleotide
biasness similar to that of other insects?
P m aphid smpling and mtDNA pirrijîcation
Labosatory cultures of A. pimm were established from coIlections made in alfalfa
fields near Lansing, New York (Barrette et al., 1994). Mitochondrial DNA was extracted
and purihed as reported previously in Barrette et al. (1994). Briefly, 700 to 1500 mg (wet
weight) of fresh colony material was homogenized in sucrose grinding buffer and then
subjected to differentiai centrihgation to sepante cellular debns b m the mitochondria. TO
ensure sufficient removal of cellular contaminants, rnitochondria were pelleted through a
sucrose step gradient. Mitochondria were lysed with sodium dodecyl sulfate (SDS) and the
mtDNA was purified by cesium chloride density gradient ultracentrifùgation.
PCR [email protected] and Gene Clean purification
PCR amplifications were performed in 50 pl reaction volumes using two units of
Taq polymerase (Boehringer) in 1X Boehringer PCR buffer, 20 nrnol of each M P and
10 pmol of each primer. Reactions were overlaid with Iight minera1 oil to prevent
evaporation. Primer combinations, magnesium chloride concentrations and annealing
temperatures for amplification of the regions of the mtDNA used for cloning and
sequencing are shown in table 1. Al1 PCR amplifications were performed using the
following program in a Ampliuon II Thermocycler (Themolyne): 27 cycles of denaturation
at 94°C for L minute, anneaiing (Table 1) for 2 minutes and primer extension at 72°C for 2
minutes. Amplification was continued for another 8 cycles in which the extension phase of
each cycle was lengthened by 20 seconds.
Gene Clean (BI0 101; Vista, California) purification of the amplification products
was performed in accordance with manufacturer's general specifications. Minerai oil was
removed from the PCR reactions prior to addition of 120 pl sodium iodide solution and 5
pl g l a s milk suspension. To ensure maximum recovery of the purified PCR product,
Table 1: Sequence and amplification conditions for the PCR primers used in both DNA sequencing and cloning of pea aphid intDNA
PCR product
1 sequencing
PCR priiiiers
M ~ Position
in D. yakuba
5' cgcctgtttaacwaaacüt 3'
1 6s-4c (#20)
5' ccggittgaactcügatcatgt 3'
16s-5 (#2 1)
5' acatgüattggagctcgaccagi 3'
5' ggtacattacctcgglttcgltatgal3'
1 1523
1 1867
Cyt b-1 (#6)*
ND I - 2 (M)*
5' gtaggüggagctgctütüttag3'
848 1
ND4-5 (#II)*
5' gcttüttcatcggttgctca 3'
ND4-6c (#12)*
1 PW-f-1
cy; b -NDI
A+T region
*primers previously described in Burretie et al., 1994
binding of DNA to g l a s mük was carried out by incubating and agitating the mix for a
minimum of 20 minutes on a rocker platform. Foilowing a 15 second pulse spin at 14000
RPM, the glass rnilk pellet was washed twice in 500 pi New Wash solution, resuspended
and agitated on a rocker plathm for 15 rninutes each. DNA was eluted from the glass milk
by two 15 minute incubations at 50°C, each in 7 fl of ultrapure water.
The A+T rich region of a single heteroplasmic clone (Rl-1,3) was PCR amplified
using primers #18 and #19 (Table 1) and cleaned as described above. To ensure that the
purified PCR product had blunt ends, 14 pl of it were incubated for 30 minutes at 37°C
with 10 units of Klenow enzyme (BRL), lOnrnol of dTïP in 1X Klenow buffer in a final
reaction volume of 20 pl. The reaction was terrninated by denaturation of the Klenow
enzyme at 80°C for 10 minutes. One hundred microlitres of water were added to the sarnple
which was then extracted in an equal volume of phenoVchloroform/isoarny1dcohol (PCI)
2524: 1, and then in an equal volume of chloroform/isoamyl alcohol (CI) 24: 1. The DNA
was precipitated by the addition of 10 pl of SM NaCl, 300 pl of absolute ethanol followed
by incubation ovemight at -20°C. The DNA was pelleted in 1.5rnls Eppendorph centrifuge
tubes by centrifugation at 14000 RPM at 4OC for 20 minutes using an Eppendorph 5415C
benchtop centrifuge, washed three tirnes with 70% ethanol and solubilized in 20 pl of ImM
TE. Ligation of the blunt-ended A+T region PCR product to the plasmid vector, PCRScript SK++was sirnplified by using the pCR-Script Cloning System (Stratagene) in
accordance with the manufacturer's specifications. Reactants were added in the following
order; 1 1 1 (10 ng) pCR-Script SK+vector, 1 pl 1OX ligation buffer (Stratagene), 0.5~1
lOmM rATP, 3 pl (400ng) target DNA, 5 units [email protected] enzyme, 1 pl T4 DNA ligase and
3.5 pl water. The sarnple was mixed gently and then incubated at room tempenture for one
hour. The reaction w u stopped by heating at 65°C for 10 minutes and then stored on ice
until transformation. The Epicurian Coli XLl-Blue MRF' Kan supercompetent cells
supplied with the pCR-Script kit were thawed on ice after which 35 pi were transferred to a
15 ml poIypropylene Falcon tube containing 0.6 pl of 1.44M B-mercaptoethanol, gently
mixed and incubated for 10 minutes. Two microlitres of the Ligation reaction were added to
the cells and the mixrure was incubated on ice for 30 minutes, heat-shocked in a 42OC water
bath for 45 seconds and then transferred to ice for 2 minutes. Four hundred and fifty
microlitres of SOC medium (OS%w/v yeast extract; 2%w/v uyptone; lOmM NaCl; 2.5
mM KCI; lOmM MgCl?; 20mM MgS04; 20mM glucose) were added to the Faicon tube,
mixed, and then incubated for one hour at 37°C in a shaking incubator. Twenty-five, 50,
100 and 200 microlitres of the transformation mix were plated ont0 Luria Broth (LB) 1%
a g a plates containing LU) pg/ml X-gai (Boehringer), 40 pg/ml iPTG (Boehringer) and 50
pg!d ampiciIlin (Sigma) and aiiowed to incubate overnight at 37°C. White transformants
were triple streaked ont0 a fresh LB plate and grown ovemight at 37°C. Forty-eight white
colonies were screened by Libenting DNA from single colonies with cracking buffer
preparation (O.LN NaOH, lOmM EDTA pH 8.0, 10% glycerol and 1% SDS) and
eIectrophoresing it in a 1% agarose gel in TBE (89mM Tris-borate; 89mM boric acid; 2mM
EDTA) buffer. Two size classes of recombinant plasrnid were detected and selected for
growth in 100 mls of liquid LB overnight at 37°C. To isolate plasrnid DNA, bacteria were
peIIeted by centrifugation at 6000 RPM for 20 minutes. The pellet was resuspended in 5 ml
of soIution 1(50mM glucose, 25 m M Tris-HCL pH 7.5, IO mM EDTA pH &O), and lysed
for 20 minutes on ice by addition of 10 ml of solution iI ( 1% SDS, 0.2N NaOH).
Bacterial chromosornal and cellular material were precipitated by addition of 7.5 ml of
Solution LI1 (7.5 M ~ o n i u m
acetate) and incubated on ice for 20 minutes. Debris was
pelleted by a 20 minute centrifugation at 12500 RPM using an RC5 superspeed centrifuge
(Sorvall) and an SS-34 rotor (Sorvall). The supematant containing DNA was precipitated
by the addition of 15 ml isopropanol and incubation at -20°C for 30 minutes. The DNA
was pelleted by centrifugation at 12500 RPM for 20 minutes, resolubilized in 5 mls 2.5 M
ammonium acetate, incubated on ice for 30 minutes, recentrifuged at 12500 RPM for 20
minutes, washed twice in 70% ethanol, and then solubîiized in 400 pi of 1 mM TE. RNA
was removed by RNase treatment (3 yg) at 37OC for 30 minutes followed by organic
extraction and ethanol precipitation as descnbed above. The plasmid DNA was solubilized
in 100 pl H20and quantXed against calf thymus DNA (Sigma) using a DyNA Quant 200
(Hoefer) flurometer.
Progressive unidirectional deleted subclones of the original cloned fragment
containing the A+T rich region were generated using the Erase-A-Base kit (Promega) in
accordance with manufacturer's specifications. To generate deletions, 10 pg of plasmid
DNA were digested in a 50 pl reaction containing 1X One-for-al1 digest buffer
(Pharmacia), 20 units each of Kpn-1 (BRL) and Cla-I (BRL).Reverse deletions were
generated by sequentially digesting 10 pg of plasmid DNA in a 50 pl reaction containing
LX Buffer 1 (NEB) and 20 units Sac-I (NEB). After heat inactivating this enzyme, the
DNA was digested in a final volume of 140 pl in LX Buffer 3 (NEB)containing 40 units
of Eag-I (NEB) enzyme. All digests were subsequentiy brought up to 400 pl with water,
extracted twice with PCI, once with CI, and then ethanol precipitated overnight at -20°C by
the addition of 25 pi SM NaCl and 1 ml absolute ethanol. Pelleted DNA was washed three
times in 70% ethanol and then solubilized by 37OC incubation in 36 pl H20and 4 pl LOX
Exo IIIbuffer. A total of 14 tubes (time points), each containing 7.5 pl of S 1 digestion mix
( 103 pl
H20+ L6 pl 7.4X S 1 Buffer + 4 y1 S 1 enzyme) were prepared and stored on ice
until use. Exonuclease III reactions were commenced by the addition of 3 ~l Exonuclease
III enzyme to the plasmid DNA preheated to 37T.At 30 second intervals, 2.5 @ diquots
were transferred to each of the 14 tubes on ice. These tubes were then incubated at 37°C for
30 minutes, 1 pl of stop buffer was added, the samples were heat shocked at 70°C for 10
minutes, and then returned to ice. To evaluate the quaiity of the defetion reactions, 2 pi of
each time point were electrophoresed on a 1% agiuose gel. Each time point was ethanol
precipitated, washed 3X in 70% ethanol and then solubilized in 15 pl H20. Four
microlitres of 5X ligase buffer (BRL) and 10 units T4 ligase (BRL)were added to each
time point which was then incubated overnight at 4OC. These sampIes were used to
transform E. coli strain DH5 alpha F' followed by plating on ampicillin/x-gaVIPTG LB
plates. Deletion subclones differing in size by 200 bp increments spanning the entire A+T
rich region were selected.
Deletion subclones were grown overnight at 37°C in 10 ml LB containing
ampicillin. Plasrnid DNA was isolated and purified using the alkaline isolation procedure
described earlier. However, an additional purification step was performed. The tinal
plasrnid DNA pellets were solubilized in 64 pi H20+ 16 pl of 4M NaCl, reprecipitated by
the addition of 13% PEGsooo and incubated on ice for 20 minutes. The DNA was pellered
by centrifugation at 14000 RPM at 4OC on a 54132 benchtop Eppendorph centrifuge and
then carefully washed three tirnes in 70% ethanol followed by solubilization in 30 pI H2O.
The Kpn-UCla-1 deletion subclones were sequenced by the Sanger dideoxy method
using the T7 Pharmacia sequencing kit as recommended by the manufacturer. Two
micrograms of plasrnid DNA were aikaline denatured and annealed to the Ml3 forward
primer and then sequenced using the short-read termination mix and a S35-thio-dATP.
Sequencing products were electrophoresed on 4% Sequagel XR (National Diagnostics)
acrylamide gels in 1X TBE at 50 Watts constant power in a BRL 52 sequencing apparatus.
Gels were vacuum dried and then exposed to Fuji NiF-Rx blue X-ray film for 16 hours
before development of the autoradiographic image.
Sequencing of the Sac-VEag-1 deletions (using the ML3 reverse primer) was
perfomed at the University of Guelph's automated DNA sequencing facilty. DNA
templates were sequenced using the Taq FS dye-termination kit (Perkin Elmer) and then
analyzed on an Appiied Biosystems 377 automated sequencer (Perkin-Elmer).
PCR products purified with the Gene CIean kit were sequenced using the T7
(Pharmacia) kit. Twelve microlitres of template and 2 pi of primer (10 pmoVpi) were
denatured by 5 minute incubation in a boiling water bath and then chiiied immediately on
ice for 15 minutes. Primer-annealed tempIates were then sequenced according to the
manufacturer's specifications using short-read rnix and a S3s-thio-dATP. Sequence
products were electrophoresed and visualized as previously described.
Squence alignrnent and annlysis
Autoradiographs obtained by standard sequencing techniques were mdyzed using
the DNASTAR digitizer software. Full composite sequences were generated using the
Megaiign program (DNASTAR; Madison, Wisconsin). AB1 autosequencing output files
and composite sequences were
(electropherograrns) were exported to Macvector (BI)
generated using Sequencher software (Gene Codes Corp, Ann-Arbor, Michigan).
Published sequences were downloaded from GenBank and comparative anaIyses
perforrned using DNASTAR software. Further sequence analysis was acheived using
DNAman (Lynnon Biosoft, Vancirevil, Quebec, version 2.5 1) software.
DNA secondary structures within the A+T rich region sequences were determined
using the minimum energy approach (Zuker and Streieger, 1981; Jaeger, Turner and
Zucker, 1989a; 1989b) using the program MULFOLD (version 2.0) and visuaiized using
the program Loopdloop (Gilbert, 1990).
A+ T richness and codon use
Initial restriction endonuclease screening of pea aphid mtDNA reveaied that digest
frequency was dependent upon the A+T content of the recognition sequence of the enzyme,
wi th the highest restriction site frequency occumng with those endonucleases whose
recognition sites are 80% to 100% A+T recognition sequences (Appendix La; Barrette et
ai., 1994). To [email protected] the A+T richness of A. pisrlm mtDNA, four protein genes and the
two ribosomal genes were partially sequenced from PCR products amplified using the
primer pairs listed in table 1. Pea aphid sequences were compared to the same genes from
three other insects, the fruit fly Drosophila yakuba (X03240), the mosquito Anopheles
qiindrimucc~lutus(L04272), and the organism with the highest A+T content measured to
date for metazoan mtDNA, the honeybee, Apis rnellifera (L06178). Refer to appendix 3
(a,b,c.d) and figure 1 for the alignment of the various sequences used in this cornparison.
The A+T content in three of the four proteins was higher in A. pisrlm than in any of the
other three insects with 74.1% A+T for COI (Table 2), 90.3% for Cyt b (Table 3) and
92.7% for ND 1 (Table 4). Only the ND IV gene sequence of A. rnellifera was slightly
higher at 79.0% versus 77.2% for A. pisum (Table 5 ) . The 12s and 16s ribosomal genes
of A. pisirm also showed elevated A+T levels with only the 16s rRNA sequence of D.
yakcibn having a higher value then A. pisiun (Tables 6 & 7). The protein encoding DNA
sequences were translated using the Drosophila mitochondrial genetic code. Cornparison of
nucleotide and amino acid sequences of the four protein genes reveded a spectnim of
interspecif'csidarities with the COI gene showing the greatest level of sequence similarity
and the ND1 gene exhibiting the least (Tables 2, 3, 4 & 5). Furthemore, although the
levels of DNA sequence s i m i l ~ t ybetween A. pisum and the three other insects were
similar for each of the four protein-encoding genes, the corresponding amino acid
sequences showed fairly low similririty values for di but COI. NADH dehydrogenase
Table 3. Coiiipürison of DNA scquence and aniino ucid sequence of cytochromc b gene for Ac~~r?lio.sipliori
pisirni and three other insecis
D.ycrkith A. yitci~lri~~tcrc~~Iciti~.s
A. rtiellifrrw
Numbcr of amino acids (A.A.)
Percent sequence similürity with A. pisirm
Percent A.A. siinilarity wiih A. pisrrm
Table 4. Con~prrrisonof DNA sequence und uniino acid sequence of NADH dehydrogeniise suhunit 1 for Acytlrosiplroti pisritn and ihree
other insects
Number of amino ücids (A.A.)
Percent sequence similwity with A. pisrrtri
Percent A.A. similarity with A. pisurri
A .pisitrt1
A. c~rrtr~lrirntrcr~ltrti~.~
A. mellijiirtr
3 1,2
Table 6. Cornparison of DNA sequence of the 12s ribosomal gene for Ac~~r?
pisirnt and three other insects
Length of sequence (ni)
Percent sequence similarity with A. pisirni
D.yuklrba A. qirt1tIririiacrr1~1111s
A. rrlellijkrtr
35 1
45 .O
*sequena length of 12s gene based on positioning of the putative 5' end
Table 7. Comparison of the DNA sequence 01' the 16s ribosamal gene from Acyrtho.sip/ruti pisruri and ltiree oiher insecls
Length of sequence (nt)
Percent sequence similarity with A. pisirm
2 10
D.ycrkubu A. q~rcrclrit~iaarlu~~~s
A. rriellifercr
subunit 1(Table 4) best illustrates thÏs point. The percent sequence similarity values for
ND1 differed by only 6.5% between A. qrtadrimcrculat~tsand A. melli$era (65.6% and
59.196, respectively), yet this difference corresponded to a more than twofold difference in
the amino acid sequeace similarity values (40.6% and 19.4%, respectively). DNA
sequence cornp~sonsof the 5' end of the 12s rRNA gene showed Iower simiiarities with
values varying frorn 44.8% for honeybee to 46% for the mosquito (TabIe 6). Cornparisons
of the 16s &NA gene reveaied higher vdues ranging from 6 1.4% to 66.2% for honeybee
and fruitfly, respectively (Table 7).
Codon usage in A. pisurn mtDNA is highly biased against codons tich in G or C.
Among the 159 amino acids mslated from the four partially sequenced protein genes. the
codons rnost frequently used were those rich in A and T (Table 8). Nearly hdf of the 159
amino acids could be accounted for by five codons, each containing only A or T in the third
position. The leucine codon, TTA, was present 21 tirnes, ATT coding for isoleucine was
present 20 tirnes, phenytalanine (TTT) was present 16 times, methionine (ATA) was
present 10 times and lysine (AAA) was present nine times (Table 8). Only one of the 22
totai Ieucine amino acids were coded by a G or C-containhg codon, TTG (Table 8). The
same bias holds for phenylaianine, isoleucine and methionine with the AAG codon being
completely absent from the nine lysine amino acid residues identified in the four gene
sequences. Furthermore, no codons cornposed only of G and C were detected. For
example, the amino acid glycine is coded for by the four codons GGT, GGC, GGA and
GGG. Of the eleven glycine residues identified, four were represented by GGT,seven by
GGA, and none by either GGC or GGG.
A+T content was found to differ markedly in the three codon positions (Table 9).
The A+T content at the third position, at which most but not ail substitutions are silent, is
95.7%. The second codon position is the least biased at 67.7% A+T with T being the most
frequently used (46.6%). G and C are present in nearly equd abundance. Eighty percent of
the first position nucleotides are either A or T and G is used nearly W e tirnes more often
Table 8. Codon usage table of the four pariiülly sequenced protein coding genes of A. pi.vio,t
Frequenc y
of A A
*Refer to iippendix 2 for translation of the stiindard one-lettercode used to specify the 22 amino ücids within this table
Table 9. Base composition of the codons used in the four pürîiülly sequenced protein coding gencs for A. pistrrn
4 1 .O
Codon Position
than is C at this position. The ratio of the GC-nch amino acid codons of proline, danine,
arginine and glycine to the AT-rich codons of phenylalanine, isoleucine, methionine,
tyrosine, asparagine and lysine has been used to examine the relationship between base
composition of a codon family and amino acid occurrence (Crozier and Crozier, 1993).
This value is 0.297 for A. pisrim, which is approxirnately 10% Iower than the ratio of
0.328 calculated for the same sequeace in A. melfifem, and considerably lower than the
values detennined for D. yakiiba and A. q~indrimnczïfcztus(0.438 and 0.508, respectively).
Cyt b and NDC intergenic region
In the other insects sequenced to date, the genes for Cyt b and ND 1 are encoded on
opposite strands of the mtDNA so that their 3' ends are adjacent to one another. The
tRNAser gene is located between these two genes (Clary and Wolstenholme, 1985; Crozier
and Crozier, 1993; Mitchell et al., 1993). This region in A. pistim is iilustrated in figure 1.
Complete stop codons (TAA) are used for both the Cyt b and ND1 genes and the 3' ends of
both genes are flanked by several intergenic noncoding nt. In A. pisum, a totai of eight nt
separate the 3' end of the Cyt b gene frorn the beginning of the tRNAsr gene whereas only
two and six nt separate this region for A. qrrczdrimacrrlafus and D. yakuba, respectively. By
contrast, the honeybee has 45 nt spanning this region as well as an additional 15 nt
encoding five amino acids at the 3' end of the Cyt b gene. The 3' end of the ND1 gene for
D. ycrkrrba overlaps the tRNAser gene by 15 nt (five amino acids) followed by a complete
TAA stop codon. In the pea aphid, the mosquito and the honeybee, there are several
intergenic nt separating the TAA stop codon of ND1 and the 3' end of tRNA5er. The 10 nt
sequence of A. pisrim is identical to 10 of the 19 intergenic nt identified in A .
qucrdrimaculrrtus. Again, the honeybee has the longest noncoding region with 34 intergenic
Figure 1. The A. piscm DNA and protein coding sequences for the 3' ends of both the
Cytochrome b and ND1 genes, the tRNAser gene, and the intergenic spacers abutting the
tRNAsrr gene. The insects used for the cornparison are abbreviated as follows: D. yak.
(Drosophita ynkirba); A. mell. (Apis rnett$era); and A. quad. ( A n o p h e l e s
qi~aciritnacutariis).Single bold asterisks (*) indicate the beginninp and ending of the
tRNAser gene with the TGA anticodon sequence highlighted in bold. The triple astensks
(***) denote
the stop TAA codon for the Cyt b and ND 1 penes. Arrows (->)
the 5' to 3' orientation of the gene, Single letter codes designate the runino acids coded for
by the DNA sequences of Cyt b and ND 1 of the four insects.
Cyt b
p i %un
me l 1 .
0. yak.
me l 1 .
i n t e r g e n i ~zone .***+4
- - . N E M L K K S F F l W L L F S F L L I K
--- F
. O -
0. yak.
D. yak.
me1 1. TT
Y f l L K
p i sum AATTT
Transfer RNA genes
The location and anangement of the four A. pisum RNA genes sequenced were
determined on the basis of sequence similarity with pubtished tRNA gene sequences.
These sequences were then confmed by their predicted folding potential into the
characteristic cloverleaf structures (Fig 2). All four tRNAs are similar to the inferred
configurations proposed for other insects and concordant with the general features
associated with the amino-acyl arm, the anticodon arm, the DW-arm and the TiyC m.
Non-standard base pirings occur in the secondary structures of severai tRNAs including
t R N A h G-U pairs are known to occur in RNA molecules and are quite cornmon in
rRNA. One T-T basepair mismatch is present in the predicted configuration of A. pisiim
tRNAdn. The location and orientation of the four tRNAs in pea aphid mtDNA is identicai
to that of D.yakuba and A. quadrtinacularirs. The three tRNAs, tRNAfmet, tRNAdn and
tRNAiIc, are located between the A+T rich region and the 5' end of the ND2 gene and are
al1 transcribed in the same direction as that of D.yakrrba and A. qriadrimaculatris. Hence,
only tRNAgln is transcribed on the secondary strand whereas the other three tRNAs are
transcribed in the sarne direction on the primary strand. htergenic spacers, as seen with
tRNAser (Figure 1)' are also present between two of the three other tRNAs. Eight
nucleotides separate tRNAfmet from tRNAgin. However, the 3' end of tRNAdn overhps
the 3' end tRNAiie by three nucleotides (Figure 3). The total Length of the tRNA genes also
differs slightly between the four insects. The A. pisum tRNAser (65 nt) and tRNAiie (64
nt) genes are the shortest and the tRNAgin gene is only three nucleotides longer (66 nt)
than that of the honeybee. Although the A+T content of pea aphid tRNAser gene is the
highest (91%) arnong the four insects, the A+T content of the other three A. pisum tRNA
genes is considerably lower than that of the other insects.
T- A
T -A
A- T
A -T
A- T
T T G T ~
G T~ C A T *
A- T
t î A
A. p h m MNA Ser(UCA)
A.pisum tRNA Gln
A. pisum tRNA Ile
A. pisum tRNA fMet
Figure 2. Predicted secondary structures for the four R N A genes of the pea aphid:
tRNA ser(UCA), tRNA gln, tRNA ile, tRNA fmet.
Figure 3. The DNA sequence of the large size class of cloned PCR products from A.
pisum. Arrows (-)
indicates the 5' to 3' orientation of the gene. A single asterisk
designates the beginning andlor ending of the tRNA genes and the 12s rRNA gene.
The underlined 5 nt sequence highlighted by the number symbol (#) indicates the
beginning of the sequence of the smaller size class DNA molecule. Nucleotide
substitutions in the small cloned PCR fragment are indicated above the sequence.
Downward arrows (4) indicate the sites of nucleotide insertion or deletion in the srnaller
fragment. The eight nt sequence underhed and superscripted by astensks indicates the
short repetitive element postulated to be the progenitor of the longer 123 nt A+T rich
region repeat.
HaIrpin loop 2
Hatrptn [oop 3
A +T rich region
The A+T rich region was cIoned and sequenced for two DNA fragments of
different sizes generated From the amplification of a pea aphid clone that was
heteroplasrnic in region 1 using prirners #18 and #19 (Table 1). The larger of the two
PCR products was a total of 1398 nt in length. The smdler fragment was only 11 11 nt
long and Iacked a region of 287 nts irnrnediately adjacent to the tRNAmet-lc primer (Table
1) that inchdes three tRNAs genes coding for methionine, glutamine and isoleucine, a
hairpin Ioop and an additionai 40 nt (Figure 3). Measurement of restriction Fragments
suggested that the size of the putative repeat sequence was about 120 nt (Barrette et al.,
1994). Hence, this smaller PCR fragment is nearIy 50 nt shorter than that anticipated for
an RI- 1 size variant that Lacks two repeat units relative to the long variant. The sequence
alignment between the long fiagrnent (Rl-3) and the smaller fragment begins at nt 303 of
R1-3 in a pentanucleotide sequence (5' CATCG 3') that is a perfect match for the 3' end
of the tRNAmet-lc primer. This suggests that the shorter fragment was generated by
rnispriming of the tRNAmet-lc primer. Comparison of the sequences of the two PCR
fragments revealed a total of five transitions (three Tc->C;
and two Ac->G).
Four of
them occurred in the A+T rich region and one occurred in the 12s rRNA gene. There were
also three insertioddeietion events (indels) involving a single nucleotide and two
involving dinucleotide sequences.
Based on the tentative positioning of the 5' end of the 12s rRNA gene, the A+T
rich region of the Rl-3 A. pisum clone was determined to be 848 nt in length. The A+T
content of this region is suprisingly iower (38.396, Table 10) than that of the other three
insects. Sequences similar to those detected in the control regions of marnmalian systems
were absent in the A+T rich region of A. pisum. No conserved sequence blocks (CSB 13), termination associated sequences (TAS), or transcriptional promoters sirnilar to the
LSP and HSP were detected. However, a total of three hairpin loop structures were
found, and the one that is immediately adjacent to the tRNAile gene yields a structure
analogous to the stem and loop structure associated with the L-strand origin of replication
of frog, human and mouse (Figure 4). This structure consists of a 26-nt imperfect stem and
a 24-nt terminai Loop that includes a hexanucleotide sequence of T's. This loop structure
also contains the pentanucleotide sequence CCAAT identified as the "CAAT box"
transcription factor CW/N l binding site. Two open reading frames (CRF) were identified
within the A+T rich region, one 240 nt sequence (80 amino acids) is located between nt
205 and 448 on the pnmary strand (Figure 3) and the other 243 nt sequence (8 1 A.A.) is
located on the complementary strand between nt 343 and 589. Blast search analysis of the
GenBank database revealed no significant similarity between the putative translated
sequence of these two ORFs to any published protein sequences.
The A+T rich region of A. pisurn was found to contain one primary repetitive
element located adjacent to the 5' end of the 12s rRNA gene (Figure 3). The repeat, similar
in size to that expected from the restriction site data (Barrette et al., 1994), is 123 nt in
length. The RI-3 size variant contains two complete repeat copies and one uuncated repeat
(the furthest from the 12s rRNA gene) in which 35 nt have been deleted from the 5' end
(Figure 5). One of the five transitions identified earlier between the two cloned PCR
fragments occun at the 3' end of the middle repeat. Moreover, a short poly-T stretch
within the repetitive element was found to be five nt long in the r3 repeat (adjacent to the
12s rRNA gene) and six nt in the middle repeat, P. This poly-T stretch is the location at
which the 35 nt deletion begins in the first repeat. RNA folding anaiysis of the repetitive
element revealed that a single repetitive element c m be folded into a thermodynarnically
stable secondas, structure with a AG0 of -8.3 kcal/mol (Figure 6a). The secondary structure
of two elements has a AGo of -19.7 kcai/mol (Figure 6b). The only other repetitive
sequence detected in the A+T rich region was that of a srnail octanucleotide 5' TïAAAAAT
3' sequence present in two copies immediately adjacent to the last repeat and also present
within each repetitive unit.
T- A
A- T
A- T
T- A
T- A
T- A
T A-T'
T- A
T- A
T- A
T 3'
Figure 4. Predicted secondary structure of primary hairpin loop identified in the A+T-rich
region of pea aphid.
12 0
Figure 5 . Alignment of the three repeiiiive units idcntified in the lurger size clüss A+T rich region clone of A. pisirm. Delta synibol (A)
indicates location of site differences beiween the repeüting units. The hiitched syinbol (-) represents an iiiseriionfdeletion site and the arrüy
of asterisk (*) depicts the trunciited zone of the ri repeat. Superscript L and S refers to the large and small size cluss clones, respeciively.
t a
t ~ a a
a :aactt
a a t t t t laat
'tl aat
t a 3'
aa aa
t tac .Caa
t cta
Figure 6. Predicted secondary structures for the repetitive sequence identified in the A+Trich region of pea aphid: (a) one repeating element (AG0= -8.3kcal/mol) and (II) two
repeating elements (AG0= -19.7kcaiirnol).
Cornparison of fmit fly, honeybee and mosquito reveals that the protein and
ribosomal RNA gene order is identical among the three insects (Clary and Woistenholme,
1985; Mitchell et ai., 1993; Crozier and Crozier, L993) and it is therefore highly probable
to be the same in A. pisum. Thus far, a total of four pea aphid protein coding genes (COI,
ND4, ND1 and Cyt b) have been paniaiiy sequenced and their transcriptional orientation is
identical to that of D. yakubn, A. quadrimaczikztus and A. rnellifera (Clary and
Wolstenholme, 1985; Mitchell et al., 1993; Crozier and Crozier, 1993). However,
sequence analysis of these four protein encoding genes has detected none of the restriction
sites anticipated to be found based on the restriction and gene order map (Appendix le).
Since these partial sequences represent only a fraction of the total gene sequence, and given
that the gene order map may be skewed in its alignment relative to the restriction map
(Barrette et ai., 1994), it is not unusual that the restriction sites were not detected.
The tRNA genes are considerably more labile in their positioning than are the other
mtDNA genes. Relative to D.yakuba, eleven tRNAs of A. mellifera and three tRNAs of A.
quadrimacrrlntirs are located in different positions within the mtDNA genome (Crozier and
Crozier, 1993; Mitchell et al., 1993). Hence, description of tRNA positioning in the pea
aphid genorne will be Limited to the four tRNAs that have been identified by sequence and
secondary structure analysis. The three tRNAs flanking the A+T rich region of pea aphid
(tRNAfmet,tRNAdn and tRNAiIe) are identical in their location and orientation to D.
ynkubn and A. qziadrimaculatiis. Eight nucleotides sepmte tRNAfmet and R N A & in both
A. pisum and D. yakuba, whereas in A. quadrimaculntus, both of these tRNAs are directly
adjacent to one another. In D.yakuba and A. qundrimaculatus, tRNAile and [email protected]
separated by 3 1 nt and one nt, respectively. However, in A. pisum as with A. gambiae, a
close relative of A. quadnmûculatns, the 3' ends of both the tRNAgin and tRNAIlr genes
overlap by three nt (Clary and Wolstenholme, 1985; Beard et al., 1993; Mitchell et al.,
1993). In honeybee, three additional tRNAs (tRNAElu, tRNAserAGN and tRNAda) occur in
this region and the orientation of some of these genes relative to D. yakrrba (Clary and
Wolstenholme, 1985) has changed. These six tRNA genes are separated by either a single
nt or by as many as 49 nt and al1 are transcribed in the same direction (Crozier and Crozier,
1993). The location and orientation of the tRNAserUCN (between the 3' ends of Cyt b and
ND0 is identical in D. yakrrba, A, quaJrirnaculntus, A. mellifera and A. pisum mtDNAs
(Clary and Wolstenholme, 1985; Mitchell et al., 1993; Crozier and Crozier, 1993).
However, with the exception of D.yakrtba, the tRNAsergene is separated by as few as two
to as rnany as 45 nt.
As found in other rnetazoan mitochondrial genomes (Wolstenholrne, 1992), the
RNA genes of A. pisum will likely be dispersed throughout the various protein coding and
rRNA genes. This distribution is consistent with the tRNA punctuation mode1 proposed by
Ojaia et al. (1980, L98 1) who suggested that the secondary structure of the RNA genes
serves as a recognition site for processing the polycistronic mRNA. in situations where the
genes are not separated by RN&, a recognition mechanism involving secondary structure
is believed to function as the cleavage site for E2Nase-P like enzymes. For exarnple, the
junctions between the ATPase 6 and CODI genes (no intergenic nt), between the ND4L and
ND4 genes (separated by 33 nt), and between the ND6 and Cyt b genes (separated by 10
nt) of the crustacean Arrernia. frcznciscana are capable of foming hairpin loop
configurations (Valverde et al., 1994). Similar hairpin structures have been reported at the
ATPase 6/COlII and ND4UND4 junctions of D. yczhrba mtDNA (de Bruijn, 1983; Clary
and Wolstenholme, 1985).
Overlapping coding sequences have been detected in a number of rnetazoan mtDNA
genomes. Some ovedaps occur between the 3' ends of two genes that are encoded on
opposite strands of the molecule. There are d s o cases of overlap that involve genes
encoded on the same strand. For example, the ATPase 8 gene overIaps with the ATPase 6
gene by between two and 46 nt in vertebrate and higher invertebrate rntDNAs. Similady,
the ND4L gene overlaps the ND4 gene by between four and seven nt in vertebrate and
some higher invertebrate mtDNAs (Wolstenholme, 1992). Ojala et ai. (1981) successfully
isolated mature, bicistronic transcripts of the ATPase 8 and ATPase 6 genes, and of the
ND4L and ND4 genes from HeLa mitochondria. Subsequent protein isolation from bovine
mitochondria recovered fully functional ATPase 6 & 8 proteins that corresponded to the
size and sequence predicted from the overlapping coding regions (Feadey and Walker,
1986). Therefore, transIation of the ATPase 6 protein must involve ribosome binding and
translation initiation within the ATPase 8 coding region (Wolstenholme, 1992). Although
sequencing of pea aphid protein and tRNA coding regions was limited, it is most probable
that pea aphid mtDNA is similar to other insect mtDNA genomes with overIaps occurring
between the ATPase 8 & 6 genes and possibly the ND4L and ND4 genes. However, the 15
nt overlap between the 3' end of the ND1 gene and the tRNAser gene identified in D.
ynktibn is absent in A. pis~un,A. mellifera and A. quadrirnncularrrs.
Pea aphid rntDNA shows the compact organization similar to that observed in other
animal rntDNAs. The number of intergenic nt in D. y a k h a , although four times that of A.
qundrimaculatus (183 nt and 46 nt, respectively) and considerably less then the 81 1
detected in A. rnellifera, most likely parallels that of the pea aphid. Few intergenic
nucleotides were detected between any of the three R N A genes found adjacent to the A+T
rich region or the ND 1 and Cyt b junction (Figure 1).
Genetic code and codon usage
The DrosophiIa genetic code differs only slightly from the standard genetic code.
The codons AGG and AGA specify the amino acid serine in the mitochondrial genetic
code, rather then arginine. ATA specifies methionine instead of isoleucine and TGA codes
for tryptophan rather then acting as a termination codon (WolstenhoIme, 1992). However,
until the pea aphid COI, ND4, ND1 and Cyt b proteins are isolated, sequenced and
compared to their corresponding DNA sequences, it will not be possible to address the
issue of coding differences when the Drosophila genetic code is used to predict codon
assignrncnt in A. pisrim. in A. mellifera, al1 but two tFU4As use the same sequence in the
anticodon region as reported in D. yakrrbn (Crozier and Crozier, 1993). The anticodon
sequence for tRNAlys ÎS CTT in D. pktiba and TïT in A. mellfera as in Xenopus (Roe et
al., 1985), Gallr~s(Desjardins and Morais, 1990) and Caenorhabditis (Okiomoto et al.,
1992).GCT is the anticodon sequence for tRNAserAGN in D. yakttba, Xenopus and Gallra
compared to TCT in Apis and Caenorhabcliris.
The A+T content of vertebrate mtDNA ranges from 56% to 64% (Gadaleta et al.,
1989) with the base composition between the two strands being unequally distributed. The
mtDNA of insects, including that of pea aphid. is considerably higher in A i T content than
that of other invertebrates and vertebrates. AiT Levels range from 84.9% in honeybee
(Crozier and Crozier, 1993) to 77.4% in mosquito (Mitchell et al., 1993) with coding
regions and A+T content being more equally distributed between the two strands. Aithough
only a Iirnited portion of the pea aphid mitochondrial genome was sequenced, it is quite
evident that its A+T content is simiIar to that of the honeybee whose A+T content
cepresents the highest recorded level for metazoan mtDNA (Tables 2 , 3 , 4 , 5 and 6).
The codons most frequently used are those hesivily biased in A+T content and even
more so in those ending with A or T nucleotides. For example, phenyialanine is encoded
by the codons TTT and TTC. in the frog, Xenopus lnevis, ï T T and TTC occur 125 and
105 times, respectivdy (Roe et al., 1985) whereas in Drosophila they occur 3 13 and 17
times (Clary and Wolstenholme, 1985), and 355 and 26 times in Apis (Crozier and
Crozier, 1993). Moreover, excluding termination codons, the percentage of codons ending
in A or T is 93.8% in Drosophila and 95.2% in Apis. Although the sample size is
considerably srnalier, the use of TTT and TTC phenylalanine codons in A. pistrm is dso
heavily skewed: TM' occun 16 times and TTC occurs only twice. Of the 159 pea aphid
codons screened, 95.7% ended with either A or T, and 56% of the codons consisted
entirely of A andlor T (Table 8). Furthermore, when the relationship between the base
composition of a codon family and the occurrence of the corresponding amino acid is
examined within D. yakuba, A. q~indrimuculatus,A. rnellifera and A. pisrirn, there is a
ciear bias in favour of amino acids encoded by A+T rich codons (Crozier and Crozier,
1993). The ratio of 0.297 in A. pisum is similar in magnitude to that of A. melliferci, yet
both ratios are considerably srnalier then either D.yakitba or A. qrmdrimaculatrrs.
The "A+T pressure" mode1 (lukes and Bhushan, 1986; Jermiin et al., 1994)
proposes that there is an evolutionary tendency toward directional substitutions patterns that
result in the accumulation of A and T nucleotides within protein coding regions, and in
particular within synonymous codon sites. Clary and Wolstenholme (1985) postulated
continuous selection for A+T nucleotides at al1 sites of Drosophiln mtDNA where it is
compatible with function. However, the enzymes involved in both transcription and
replication of organïsms with A+T rich mtDNA are no more efficient at processing A+T
nch DNA then G+C rîch DNA (Lewis et ai., 1994). Studies of mtDNA polymerase show
no preferences for dATP or dTI'P substrates (Lewis et al., L994).Furthermore, replication
of mtDNA in both invertebrates and vertebrates is highly accurate with in vitro error rates
of one in every one million bases replicated (Kunkel, 1985; Wemette et al., 1988; Kunkel
and Mosbaugh, 1989). On the other hand, fidelity of DNA polymerase activity is also
known to be affected by in vitro reaction conditions, and in particular by nucleotide pool
biases (Wernette et al., 1988; Olson and Kaguni, 1992). Unfominately, little is known
about nucleotide pools or their regulation in vivo (Lewis et al., 1994).
Two additionai explanations for A+T richness have been proposed by Crozier and
Crozier (1993) and both involve the conversion of GC pairs to AT pairs via alkylation. 0 6 methylguanine, generated by the alkylation of guanine, often mispairs with thymine,
leading to the replacement of GC pairs with AT pairs during mtDNA replication (Watson et
al., 1987). It has been demonstrated chat aikylating agents capable of producing 0 6 -
methylguanine are endogenously produced in prokaryotic cells (Reebeck and Samson,
199 1) and in large arnounts in some insect mitochondria (Beard et al., 1993). The second
hypothesis proposes that the mitochondria are relatively ineFficient at importing the DNA
repair methyltransferases needed to combat mutation resulting from methylation (Crozier
and Crozier, 1993).
A+T ricti region
The A. pisrlm mtDNA moIecuIe sequenced in this study contains an A+T rich region
of 848 nt which is slighdy larger than chat of A. rnellifera (826 nt) (Crozier and Crozier,
1993) but srnaller than that of D. yukrtba (1077 nt) (Clary and Wolstenholme, 1985).
However, a 123 nt repetitive element is present within this region in pea aphid and the
clone sequenced in the present study is the Largest size class variant (ie. RI-3; Barrette et
al., 1994). Thus, the expected size range of the A+T rich region is 602 bp, 725 bp and 848
bp depending on the number of repeats present. Furthemore, since the 5' domain of the
12s rRNA gene exhibits considerable length and nucleotide sequence divergence arnong
various metazoan animais (Wolstenholme, 1992; Neefs et ai., 1993; Lewis et ai., 1995)'
determination of the 5' end of the A+T rich region based upon sequence aiignrnent is not
very reliable. Total length is therefore based on the "tentative" placement of the 5' end of
the 12s rRNA gene (Figure 3). A more accunte method to determine the 5' end of the 12s
rRNA gene would involve primer extension mapping, or sequence and secondary structure
analysis of the entire 12s rRNA gene.
The pea aphid A+T rich region is 88% A+T which is nearly 10% less than that of A.
rnellifera and 6% less then A. q~tndrimac~datus.
Two long ORFs encoding 80 and 8 1
amino acids were located in this region and comparative analysis of protein sequences from
the GenBank revealed no substantiai full length sequence similarities. However, nine of the
first 14 arnino acids of one of the ORFs matched to the large subunit of the mRNA capping
enzyme of the variola virus. It is possible that this coding sequence represents an ancestral
gene reiic that has since been integrated into the nuclear genome (Brown, 1985)
Comparative analysis of the A+T rich region of A. pisum revealed no apparent
signals for initiation of transcription and replication comparable to those detected in
mammalian systems. Sequences analogous to the conserved sequence blocks are absent in
A. p i s m as in dl other invertebrates. In vertebrates, a total of three CSBs have been
identified. CSB-1 and CSB-2 are stongly conserved throughout marnmals. However, the
third CSB is absent in the playpus, bovine and cetacean control region (Anderson et al.,
1982; Southern et ai, 1988; Hoelzel et ai., 1991; Dillon and Wright, 1993; Gemrnell et al.,
1996) and therefore its functional importance is questionable. There are three primary roles
that have been proposed for the CSBs: (1) involvement in the transition of RNA to DNA
synthesis in mtDNA (Chang and CIayton, 1985). (2) relief of supercoiling during H-strand
synthesis (Low et al., 1987), and (3) substrate recognition by RNA processing enzymes
(Chang and Clayton, 1987; Low et al., L988; Coté and Ruiz-Carrillo, 1993). Thus far,
three endoribonucleases including RNase MRP have been identified that show substrate
specificity for CSB-2 and wiii cleave in vitro either DNA or RNA at this point (Chang and
Clayton, 1987; Low et al., 1988; Coté and Ruiz-Carrillo, 1993). It has also been
demonsuated that CSB2 is a binding site for mtSSB identified by Mignotte et al. (1985)
and together with CSB-1 is a binding site for rnitochondrial transcription factor MTF
(Gernrnell et al., 1996).
Neither termination associated sequences (Doda et al., 1981) nor the light and heavy
strand transcriptional promoters (Chang and Clayton, 1984; 1985) are present in the pea
aphid A+T rich region. The TAS, dthough moderately conserved, have been identified in a
large number of taxa and are thought to signal the end of D-loop synthesis (Doda et al.,
1981, Mackay et al., 1986; Dunon-Bluteau et ai., 1987). In humans, a short region
associated with the LSP and HSP is essential for promoter function (Hixson and Clayton,
1985). Efficient transcription requires the presence of a trans-acting protein termed mtTFl
(Parisi & Clayton, 1991; Fisher et al., 1992), which h a been shown to bind 10 to 40 nt
upstrearn of the transcriptional staa sites at each promoter where it wraps and bends the
duplex DNA (Fisher et al., 1992). Mitochondrial RNA polymerase binds at the
transcriptional start site where it is presumed to interact either directly or indirectly with
mtTF1 and begins transcribing the mtDNA encoded genes. Transcription in the mammdian
system is terminated at a site located at the 3' end of the rRNA region directly adjacent to
the tRNAleu gene. Tt is believed that termination is accomplished through the binding of
protein(s) at a 13 nt sequence located at the 5' end of the tRNAieu by a simple footprinting
event (Kruse et al., 1989; Hess et ai., L991). Thus, any hrther transcription by mtRNA
polymerase is prevented by blockage with a termination protein (Clayton, 1992).
The only structural element detected in the A+T rich region of pea aphid mtDNA that
is sirnilar between both vertebrates and invertebntes is a hairpin structure containing a Trich loop. Hairpin structural configurations are of interest because replication of circular
DNA molecules obtained from a variety of sources has been shown to be initiated within or
adjacent to these regions (Schailer, 1978; Tijan, 1978). Replication of the light strand (Le.,
equivaient to the invertebrate second srrand) in mouse, human and frog is initiated within a
short sequence located at the junction of the t R N A c y s and tRNAasn genes (Martens and
Clayton, 1979; Tapper and Clayron, L98 1; Clayton, 1982; Wong et ai., 1983). This short
sequence, capable of forming a highiy stable stem and T-rich loop structure, is conserved
among al1 vertebrates except chicken (Chang et ai., 1985; Roe et al., 1985; Dejardins and
Morais, 1990; Clayton, 1992). DNA replication at the OLof vertebrates is known to begin
with the synthesis of a RNA primer starting within a run of T's (1 1 in mouse and seven in
human) of the T-rich Ioop (Le., ternplate strand) and continuing to the RNA/DNA transition
point located within a pentanudeotide sequence (5' CGGCC 3') at the base of the stem
(Wong and Clayton, 1985; Hixson and Clayton, 1985). Although this secondary structure
is absent in chicken and quail, the S'CGGCC 3' target sequence for RNase-H andor
transition from RNA to DNA synthesis is present in the amino acid acceptor stem of the
tRNAcys gene suggesting that an alternate secondary smcsural conformation may be
responsible for the OLof lagornorphs (Desjardins and Morais, 1991). A sirnilar stem and
loop motif is located at the intergenic region between the ND4 and CO1 genes of two
nematodes (Okiomoto et al., 1992). In C. elegans, the stem and ioop structure is 109 nt
long with a loop containhg a run of 4 T's whereas in A. suclm the structure is 117 nt with
a mn of 6 T's within the loop. A stem and T-rich loop structure exhibiting considerable
resemblance to the vertebrate OLhas been found within the A+T rich region of D. yakrrba,
D. virilis and D. teisieri (Clary and Wolstenholme, 1987; Monnerot et al., 1990). EIectron
microscopy studies have mapped the origin of second strand synthesis to an area of the
Drosophila A+T rich region found to contain this hairpin structure (Goddard and
Wolstenholme, 1978; 1980). Furthemore, since the 5'-3' sequences of the stem and loop
structure would be the product of fkst strand synthesis from this structure proceeding
towacds the 12s rRNA gene, it has been suggested that this structure in Drosophila is the
functional equivalent of the vertebrate OL-contriinhg sequence (Clary and Wolstenholme,
1987; Wolstenholme, 1992). In A. pisum. the hairpin structure is lacated immediately
adjacent to the tRNAilt gene whereas in Drosophila, the structure is some 250 nt
downstream from the tRNAi1r gene. Although the pentanucleotide transition sequence 5'
CGGCC 3' located at the base of the stem of the rnammalian OL is absent in A. p i s m and
Drosophila, some other nucleotide sequence may function as the transition site. It is
therefore very plausible that this structure is the origin of second strand synthesis in pea
aphid mtDNA. Analysis of secondary structure in the mtDNA genome of A .
quadrimaculatus and A. mellifera has not revealed che presence of the putative second
strand origin stem and T-rich loop structure (Mitchell et al., 1993; Crozier and Crozier,
The loop portion of A. pisum's putative second strand origin also contains a
pentanudeotide sequence (CCAAT) identified as the "CAAT" box (Efstradiadis et al.,
1980). This sequence has been shown to be important in transcriptionai promotion of
mammalian alpha and beta globulin genes (Dierks et al., 1983). Subsequent research bas
identified CCAAT trmscriptional factors (CTF)that bind DNA in a sequence specific
rnanner and facilitate either the initial binding of RNA polymerase to a promoter, or the
isomerization of the RNA poIymerase/promoter complex into an open configuration
(Mcbight and Tijan, 1986). However, this pentanucleotide sequence is absent in both D.
ynktrba and A. quadrimnculat~ismtDNA, but is found once in the A+T nch region of A.
nrdl.$era, although it is not associated with any secondary structure. Moreover, since this
"CCAAT" sequence has not been documented in any other mitochondrial genome, it is
possible that its occurrence in pea aphid and honeybee is a coincidence. Further anaiysis is
necessary to estabiish any functional importance of this sequence.
Two other hairpin and laop structures were found downsueam of the putative
second strand origin and upstream of the commencement of the 123 nt cepetitive sequence
in pea aphid (Figure 3). However, the function of these secondary structures is currently
unknown. Among vertebrate 0 H 9 s , a secondary structure is found oniy in humans (Attardi
et al., 1978; Chang et al., 1985). [t is therefore conceivable that one of these other hairpin
loops could represent the origin of first strand synthesis in A. pisrrrn. In Drosophiln, an
approximately 300 nt conserved sequence elernent adjacent to the tRNAiie gene is involved
in a site specific protein-DNA interaction. This area was shown to be protected from
crosslinking with 4,5', 8- trimethylpsoralen in both D. melnnugaster (Potter et ai-. 1980)
and in D. virilis (Pardue et al., 1984) and ovedaps the origin of DNA replication in the
three species in wfiich it has been mpped by electron microscopy studies (Wolstenholme et
al., 1983). in A. gambiae, a tocal of seven stem and loop motifs are present in the A+T rich
region. However, no functionai relevance has yet been ascribed to any of these structures
(Beard et d., 1993). Neverthelas, in spite of some sirniiarities to other documented ongins
of mitochondrial replication, the functional relevance of these hairpin Ioop motifs in A.
pimm rernain tentative until experimentai studies c o n f i their rote in replication.
Another sequence motif present in the A+T rich region of the pea aphid that may
have functional relevance is a 49 nt sequence consisting mostly of polypyrimidine runs. It
begins approximately 25 nt downstream from the putative second strand replication ongin.
A run of T's, aithough only 10 nt in length, may be analogous to the poly-T runs identified
in Drosophila (Clary and Wolstenhoime, 1987; Monforte et al., 1993; Lewis et d.,1994).
Members of the genus Drosopkila have two poly-T tracts at the ends of the A+T rich
region. One 19 to 25 poly-T sequence flanks the 300 nt conserved element and the other 13
to 17 poly-T sequence is located on the complementary strand adjacent to the 12s rRNA
gene. The position and orientation of the two conserved poly-T segments present in
Drosophila suggest a role in promoting both transcription and replication (Lewis et al.,
1994). In both nuclear and viral systems, thyrnidylate homopolymers have been shown to
function as core elements in DNA replication, and as trmscriptional activators (Campbell,
1986; Delucia et al., 1986).
Lastly, the uea spanning nt 265 to nt 349 of the A. pisuni A+T rkh region (Figure
3) has a rather elevated G+C content of 40%. This does lower the overail A+T content of
the pea aphid A+T rich region to ievels less than the expected values detected in A .
mellifem (96%), A. quc~drirnaculatiis(94%) or D. yakuba (92.8%). Given the extreme A+T
bias observed in A. pislim gene coding regions, it would be highly unlikely that this region
of elevated G+C content is devoid of the "A+T pressure". Therefore in ail likelihwd, there
could be some type of Functionality (transcriptionai promotion?) associated with this area,
restricting its susceptibility to increased A+T content.
Length variation and heteroplasmy
Attardi's (1985) assertion that the animal mitochondriai genome is "an extremely
economical unit" is a broad generdization, that although based on the limited number of
rntDNA genomes available at the time, is in many facets stdl accurate. Intense selection for
small size and invariable structure has been a primary dnving force throughout mtDNA
evolution (Attardi, 1985; Brown, 1985). The mtDNA of many organisrns contains only the
full complement of protein, ribosomal and tRNA coding seqences with few superfluous
noncoding sequences. Coding regions are ofien directiy adjacent to one another, separated
by a few intergenic bases, or in some circumstances, overlapping. However, the number of
fully sequenced mtDNA genomes has greatly increased over the 1 s t decade. Numerous
examples of mtDNA genome size variation and heteroplasmy have now been chancterized,
making it necessary to revise some of Attardi's original views. The pea aphid is but another
example of the fiuidity in mtDNA genome size and heteroplasmy.
Two length variable regions have been identified in the pea aphid mtDNA genome.
Three length variants ( 1,2,3) differing in size by multiples of approximately 120 bp were
found in region one (Barrette et a1.,1994). Nearly 80% of the pea aphids screened
contained the largest R1-3 size class (Appendix lb) and none of the clones were found to
be heteroplasrnic. An additional sixteen A. pisum clonal lineages were screened for their
TaqI restriction enzyme pattern (data not shown). One lineage was heteroplasmic at region
one (R 1- 1 & R 1-3) and subsequently used as template for PCR amplification. DNA
sequence analysis revealed that region one is located within the A+T rich region of the
genome and that two potential mechanisms are responsible for the size fluctuation. The Fust
mechanism involves the truncation of three tRNA genes (methionine, glutamine and
isoleucine) and the hairpin loop structure identified in this report as the putative ongin of
second strand mtDNA replication. Although the absence of tRNA genes is documented in
the sea anemone M. senile (Wolstenholme, 1992), it is highly unlikely that this is a viable
explanation for pea aphid size variation. Since region 1 comprises three size categories, it
would require the deletion of two tRNAs to account for each size class shift. Different size
classes would therefore have a distinctive complement of tRNA genes. Moreover, the total
size change between the large and small PCR products is nearly 50 bp greater than the
expected 240 bp based on restriction enzyme analysis. Lastly, the small PCR product
begins its sequence alignment with the larger PCR product at a pentanucleotide locale that is
a perfect match for the 3' end of the tRNAmet-lc PCR primer. Therefore, in di likelihood,
the smdier fragment is a PCR mispriming artifact.
The second mechanism is the more plausible explanation for the observed size
variation detected within the pea aphid A+T rich region. This mechanism involves
differences in the copy number of a 123 nt tandem repeat identified downstream from the
putative second strand origin of replication. This locale contains one complete copy of a
123 nt sequence, a second copy of 122 nt in which a T has been deleted from a poly-T
stretch plus a third copy ( h h e s t from the 12s rRNA gene) that is tmncated by 35 nt at its
5' end. Ail available data suggest that length heterogeneity is due to variation in the number
of this repeat sequence, but this cannot be confmed until the other two (RI-1 & Rl-2) size
classes of mtDNA molecules are cloned and sequenced. This has been an extremely
difficult process. Numerous cloning attempts using both PCR-mediated routes or direct
cloning of restriction digested ultrapurified mtDNA have thus far been unsuccessfu~(data
not shown). An alternative method to cloning and sequencing involves Southem probing.
oligonucleotide, complementary to the 123 nt repeat, could be P32 end-labelled and then
used for Southem probing of slot blots containing the various pea aphid mtDNA size
classes. Scanning densitometry measures of band intensity would quantify the number of
repeating units present in each size category. Nevertheless, the most logical explanation for
length variation within the A+T rich region is variation in the number of 123 nt repeat
The presence of tandedy repeated motifs in the noncoding region of metazoan
mtDNA is common and has been documented in numerous invertebrates including sea
scallops (Snyder et ai., 1987; La Roche et al., 1990; Gjetvaj et al., 1992), cricket (Rand
and Harrison, 1989), bark weevils (Boyce et al., 1989), honeybee (Comuet et al., 199l),
fruit fly (Monforte et al., 1993; Pissios and Scouras, 1993; Lewis et al., 1994) and
nematodes (Okiomoto et al., 1991). Two size variable regions have been identified in the
mtDNA molecule of the bird cherry oak aphid R. padi (Martinez-Torres et al., 1996). A
113 bp tandemly repeated unit present in one to eight copies has been detected within the
A+T rich region. However, unlike the pea aphid, 25% of the clones surveyed were
heteroplasmic for this region. A second length variable locale differing by increments of
about 100 bp is localized to an area close to the ND5 gene. Five size classes have thus far
been detected with only 12% of the 148 surveyed R. padi clones heteroplasmic for this
second local (Martinez-Torres et al., 1996).
Several molecular mechanisms have been proposed to account for length variation
in mtDNA: (1) transposition (Rand and Harrison, 1989); (2) intra and inter rnolecular
recombination (Rand and Harrison, 1989) (3) slippage replication (Sueisinger et al., 1966;
Efstradiadis et al., 1980; Levinson and Gutman, 1987; Buroker et al., 1990; Hayasaka et
al., 1991; Wilkinson and Chapman, 1991; Amason and Rand, 1992). It has been
documented that DNA can be transferred from the nuclear genome to the mitochondna by a
protein (Vestweber and Schatz, 1989), which suggest the possibility of transposition in
mitochondria. The presence of a small tandem repeat flanking a repetitive element (e.g.,
scallops; La Roche et al., 1990) is also suggestive of a transposable element. However, the
absence of an open reading f r m e of substantial length and of short inverted repeats at the
ends of the repetitive sequence do not support the hypothesis that it is a tramferable element
(Calos and Miller, 1980). Furthemore, transposition mechanisms cannot generate
insertions/deletions nor can they explain the fact that adjacent tandem repeats are more
sirnilar to one another than extemal repeats. Overall, evidence for transposable elernents in
animai mDNA is lacking (Brown, 1985; Moritz et al., 1987).
Rand and Harrison (1989) proposed a mitochondrial recombination mode1
involving a double molecule intermediate as a possible mechanisrn for the generation of a
variable number of tandem repeats (VNTR) in cricket mtDNA. Recombination between
repeats in the sarne molecule would cause the formation of loops containing one or more
repeats. Excision of the loop would create a mtDNA molecule that is shorter then the
original. Recombination between repeats on different molecules could produce daughter
molecules that were either shorter or longer than the parent molecules (Appendix 4a ).
Although severd gene remangements due to sequence inversion and translocation in
rnetazoan mtDNA have occurred, (Wolstenholme and Clary, 1985; Moritz et al., 1987;
Jacobs et al., 1988) recombination had yet to be documented in animal ceIl mitochondria
(Clayton, 1982; Brown, 1985: Hayashi et al., 1985; Moritz et al., 1987; Birky, 1991).
Hence, modets such as those of Rand and Harrison (1989) postuIating the generation of
length variation via recombination were believed to be unlikely. However, the minicircle
by-products of recombination have recently been identified in the phytonematode, M.
juvnniccz (Lunt and Hyman, 1997) suggesting that recombination mechanisrns may indeed
be involved in the generation of tandem cepetitive regions.
The currently favored mechanism for generating repetitive sequence polymorphism
in both phnt and animal mtDNA involves the occurrence of insertion and deletion events
via strand slippage during DNA replication (Streisinger et al., 1966; Efsuadiadis et al.,
1980; Tautz et al., 1986; Levinson and Gutman, 1987; Buroker et al., 1990; Hayasaka et
al., 1991; Wilkinson and Chapman, 1991; Amason and Rand, L992). New and longer
repeat motifs cm be generated from the mispairing of short simple contiguous or noncontiguous di-, tetra-, and hexanucleotide repeat motifs (Levinson and Gutman, 1987)
followed by polymerase- and endonuclease- mediated cepair. In the plant Oenothera, a 29
bp duplication present in the chloroplast DNA (cpDNA) was most likely caused by a single
slippage event involving a 6 bp short direct repeat (5' GAAATA 3') present at the 5' end of
the duplication and adjacent to its 3' end (Wolfson et al., 1991; refer to appendk 4b). Since
the overall in-vivo DNA-replication rates of mtDNA and cpDNA polymerases are extremely
slow, it has been suggested that much of their time in DNA synthesis is consumed by
pauses (Kornberg, 1980). Hence, when the replication complex pauses while passing
through the region destined to be duplicated, there is a localized melting of the newiy
synthesized daughter strand from the template stmnd. if the sequence GAAATA on the
nascent daughter s m d reanneals with the complement of the preceding GAAATA repeat
on the template strand, continuation of the replication cornplex from the upstream repeat
will create a duplication of the repeat sequence, Realignment of the daughter and template
strands is believed to be stabilized by the formation of a hairpin loop structure whose loop
contains the duplication. The stem portion of this hairpin structure is believed to be formed
between the GAAATA repeat and a 6 nt complementary sequence Iocated just upstream of
the duplication (refer to figure 4, Wolfson et al., 1991). Several length mutation "hot
spots" have been detected at many dispersed locations throughout the Oenothera cpDNA
genome. Al1 of these sites share the trait of multiple direct repeats suggesting that
replication slippage has been of general importance in its evolution (Blasko et al., 1988).
The illegitimate elongation model (Buroker et ai., L990) was proposed to explain
the peneration of the 82 nt RS 1 (Le., those adjacent to tRNApr0 TAS sequences; Hoelzel et
al., 1994) repeat in white sturgeon. Since each repeat was found to contain a TAS
sequence, an individu1 with multiple copies of the repeat would have several different
lengths of D-loop strands. Length mutations would originate through replication slippage
brought about by a competitive equilibrium between the H-strand and the D-loop s t ~ n for
base pairing with the L-strand. Misalignment of the repeat region could easily occur if the
D-loop was partially displaced by the H-strand and then reinvaded at a different repeat
locale. Since one or more single-stranded repeat units c m form thermodynamically stable
hairpin loop structures, the likelihood of rnisalignment is drarnatically increased by the
shortening of the length of either the D-loop strand, H-strand or L-strand. If a misaligned
D-loop strand is extended into a nascent H-strand during replication, gains or losses of a
repeat unit wouId occur. The heteroduplex molecule, with one or more repeat units
unpaired and stabilized by the interna1 base pairing capability of the repeats, would be
resolved at the next round of replication (Buroker et al., 1990).
Although the competitive displacement model is applicable to repeat sequence units
adjacent to the TAS sequences as found in cod (Johansen et al., 1990), evening bat
(Wilkinson and Chapman, 1991), frog (Roe et ai., 1985) and sturgeon (Buroker et al.,
1990), it cannot be generalized for the many other documented cases of Iength variable
m y s Iocated between the OHand the LSP (RS 3,4 & 5; Hoelzel et al., 1994). For
example, the repeat motifs of the platypus are Iocated between the major transcriptional
promoters and the OH of the control region (i.e., CSB-2 and the tRNAphe gene) (Gemme11
et al., 1996). In platypus and other similar cases, the length variation is believed to result
from rnisali,ment errors in the initial RNA priming event that precedes DNA replication.
Misdignment, facilitated by the propensity of repeat units to f o m stable secondary
structures, could result from the looping out of repeat units. Looping out of repeats in the
template strand would decrease the length of the RNA primer, while looping out of repeats
in the primer strand would result in an increase in the length of the RNA primer. Increases
or decreases in the RNA primer length would be accompanied by a subsequent change in
the overall length of the molecule when replication proceeds to completion (Gernrneii et al.,
Whether the length variants are generated by iliegitimate termination (Buroker et al.,
1990) or by illegitimate prirning (Ghiviwani et al.-1993; Gemmeil et ai., 1996). it is quite
evident that misalignment of the strands is promoted by the propensity of repeat sequences
to form thermodynamicaily stable structures with free energies of formation (AG0) that
decrease linearly with repeat number. The secondary structure free energy profiles of the 8 1
nt repeat within evening bats (Wilkinson and Chapman, 1991) have a AG0 of -25.4
kcal/moI for 2 repeats, -40.9 for 3 repeats, -51.2 for 4 repeats and -64.4 for 5 direct
Analyses of nucleotide sequence polyrnorphisms within repetitive arrays have often
revealed a pattern of conceaed evolution (Rand, 1994). Homogeneity of repeat sequences
within individuais but heterogeneity between individuals or different species is indicative of
this evolutionary proçess (see Solignac et ai., 1986; Wilkinson and Chapman, 1991;
Broughton and Dowling, 1994; Rand, 1994; Stewart and Baker, 1994a, 1994b; Yang et
d.,1994; Fumagalli et ai., 1996). Restriction site analyses from both the D.melmzogasrer
groups (Solignac et al., 1986) and the D, obscurs groups (Monforte et ai., 1993) have
revealed that the conserved and variable hdves of the AtT rich region evolve in concert as
a result of duplication and deletion events involving the two types of repeated elements
(Lewis et al., 1994). New polymorphisms are generated by either point mutation or
insenions/deletions, then swept through the contiguous array of repeats by repeated cycles
of expansion and contraction of the region leading eventually to the homogenization of
repeats (Fumagalli et ai., 1996).
The "edge effect" (Rand, 1994) in concerted evolution has been defiied to describe
the situation in which the repeats at the ends of a tandem m a y are more deviant from the
consensus repeat sequence than are repeats in the middle of the array. Moreover, repeats at
the 3' end of the array not only show higher divergence than repeats at the 5' end, but most
often possess a single "imperfect" copy or a highly divergent copy of the petiti ive element
at the 3' end of the array that rarely undergoes duplicationldeletionevents (Furnagaili et al.,
1996). This pattern of variation has been obsemed in the mtDNA of elephant seals (Hoelzel
et al., 1993). rabbit (Mignotte et ai., 1990), severai species of carnivores (Hoelzel et al.,
1994) and shrews (Fumagalli et al., 1996).
The orientation of the pea aphid tandem array is such that the truncated repeat is at
the end of the array, discal to the 12s rRNA gene. The three R1 size classes identified
(Barrette et al., 1994) differ oniy by increments of approximately 120 nt with no size class
differing by 88 nt (tnincnted repeat). Thus, this tmncated repeat may be the 3' terminai
repeat, suggesting that the region 1 repetitive array is the result of a replication slippage-like
mechanism originating during replication of the second strand. Although the two full
version repeats differ by a single Ac->G
transition and a single T insertioddeletion, it
will not be possible to address the issue of concerted evolution until a number of pea aphid
clonal lineages are sequenced for the A+T rich region. This sequence information will also
be valuable in assessing the Likelihood that a replication slippage-like process is the prirnaty
mechanism by which this npetitive m y was generated.
Further evidence favoring replication sLippage as the mechanism for generating the
size variation in pea aphid mtDNA is obtained from the secondary structure anaiysis of the
123 bp repetitive element (Figure 6 a,b). The formation of the secondary structures,
although not a prerequisite of slippage-like mechanisrns, enhances the rate at which this
process occurs in both plasmid and bacteriophage genomes (Pierce et ai., 1991; Trinh and
Sinden, 1991). The predicted s e c o n d q structure of the single 123 nt repeat is
thennodynamicaily quite stable and stability is more than doubled with two repeating
elements. Since the polymerization rate of mtDNA polymerase is extremely slow (-270
ntlminute) and DNA synthesis is delayed by numerous pauses (Kornberg, 1980), there
could be sufficient time for the repeat(s) to fold into a secondary conformation. If the
misalignment occurred in the nascent strand, slippage would resuit in duplication whereas a
deletion would result if the template strand repeat(s) folded. Two octanucleotide repeat
sequences, separated by 8 nt, are adjacent to the r3 repeat nearest the 12s rRNA gene
(Figure 3). This same octanucleotide, although absent from the remainder of the A+T rich
region, is present one time in each of the three repeats and could have served as anchor
points for slippage misalignment processes. Levinson and Gutman (1987) suggest that new
and longer repeat motifs can be generated from the mispairing of short contiguous or noncontiguous di-, tetra-, hexanucleotide repeat elements. Therefore, it is plausible that this 5'
TTAAAAAT 3' sequence could have been the onginal starting point from which larger
repeating units evolved until some threshold or optimum was reached, possibly based on
secondary structure, and the pea aphid 123 nt repeating unit w u generated.
Lastly, support for slippage-based models is provided by the fact that propagation
of some cloned tandemly repeated sequences in Escherichia coi has generated both losses
and gains of units in the tandem airay (Ghivizzani et al., 1993). Madsen and colleagues
(1993) analyzed a repeat domain present in the mitochondrial genome of pigs. This
domain, located at the 5' end of the D-loop, is composed of 14 to 29 copies of a 10 bp selfcomplementary tandemly repeated sequence. initiai characterization of a heteroplasmic
individual revealed that most variants differed in length by two (20 bp) repeating units.
However, after propagation of the recombinant phsrnids containing the repetitive domain in
E. coli, new variants differing in length by multiples of one (10 bp) repeat unit were
generated. To test the hypothesis that the variants were induced by siippage, Madsen et al.
(1993) performed a series of in vitro primer extension experiments on both single and
double stranded templates containing the repeat domain. They were able to correlate repeats
generated in vitro with those seen in both the mitochondria and bacteria, thus providing
strong evidence that replication slippage is responsible for a major class of mammalian
mutations (Madsen et al., 1993).
Restriction site analysis of pea aphid rntDNA reveded a second region of length
heterogeneity (Barrette et al., 1994). This region 2 length variation, localized to an area
spanning the ND3 and ND5 genes, was found to contain six size class variants
(1,2,3,4,5,6) differing by increments of 210 bp. Numerous attempts to PCR amplify
region 2 for cloning experiments using ND4 primers [#10 & #12] in combination with
CODI primers [#5 & #3] (Table 8; Barrette et d.,1994) were unsuccessFul. Barrette et al.
(1994) suggested that this may be due to the inversion of the Ml4 gene. Aithough this
possibility cannot be fully dismissed until this region is completely sequenced, partial
sequencing of a small portion of the ND4 gene amplified with primers #12 & #11 has
suggested it is in the same orientation as the ND4 gene of D. yakubn. Moreover, none of
the fully sequenced insect mtDNA genomes exhibit protein-coding gene order changes. AU
modifications with respect to gene order have been due to reshuffling a srnail number of
tRNA genes (Clary and Wolstenholme, 1985; Crozier and Crozier, 1993; Mitchell et al.,
1993; Beard et al., 1993).
Numerous attempts to amplify region 2 using the Expandn' PCR system
(Boehringer) and various primer combinations were also unsuccessiül. It is possible that
this failure is the result of intense secondary structure(s), possibly associated with the
repeating units, that prevent the Taq polyrnerase enzyme from proceeding unimpeded dong
the pea aphid DNA template.
An attempt was dso made to clone fragments of pea aphid mtDNA generated with
restriction enzymes. Plasmid recombinants containing other regions of the pea aphid
mtDNA molecule were obtained in these experiments. However, no inserts containing
region two were recovered. Sorne of the plsrnid recombinants containing inserts of the
size expected from restriction site rnapping were sequenced and found to be extremely A+T
rich, with numerous poly-A and poly-T runs (data not shown). A search of GenBank DNA
data base reveded no sequences with substantid similarity to the ones obtained from the
pea aphid clones. It is possible that some feature(s) of region two of the pea aphid mtDNA
molecule, such as the secondary structures suggested above, destabiiizes the fragment of
interest, rendering it unstable in recombinant plasmids md thus making it difficult, if not
impossible to clone.
The most plausible explanation for size variation within region two is the presence
of a tandernly repeated element of approximately 210 nt in length. The largest size class
(R2-6) will likely contain a minimum of six copies of the approximately 2 10 nt repeat motif
with each successively smailer size class containing one less repetitive unit. Each individual
unit will likely be capable of forming a stable self-annealed secondary structure, with
increases in the length of the tandem may leading to proportionally Iarger increases in the
thennodynamic stability of the array.
The distribution of mtDNA size variation is governed by three primary evolutionary
forces (mutation, selection and genetic drift) and c m be viewed as an equilibrium between
the forces that generate the variation (slippage or recombination) and the forces that reduce
it (selection and drift) (Rand, 1993). The observed patterns of A. pisum length variation
and heteroplasmy are considerably different between region one and region two (Barrette et
al., 1994). Virtually al1 pea aphid clonai lineages were homoplasmic for region one while
13 of 35 clones were heteroplasrnic at region two. Temporal stability of region one and
region two was assessed by isolating mtDNA from four heteroplasmic clones after a two
year intemal (Le., approximately 30 generations) of laboratory culture. Each of these
clones maintained their phenotype at region one and also sustained a stable pattern of
heteroplasmy at region two (Appendix Id; Barrette et al., 1994). However, scanning
densitometry measures of band intensity revealed a substantiai change in the relative
proportion of region two variants within one clonal Lineage.
The primary events involving the generation of length variation take place in the
femde germ line and involve mtDNA replication and the partitioning of mitochondria at
cytokinesis (Solignac et al., 1987). If at the time of mtDNA replication an error generates a
size change in repeat copy number, a rnixed or heteroplasmic population of size variants
will have been produced. Heteroplasmy is an obligatory transitory stage that is not
indefinitely maintained and can eventually lead to a homoplasmic ceii line that contains the
DNA mutation. The levels of heteroplasmy for length variants can be explained (Birky et
al., 1983) by mutation rates (insertions/deletions) relative to the mean time required to
elirninate the so-generated diversity rhrough vegetative segregation in the germ line. The
effective population size of mitochondria through the germ line is likely to be speciesspecific since it is affected by the number of celi divisions per generation and the number of
mitochondria transmitted to daughter cells (Rand and Harrison, 1986). Ln D. melanognster,
development of the germ line begins with the individuaiization of 10 to 18 pole celis with
two to three divisions giving rise to forty to sixty pole cells. It is believed that
approximately eight of these cells rnigrate to the presumptive gonads where they begin a
second round of three or four divisions leading to the stem cells of the ovarioles. The
number of stem ceils varies from one to five per ovariole. Hence, there are from seven to
nine random samplings of mitochondria from the original egg to the first offspring
(Solignac et ai., 1984, 1987) and therefore numerous opportunities to son out any
mutations that have occurred within the parent by stochastic partitioning.
MtDNA size variants are dso not usuaiiy viewed as neutrd (Hale and Singh, 1986;
Rand and Harrison, 1986; MacRae and Anderson, 1988; Wallace, 1989) and it has often
been argued that selection x t s to keep the molecule compact as h a been reported in D.
martritiann (Solignac et ai., 1984) and in the cricket G.finnus (Rand and Harrison, 1986).
Studies of the inheritance of heteroplasmic mtDNA from mother to offspnng suggest that in
animals, smaller rntDNAs can have an advantage in transmission over Longer rntDNAs
(Solignac et al., 1984, 1987; Rand and Harrison, 1986). Wild-type strains of D.
rnelnnogasrer exhibit a skewed distribution pattern of Iength variants in a manner
suggesting selection for srnaller mtDNAs (Haie and Singh, 1986, 199 1). Observations
such as these suggest that selection for small size has been an important force in the
evolution of the rnitochondrial genome (Wallace, 1982; Attardi, 1985; Rand and Harrison,
1986, 1989). in animal mtDNA replication, the synthesis of a 16 kb daughter strand takes
about one hour to accomplish (Clayton, 1982). This represents a polymerization rate of
approximately 270 nucleotides per minute. in a heteroplasmic cell, a mitochondrid genome
that is a few hundred nucleotides shorter than anocher mtDNA could have a temporal
advantage in a "race for replication" (Rand, 1993). Given that mtDNA molecules can be
selected at random throughout the ce11 cycle (Bogenhagen and Clayton, 1977), this could
effectively lead to an increase and eventual fixation of smaller mtDNAs in the ceil (Rand,
1993). Although the larger mtDNA may be at a temporai disadvantage in the replication
race with smaller mtDNA, selective advantages for larger mitochondriai genornes could
corne from additional repeated sequences that provide "attractive*'conformations for more
efficient binding of mtDNA polymerase (Rand, 1993). This could Iead to an advantage in
the initiation of replication that could, in turn, outweigh the disadvantage associated with
polymerizing additional nucieotides.
One explmation for the observed pattern of pea aphid mtDNA size heteroplasmy in
region two and its virtual absence in region one coupled with the observed temporal
changes detected in region two, is that region two is considenbly more mutationally active
than is region one. The mutation rate for indels h a been estimated at 1 0 4 for crickets
(Rand and Harrison, 1986) and 10-2 for bats (Wilkinson and Chapman, 1991). It would
be reasonable to assume that the mutation rate of pea aphid region two must be somewhere
in this range while the rate of region one would be considerably lower then the 10-4
estimated for crickets. Until the molecular basis for region two length variation is
characterized through DNA sequence anaiysis, one c m oniy postulate that some feature,
possibly one involving the conformationai state of the repetitive element, enables region
two repeat(s) to foId much more quickly or effectively then the region one repeats.
Nternatively, replication could be intempted somewhere within the region two domain
leavinp the template and nascent strands single stranded and the repeated sequences with
ampIe time to fold on themselves, thus providing the opportunity for the occurrence of
slippage events. Another explanation could involve the intensity of selection acting on the
length variants between the two regions. As discussed eulier, Rand (1993) has suggested
that additionai repeat sequences m y provide a more appealing conformationai binding site
for the mtDNA polymerase enzyme. This conformationai "attractiveness" could be
sufficient COooutcornpete the ili effects associated with increased mtDNA length in the race
for repkation. This could possibly explain the observation that the most prevalent size
class for boeh region one and two in A. pisilm mtDNA is not the smailest size variant but
the Iargesc and median, respectively.
This study has begun to characterize the intricacies of another insect mtDNA
genome. Although the arnount of pea aphid mtDNA sequence data is Limited, it is evident
that this genome is comparable in nature to the other fully sequenced insect mitochondrid
genomes. It appears to be very s W a r to Drosophila, Apis and Anopheles in not only gene
organization, but also A+T nucleotide bias. It is quite possible that the A+T content of the
pea aphid mtDNA genome will parallel or surpass that of the highest A+T content
measured CO date, the honeybee. The A+T rich region is the proposed control centre for
both transcription and replication events of insects. As detected in Drosophila, a T-rich
loop and hairpin structure, analogous to the ongin of iight strand replication origin of
mrimmals, is present in pea aphid. However, unlike the extensive studies characterizing the
mechanics of vertebrate control region replication and triiscription, Little work of this
nature has thus far been pursued within insects. Comparative analysis of various insect
A+T rich regions has shown little to no sequence similarity to vertebwtes. It has therefore
k e n suggested that the initiation signals and quite possibly the overall mechanics of
replicational and transcriptional events of insects, including those of A. pisum, are
difierent from those of vertebrates. Two length variable regions are present in pea aphid
mtDNA. Region one has been sequenced and is likely the result of a 123 nt tandem repeat.
However, this will only be confirmed once a number of region one size class variants have
been sequenced. Region two has thus far eluded al1 attempts at cloning and subsequent
Given the short generation time of pea aphids and the ease with which pea aphids
can be cultured, obtaining sufficient quantities of mtDNA is relatively simple. Therefore,
the pea aphid can be quite usefd as an insect mode1 system for characterizing the molecular
mechanics involved in not only transcription and replication, but aiso those involved in
generating length variation.
Aloni, Y., and G. Attardi. 1971. Syrnmetricd in vivo transcription of mitochondrial
DNA in HeLa celis. Proc. Natl. Acad. Sci. USA 68: 1957-1961.
Anderson, S., M.H. L. DeBruijn, A.R. Coulson, I.C. Eperon, F. Sanger,
and I.G. Young. 1982. Complete sequence of bovine mitochondd DNA. J. Mol.
Biol. 156: 683-717.
Arnason, U., and E. Johnsson. 1992. The complete mitochondrial DNA sequence
of the Harbour Seal, Phoca vitdina. J. Mol. Evol. 34: 493-505.
Arnason, E., and D.M. Rand. 1992. HeteropIasmy of short tandem repeats in
mitochondrial DNA of Atlantic cod, Gadus morhua. Genetics 132: 21 1-220.
Attardi, G. 1985. Animal mitochondriai DNA: an extreme example of genetic economy.
Int. Rev. Cytol. 93: 93-145.
Attardi, G., S.T. Crews, J. Nishigushi, D.K. Ojala, and J.W. Posakony.
1978. Nucleotide sequence of a Gagment of HeLa-ceIl mitochondrial DNA containing the
precisely localized origin of replication. Cold Spring Harbor Symp. Quant. Biol. 43: 179192.
Azevedo, J.L.B., and B.C. Hyman. 1993. Molecular characterization of lengthy
mitrichondrial DNA duplications from the parasitic nematode Romanomemis ndicivorar.
Genetics 133: 933-942.
Barrette, R.J., T.J. Crease, P.D.N. Hebert, and S. Via. 1994. Mitochondrial
DNA diversity in the pea aphid Acyrthosiphon pisum. Genome 37: 858-865.
Beard, C.B., D.M. Hamm, and F.H. Collins. 1993. The mitochondrial genome
of the mosquito Anopheles gambiae: DNA sequence, genome O-nization.
comparisons with mitochondrial sequences of other insects. Insect Mol. Biol. 2: 103-114.
Birky, C.W. 1991. Evolution and population genetics of organelle genes: mechanisms
and models. Pp. 112-134 in R.K.Selander, A.G. Clark, and T.S. Whittam, eds.
EvoIution at the molecular level. Sinauer, Sunderland, Mass.
Birky, C.W., T. Maruyama, and P. Fuerst. 1983. An approach to population and
evolutionary genetic theory for genes in mitochondria and chloroplast, and some results.
Genetics 103: 5 13-527.
Blackman, R.L., and V.S. Eastop. 1994. Aphids on the world's trees: an
identificationand information guide. CAB International, University Press, Cambridge.
Blasko, K., S.A. Kaplan, K.G. Higgins, R. Wolfson, and B.B. Sears.
1988. Variation in copy number of a 24-base pair tandem repeat in the chloroplast DNA of
Oenothera hookeri strain Johansen. Curr. Genet. 14: 287-292.
Bogenhagen, D., and D.A. Clayton. 1977. Mouse L ce11 mitochondrial DNA
moIecuIes are selected randomly for replication throughout the ce11 cycle. Ce11 11: 719727.
Boursot, P., H
m Yonekawa, and F. Bonhomme. 1987. Heteroplasmy in mice
with deletion of a large coding region of mitochondnal DNA. Mol. Biol. Evol. 4: 46-55.
Boyce, T.M., M.E. Zwick, and C.F. Aquadro. 1989. Mitochondrial DNA in the
bark weevils: size structure and heteroplasmy. Genetics 123: 825-836.
Broughton, EDED,
and T.E. Dowllng. 1994. Length variation in the mitochondrial
DNA of the minnow Cyprinella spilopsina. Genetics 138: 179-180.
Brown, G.G., a n d LDJm Des Rosiers. 1983. Rat mitochondrial DNA
polyrnorphisms: sequence analysis of a hypervariable site for insertions or deletions.
Nucleic Acids Res. 11: 6699-6708.
Brown, G.G., GD Gadaleta, G. Pepe, C. Saccone, and E. Sbisi. 1986.
Structural conservation and variation in the D-loop containing region of vertebrate
mitochondrial DNA. J. Mol. Biol. t 92: 503-5 11.
Brown, W.M. 1985. The mitochondriai genome of animais. in: MacIntyre R J . (ed)
Molecuiar evolutionary genetics. Plenum, New York, 95- 130.
Buroker, NmE.9 J.R. Brown, T.A. Gilbert, P.J. OYHara, A.T. Beckenbach,
W.K. Thomas, and M.J. Smith. 1990. Length heteroplasmy of sturgeon
mitochondrial DNA: an illegitimite elongation model. Genetics 124: 157- 163.
Calos, M.P., and J.H. Miller. 1980. Transposable elements. Ce11 20: 579-595.
Campbell, JmL. 1986. Eukaryotic DNA replication. Ann. Rev. Biochem. 55: 733-771.
Cantatore, P., M. Roberti, G. Rainaldi, M.N. Gadaleta, and C. Saccone.
1989. The complete nucleotide sequence, gene organization. and genetic code of the
mitochondriai genome of Pnracentrotcts Zividns. J. B iol. Chem. 264: 10965-10975.
Cantatore, P., and C. Saccone. 1987. Organization, structure and evolution of
mamrnalian mitochondnai genes. Int. Rev. Cytol. 108: 149-207.
Chang, D.D.9 and D.A. Clayton. 1984. Precise identification of individual
promoters for transcription of each strand of human mitochondrial DNA. Ce11 36: 635643.
Chang, D.D., and D.A. Clayton. 1985. Priming of human mitochondrial DNA
replication occurs at the Iight strand promoter. Proc. Natl. Acad. Sci. USA 82: 35 1-355.
Chang, D.D., and D.A. Clayton. 1987. A mamrnalian mitochondrial RNA
processing acùvity contains nucleus-encoded RNA. Science 235: 1178-1184.
Chang, D.D., W.W. Hauswirht, and D.A. Clayton. 1985. Replication priming
and transcription initiate from precisely the same site in mouse mitochondnai DNA. EMBO
J. 4: 1559-1567.
Chomyn, A., and G. Attardi. 1987. Mitochondrial gene products. Curr. Top.
Bioenerg. 15: 295-329.
Clary, D.O., and D.R. Wolstenholme. 1985. The rnitochondrial molecule of
Drosophila yakuba: nucleotide sequence, gene organization, and genetic code. 1. Mol.
Evol. 22: 252-271.
Clary, D.O., and D.R. Wolstenholme. 1987. Drosophila mitochondrial DNA :
conserved sequences in the A+T-rich region and supporting evidence for a secondary
structure mode1 of the small ribosomai RNA. J. Mol. Evol. 25: 116-125.
Clayton, D.A. 1982. Replication of animai mitochondriai DNA, Ce11 28: 693-705.
Clayton, D.A. 1984. Transcription of the marnmaiian mitochondrial genome. Ann. Rev.
Biochem. 53: 573-594.
Clayton, D.A. 199la. Replication and transcription of vertebrate rnitochondrial DNA
Ann. Rev. Cell. Biol. 7: 453-478.
Clayton, D.A. 199tb. Nuclear gadgets in rnitochondrial DNA replication and
transcription. Trends Biol. Sci. 16: 107-1 1 1,
Clayton, D.A. 1992. Transcription and replication of animal mitochondriai DNAs. int.
Rev. Cytol. 141: 2 17-232,
Coote, J.L., G. Szabados, and T.S. Work. 1979. The heterogeneity of
mitochondrial DNA in different tissues from the same animai. FEBS Lett. 99: 255-260.
Cornuet, J.M., L. Garnery, and M. Solignac. 1991. Putative origin and function
of the intergenic region between COI and COD of Apis mellifera L. mitochondriai DNA.
Genetics 133: 393-403.
Coté, J., and A.C. Ruiz-Carrillo. 1993. Prirners for mitochondriai DNA replication
generated by endonuclease G. Science 26 1: 765-769.
Crozier, R.H.,and Y.C. Crozier. 1993. The mitochondrial genome of the honeybee
Apis mellifera: cornplete sequence and genorne organization. Genetics 133: 97- 1 17.
Dawid, I.B., and A.W. Blackler. 1972. Materna1 and cytoplasmic inheritance of
mitochondriai DNA in Xenopus. Dev. Biol. 29: 152- 161.
de Bruijn, M.H.L. 1983. Drosophila melanogasrer rnitochondrial DNA, a novei
organization and genetic code. Nature 304: 234-241.
Delucia, A.L., D. Surnitra, K. Partin, and P. Tegtmeyer. 1986. Functionai
interactions of the simian virus 40 core origin of replication with flanking regulatory
sequences. J. Virol. 57: 138-144.
Desjardins, P., and P. Morais. 1990. Sequence and gene organization of the
chicken rnitochondrial genome. J. Mol. Biol. 2 12: 599-634.
Desjardins, P., and R. Morais. 1991. Nucleotide sequence and evolution of coding
and noncoding regions of a quail mitochondrial genome. J. Mol. Evol. 32: 1%- 16 1.
Dierks, P., A. van Ooyen, M.D. Cochran, C. Dobkin, J. Reiser, and C.
Weissmann. 1983. Three regions upstrearn from the cap site are required for efficient
and accunte transcription of the rabbit beta-globin gene in mouse 3T6 ceus. Ce11 32: 695706.
Dillon, M.C., and J.M. Wright. 1993. Nucleotide-sequence of the D-loop region of
the sperm whale (Physeter macrocephalis) mitochondrial genome. Mol. Biol. Evol. 10:
Doda, J.A.,- C.T. Wright, and D.A. Clayton. 1981. Elongaion of displacementloop strands in human and mouse mitochondrial DNA is arrested near specific template
sequences. Proc. Natl. Acad. Sci. USA 78: 6 116-6 120.
Dunon-Bluteau, D.C., and G.M. Brun. 1987. Mapping at the nucleotide level of
Xenopus laevis mitochondrial D-loop H strand: structurai features of the 3' region.
Biochem Int. 14: 643-657.
Efstradiadis, A., J.W. Posakony, T. Maniatis, R.M. Lawn, C. O'Connell,
R.A. Spritz, J.K. De Riel, B.G. Forget, S.W. Weismann, J.L. Sightom,
A.E. Blechel, O. Srnithies, F.E. Baralle, C.C. Shoulders, and N.J.
Proudfoot. 1980. The structure and evolution of the humai beta-globin gene family. Ce11
2 1: 653-668.
Fauron, C.M.R., and D.R. Wolstenholme. 1976. Structural heterogeneity of
mitochondrial DNA molecules within the genus Drosophila. Proc. Natl. Acad. Sci. USA
73: 3623-3627.
Fearnley, LM., and J.E. Walker. 1986. Two overlapping genes in bovine
rnitochondrial DNA encode membrane components of ATP synthme. EMBO J. 5: 30032006.
Fisher, C., and D.O.F. Skibinski. 1990. Sex-biased mitochondrial DNA
heteroplasmy in the marine musse1 Mytilics. Proc. R. Soc. Lond. 242: 149- 156.
Fisher, R.P., T. Lisowsky, M.A. Parisi, and D.A. Clayton. 1992. DNA
wrapping and bending by a rnitochondnal high mobility group-like transcriptionai activator
protein. J. Biol. Chem. 267: 3358-3367.
Fisher, R.P., J.N. Topper, and D.A. Clayton. 1987. Promoter selection in
human mitochondria involves binding of a transcription factor to orientation-independent
upsueam regulatory elements. Ce11 50: 247-258.
Fumagalli, L., P. Taberlet, L. Favre, and J. Hausser. 1996. Origin and
evolution of homologous repeated sequences in the mitochondriai DNA control region of
shrews. Mol. Biol. Evol. 13: 3 1-46.
GadaIeta, G., G. Pepe, G. DeCandia, C. Quagliariello, E. Sbisa, and C.
Saccone. 1989. The complete nucleotide sequence of the Rat~isnorvegicus ~tochondrial
genome: cryptic signais revealed by comparative analysis between vertebrates. J. Mol.
Evol. 28: 497-516.
Gemmell, N.J., P.S. Western, J.M. Watson, and J.A. Marshall Graves.
1996. Evolution of the mammalim mitochondtial control region-cornparisons of the
control region sequences between monotreme and therian mammds. Mol. Biol. Evol. 13:
Ghivizzani, S.C., S.L.D. Mackay, C.S. Madsen, P.J. Laipis, and W.W.
Hauswirth. 1993. Trmscribed heteroplasmic repeated sequences in the porcine
mitochondnal DNA D-loop region. J. Mol. Evol., 37: 3647.
Gilbert, DG. 1990. LoopViewer, a Macintosh pmgram for visualizing RNA secondstructure. Published electronically on the Intemet, available via anonymous ftp to
Gjetvaj, B., D.I. Cook, and E. Zouros. 1992. Repeated sequences and large-scale
size variation of mitochondrial DNA: a common feature among scallops (Bivalvia:
Pectinidae). Mol Biol. Evol. 9: 106-124.
Goddard, J.M., and D.R. Wolstenholrne. 1978. Origin and direction of replication
in mitochondrial DNA molecules from Drosophila melanogaster. Proc. Natl. Acad. Sci.
USA 76: 3886-3890.
Goddard, J.M., and DaRo Wolstenholme. 1980. Origin and direction of replication
in mitochondrial DNA molecules from the genus Drosophiln. Nucleic Acids Res. 8: 7417C7
Greenberg, B.D., J.E. Newbold, and A. Sugino. 1983. Intenpecific nucleotide
sequence variability sunounding the origin of replication in human mitochondrial DNA.
Gene 2 1: 33-49.
Gyllensten, U., D. Wharton, A. Josefsson, and A.C. Wilson. 1991. Paternal
inhentance of mitochondrial DNA in mice. Nature 352: 255-257.
Hale, L.R.? and R.S. Singh. 1986. Extensive variation and heteroplasmy in size or
mitochondnal DNA among geographic populations of Drosophiia melanogaster. Proc.
Natl. Acad. Sci. USA 83: 88 13-88 17.
Hale, L.R., and R.S. Singh. L99L. A comprehensive study of genetic variation in
natural populations of Drosophila melanogaster,IV.Mitochondrial DNA variation and the
role of history vs selection in the genetic structure of geographic populations. Genetics
129: 103- 117.
Hauswirth, W.W., and P.J. Laipis. 1985. Transmission genetics of mmmalian
mitochondria: a molecular mode1 and experimental evidence. 4 49-59 in Quagliarello E..
Slater E.C.,F. Palmieri, C. Saccone, and A.M. Kroon (eds) Acheivements and
perspectives of mitochondrial reseûrch, Vol II: Biogenesis. Elsevier Science Publishen,
New York.
Hauswirth, W.W., M.J. Van de Walle, P.J. Laipis, and P.D. Olivo. 1984.
Heterogeneous rnitochondriai D-loop sequences in bovine tissue. Ce11 37: 1001-1007.
Hayasaka, K., T. Ishida, and S. Horai. 1991. Heteroplasrny and polymorphism in
the major noncoding region of mitochondrial DNA in Japanese monkey: association with
tandemly repeated sequences. Mol. Biol. Evol. 8: 399-4 15.
Hayashi, J.I., 1. Tagashira, M.C. Yoshida. 1985. Absence of extensive
recombination between inter- and intra-species mitochondnal DNA in mammûlian cells.
Exp. Cell. Res. 160: 387-395.
Heddi, A., P. Lestienne, D.C. Wallace, and G. Stepien. 1994. Steady state
levels of mitochondrial and nuclear oxidative phosphorylation transcripts in Keams-Sayre
syndrome. Biochim. Biophys. Acta. 1226: 206-2 12.
Heie, O.E. 1981. Morphology and phylogeny of some Mesozoic aphids (Insecta:
Hemiptera). Entomol. Scand. 15: 40 1-415.
Heie, O.E. 1987. Palaeontology and phylogeny. Pp. 367-391 Ni Minks, A.K., and P.
Harrewijn eds. 1987. Aphids, their biology, naturd enemîes, and conuol, World crop
pests. Vol. 2A, Amsterdam; Elsevier.
Hess, J.F., M.A. Parisi, J.L. Bennett, and D.A. Clayton. 1991. Impairment
of mitochondrial transcription tecmination by a point mutation associated with the MELAS
subgroup of mitochondrial encephalomyopathies. Nature 35 1: 236-239.
Hixsoo, J.E., and W.M. Brown. 1986. A cornparison of the small ribosomal RNA
genes from the mitochondrial DNA of the great apes and humans: sequence structure,
evolution, and phylogenetic implications. Mol. BioL Evol. 3: 1- 18.
Hixson, J.E., and D.A. Clayton. 1985. Initiation of transcription from each of the
two human mitochondnai promoters requires unique nucleotides at the transcriptional start
sites. Proc. Natl. Acad. Sci. USA 82: 2660-2664.
Hoeh, W.R., K. A. Blakley, and W.M. Brown. 1991. Heteroplasmy suggest
lirnited biparental inheritance of Mytihs mitochondrial DNA. Science 25 1: 1488-L490.
Hoelzel, A.R., J.M. Hancock, and G A . Dover. 1991. Evolution of the cetacean
mitochondrial D-loop region. Mol. Biol. Evol. 8: 475-493.
Hoelzel, A.R., J.M. Hancock, and G.A. Dover. 1993. Generation of VNTRs
and heteroplasmy by sequence tumover in the mitochondrial control region of two elephant
seals. J. Mol- Evol. 37: 190497.
Hoelzel, A.R., J.V. Lopez, G.A. Dover, and S.J. O'Brian. 1994. Rapid
evolution of a heteroplasmic repetitive sequence in the mitochondrial DNA control region
of carnivores. J. Mol. Evol. 39: 191-199.
Hoffman, R.J., and W.M. Brown. 1992. A novel mitochondrial genome
organization from the blue musse1 Mytilus edulis. Genetics 131: 397-4 12.
Holt, I.J., A.E. Harding, and J.A. Morgan Hughes. 1988. Deletions of muscle
mitochondriai DNA in patients with rnitochondnal myopathies. Nature 33 1:7 17-7 19.
Hutchison, C.A.,III, J.E. Newbold, S.S. Potter, and M.H. Edgell. 1974.
Matemal inheritance of m m a l i a n mitochondnal DNA.Nature (London)25 1: 536-538.
Hyman, B.C., and J.L. Beck Azevedo. 1996, Similai- evolutionary patterning
among repeated and single copy nematode rnitochondriai genes. Mol. Biol. Evol. 13: 22 1232.
Hyman, B.C., and T.M. SIater. 1990. Recent appearance and molecular
characterization of mitochondrial DNA deletions within a defined nematode pedigree.
Genetics 124: 845-853.
Jacobs, B.T., D.J. Elliot, J.B. Math, and A. Farquharson. 1988. Nucleotide
sequence and gene organization of sea urcbin mitochondrid D M AJ.. Mol. Biol. 202: 185- 1 7
Jaeger, J.A., D.H. Turner, and M. Zuker. 1989a. Emproved predictions of
secondary structures for RNA. Proc. Natl. Acad. Sci. USA 86: 7706-7710.
Jaeger, J.A., D.H. Turner, and M. Zuker. 1989b. Predicting optimal and
suboptirnal secondary structure for RNA. in "MoIecuIar evolution: cornputer anaiysis of
protein and nucleic acid sequences", R.F. Doolittie ed. Methods in Enzymology 183: 28 1306.
Jermiin, L.S., D. Graur, R.M. Lowe, and R.H. Crozier. 1994, Analysis and
directional mutation pressure and nucleotide content in mitochondrial cytochrome b genes.
J. Mol. Evo~.39: 160-173.
Johansen, S., P.H. Guddal, and T. Johansen. 1990. Organization of the
rnitochondriai genome of Atlantic cod, Godus murhua. Nucleic Acids Res. 18: 4 L 1-419.
Johnson, W.G. 1899. The pea louse, a new and important econornic species of the
genus Nectarophora. Sci. Am. 8 1: 325-326.
Jukes, T,H., and V. Bhushan. 1986. Silent nucleotide substitution and G+C content
of some rnitochondrial and bacteriai genes. I. Mol, Evol. 24: 39-44.
Kornberg, A. 1980. DNA replication, W.H.Freeman, Sm Francisco.
Kruse, B., N. Narasimhan, and G. Attardi. L989. Temination of transcription in
hurnan rnitochondria: identification and purification of a DNA binding protein factor that
promotes termination. Ce11 58: 391-397.
Kunkel, T.A. 1985. The mutational specificity of DNA polymerase-alpha and gamma
during in vitro DNA synthesis. I. Biol. Chem. 260: 12866-12874.
Kunkel, T.A., and D.W. Mosbaugh. 1989. ExonucIeolytic proofreading by a
mamrnalian DNA polymerase gamma. Biochemistry 28: 988-995.
Lamb, R.J., and P.J. Pointing. 1972. Sexuai morph determination in the aphid,
Acyrthosiphon pisum. J. insect Physiol. 18: 2029-2042.
La Roche, J., M. Snyder, D.I. Cook, K. Fuller, and E. Zouros. 1990.
Molecular characterization of a repeat element causing large-scde size variation in the
mitochondnal DNA of the deep-sea scaiiop Placopecren rnagellanicus. Mol. Biol. Evol. 7:
Levinson, G., and C.A. Gutman. 1987. Slipped-strand rnispairing: a major
mechanism for DNA sequence evolution. Mol. Biol. Evol. 4: 203-22 1.
Lewis, D.L., C.L. Farr, A.L. Farquhar, and L.S. Kaguni. 1994. Sequence,
organization, and evolution of the A+T region of Drosophila melanogaster mitochondnal
DNA. Mol. Biol. Evol. 11: 523-538.
Lewis, D.L., C.L. Farr, and L.S. Kaguni. 1995. Drosophila melanogaster
mitochondrial DNA: completion of the nucleotide sequence and evolutionary comparisons.
Ins. Mol. Bio. 4: 263-278.
Low, R.L., J.M. Buzan, and C.L. Couper. 1987. The preference of the
mitochondrid endonuclease for a conserved sequence block in mitochondrial DNA is
conserved during mammalian evolution. Nucleic Acids Res. 14: 6427-6445.
Low, R.L., O.W. Cummings, and T.C. King. 1988. The bovine mitochondrid
endonuclease prefers a conserved sequence in the displacement loop repion of
mitochondrial DNA. J. Biol. Chem. 262: 16146-16 170.
Lunt, D.A. and B.C. Eyman. 1997. Animal mitochondrial DNA recornbination.
Nature 387: 247-247.
Mackay, S.L.D., P.D. Olivo, P.J. Laipis, and W.W. Eauswirth. 1986.
Template-directed arrest of mammalian mitochondrial DNA synthesis. Mol. Ceil. Biol. 6:
MacRae, A.F., and W.W. Anderson. 1988. Evidence for non-neutrality of
mitochondrial DNA haplotypes in Drosophila pseudoobscurn. Genetics 120: 485-494Madsen, T.S., S.C. Ghizziani, and W.W. Hauswirth. 1993. In vivo and in
vitro evidence for slipped-strand mispairing in rnammalian mitochondria. Proc. Natl.
Acad. Sci. USA 90: 767 1-7675.
Martens, P.A., and D.A. Clayton. 1979. Mechanisms of mitochondrial DNA
replication in mouse L-cells: localization and sequence of the light strand origin of
replication. J. Mol. Biol. 135: 327-35L.
Martinez, D., A. Moya, A. Latorre, and A. Fereres. 1992. Mitochondrial DNA
variation in Rhopalosiphum padi (Homoptera: Aphididae) populations from four Spanish
localities. Ann. Entomol. Soc. America 85: 241-246.
Martinez-Torres, D., J.C. Simon, A. Fereres, and A. Moya. 1996. Genetic
variation in naturd populations of the aphid Rhpalosiph~impadi as revealed by maternally
inherited markers. Mol. Ecol. 5: 659-670.
McKnight, S., and R. Tjian. 1986. Transcriptional selectivity of viral genes in
mammalian celis. Cell46: 795-805.
Mignotte, E., M. Barat,and J.C. Mounolou. 1985. Characterization of a
mitochondrial protein binding to single-stranded DNA. Nucleic Acids Res. 13: 17031716.
Mignotte, E., M. Gueride, A.M. Champagne, and J.C. Mounolou. 1990.
Direct repeats in the non-coding region of rabbit mitochondrial DNA. Eur. J. Biochem.
194: 56 1-57 1.
Mitchell, S.E., A.F. Cockburn, and J.A. Seawright. 1993. The mitochondrid
genome of Anopheies quudnmacukrtirs species A: cornplete nucleotide sequence and gene
organization. Genome 36: 1058-1073.
Monforte, A., E. Barrio, and A. Latorre. 1993. Characterization of the length
polymorphism in the A+T-rich region of the Drosophila obscura group species. J- Mol.
EVOL36: 214-223.
Monnerot, M., M. Solignac, and D.R. Wolstenholme. 1990. Discrepancy in
divergence of the mitochondrial and nuclear genomes of Drosophila teissieri and
Drosophiln yakribu. J. Mol. Evol. 30: 500-508.
Montoya, J., T. Christianson, D. Levens, M. Rabinowitz, and G. Attardi.
1982. Identification of initiation sites for heavy strand and light strand transcription in
human rnitochondrial DNA. Proc. Natl. Acad. Sci. USA. 79: 7195-7 199.
Montoya, J., G. Gains, and G. Attardi. 1983. The pattern of transcription of the
human mitochondrïai rRNA genes reveais two overlapping transcription units. CeU 34:
Moran, N.A. 1992. The evolution of aphid life cycles. Ann. Rev. Ento. 37: 321-348.
Moritz, C. 1991. Evolutionary dynarnics of mitochondrial DNA duplications in
parthenogenetic geckos, Heteronoria binoei. Genetics 129: 22 1-230.
Moritz, C., and W.M. Brown. 1986. Tandem duplications of D-loop and ribosomal
RNA sequences in lizard tnitochondnai DNA. Science 233: 1425-1427.
Moritz. C., and W.M. Brown. 1987. Tandem durikations in animal mitochondrial
DNAs:.variation in incidence and gene content. ~rÔc.Natl. Acad. Sci. USA 84: 71837187.
Moritz, C., T.E. Dowling, and W.M. Brown. 1987. Evolution of animal
mitochondrial DNA: relevance for population biology and systematics. Ann. Rev. Ecol.
Syst. 18: 269-292.
Neefs, J.M., Y. Pui, D.R. Van de Peer, S. Chapelle, and R. De Wachter.
1993. Compilation of smail nbosomai subunit RNA structures. Nucleic Acids Res. 21:
Ojala, D.?and G. Attardi. 1977. A detailed physicai map of HeLa cell rnitochondrial
DNA and its alignment with the positions of known genetic markers. Plasmid 1: 78-105.
Ojala, D., C. Merkel, R. Gelfand, and G. Attardi. 1980. The tRNA genes
punctuate the reading of genetic information in human mitochondrial DNA. CeIl 22: 393403.
Ojala, D., J. Montoya, and G. Attardi. 1981. &NA punctuation mode1 of RNA
processing in human rnitochondria. Nature 290: 470-474.
Okiomoto, R., H.M. Chamberlin, J.L. Macfarlane, and D.R.
Wolstenholme. 1991. Repeated sequence sets in mitochondrial DNA molecules of root
knot nematodes (Meioidogyne): nucleotide sequences, genome location and potential for
host-race identification. Nucleic Acids Res. 19: 1619-1626.
Okiornoto, R.? J.L. Macfarlane, D.O. Clary, and D.R. Wolstenholme. 1992.
The mitochondrial gemmes of two nematodes, Caenorhabditis elegans and Ascaris siuim.
Genetics 130: 47 1-498.
M.W., and L.S. Kaguni. 1992. 3'-5' exonuclease in Drosophiiu
mitochondrial DNA poiymerase: substrate specificity and functional coordination of
nudeocide polymerization and mispair hydrolysis. J, Biol. Chem. 267: 23 136-23142.
Pardue, M.L., J.M. Fostel, and T.R. Cech. 1984. DNA-protein interactions in
the Drosopiiiia virilis mitochondrial chromosome. Nucleic Acids Res. 12: 1991-1999.
Parisi, M.A., and D.A. Clayton. 1991. Similarity of human mitochondrial
transcription factor 1 to high mobility group proteins. Science 252: 965-969.
Pierce, J.C., D. Kong, and W. Masker. 1991. The effect of the iength of direct
repeats and ~ h epresence of palindromes on deletion between directly repeated DNA
sequences in bacteriophage T7.Nucleic Acids Res. 19: 3901-3905.
Pissios, P., and Z.G. Scouras. 1993. Mitochondrial DNA evolution in the
montirirn-species subgroup of Drosophila. Mol. Biol. Evol. IO: 375-382.
Potter, D.A., J.M. Fostel, M. Berninger, M.L. Pardue, and T.R. Cech.
L980. DNA-protein interactions in the Drosophiiu rnelanogaster mitochondrial genome as
deduced from trimethylpsoraien crosslinking patterns. Proc. Natl. Acad. Sci. USA 77:
41 18-4122.
Potter, S.S., J.E. Newbold, C.A. Hutchison III, and M.H. Edgell. 1975.
Specific cleavage andysis of mammalian mitochondrial DNA. Proc. Natl. Acad. Sci. USA
72: 4496-4500.
Powers, T.O., S.G. Jensen, S.D. Kindler, C.J. Stryker, and L.J. Sandall.
1989. Mitochondrial DNA divergence among greenbug (Homoptera: Aphididae) biotypes.
Ann. Entomol. Soc. 82: 298-302.
Rand, D.M. 1993. Endotherms, ecotherms, and mitochondrial genome-size variation. J.
Mol. Evol. 37: 281-295.
Rand, D.M. 1994. Concerted evolution and RAPing in mitochondrial VNTRs and the
rnoiecular geography of cricket populations. Pp. 227-245 in B. Schierwater, B. Streit,
G.P. Wagner, and R. DeSdle, eds. Molecular ecology and evolution: approaches and
applications. Birkhauser Verlag, Basel.
Rand, D.M., and R.G. Harrison. 1986. Mitochondriai DNA transmission genetics
in crickets. Genetics 114: 955-970.
Rand, D.M., and R.G. Harrison. 1989. Molecular population genetics of mtDNA
size variation in crickets. Genetics 121: 551-569.
Reebeck, G.W., and L. Samson. 1991. lncreased spontaneous mutation and
aikylation sensitivity of Escherichia cuti strains lacking the ogr 06-methylguanine DNA
repair methyitransferase. J. Bacteriol. 173: 2068-2076.
Reilly, J.G. and G.A. Jr Thomas. 1980. Length polymorphism, resviction site
variation and matemal inheritance of mitochonclrid DNA of Drosophilcl melanogmfer.
Plasmid 3: L09-115.
Robberson, D.L., D.A. Clayton, and J.F. Morrow. 1974. Cteavage of
replicating forms of mitochondrid DNA by EcoRi nuclease. Proc. Natl. Acad. Sci. USA
7 1: 4447-445 1.
Roe, B.A., D.P. Ma, R.K. Wilson, and J.F. Wong. 1985. The complete
nucleotide sequence of the Xenoprrs lnevis mitochondrial genome. J . Biol. Chem. 260:
Saccone, C.G., M. Attimonelli, and E. Sbisà. 1987. Stmctural elements highly
presewed during the evolution of the û-loop containhg region in vertebnte mitochondrial
DNA. J. Mol Evol. 26: 205-2 1L.
Saccone, C.G., G . Pesole, and E. Sbisà. 1991. The main regulatory region of
marnrnalian mitochondrial DDN: structure-function mode1 of evolutionary pattern. J. Mol.
Evo~.33: 83-91.
Schalier, H. 1978. The intergenic region and the origin of filamentous phage DNA
replication. Cold Springs Harbor Symp. Quant Biol. 43: 40 i -408.
Shoulbridge, E.A., G. Karpati, and K.E.M. Hastings. 1990. Deletion mutants
are functiondly dominant over wild-type mitochondrial genomes in skeletal muscle fibre
segments in rnitochondrial disease. Ce11 62: 43-49.
Snyder, M., A.R. Fraser, J. La Roche, K.E. Gartner-Kepkay, and E.
Zouras. 1987. Atypicd rnitochondrial DNA from the deep-sea scallop Placopecren
mngellnnicus. Proc. Natl. Acad. Sci. USA 84: 7595-7599.
Solignac, M.? J. Genermont, M. Monnerot, and J.C. Mounolou. 1984.
Genetics of mtochondria in Drosopizila: inheritance in heteroplasmic strains of D.
nîauritiana. Mol. Gen. Genet. 197: 183- 188.
Solignac, Mr, J. Genermont, M. Monnerot, and J.C. Mounolou. 1987.
Drusuphila mtochondrial genetics: EvoIution of heteroplasmy through germ line ce11
division. Genetics 117: 687-696.
Solignac, M., M. Monnerot, and J.C. Mounolou. 1986. Concerted evolution of
sequence repeats in Drosophila mitochondrial DNA. J. Mol. Evol. 24: 53-60.
Southern, S.O., P.J. Southern, and A.E. Dizon. 1988. Molecular
characterization of a cloned dolphin mitochondrial genome. J. Mol. Evol. 28: 32-42.
Stewart, D.T., and A.J. Baker. 1994a- Patterns of sequence variation in the
mitochondrial D-loop region of shrews. Mol. Biol. Evol. 11: 9-21.
Stewart, D.T., and A.J. Baker. 1994b. Evolution of rntDNA D-loop sequences and
their use in phylogenetic studies of shrews in the subgenus Otisorex (Sorex: Soricidae:
Insectivora). Mol. Phylogenet. Evol. 3: 3846.
Streisingner, G., Y. Okada, L. Emrich, J. Newton, A. Tsugita, E.
Terzaghi, and M. Inouye. 1966. Frameshift mutations and the genetic code. Cold
Spring Harbor Symp. Quant. Biol. 3 1: 77-84.
Takahata, N., and T. Maruyama. 1981. A mathematicai mode1 of extra-nuclear
genes and the genetic variabîiity rnaintained in a finite population. Genet. Res. 37: 29 1302.
Tapper, D.P., and D.A. Claytoo. 1981. Mechanism of repIication of human
mitochondriai DNA. Location of the 5' ends of nascent daughter strands. J. Biol. Chem.
256: 5 109-5115.
Tautz, D., M. Trick, and G A . Dover. 1986. Cryptic sirnplicity in DNA is a major
source of genetic variation. Nature 322: 652-656.
Tijan, R. 1978. Protein-DNA interactions at the origin of replication of sirnian virus 40
DNA replication. Cold Springs Harbor Symp. Quant Biol. 43: 655-662.
Trinh, T.K., and R.R. Sinden. 1991. Preferential DNA secondary structure
mutagenesis in the lagging Strand of replication in E. coli. Nature 352: 544-547.
Tullo, A., W. Rossmanith, E.M. Imre, E. Sbisa, C. Saccone, and R.
Karwan. 1994. RNase rnitochondrid RNA processing cleaves RNA from the rat
mitochondrial displacement loop at the origin of heavy-strand replication. Eur. .J.
Biochem. 227: 657-662.
Upholt, W.B., and I.B. Dawid. 1977. Mapping of mitochondrial DNA of individual
sheep and goats: rapid evolution in the D-loop region. Cell Il: 571-583.
Valverde, J.R., B. Batuecas, C. Moratilla, R. Marco, and R. Garesse.
1994. The complete mitochondriai DNA sequence of the crustacean Artemia franciscana.
J. Mol. Evo~.39: 400-408.
Vestweber, D., and G. Schatz. 1989. DNA-protein conjugates can enter
mitochondria via the protein import pathway. Nature 338: 170- 172.
Via, S. 1991. Specialized host plant performance of pea aphid clones is not altered by
experience. Ecology 72: 1420-1427.
Walberg, W.M., and D.A. Clayton. 1981. Sequence and properties of human KB
ce11 and the rnouse L ce11 D-loop regions of mitochondrial DNA. Nucleic Acids Res. 9:
54 11-542 1.
Wallace, D.C. 1982. Stmcture and evolution of organelle genomes. Microbiol. Rev. 46:
Wallace, D.C. 1989. Mitochondrial DNA mutations and neuromuscular disease. Trends
Genet. 5: 9-13.
Wallis, G.P. 1987. Mitochondrid DDNA insertion polymorphism and germ line
heteroplasmy in the Tntunrs crisratus cornplex. Heredity 58: 229-238.
Warrior, R., and J. Gall. 1985. The mitochondrial DNA of Hydraattenttata and
Hydra littoralis consists of 2 linear molecules. Arch. Sci. Geneva 38: 439-445.
Watson, J.D., N.H. Hopkins, J.W. Roberts, J.A. Steitz, and A.M.
Weiner. 1987. Pp. 346-347 in Moleculas biology of the gene, Ed. 4, vol 1.
Benjamin/Cummings, Menlo Park, California.
Wernette, C.M., M.C. Conway, and L.S. Kaguni. 1988. Mitochondrial
polyrnerase frorn Drosophila melanogaster embryos: kinetics, processivity, and fidelity of
DNA polymerization. Biochemistry 27: 6046-6054.
Wilkinson, G.S. and A.M. Chapman. 1991. Length and sequence variation in
evening bat D-loop mtDNA. Genetics 128: 607-617.
Wolfson, R., K.G. Higgins, and B.B. Sears. 1991. Evidence of replication
slippage in the evolution of Oenothera chioropIast DNA. Mol. Biol. Evol. 8: 709-720Wolstenholme, D.R- 1992. Animal mitochondrial DNA: structure and evolution. [nt.
Rev. Cytol. 141: 173-216.
Wolstenholme, D.R., and D.O. Clary. 1985. Sequence evolution of Drosophila
mitochondrial DNA. Genetics 109: 725-744.
Wolstenholme, D.R., J.M. Goddard, and M.R. Fauron. 1979. Replication of
Drosophila mitochondrial DNA. Pp. 131-148 in Y. Becker, ed. Replication of viral and
cellular genomes. Martinus Nijhoff,Boston.
Wolstenholme, D.R., J.M. Goddard, and M.R. Fauron. 1983. Replication of
Drosophila mitochondrial DNA. Pp. 13L-148 in Y. Becker, ed. Replication of viral and
cellular genomes. Martinus Nijhoff, Boston
Wong, J.F.H., D.P. Ma, R.K. Wilson, and B.A. Roe. 1983. DNA sequence of
Xenopiis laevis mitochondrial heavy and Iight strand replication ongins and flanking tRNA
genes. Nucleic Acids Res. 11: 49774995.
Wong, T.W., and D.A. Clayton. 1985. In vitra replication of human mitochondrial
DNA: accurate initiation at the origin of light-strand synthesis. Ce11 42: 95 1-958.
Yang, Y*G*y Y.S. Lin, J.L. W u , and C.F. Hul. 1994. Variation in mitochondrial
DNA and population structure of the Taipei treefog Rhacophonis taipeiantts in Taiwan.
MOL.EcoI. 3: 219-228.
Zevering, C.E., C. Moritz, A. Heideman, and R.A. Sturm. 1991. Parallel
origins of duplication and the formation of pseudogenes in mitochondrial DNA from
parthenogenetic lizards (Heteronotiabinoei: Gekkonidae). J . Mol Evol. 33: 43 1-441.
Zuker, M., and P. Steiegler. 1981. Optimal computer folding of large RNA
sequences using thermodynamics and awriliary information. Nucleic Acids Res. 9: 133148.
Appendix Id: Analysis of the teniporül stübility of length variünts for regions 1 & 2 in four clones of A. pisiti~ifrom ülfülfü ünülyzed in
1990 and again in 1992 after approxiniotely 30 generütions of pünhenogenetic reproduction (Tüble 6, Barrette et al., 1994)
Region 1 size class
Region 2 size clriss (relative
2 (0.45)
2 (0.31)
3 (0.55)
3 (0.69)
4 (0.60)
4 (O. 16)
5 (0.22)
5 (0.65)
6 (0.18)
6 (O. 19)
1 (0.47)
4 (0.53)
1 (0.46)
2 (0.57)
3 (0.43)
4 (0.54)
2 (0.52)
3 (0.48)
Note: Clone E85 was only analyzed with the enzynie Th11 and is not included in the other nnulysis. Length vüriiini 6 in region 2 wus
observed only in this clone
Appendix le. Restriction map and gene order in A. pisum. The intemal gene order
rnap is that of D.yakuba (Clary and Wolstenholme 1985). Genes for TRNA are
hatched. Protein encoding genes are denoted COI, COD, and COm for the genes
encoding subunits 1.2, and 3 of cytochrome c oxidase, Cyt b for the cytochrome b
gene, and ND 1 -6 and ND4L for the genes encoding subunits 1 to 6 and 4L of the
NADH dehydrogenase system. The AT-rich region is denoted "A + T." The genes
encoding the smail and large subunits of tibosomai RNA are denoted srRNA and
IrRNA, respectively. The position of the PCR prirners that have been used
successfully on A. pisum and their direction of extension is indicated on the gene
order map by the solid flags. The primer numbers correspond to those in Table 8.
The hatched bars on the restriction map represent regions of hybridization between
total A. pisum rntDNA and the DIG-labelled PCR fragments amplified from
A. pisum. The dotted lines connect the ends of the bars to the primers used to
generate the probe fragments. Length variable regions (regions 1 and 2) are
indicated on the extemd A. pisum restriction site map by open bars. The solid bars
on the restriction map represent regions of the A. pisrcm mtDNA that we have been
unable to [email protected] with any available primen. Letter codes for the restriction sites
mapped in A. pisum are as follows: Ac, Accl; Av, Aval; Bc, Bcll; Bg, BglIl; Ec,
EcoRl; Hi, Hindlll; Ps, Psrl; Rs, Rsal; Ss, Ssrl; Ta, Taql; Xb,Xbal.
(Reproduced from Barrette et al., 1994)
Appendix 2: Single-letter amino acid code designation used in codon usage Table 8.
A - Alanine
E -Glutamic acid
H -- Histidine
L - Leucine
P --- Proline
S - Serine
W - Tryptophane
C -Cysteine
F - Phenylalanine
1 - Isoleucine
M -Methionine
Q -Glutamine
T -Threonine
V - Valine
D - Aspartic acid
G -Glycine
K -- Lysine
N- Asparagine
R - Arginine
Y -- Tyrosine
A. me11
A- quad
D* yak
A. pimm
A. me11
A. quad
D* yak
A. pisum
A. me11
A. quad
D- yak
A. pisum
A. me11
A. quad
D- yak
A. piçum
A. quad
D. yak
A. pisum
A. quad
D. yak
A. pisum
Appendix 3a. DNA sequence alignment used in the comparative andysis of the 12s
rRNA partial gene sequence for the four insects. The abbreviations are as follows: A.
melI, A. mellifera; A. quad, A. quadrinzaculatus; D. yak,D. yakuba.
A. quad
D -yak
A. quad
A. quad
Appendix 3b. DNA sequence alignment used in the comparative anaiysis of the 16s rRNA
partial gene sequence for the four insects. The abbreviations are as foiiows: A. mell, A.
meIlifera; A. quad, A. q~madrimaculntris;D. yak, D. yakriba.
A. me11
A. quad
D- yak
A. pisum
C T C A T C A m T A T l T A C A ~ T A G A ~ m C A ~ m ~ A C T A T A A T T A
Appendix 3c. DNA sequence alignment used in the comparative analysis of the COI gene
sequence for the four insects. The abbreviations are as follows: A. mell, A. mellifera; A.
quad. A. qundrimacrtlatus; D. yak, D.yakzïba.
A. quad
A. quad
D .yak
A. pisum
Appendix 3d. DNA sequence alignment used in the comparative analysis of the ND4
gene sequence for the four insects. The abbreviations are as follows: A. mell, A.
rnellifern; A. quad, A. q~irlrrimmtlnrrïs;D. yak, D.yakuba.
Appendix 4a. Possible mechanisms of recombination generating length variation in cricket
mtDNA. Bold lines represent repeated DNA. Numbers or letters serve as Imdmarks with
which to identify ends of different repeats. A bold line running perpendicular to repeats
indicates the site of recombination. A, Intraniolecular recombination; B, intemoLecular
recombination. These are meant to serve as examples; other intermediates and products
could be drawn. Figure and Iegend reproduced from Rand and Harrison, 1989.
.....trtCTTTAT.... CTTTAT..
Appendix 4b. Replication slippage mode1 for the creation of tandem duplications. The exact
29-bp duplication in plastome 1 of Oenothera could have arisen by the following process:
replication through the region destined for duplication (A), pausing and rnisaügnment of
newly synthesized strand, mediated by 6-bp repeat shown in capital letters (B), replication
continuing and producing the duplicated sequence (C), and realignment of daughter strand,
stabilized by 6-bp complementary sequence (D). Figure and legend reproduced from
WoIfson et al., (1991).
1653 East Main Street
Rochester, NY 14609 USA
Phone: 71W48~-03OO
Fax: 716/28&5989
O 1993. Appiied Image. Inc. All Rights Resenred