* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CHARACTERlZATION OF THE ~ 0 CHONDRIA . L DNA MOLECULE
Gene expression profiling wikipedia , lookup
SNP genotyping wikipedia , lookup
Genetic code wikipedia , lookup
DNA damage theory of aging wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Genome (book) wikipedia , lookup
Minimal genome wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Genetic engineering wikipedia , lookup
DNA vaccination wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Transposable element wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Molecular cloning wikipedia , lookup
DNA supercoil wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Epigenomics wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Oncogenomics wikipedia , lookup
DNA barcoding wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Primary transcript wikipedia , lookup
Metagenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Human genome wikipedia , lookup
Designer baby wikipedia , lookup
Genome evolution wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Genomic library wikipedia , lookup
History of genetic engineering wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Non-coding DNA wikipedia , lookup
Point mutation wikipedia , lookup
Microevolution wikipedia , lookup
Microsatellite wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Genome editing wikipedia , lookup
Human mitochondrial genetics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
CHARACTERlZATIONOF THE ~ 0 C H O N D R I A . L DNA MOLECULE OF PEA APHID, ACYRTHOSIPHON PISUM A Thesis Presented To The Faculty of Graduate Studies of The University of Guelph by RICHARD J. BARREïTE in partial fuifilment of requirements for the degree of Master of Science August, 1997 O Richard J. Barrette, 1997 1*1 ofCanada National Library Bibliothèque nationale du Canada Acquisitions and Bibliographie Services Acquisitions et services bibliographiques 395 Wellington Street Ottawa ON K1A ON4 395. tue Wellington OltawaON KlAON4 CaMda Canada The author has granted a nonexclusive licence ailowing the National Lïbrary of Canada to reproduce, loan, distri'bute or sell copies of this thesis in microform, paper or electronic formats. L'auteur a accordé une Licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, disiriiuer ou vendre des copies de cette thèse sous la forme de microfiche/film, de reproduction sur papier ou sur foxmat électronique. The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fiom it may be printed or otherwise reproduced without the author's permission. L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation. CHARACTERIZATIONOF THE MïïOCHONDRiAL DNA MOLECULE OF PEA APHID, A CYRTHOSIPHON PISUM Richard J. Barrette University of Guelph, 1997 Advisor: Professor TJ. Crease An investigation of the mitochondrid DNA molecule of the pea aphid, Acyrthosiphon pistim, was undertaken using PCR amplification, Southern blotting and DNA sequence analysis. The results showed that pea aphid genome content and organization is very sirnilrir to that of Drosophila yak~tba,Anopheles quudrirnacdan~and Apis rnellifera. Codon usage is highiy biased in favour of codons rich in adenine and thymine. Pea aphid mtDNA varies in size from 16.8kb to 18.lkb. These size fluctuations occur in two regions of the molecule: the A+T-rich region, and in the vicinity of the ND4 and ND5 genes. The most probable mectianism for the A+T-rich region size variation involves a 123 nucleotide tandem repeat. Andysis of the A+T-rich region also reveded the presence of a thymine-rich haiiirpin-loop structure that is andogous to the marnmalian putative Iight-strand origin of replication. ACKNOWLEDGEMENTS 1 would like to take this opportunity to thank the individuais (Dr. Teri Crease, Dr. Paul Hebert, Dr. Sara Via, Dr. Bob Foottit and Dr. Bob Forster) for supplying aphid material, laboratory space and equipment, and the inteiiectuai guidance needed to complete this research project. I wouid dso like to thank Dr. Elizabeth Boulding for her insightfil comments of the thesis. Mr. Eric Maw must also be thmked profusely for his continuous help in both computer wizardry and in culturing bulk quantities of pea aphids used in cloning. Most importantly, 1must thank my f d y , friends and my wife P a e l l a , for their patience and support throughout my education and career in science. A special thanks must be said to Dr. R.T.M'Closkey and especidly Dr. Paul Hebert for taking a chance and giving this 3rd year undergraduate student the initiai opportunity at summer ernpIoyment in science. ii TABLE OF CONTENTS Acknowledgements ..................................... ............................................i Table of contents ................................................................................. u.. List of tables ...,. ..........................................~..............................-...... List of figures v ..................................................................................... vil ** INTRODUCnON ................................................................................. 1 Genorne content and organization ......................... . . . .........O..........1 Length variation and heteroplasmy ................................................. 6 ................ The pea aphid system ...................... . . ................ 1 1 MATERIALS AND METHODS ............................................................ ,.- 14 Pea aphid sarnpling and mtDNA purification .......................................... -14 PCR ampiifïcation and Gene Clean pudication ....................... . . ...........14 Cloning ..................................................................................... 16 Sequencing .............................................................................1 9 Sequence alignment and analysis ......................................................-20 RESULTS .........................................................................................2 1 A+T richness and codon use ..........................................................2 1 Cyt b and ND 1 intergenic region ...................................................3 1 Tnnsfer RNA genes ................................................................... - 3 4 A+T rich region ........................................................................ - 38 DISCUSSION .................................................................................... -44 Genome orgdzation ................................................................... -44 Genetic code and codon usage ..........................................................-46 A+T rich region .......................................................................... - 49 Length variation and heteroplasmy .................................................... - 54 iii REFERENCES ...................................................................................6 9 APPENDICES .................................................................................- - - - 8 2 Appendix la: The relationship between the nurnber of cut sites in A. pisrtm mtDNA and the A+T content of recognition sequences for f i restriction enzymes ...... -82 Appendix lb: Frequency distribution of Iength variants for region 1 and region 2 in thirty-five clones of A. pisrtm .......................................................... 8 2 Appendix lc: Disuibution of size class CO-occurrencesin the 13 clones of A. pisrcm from alfalfa with heterophsmy at region 2 ............................................ - 83 Appenduc ld: Analysis of the temporal stability of length variants for regions 1 & 2 in four clones of A. pisurn from aifaifa anaiyzed in 1990 and again in 1992 after approximately 30 genentions of parthenogenetic reproduction .....................84 Appendix le: Restriction map and gene order in A. pisum. ........................ -85 Appendix 2: Single-letter amino acid code designation used in codon usage Table 8 .............................................................................................. -86 Appendix 3a: DNA sequence sequence alignrnent used in the comparative analysis of the 12s rRNA partial sequence for the four insects ................................ .8 7 Appendix 36: DNA sequence sequence alignment used in the comparative analysis of the 16s rRNA partial sequence for the four insects ................................ .88 Appendix 3c: DNA sequence sequence alignment used in the comparative analysis of the COI gene sequence for the four insects ....................................... -89 Appendix 3d: DNA sequence sequence alignment used in the comparative analysis of the ND4 gene sequence for the four insects ....................................... ..90 Appendix 4a: Possible mechanisms of cecombination generating Iength variation in cricket mtDNA ...,.............,.,..........................,.. .. .. .. - .. . ....- . .. . 9 1 AppendYt 4b: Replication slippage mode1 for the creation of tandem duplications ...........................................-......-..........-.......-.....-.......... ...--.- -92 v LIST OF TABLES Table 1. Sequence and amplifcation conditions for the PCR primers used in both DNA sequencing and ctoning of pea aphid rntDNA ....................... 15 Table 2. Compslrison of DNA sequence and amino acid sequence for cytochrome oxidase subunit 1for Acynhosiphon pismi and three other insects Table 3. ..... 22 Comparison of DNA sequence and amino acid sequence of cytochrome b gene for Acyrthosiphon pisrrm and three other insects.......................23 Table 4. Cornparison of DNA sequence and amino acid sequence of NADH dehydrogenase subunit 1 for Acyrthosiphon pisum and three other insects Table 5. Cornparison of DNA sequence and amino acid sequence of NADH dehydrogenase subunit IV for Acyrthosiphon pisurn and three other insects Table 6. Comparison of DNA sequence of the 12s ribosornd gene for Acyrthosiphon pisurn and three other insects ..................................................-26 Table 7. Comparison of the DNA sequence of the 16s nbosomai gene from Acyrthosiphon pisurn and three other insects ................................ .27 Table 8. Codon usage table of the four partiaiiy sequenced protein coding genes of Acyrthosiphon pisum ...........................................................2 9 Table 9. Base composition of the codons used in the four partially sequenced protein coding genes for A. pisum Table 10. ....................-...- - - -...-..- -.... . - ......- ....- 3 O Cornparison of the Iength and base composition of the A+T rïch region for A. pisrirn and three other insects ...................-............-..- ....-..-.- - 3 9 vii LIST OF FIGURES Figure 1. Cornparison of the protein and DNA sequence of the 3' ends of the Cyt b gene, the ND 1 gene and the intergenic space between these two coding sequences ........................................................................ - 3 3 Figure 2. Predicted secondiiry structures of the four R N A genes identified in the pea aphid ...............................................................................35 Figure 3. DNA sequence and location of notable features of both the large and small size chss of cloned PCR products from pea aphid mtDNA Figure 4. ...............37 Predicted secondary structure of the primary hairpin-loop identified in the A+T rich region of pea aphid.. ................................................- 41 Figure 5. Alignment of the repetitive units from smaü and large size variants ......-42 Figure 6. Predicted secondary structures for the repetitive units ....................... 43 INTRODUCTION Mitochondria play a centrai role in energy metabolism by generating ATP in the cell throuph oxidative phosphorylation. Oxidative phosphorylation is a cornplex biochemical mechanism that involves the conversion of potential energy from electron gradients into chernical energy. Each mitochondrion contains its own genetic code. However, the mitochondria do not encode al1 of the proteins needed for organellar function. Sixty nuclear-encoded gene products are uansported into the mitochondrion (Heddi et al., 1994) where they interact with those polypeptides coded by the mitochondrial genome and assemble into the Five oxidative phosphorylation complexes: 1 to IV of the electron transport chain and complex V of the ATP syntase complex (Chomyn & Attardi, 1987). Genome content and organizntion With the single exception of species in the cnidarian genus Hydra, where the rnitochondriai genorne occurs as two unique linear molecules (Warrior & Gall, 1985), al1 metazoan m i m d mitochondrïal DNA (mtDNA) exists as covalently closed circular duplex molecules present in high copy number (103-104 mtDNA molecules per somatic cell) (Clayton, 1982; Brown, 1985). Al1 animai cells examined to date maintain a significant proportion of their mtDNA in either the form of catenanes in which monomer circles are intermeshed Iike links in a chain, or as simple head to tail unicircular dimers (Clayton, 1982). Evidence for recombination between molecules has been lacking (discussed in Brown, 1985; Moritz et al., 1987). However, the minicircle end-products of mtDNA recombination have now been detected in the mitochondnai genome of the phytonernatode, Meloidogyne javanica (Lunt and Hyman, 1997). Although it was originaily believed that mtDNA was strictly matemally inherited (Dawid & Blackler, 1972; Hutchison et al., 1974; Reilly & Thomas, 1980), recent studies on the marine mussel Mytilus (Fisher and Skibinski, 1990; Hoeh et al., 1991) and mouse (Gyllensten et al., 1991) have demonstnted some patemal involvement in mtDNA inheritance. An underlying feature of the mitochondrial genome is its extremely compact and highly "economic" genetic organization (Attardi, 1985). Apart from the putative regdatory region(s), the mitochondrial genome is saturated with sequences encoding discrete gene products. Coding sequences are either directly abutting or separated by few intergenic sequences, and in some situations they overlap. Each metazoan mtDNA contains the genes for the structural RNAs of the mitochondrion's own protein translation machinery ( 2 ribosomal RNAs [rRNA], 22 transfer RNAs [tRNA]), and 13 proteins. These proteins are al1 components of enzyme complexes associated with the inner mitochondrial membrane: cytochrome b (Cyt b), subunits 1-ID of cytochrome c oxidase (COI-III)subunits , 6 and 8 of the F, ATPase complex (ATPase6 and ATPases), and subunits 1-6 and 4L of the respiratory chain NADH dehydrogenase (ND 1-6 and 4L) (Chomyn and Attardi, L987).Of the metazoan mitochondrial genomes sequenced to date there are oniy a few examples of mtDNAs that are lacking the complete complement of coding genes. The mtDNAs of the nematodes Caenorhabdiriselegans and Ascaris sicum (Okiomoto et al., 1992) and the marine mussel Mytilus edulis (Hoffrnanand Brown, 1992) no longer contain the ATPase 8 structural gene. The sea anemone Metridium senile mtDNA codes for only 2 M A Sof the 22 tRNAs required for decoding the mitochondrial genetic code: one for tryptophane and the other for f-methionine. This represents the only known example in which genes coding for tRNAs have been exported from the mitochondrial g n o m e (Wolstenholme, 1992). Intragenic sequences are absent in ail other animai mtDNA. However, M. senile is again distinct as its COI and ND5 genes contain group 1 introns. The COI intron is postulated to encode an RNA splicase. Moreover, the ND5 gene intron contains the only copies of the ND 1 and ND3 genes (Wolstenholme, 1992). Despite the complete transcription of both strands, the distributions of the stmcturd genes in vertebrates, sorne invertebrates and insects are considerably different. The distribution of coding regions of vertebrates and various invertebrates, including sea urchins, is highly asyrnrnetrical: the ND6 and a few tRNAs are coded on the Light strand while the balance of the genes are al1 encoded on the heavy strand. The two strands designated as heavy (H) and light (L), were named according to physiochernicai experiments that measured the different buoyant densities of each strand in alkaline CsCtI density gradients based upon their base composition (Aloni and Attardi, 1971). A more balanced gene distribution between the two strands is found in the insect genomes sequenced thus far. The component protein, rRNA and tRNA genes have identical gene arrangements among such diverse vertebrates as fish, arnphibians and mammals (Wolstenhoime, 1992). A similar arrangement is found in birds except that the segments coding for Cyt b, tRNApr0 and tRNAthr and the ND6 and tRNAglu genes have been transposed reIative to each other (Desjardins and Morais, 1990; 199 1). In contrast to the vertebrates, invertebrates have undergone significant rearrangements in mtDNA gene order. Multiple inversions and translocations involving numerous loci are evident when the mtDNA genomes are compared between insects, sea urchins and nematodes (Clary and Wolstenholme, 1985; Crozier and Crozier, 1993; Jacobs et al., 1988; Cantatore et al., 1989; Okiomoto et al., 1992). Furthemore, among insects, the location and orientation of pmtein and rRNA genes and the putative control region are the same. However, significant variation ha been observed in the positioning and orientation of certain tRNAs (Clary and Wolstenholme, 1985; Crozier and Crozier, 1993; Mitchell et al., 1993; Beard et ai., 1993). The tRNA genes for al1 rnetazoan mtDNAs are interspersed throughout the mRNA and rRNA coding sequences. The tRNA punctuation mode1 (Ojala et al., 1980; 1981) proposes that tRNAs serve as recognition sites for processing of polycistronic mRNA by RNase-Plike enzymes that cleave the transcnpt at the junctions between the rnRNAs, rRNAs and the tRNAs. The presence of one major noncoding segment in the mitochondrial genome is a feature common to al1 metazoa. In vertebrates, except birds, this region is located between tRNApro and tRNAphe and varies considerably in size (879 bp in mouse; 1122 bp in human and 2134 bp in h g , Xenopus Inevis)(WoIstenholme, 1992). In marnmds and amphibia, thk sequence has been shown to include the signais necessary for both the initiation of transcription and replication (Montoya et al., 1982; 1983; Clayton, 1984) and it has therefore been designated the control region. Although the exact events goveming mitochondrid replication and transcription are not known, comparative analyses of the controI region from various mamindian species have identified regions of sequence consensus and possible functionai importance (Ciayton, 199la; 199 1b). These include the conserved sequence blocks [CSB) (Walberg and Clayton, 1981; Brown et ai., 1986; Dunon-Bluteau and Brun, 1987; Saccone et al., 1987; Saccone et al., 1991), the termination associated sequences (TAS) (Doda et al., 1981; Mackay et al., L986),the Iight strand promoter (LSP)and the heavy suand promoter (HSP) (Chang and Clayton, 1984; 1985), the binding sites for mitochondriai transcription factor (MTF) (Fisher et ai., l987), rnitochondrial singIe stranded binding protein (mtSSB-protein) (Mignotte et al., 1985) ruid the origin of heavy strand replication (OH)(Clayton, 1982). in insects, the single noncoding region is referred to as the "A+T-rich region" because it is composed of 90% to 96% deoxyadenylate and thymidylate residues (Fauron and Wolstenholme, 1976). This region, which has been shown to contain the origin of DNA replication (Goddard and WoIstenh~lme,1978; 1980), is situated between the tRNAilc gene and the 5' end of the small subunit ribosomal (12s) gene. However, no apparent signals for initiation of transcription or replication comparable to those of vertehrates have been detected. in the mtDNA genomes of insects thus f a sequenced there is considerable variation in both length and nucleotide sequence for this region (4601 bp in Drosophila rnelanogaster, 1077 bp in Drosophila ycrkuba, 826 bp in Apis rnellifera, 520 bp in Anopheles gunzbiae and 625 bp in Anopheles quadrirnaculatiis (Lewis et al., 1994; Clary and Wolstenholme, 1985; Crozier and Crozier, 1993; Beard et al., 1993; Mitchell et al., 1993). Few conserved sequence motifs are found within this region. Replication and transcription in both vertebntes and invertebrates are believed to be strictly correlated with one another. In vertebntes, studies have demonstrated that priming for both H-suand replication and polycistronic transcription of the L-strand start from the LSP and are therefore indistinguishable from one another (Clayton, 1984; Attardi, 1985, Cantatore and Saccone, 1987). Clayton (199 la) has proposed that the concentration of mitochondnal transcription factor is decisive in regulating the two processes. Low concentrations of mtTFl would acuvate LSP leading to the transcription of the genes encoded on the L-strand (i.e., 8 tRNAs and ATPase 6 in mammals). Elevation of the concentration of mtTFl in the rnitochondrial matrix would then activate HSP resulting in transcription of the 14 tRNAs and the 12 protein encoded genes of the H-strand. Although it is unknown what factor(s) signal the commencement of DNA replication versus transcription, RNA replication priming starts from the LSP, proceeds downstrearn, and provided sufficient levels of RNase MRP are present, terminates near the OH at the RNA/DNA transition site (CSB L in mouse; TuUo et al., 1994). A three stranded DNA structure is fonned between a short nascent DNA (H) suand formed by displacement synthesis, the parental H-strand and the complementary L-strand. In this short strand, called the D-loop or 7s, DNA synthesis starts from the OH and in mammals stops approximately 600 to 700 nt downstream at the TAS sites, and is repeatedly synthesized and degraded (Wolstenholme, 1992). However, at some point, synthesis of the nascent Hstrand proceeds in a unidirectional manner until the daughter strand is completely synthesized. The L-strand origin of replication (OL)consists of anocher noncoding sequence located in mammals and amphibians in a cluster of tRNA genes between tRNAsn and tRNAcys and is approximately two thirds of the mtDNA genome away from the OH. This sequence is similas to other replication origins in that it can form a themodynamicaily stable hairpin loop structure. It consists of a GC-rich stem of variable length ranging in size from 9 bp in frog to 12 bp in mouse and a T-rich loop of 12 to 19 nt (Clayton, 1982; Wong et al., 1983). Once H-strand synthesis is at l e m 67% complete and the OL is exposed as a single-stranded template, initiation of L-strand synthesis can begin. The synthesis of the RNA primer has been shown to begin in the T-cich stretch of the loop structure and the transition of RNA to DNA synthesis occurs at the base of the hairpin structure (Wong and Clayton, 1985). Further experiments have shown that a pentanucleotide sequence 5'-CGGCC-3' present at the base of the stem that overlaps with a few basepairs of the 5' end of the tRNAcys gene is necessary for efficient replication (Hixson and Brown, 1986). The replication of both strands proceeds asynchronously resulting in the segregation of two distinct daughter molecules (alpha and beta). Once the beta daughter rnolecule is fully synthesized, dpha and beta molecules are converted to closed circles. Synthesis of full length daughter strmds requires approximately one hour and the entire cycle requires two hours (Clayton, 1982). Overali, the mitochondrial putative control region of invertebrates remains poorly studied relative to vertebrates. Most research defining the mechanics of mitochondrial replication and transcription has involved mammaiim and amphibian systems. The data available from the limited number of invertebrate mtDNA genomes that have been entirely sequenced show a lack of sequence conservation in the regdatory region. It is beiieved that this may reflect a difference in the mechanisms of DNA replication a d o r transcriptional initiation and its regdation among invertebrate groups (Lewis et al., 1994). Lengtlz variation and heteroplasrny Attardi (1985) originally described the animai mitochondriai genome as an example of "an exuemely econornicai unit". This generalization was bsised on the notion that the mitochondriai genome has been under intense selection for small size and invariable structure. Insertion and deletion events were thought to be rare, the relative gene order was thought to be conserved, and most mutational changes seemed to involve base substitutions at either silent sites or in noncoding regions (Brown, 1985). However, as the number of characterized metazoan mitochondrial genomes has expanded, it has become quite evident that Attardi's (1985) traditionai view of "extreme economy" is in need of revision. Initial studies at the population level reveaied little intraspecific and interspecific variation in mtDNA genome size. Intra-individual heterogeneity (Le., heteroplasmy) was rarely observed (Robberson et al.. 1974; Potter et al., 1975; Ojala and Attardi, 1977; Coote et al., 1979; Brown and Des Rosiers, 1983).Therefore, it was generally believed that there was a rapid sorting-out of the transient heteroplasmic States that must occur following mutational events (Upholt and Dawid, 1977; Takahata and Maruyama, 1981; Birky et ai., 1983). However, as the number of population level studies expanded, heteroplasmy has been found to be much more cornmon with five types having been recognized. (1) Nucleotide site heteroplasmy, although quite rare has been identified in cow (Hauswinh and Laipis, 1985), human (Greenberg et ai., 1983) and fmit fly (Hale and Singh, 1986). (2) Heteroplasmy may also involve variable number of nucleotides in homopolymer stretches as detected in cow (Hauswirth and Laipis, 1985) and rat (Brown and Des Rosier, 1983), or (3) large deletions in coding sequences as found in mouse (Boursot et al., 1987). The last two classes (4) & (5) encompass either continuous Iength variation up to 1 kb in size or discrete length variation involving variable copy number of tandem repeat(s). These and numerous other studies have unveiled a "fluid" nature associated with the mitochondrial genome (Rand, 1994). Metazoan mtDNA is now known to vary in size by as much as threefold, from 13.8 kb in the nematode Caenorhabditis elegans (Okiomoto et al., 1992) to 41 kb in the scailop, Plampecten magellanicris (Gjetvrij et al., 1992). Although the large size variation among animal mtDNAs is not typically attributable to duplication/deletion events, there are a few cases of these mechanisms. Duplications of coding regions have been detected in Cnernidophorus lizards (1.5 -> Heteronotia geckos ( 1.2 -> 8.0 kb) and 10.4 kb) where they comprise a portion of the conuol region and adjacent rRNA, tRNA and protein genes (Moritz and Brown, 1986; 1987; Moritz, 1991; Zevering et ai., 1991). In addition, a single sequence containing the 12s rRNA, 16s rRNA, ND 1 and ND2 genes of the newt, Trit~iruscrisratus, has been amplified (Wallis, 1987). A tandem duplication involving a region similar to the 3' end of the COI gene and the tRNA1ru gene has been located at the junction of the honeybee, A. rneilifera. COII gene and COII gene (Cornuet et al., 1991). A 3.0 kb tandem duplication has also been detected in the nematode Romnnomennis crrlcivorar. Each repeating unit was found to contain six open reading frames (ORF), two encoding ND3 and ND6 and a third ORF sharing simîlarity to the cytochrome P-450 gene (Azevedo and Hyman, 1993). However, unlike the examples listed above where considerable substitutions, short duplications, and deletions occurred in the duplicated regions, substitution levels in the nematode repeating units were less then 0.0 1% (Hyrnan and Azevedo, 1996). Deletions involving coding regions, such as those found in mouse (Boursot et ai., 1987), have also been detected in humans. For example, Kearns-Sayre syndrome, a human neuromuscular disease, is the direct result of dysfunctional rnitochondria. The encephalomyopathy involves deletions of 1.3 kb to 7.7 kb that can encompass part of the COII, ATPase 6 & 8, NADH 4,4L,5 & 6 the N-terminus of the Cyt b gene and 5 tRNA genes. The remainder of the COII and Cyt b genes are fused (Holt et al., 1988, Shoulbridge et ai., 1990, Heddi et al., 1994). MtDNA size variation due to substantid duplication or deletion of coding regions is much less comrnon than size variation and heteroplasmy due to variation in copy number of tandemly repeated sequences in the control region. In vertebrates, repeat motifs generally occur at either one of two locations. The first is the region associated with the TAS site(s) for the three-stranded D-loop structure and the second is located between the OHand the LSP (often within or adjacent to the CSBs) (Gernmell et al., 1996; Fumagdli et ai., 1996). Hoelzel and coworkers (1994) partitioned these two zones into five distinct repetitive sequence regions (RS 1 to RS5; refer to Figure 1 of Hoelzel et ai., 1994). The TAS zone repeats (RI & R2) and the CSB2 and CSB3 (R4 & R5) zones involve repeat motifs whose size varies in multiples of 40 nt and ranges from 40,80 or 160 nt resulting in total Iength changes of 80 to 650 nt. Up to 50% of the ceiis are heteroplasmic for the RS 1 and RS2 variants. These repetitive sequences have been described in several vertebrates including the white sturgeon (Buroker et ai., 1990), Atlantic cod (Amason and Rand, 1993, evening bat (Wiikinson and Chapman, 199l), Japanese monkeys (Hayasaka et d-, 1991) and nbbit (Mignotte et al., 1990). Repeat motifs situated between CSB 1 and CSB2 (RS3) are distinctive from the other four RS zones in that the repeat motifs are considenbly smaller in size (6 to 38 nt) and the frequency of heteroplasmy is ver- high, although the total size of repeat amys parallels that of the other RS zones. RS3 repeats are found in pig (Ghivizzani et ai., 1993), harbour seal (Arnason and Johnsson, 1992), elephant seals (Hoelzel et al., 1993) and 18 carnivore species (Hoelzel et ai., 1994). Shrews (Crocidwa rirssala and Sorex araneus) are unique relative to other mammals in that they simultaneously possess tandem repeats in two locations (RS 1 or RS2; 78 nt long & RS3 12 nt in C. russala, 14 nt in S. armeus) unlike other mammals where they occur at only one of the five RS locations (Fumagalli et ai., 1996). In invectebrates, tandernly repeated sequences such as those detected in vertebrates do occur within or irnrnediately adjacent to the A+T rich region. The A+T rich regions of three closely related species of bark weevil (Curculionidae: Pissodes) were not only found to be dramaticaily enlzuged (9 to 13 kb) but also to be flanked by variable numbers of a tandernly repeated unit of approximately 200 nt. The repeat region varïed in size from 0.8 to 2.0 kb (Boyce et ai., 1989). The size of the mtDNA genorne for these three species ranged from 24 to 30 kb in P. terminalis, from 25 to 34 kb in P. nemerensis, and from 28 to 36 kb in P. strobi. Remarkably, dl219 bark weevils that were sampled were found to be heteroplasmic and exhibited anywhere from two to five distinct size classes of mtDNA that differed by as much as 7.5 kb. Rand and Harrison (1989) documented the presence of length variation in two species of crickets, Grylllrs fimus and Gryllus pennsylvanicus. Nucleotide sequence anaiysis revealed a 206 nt repeat that is bounded by a G+C-rich 14 nt sequence with dyad symmetry and is present in one to seven copies. Sixty percent of G. finnus and 45% of G. pennsylvanicus were heteroplasmic for two or three haplotypes, respectively. Undoubtedly, the most notable example of large scale mtDNA size variation is that found arnong scallops (Bivalvia: Pectinidae). Of the seven species studied thus far, only the bay scailop, Argopecten irradians, apparently lacks repetitive sequences and shows a typical size-invariant 16.2 kb mtDNA molecule (Gjetvaj et al., 1992). The remaining six species ail possess relatively large mtDNAs accompanied by a broad spectrum of size classes: the giant scaiiop, Pecten marimiis ( 19.9 to 26.3 kb); the rock scallop, Crassadoma g i g m t a (22.8 to 24.8 kb);the Queen scallop, Aeqriipecten opercrîlnris (2 1 to 28.2 kb);the spiny scallop, C h l m y s hastata (23.9 to 27.2 kb); the Iceland scallop Chlmys islnndicn (22.2 to 25 kb) and the deep-sea scailop Plampecten magellanicris (3 1 to 41 kb) (Snyder et al., 1987; La Roche et ai., 1990; Gjetvaj et al., 1992). Restriction enzyme analysis has revealed the presence of three separate regions associated with size variation in the mtDNA molecule of the Iceland scûllop. Size variation in region one is the result of changes in copy number (one to three) of a 1.2 kb tandemiy repeated sequence. Variation in region two is associated with deletions of 100 to 250 nt, and discrete size variation due to the insertion of a single 1.4 kb sequence is found in region 3 (Gjetvaj et al., 1992). The largest rnetazoan rntDNA identified is that of the deep-sea scallop at 4 1 kb. Again, restriction enzyme analysis characterized three regions responsibie for size variation. Size variation in region one is due to fluctuations in the copy number (two to eight) of a 1.45 kb repeated sequence whereas a continuum of size variation caused by multiple insertions or deletions of an approximately 100 nt sequence typifies region iwo. In region three, the variation occurs in 250 nt increments. As in the Iceland scailop, these three regions appear to Vary independently from one another (Gjetvaj et ai., 1992). Heteroplasmy was not detected in either the bay scallop or the giant scallop. However, ten percent of spiny scallops were heteroplasmic for size variants. In one individual, three size classes were present in about equal frequency and differed by increments of approximately 600 nt. In the Queen scallop, a multiplicity of molecules whose size differs by LOO nt increments is generated by Ava-I restriction digestion, Fiteen percent of the Iceland scallops were heteroplasmic primarily for region one and to a lesser extent regions two and three. One individuai with a single size class of mtDNA was found to be heteroplasmic for the presencehbsence of a Barn HI restriction site. In the deep sea scdlop, most individuals were heteroplasmic for the 100 nt increment located in region twu. Ten to twenty percent of the individuals were heteroplasmic for the 1.45 kb repeat sequence characterized for region one. The pen aphid systern Aphids (Aphidoidea) are an extremely complex and diverse group of insects consisting of approximately 4401 species placed in 493 currencly accepted genera (Blackman and Eastop, 1994). Aphids are believed to have evolved some 300 million years ago (Heie, 1981; 1987). As with other members of the insect order Hemiptera, aphids have slender, piercing and sucking mouthparts that form a hollow tube with which they suck juices From plants. The life cycles of aphids are very diverse and include both parthenogenetic and sexual generations, eIaborate polyphenisms, and obligate shifting between unrelated host-plant taxa (Moran, 1992). Although some aphids have Lost the sexual phase of the life cycle, most altemate one or more parthenogenetic generations with a single annual sexual generation that produces the ovenivintering eggs. In the more primitive families, Adelgidae and Phyiioxeridae, both sexual and parthenogenetic ferndes are egg producers (oviparous). However in the Aphididae, parthenogenetic females always give birth to live young (viviparous) with the embryos of granddaughters developing within the embryonic daughters of a given femde. The pea aphid, Acyrthosiphon pisiim (Harris), is a polyphagous Pest of legumes that invaded North Arnerica within the last century (Johnson, 1899). Pea aphids are cyclical parthenogens. In cold climates the sexual forms appear in the €dland Lay overwintenng eggs ( L m b and Painting, 1972). In spring, the eggs hatch into fundatrices and establish Iineages that reproduce apomictically for ten to twelve generations. Environmentai factors trigger the generation of the sexuai biotypes that produce the overwintering eggs, thus completing the cycle (Via, 1991). In the dairy producing areas ne= Lansing, New York, pea aphids are found in sirnilar abundance on their two major hosts, aifalfa, Medicng~sariva, and red clover, Tnfoli~irnprnrense (Via, 1991). Studies of mtDNA diversity in aphid species is quite lirnited with only 3 species, Schiraphis grnrninrtm (Powers et ai., 1989), Rhopnlosiphwn padi (Martinez et al-, 1992) and A. pimm (Barrette et al., 1994) having been so far examined. Sequence diversity within Schiznphis and Rhopalosiphum was found to be very low, with length variation in R. pndi arising from differences in the copy number of a LOO bp tandernly repeated sequence. MtDNA analysis of thirty five clonal lines of A. piscrrn originally collected from clover fields in New York reveaied Little sequence diversity. A survey of variation using eighteen restriction enzymes that cut pea aphid mtDNA at l e s r once, reveated onIy two polymorphic restriction sites (Barrette et al., 1994). However, this survey uncovered several notable features associated with A. pisiim mtDNA that warranted further investigation. For example, examination of the relationship between restriction site frequency and the A+T content of recognition sequences for fifty of the sixty restriction enzymes used to digest pea aphid rntDNA showed that enzymes with an A+T content of less than 40% cut far less frequently than those with higher A+T content (Table 2; Barrette et al., 1994). The A+T contents of insect mtDNA genomes are elevated with the highest recorded value (84.9%) detected in bees (Crozier and Crozier, 1993). Restriction site frequency in pea aphid is suggestive that its A+T content may be at least as high as that of Apis. While restriction site polymorphisms were rare, length variation and beteroplasmy were common in the pea aphid clones sarnpled with rnitochondrid genome sizes ranging from 16.8 to 18.1 kb. Two regions of length heterogeneity were detected. Region one (RI) contained three variants differing in size by multiptes of about 120 nt, while six size classes varying in size by multiples of about 210 nt were detected in region two (Rî). The largest size class of R1 was the most common, whiie the intermediate size class of R2 occurred most frequentiy (Appendix 1b: Barrette et ai., 1994).AU clones appeared to be homoplasrnic at R1, but 13 of 35 clones were heteroplasmic at R2. One clone had four size classes of mtDNA, but other clones contained just two size classes that usually differed by single-unit size shifts, but occasionaiiy by two-unit or three-unit size shifts (Table 5 in Barrette et al., 1994). A further 16 clones were screened for their R1 and R2 Taq 1 restriction enzyme digest patterns. Two clones were found to be heteroplasmic for R L. Temporal stability of Iength variants was assessed by isohting mtDNA from four heteroplasmic clonal lineages after approximately 30 generations of parthenogenetic reproduction. Although length variation was unchanged for RI, the ratio of length variants in R2 changed with time even though the complement of size classes remained the same (Table 6 in Barrette et al., 1994). Restriction enzyme mapping localized R1 and R2 to opposite sides of the rnolecule. Prelirninary analysis of gene order suggested that the overall gene order of A. pisrrm is very sirnilar to chat of D. yakuba with RI occurring within or adjacent to the A+T rich repion and R2 occurring somewhere within the area spanning the ND3 gene to the ND5 gene (Appendix le: Barrette et al., 1994). The purpose of this study is to continue characterizating the pea aphid mtDNA genome. 1s the R1 length variation due to repetitive elements within the A+T rich region? Does this region contain any of the transcription and replication initiation/termination signals described in other metazoan rnitochondrial genomes? Lastly, is codon usage in the gene coding sequences and A+T nucleotide biasness similar to that of other insects? P m aphid smpling and mtDNA pirrijîcation Labosatory cultures of A. pimm were established from coIlections made in alfalfa fields near Lansing, New York (Barrette et al., 1994). Mitochondrial DNA was extracted and purihed as reported previously in Barrette et al. (1994). Briefly, 700 to 1500 mg (wet weight) of fresh colony material was homogenized in sucrose grinding buffer and then subjected to differentiai centrihgation to sepante cellular debns b m the mitochondria. TO ensure sufficient removal of cellular contaminants, rnitochondria were pelleted through a sucrose step gradient. Mitochondria were lysed with sodium dodecyl sulfate (SDS) and the mtDNA was purified by cesium chloride density gradient ultracentrifùgation. PCR [email protected] and Gene Clean purification PCR amplifications were performed in 50 pl reaction volumes using two units of Taq polymerase (Boehringer) in 1X Boehringer PCR buffer, 20 nrnol of each M P and 10 pmol of each primer. Reactions were overlaid with Iight minera1 oil to prevent evaporation. Primer combinations, magnesium chloride concentrations and annealing temperatures for amplification of the regions of the mtDNA used for cloning and sequencing are shown in table 1. Al1 PCR amplifications were performed using the following program in a Ampliuon II Thermocycler (Themolyne): 27 cycles of denaturation at 94°C for L minute, anneaiing (Table 1) for 2 minutes and primer extension at 72°C for 2 minutes. Amplification was continued for another 8 cycles in which the extension phase of each cycle was lengthened by 20 seconds. Gene Clean (BI0 101; Vista, California) purification of the amplification products was performed in accordance with manufacturer's general specifications. Minerai oil was removed from the PCR reactions prior to addition of 120 pl sodium iodide solution and 5 pl g l a s milk suspension. To ensure maximum recovery of the purified PCR product, Table 1: Sequence and amplification conditions for the PCR primers used in both DNA sequencing and cloning of pea aphid intDNA ", PCR product 16s 1 sequencing PCR priiiiers 1 1 M ~ Position P primer in D. yakuba 5' cgcctgtttaacwaaacüt 3' 13417 1 6s-4c (#20) 5' ccggittgaactcügatcatgt 3' 12866 16s-5 (#2 1) 5' acatgüattggagctcgaccagi 3' 5' ggtacattacctcgglttcgltatgal3' 1 1523 1 1867 Cyt b-1 (#6)* ND I - 2 (M)* ~ 5' gtaggüggagctgctütüttag3' 848 1 ND4-5 (#II)* 5' gcttüttcatcggttgctca 3' 8737 ND4-6c (#12)* 1 Annealing 53 1 PW-f-1 1 ImM sequencing cy; b -NDI 1 sequencing sequencing A+T region cloning *primers previously described in Burretie et al., 1994 47 3rnM 45 2SmM binding of DNA to g l a s mük was carried out by incubating and agitating the mix for a minimum of 20 minutes on a rocker platform. Foilowing a 15 second pulse spin at 14000 RPM, the glass rnilk pellet was washed twice in 500 pi New Wash solution, resuspended and agitated on a rocker plathm for 15 rninutes each. DNA was eluted from the glass milk by two 15 minute incubations at 50°C, each in 7 fl of ultrapure water. Cloning The A+T rich region of a single heteroplasmic clone (Rl-1,3) was PCR amplified using primers #18 and #19 (Table 1) and cleaned as described above. To ensure that the purified PCR product had blunt ends, 14 pl of it were incubated for 30 minutes at 37°C with 10 units of Klenow enzyme (BRL), lOnrnol of dTïP in 1X Klenow buffer in a final reaction volume of 20 pl. The reaction was terrninated by denaturation of the Klenow enzyme at 80°C for 10 minutes. One hundred microlitres of water were added to the sarnple which was then extracted in an equal volume of phenoVchloroform/isoarny1dcohol (PCI) 2524: 1, and then in an equal volume of chloroform/isoamyl alcohol (CI) 24: 1. The DNA was precipitated by the addition of 10 pl of SM NaCl, 300 pl of absolute ethanol followed by incubation ovemight at -20°C. The DNA was pelleted in 1.5rnls Eppendorph centrifuge tubes by centrifugation at 14000 RPM at 4OC for 20 minutes using an Eppendorph 5415C benchtop centrifuge, washed three tirnes with 70% ethanol and solubilized in 20 pl of ImM TE. Ligation of the blunt-ended A+T region PCR product to the plasmid vector, PCRScript SK++was sirnplified by using the pCR-Script Cloning System (Stratagene) in accordance with the manufacturer's specifications. Reactants were added in the following order; 1 1 1 (10 ng) pCR-Script SK+vector, 1 pl 1OX ligation buffer (Stratagene), 0.5~1 lOmM rATP, 3 pl (400ng) target DNA, 5 units [email protected] enzyme, 1 pl T4 DNA ligase and 3.5 pl water. The sarnple was mixed gently and then incubated at room tempenture for one hour. The reaction w u stopped by heating at 65°C for 10 minutes and then stored on ice until transformation. The Epicurian Coli XLl-Blue MRF' Kan supercompetent cells supplied with the pCR-Script kit were thawed on ice after which 35 pi were transferred to a 15 ml poIypropylene Falcon tube containing 0.6 pl of 1.44M B-mercaptoethanol, gently mixed and incubated for 10 minutes. Two microlitres of the Ligation reaction were added to the cells and the mixrure was incubated on ice for 30 minutes, heat-shocked in a 42OC water bath for 45 seconds and then transferred to ice for 2 minutes. Four hundred and fifty microlitres of SOC medium (OS%w/v yeast extract; 2%w/v uyptone; lOmM NaCl; 2.5 mM KCI; lOmM MgCl?; 20mM MgS04; 20mM glucose) were added to the Faicon tube, mixed, and then incubated for one hour at 37°C in a shaking incubator. Twenty-five, 50, 100 and 200 microlitres of the transformation mix were plated ont0 Luria Broth (LB) 1% a g a plates containing LU) pg/ml X-gai (Boehringer), 40 pg/ml iPTG (Boehringer) and 50 pg!d ampiciIlin (Sigma) and aiiowed to incubate overnight at 37°C. White transformants were triple streaked ont0 a fresh LB plate and grown ovemight at 37°C. Forty-eight white colonies were screened by Libenting DNA from single colonies with cracking buffer preparation (O.LN NaOH, lOmM EDTA pH 8.0, 10% glycerol and 1% SDS) and eIectrophoresing it in a 1% agarose gel in TBE (89mM Tris-borate; 89mM boric acid; 2mM EDTA) buffer. Two size classes of recombinant plasrnid were detected and selected for growth in 100 mls of liquid LB overnight at 37°C. To isolate plasrnid DNA, bacteria were peIIeted by centrifugation at 6000 RPM for 20 minutes. The pellet was resuspended in 5 ml of soIution 1(50mM glucose, 25 m M Tris-HCL pH 7.5, IO mM EDTA pH &O), and lysed for 20 minutes on ice by addition of 10 ml of solution iI ( 1% SDS, 0.2N NaOH). Bacterial chromosornal and cellular material were precipitated by addition of 7.5 ml of Solution LI1 (7.5 M ~ o n i u m acetate) and incubated on ice for 20 minutes. Debris was pelleted by a 20 minute centrifugation at 12500 RPM using an RC5 superspeed centrifuge (Sorvall) and an SS-34 rotor (Sorvall). The supematant containing DNA was precipitated by the addition of 15 ml isopropanol and incubation at -20°C for 30 minutes. The DNA was pelleted by centrifugation at 12500 RPM for 20 minutes, resolubilized in 5 mls 2.5 M ammonium acetate, incubated on ice for 30 minutes, recentrifuged at 12500 RPM for 20 18 minutes, washed twice in 70% ethanol, and then solubîiized in 400 pi of 1 mM TE. RNA was removed by RNase treatment (3 yg) at 37OC for 30 minutes followed by organic extraction and ethanol precipitation as descnbed above. The plasmid DNA was solubilized in 100 pl H20and quantXed against calf thymus DNA (Sigma) using a DyNA Quant 200 (Hoefer) flurometer. Progressive unidirectional deleted subclones of the original cloned fragment containing the A+T rich region were generated using the Erase-A-Base kit (Promega) in accordance with manufacturer's specifications. To generate deletions, 10 pg of plasmid DNA were digested in a 50 pl reaction containing 1X One-for-al1 digest buffer (Pharmacia), 20 units each of Kpn-1 (BRL) and Cla-I (BRL).Reverse deletions were generated by sequentially digesting 10 pg of plasmid DNA in a 50 pl reaction containing LX Buffer 1 (NEB) and 20 units Sac-I (NEB). After heat inactivating this enzyme, the DNA was digested in a final volume of 140 pl in LX Buffer 3 (NEB)containing 40 units of Eag-I (NEB) enzyme. All digests were subsequentiy brought up to 400 pl with water, extracted twice with PCI, once with CI, and then ethanol precipitated overnight at -20°C by the addition of 25 pi SM NaCl and 1 ml absolute ethanol. Pelleted DNA was washed three times in 70% ethanol and then solubilized by 37OC incubation in 36 pl H20and 4 pl LOX Exo IIIbuffer. A total of 14 tubes (time points), each containing 7.5 pl of S 1 digestion mix ( 103 pl H20+ L6 pl 7.4X S 1 Buffer + 4 y1 S 1 enzyme) were prepared and stored on ice until use. Exonuclease III reactions were commenced by the addition of 3 ~l Exonuclease III enzyme to the plasmid DNA preheated to 37T.At 30 second intervals, 2.5 @ diquots were transferred to each of the 14 tubes on ice. These tubes were then incubated at 37°C for 30 minutes, 1 pl of stop buffer was added, the samples were heat shocked at 70°C for 10 minutes, and then returned to ice. To evaluate the quaiity of the defetion reactions, 2 pi of each time point were electrophoresed on a 1% agiuose gel. Each time point was ethanol precipitated, washed 3X in 70% ethanol and then solubilized in 15 pl H20. Four microlitres of 5X ligase buffer (BRL) and 10 units T4 ligase (BRL)were added to each time point which was then incubated overnight at 4OC. These sampIes were used to transform E. coli strain DH5 alpha F' followed by plating on ampicillin/x-gaVIPTG LB plates. Deletion subclones differing in size by 200 bp increments spanning the entire A+T rich region were selected. Sequencing Deletion subclones were grown overnight at 37°C in 10 ml LB containing ampicillin. Plasrnid DNA was isolated and purified using the alkaline isolation procedure described earlier. However, an additional purification step was performed. The tinal plasrnid DNA pellets were solubilized in 64 pi H20+ 16 pl of 4M NaCl, reprecipitated by the addition of 13% PEGsooo and incubated on ice for 20 minutes. The DNA was pellered by centrifugation at 14000 RPM at 4OC on a 54132 benchtop Eppendorph centrifuge and then carefully washed three tirnes in 70% ethanol followed by solubilization in 30 pI H2O. The Kpn-UCla-1 deletion subclones were sequenced by the Sanger dideoxy method using the T7 Pharmacia sequencing kit as recommended by the manufacturer. Two micrograms of plasrnid DNA were aikaline denatured and annealed to the Ml3 forward primer and then sequenced using the short-read termination mix and a S35-thio-dATP. Sequencing products were electrophoresed on 4% Sequagel XR (National Diagnostics) acrylamide gels in 1X TBE at 50 Watts constant power in a BRL 52 sequencing apparatus. Gels were vacuum dried and then exposed to Fuji NiF-Rx blue X-ray film for 16 hours before development of the autoradiographic image. Sequencing of the Sac-VEag-1 deletions (using the ML3 reverse primer) was perfomed at the University of Guelph's automated DNA sequencing facilty. DNA templates were sequenced using the Taq FS dye-termination kit (Perkin Elmer) and then analyzed on an Appiied Biosystems 377 automated sequencer (Perkin-Elmer). PCR products purified with the Gene CIean kit were sequenced using the T7 (Pharmacia) kit. Twelve microlitres of template and 2 pi of primer (10 pmoVpi) were denatured by 5 minute incubation in a boiling water bath and then chiiied immediately on ice for 15 minutes. Primer-annealed tempIates were then sequenced according to the manufacturer's specifications using short-read rnix and a S3s-thio-dATP. Sequence products were electrophoresed and visualized as previously described. Squence alignrnent and annlysis Autoradiographs obtained by standard sequencing techniques were mdyzed using the DNASTAR digitizer software. Full composite sequences were generated using the Megaiign program (DNASTAR; Madison, Wisconsin). AB1 autosequencing output files and composite sequences were (electropherograrns) were exported to Macvector (BI) generated using Sequencher software (Gene Codes Corp, Ann-Arbor, Michigan). Published sequences were downloaded from GenBank and comparative anaIyses perforrned using DNASTAR software. Further sequence analysis was acheived using DNAman (Lynnon Biosoft, Vancirevil, Quebec, version 2.5 1) software. DNA secondary structures within the A+T rich region sequences were determined using the minimum energy approach (Zuker and Streieger, 1981; Jaeger, Turner and Zucker, 1989a; 1989b) using the program MULFOLD (version 2.0) and visuaiized using the program Loopdloop (Gilbert, 1990). RESULTS A+ T richness and codon use Initial restriction endonuclease screening of pea aphid mtDNA reveaied that digest frequency was dependent upon the A+T content of the recognition sequence of the enzyme, wi th the highest restriction site frequency occumng with those endonucleases whose recognition sites are 80% to 100% A+T recognition sequences (Appendix La; Barrette et ai., 1994). To [email protected] the A+T richness of A. pisrlm mtDNA, four protein genes and the two ribosomal genes were partially sequenced from PCR products amplified using the primer pairs listed in table 1. Pea aphid sequences were compared to the same genes from three other insects, the fruit fly Drosophila yakuba (X03240), the mosquito Anopheles qiindrimucc~lutus(L04272), and the organism with the highest A+T content measured to date for metazoan mtDNA, the honeybee, Apis rnellifera (L06178). Refer to appendix 3 (a,b,c.d) and figure 1 for the alignment of the various sequences used in this cornparison. The A+T content in three of the four proteins was higher in A. pisrlm than in any of the other three insects with 74.1% A+T for COI (Table 2), 90.3% for Cyt b (Table 3) and 92.7% for ND 1 (Table 4). Only the ND IV gene sequence of A. rnellifera was slightly higher at 79.0% versus 77.2% for A. pisum (Table 5 ) . The 12s and 16s ribosomal genes of A. pisirm also showed elevated A+T levels with only the 16s rRNA sequence of D. yakcibn having a higher value then A. pisiun (Tables 6 & 7). The protein encoding DNA sequences were translated using the Drosophila mitochondrial genetic code. Cornparison of nucleotide and amino acid sequences of the four protein genes reveded a spectnim of interspecif'csidarities with the COI gene showing the greatest level of sequence similarity and the ND1 gene exhibiting the least (Tables 2, 3, 4 & 5). Furthemore, although the levels of DNA sequence s i m i l ~ t ybetween A. pisum and the three other insects were similar for each of the four protein-encoding genes, the corresponding amino acid sequences showed fairly low similririty values for di but COI. NADH dehydrogenase Table 3. Coiiipürison of DNA scquence and aniino ucid sequence of cytochromc b gene for Ac~~r?lio.sipliori pisirni and three other insecis A.pisirm D.ycrkith A. yitci~lri~~tcrc~~Iciti~.s A. rtiellifrrw Numbcr of amino acids (A.A.) 23 23 23 Percent sequence similürity with A. pisirm - 69.4 70.4 45.8 56.5 Percent A.A. siinilarity wiih A. pisrrm 28 73.6 58.3 Table 4. Con~prrrisonof DNA sequence und uniino acid sequence of NADH dehydrogeniise suhunit 1 for Acytlrosiplroti pisritn and ihree other insects Number of amino ücids (A.A.) Percent sequence similwity with A. pisrrtri Percent A.A. similarity with A. pisurri A .pisitrt1 D.yditba A. c~rrtr~lrirntrcr~ltrti~.~ A. mellijiirtr 31 42 33 - 66.7 65.6 - 3 1,2 40.6 30 59.1 19.4 Table 6. Cornparison of DNA sequence of the 12s ribosomal gene for Ac~~r?lio.si~)frort pisirnt and three other insects A.pislrm Length of sequence (ni) Percent sequence similarity with A. pisirni 376* - D.yuklrba A. qirt1tIririiacrr1~1111s A. rrlellijkrtr 35 1 45 .O *sequena length of 12s gene based on positioning of the putative 5' end 352 359 46.0 44.8 Table 7. Comparison of the DNA sequence 01' the 16s ribosamal gene from Acyrtho.sip/ruti pisruri and ltiree oiher insecls Length of sequence (nt) Percent sequence similarity with A. pisirm A,pisuni 2 10 - D.ycrkubu A. q~rcrclrit~iaarlu~~~s A. rriellifercr 209 209 210 66.2 65.2 61.4 subunit 1(Table 4) best illustrates thÏs point. The percent sequence similarity values for ND1 differed by only 6.5% between A. qrtadrimcrculat~tsand A. melli$era (65.6% and 59.196, respectively), yet this difference corresponded to a more than twofold difference in the amino acid sequeace similarity values (40.6% and 19.4%, respectively). DNA sequence cornp~sonsof the 5' end of the 12s rRNA gene showed Iower simiiarities with values varying frorn 44.8% for honeybee to 46% for the mosquito (TabIe 6). Cornparisons of the 16s &NA gene reveaied higher vdues ranging from 6 1.4% to 66.2% for honeybee and fruitfly, respectively (Table 7). Codon usage in A. pisurn mtDNA is highly biased against codons tich in G or C. Among the 159 amino acids mslated from the four partially sequenced protein genes. the codons rnost frequently used were those rich in A and T (Table 8). Nearly hdf of the 159 amino acids could be accounted for by five codons, each containing only A or T in the third position. The leucine codon, TTA, was present 21 tirnes, ATT coding for isoleucine was present 20 tirnes, phenytalanine (TTT) was present 16 times, methionine (ATA) was present 10 times and lysine (AAA) was present nine times (Table 8). Only one of the 22 totai Ieucine amino acids were coded by a G or C-containhg codon, TTG (Table 8). The same bias holds for phenylaianine, isoleucine and methionine with the AAG codon being completely absent from the nine lysine amino acid residues identified in the four gene sequences. Furthermore, no codons cornposed only of G and C were detected. For example, the amino acid glycine is coded for by the four codons GGT, GGC, GGA and GGG. Of the eleven glycine residues identified, four were represented by GGT,seven by GGA, and none by either GGC or GGG. A+T content was found to differ markedly in the three codon positions (Table 9). The A+T content at the third position, at which most but not ail substitutions are silent, is 95.7%. The second codon position is the least biased at 67.7% A+T with T being the most frequently used (46.6%). G and C are present in nearly equd abundance. Eighty percent of the first position nucleotides are either A or T and G is used nearly W e tirnes more often Table 8. Codon usage table of the four pariiülly sequenced protein coding genes of A. pi.vio,t Codon Frequenc y of A A TIT 16 TCT S TAT Y TGT Trc TCC TCA S TAC Y m'A 2 21 S TAA stop TrG I TCG S TAG stop TGC TGA TGG Cm' O cm P CAT H CGT CTC O CCC P CAC H CGC CTA O CCA P CAA Q CTG O CCG P CAG Q CGA CGG ATT 20 ACT T AAT N AGT ATC I ACC T AAC N AGC ATA 10 ACA AAA K AGA ATG 1 ACG T T AAG K AGG GTT 2 GCT A GAT D GGT GTC O GCC A GAC D GTA 1 GCA A GAA E GGC GGA GTG O GCG A GAG E GGG *Refer to iippendix 2 for translation of the stiindard one-lettercode used to specify the 22 amino ücids within this table 29 Table 9. Base composition of the codons used in the four pürîiülly sequenced protein coding gencs for A. pistrrn Base A T G C 38.5 4 1 .O 46.6 51.6 14.9 16.2 5.6 16.3 1.2 3.1 Codon Position I 2 3 21.1 44.1 than is C at this position. The ratio of the GC-nch amino acid codons of proline, danine, arginine and glycine to the AT-rich codons of phenylalanine, isoleucine, methionine, tyrosine, asparagine and lysine has been used to examine the relationship between base composition of a codon family and amino acid occurrence (Crozier and Crozier, 1993). This value is 0.297 for A. pisrim, which is approxirnately 10% Iower than the ratio of 0.328 calculated for the same sequeace in A. melfifem, and considerably lower than the values detennined for D. yakiiba and A. q~indrimnczïfcztus(0.438 and 0.508, respectively). Cyt b and NDC intergenic region In the other insects sequenced to date, the genes for Cyt b and ND 1 are encoded on opposite strands of the mtDNA so that their 3' ends are adjacent to one another. The tRNAser gene is located between these two genes (Clary and Wolstenholme, 1985; Crozier and Crozier, 1993; Mitchell et al., 1993). This region in A. pistim is iilustrated in figure 1. Complete stop codons (TAA) are used for both the Cyt b and ND1 genes and the 3' ends of both genes are flanked by several intergenic noncoding nt. In A. pisum, a totai of eight nt separate the 3' end of the Cyt b gene frorn the beginning of the tRNAsr gene whereas only two and six nt separate this region for A. qrrczdrimacrrlafus and D. yakuba, respectively. By contrast, the honeybee has 45 nt spanning this region as well as an additional 15 nt encoding five amino acids at the 3' end of the Cyt b gene. The 3' end of the ND1 gene for D. ycrkrrba overlaps the tRNAser gene by 15 nt (five amino acids) followed by a complete TAA stop codon. In the pea aphid, the mosquito and the honeybee, there are several intergenic nt separating the TAA stop codon of ND1 and the 3' end of tRNA5er. The 10 nt sequence of A. pisrim is identical to 10 of the 19 intergenic nt identified in A . qucrdrimaculrrtus. Again, the honeybee has the longest noncoding region with 34 intergenic nt. Figure 1. The A. piscm DNA and protein coding sequences for the 3' ends of both the Cytochrome b and ND1 genes, the tRNAser gene, and the intergenic spacers abutting the tRNAsrr gene. The insects used for the cornparison are abbreviated as follows: D. yak. (Drosophita ynkirba); A. mell. (Apis rnett$era); and A. quad. ( A n o p h e l e s qi~aciritnacutariis).Single bold asterisks (*) indicate the beginninp and ending of the tRNAser gene with the TGA anticodon sequence highlighted in bold. The triple astensks (***) denote the stop TAA codon for the Cyt b and ND 1 penes. Arrows (->) indicate the 5' to 3' orientation of the gene, Single letter codes designate the runino acids coded for by the DNA sequences of Cyt b and ND 1 of the four insects. Cyt b - +**.*.** O. yak. A rnel 1. ACAACAACATATnTTTATATTnTTCTTAAATTTCTATTTAAGPAAATTATGAGATAATTAATTTGAAATTCACCATTA A quad A. p i %un 0. yak. A. me l 1 . A. quad A. pisurn T T T Y F +++*+*-*-+-++ 0. yak. A. me l 1 . L Y F F L N F Y L S K L W O N L I W N S A quad. L i n t e r g e n i ~zone .***+4 +*.**-..-.+**.*.......-+k.-+.-- GAAAACATAAGATAGAATTTAATTTTCTATTAACTTTTTACTAAAAAAAATTCACAATAAAAAAGAAAATAATAAAATTTT - - . N E M L K K S F F l W L L F S F L L I K GAAAATATAATATACAAATAAAATTTTCTATTAACTTTTTTACTTATTTTAATATAAATTAATATTAAACTTTAAATACAT 1 C --- F L S.. A P AAAACATAAGAAAGAAGTAAAATCTTCTATTAATTTTAATACTAAAATmATTCA---------------- TTAAAATAAT p i sum TGAAAATAAAAAATAGAATTAAGTTCTATTAACTTAATACTAAAA------------------------- TTAA TGATAT H Y . O - 260 0. yak. A. MACCCAATAAAAAATAATAAATAATTTAAAGAAAATGATAAAAAACATTTTCAAGCTAAATATATTAATTT F G I F F L L Y N L S F S L F C K W A L Y V L K me l 1 . AAAAATTCTTTTATAAAATATAAATATATTAAATAAATTATAACTAATATTAATATTTCAGTTCAACATATATTTATTAAT L F E K M F Y L Y M L Y i M V L M L M E T W C f l N M L A quad A. pisum AAATTAAATAATTTAATAGAAAAATTAAATACAAATAAAAAATAAAGATAAAGGTAAAATTAATTTTCAAATTAAATATATT L N F L D. yak. A me1 1. TT K A quad TACATTAATTT Y f l L K A. p i sum AATTT L K K I S F N F V F L F L S L P L I L K W I L Y M 34 Transfer RNA genes The location and anangement of the four A. pisum RNA genes sequenced were determined on the basis of sequence similarity with pubtished tRNA gene sequences. These sequences were then confmed by their predicted folding potential into the characteristic cloverleaf structures (Fig 2). All four tRNAs are similar to the inferred configurations proposed for other insects and concordant with the general features associated with the amino-acyl arm, the anticodon arm, the DW-arm and the TiyC m. Non-standard base pirings occur in the secondary structures of severai tRNAs including t R N A h G-U pairs are known to occur in RNA molecules and are quite cornmon in rRNA. One T-T basepair mismatch is present in the predicted configuration of A. pisiim tRNAdn. The location and orientation of the four tRNAs in pea aphid mtDNA is identicai to that of D.yakuba and A. quadrtinacularirs. The three tRNAs, tRNAfmet, tRNAdn and tRNAiIc, are located between the A+T rich region and the 5' end of the ND2 gene and are al1 transcribed in the same direction as that of D.yakrrba and A. qriadrimaculatris. Hence, only tRNAgln is transcribed on the secondary strand whereas the other three tRNAs are transcribed in the sarne direction on the primary strand. htergenic spacers, as seen with tRNAser (Figure 1)' are also present between two of the three other tRNAs. Eight nucleotides separate tRNAfmet from tRNAgin. However, the 3' end of tRNAdn overhps the 3' end tRNAiie by three nucleotides (Figure 3). The total Length of the tRNA genes also differs slightly between the four insects. The A. pisum tRNAser (65 nt) and tRNAiie (64 nt) genes are the shortest and the tRNAgin gene is only three nucleotides longer (66 nt) than that of the honeybee. Although the A+T content of pea aphid tRNAser gene is the highest (91%) arnong the four insects, the A+T content of the other three A. pisum tRNA genes is considerably lower than that of the other insects. 5' T A-T G-C T- A T -A A-T A- T T-A A -T T-A T A T A A-T T-A A- T C-G T-A AT-A T T I T ATTC A 0 1 I G T T G T ~ TTAGA A A T G T~ C A T * G T-ATG A-T A- T A-T T-A t î A A. p h m MNA Ser(UCA) A.pisum tRNA Gln 5' A G A A A A TCGA* A *AGCTA T A A AA C T-A A. pisum tRNA Ile A. pisum tRNA fMet Figure 2. Predicted secondary structures for the four R N A genes of the pea aphid: tRNA ser(UCA), tRNA gln, tRNA ile, tRNA fmet. Figure 3. The DNA sequence of the large size class of cloned PCR products from A. pisum. Arrows (-) (*) indicates the 5' to 3' orientation of the gene. A single asterisk designates the beginning andlor ending of the tRNA genes and the 12s rRNA gene. The underlined 5 nt sequence highlighted by the number symbol (#) indicates the beginning of the sequence of the smaller size class DNA molecule. Nucleotide substitutions in the small cloned PCR fragment are indicated above the sequence. Downward arrows (4) indicate the sites of nucleotide insertion or deletion in the srnaller fragment. The eight nt sequence underhed and superscripted by astensks indicates the short repetitive element postulated to be the progenitor of the longer 123 nt A+T rich region repeat. CC$+---AAATAATTATATATATCATCATATGTATATATCTGAAGTATATATATATCATATATATATATATATATA --------------- HaIrpin loop 2 -4 44. TATGTAGATATATCGAATATATAACATATATATATATATATATATATATTTAGGAATATATCATTATAAT ------------ F---c Hatrptn [oop 3 I AGATATGTAAATAATATATATATATATATATACATATATTTAGGAATATACTTATATATGTATATAATTA i""- .......*........................ ACATATATATATATATTCCCAAATCTTTCTTTÏCAACIlAQABP;tTAATTAAATATATATTTATAGATAT AATAATAAAATATATATTTATATTAAATAAATAATATTTTAAAATAAATTAATTCTTTTATGTACTTTA '-------------------------------------------------A-+-+--------------TTATTTTAACAAACCAATAATCATAATTCATATTAAATATTAAATTAAAAGAAATTATTTCATTTTAAAT TAATTCTTAATAATAATATTAATAAAATAAGATAATTTAAAATAAATTTTGATCCTAATCTATGGGAT A +T rich region The A+T rich region was cIoned and sequenced for two DNA fragments of different sizes generated From the amplification of a pea aphid clone that was heteroplasrnic in region 1 using prirners #18 and #19 (Table 1). The larger of the two PCR products was a total of 1398 nt in length. The smdler fragment was only 11 11 nt long and Iacked a region of 287 nts irnrnediately adjacent to the tRNAmet-lc primer (Table 1) that inchdes three tRNAs genes coding for methionine, glutamine and isoleucine, a hairpin Ioop and an additionai 40 nt (Figure 3). Measurement of restriction Fragments suggested that the size of the putative repeat sequence was about 120 nt (Barrette et al., 1994). Hence, this smaller PCR fragment is nearIy 50 nt shorter than that anticipated for an RI- 1 size variant that Lacks two repeat units relative to the long variant. The sequence alignment between the long fiagrnent (Rl-3) and the smaller fragment begins at nt 303 of R1-3 in a pentanucleotide sequence (5' CATCG 3') that is a perfect match for the 3' end of the tRNAmet-lc primer. This suggests that the shorter fragment was generated by rnispriming of the tRNAmet-lc primer. Comparison of the sequences of the two PCR fragments revealed a total of five transitions (three Tc->C; and two Ac->G). Four of them occurred in the A+T rich region and one occurred in the 12s rRNA gene. There were also three insertioddeietion events (indels) involving a single nucleotide and two involving dinucleotide sequences. Based on the tentative positioning of the 5' end of the 12s rRNA gene, the A+T rich region of the Rl-3 A. pisum clone was determined to be 848 nt in length. The A+T content of this region is suprisingly iower (38.396, Table 10) than that of the other three insects. Sequences similar to those detected in the control regions of marnmalian systems were absent in the A+T rich region of A. pisum. No conserved sequence blocks (CSB 13), termination associated sequences (TAS), or transcriptional promoters sirnilar to the LSP and HSP were detected. However, a total of three hairpin loop structures were found, and the one that is immediately adjacent to the tRNAile gene yields a structure analogous to the stem and loop structure associated with the L-strand origin of replication of frog, human and mouse (Figure 4). This structure consists of a 26-nt imperfect stem and a 24-nt terminai Loop that includes a hexanucleotide sequence of T's. This loop structure also contains the pentanucleotide sequence CCAAT identified as the "CAAT box" transcription factor CW/N l binding site. Two open reading frames (CRF) were identified within the A+T rich region, one 240 nt sequence (80 amino acids) is located between nt 205 and 448 on the pnmary strand (Figure 3) and the other 243 nt sequence (8 1 A.A.) is located on the complementary strand between nt 343 and 589. Blast search analysis of the GenBank database revealed no significant similarity between the putative translated sequence of these two ORFs to any published protein sequences. The A+T rich region of A. pisurn was found to contain one primary repetitive element located adjacent to the 5' end of the 12s rRNA gene (Figure 3). The repeat, similar in size to that expected from the restriction site data (Barrette et al., 1994), is 123 nt in length. The RI-3 size variant contains two complete repeat copies and one uuncated repeat (the furthest from the 12s rRNA gene) in which 35 nt have been deleted from the 5' end (Figure 5). One of the five transitions identified earlier between the two cloned PCR fragments occun at the 3' end of the middle repeat. Moreover, a short poly-T stretch within the repetitive element was found to be five nt long in the r3 repeat (adjacent to the 12s rRNA gene) and six nt in the middle repeat, P. This poly-T stretch is the location at which the 35 nt deletion begins in the first repeat. RNA folding anaiysis of the repetitive element revealed that a single repetitive element c m be folded into a thermodynarnically stable secondas, structure with a AG0 of -8.3 kcal/mol (Figure 6a). The secondary structure of two elements has a AGo of -19.7 kcai/mol (Figure 6b). The only other repetitive sequence detected in the A+T rich region was that of a srnail octanucleotide 5' TïAAAAAT 3' sequence present in two copies immediately adjacent to the last repeat and also present within each repetitive unit. C-G A-T G-C T- A A- T A- T A C A-T T T T-A T- A T- A T- A T-A A-T T T T A-T' A-T A-T C T A-T A-T T-A A-T T-A T-A T- A T- A T- A A-T SAA CA T 3' Figure 4. Predicted secondary structure of primary hairpin loop identified in the A+T-rich region of pea aphid. repeat repeat repeat repeat repeat repeat rit ris r2L r2s r3L r39 70 80 90 I I I 100 110 12 0 1 I I TATATATTTATAGATATAATTTAATAAATTTATTCATATTTMTAARATATTTMTTa TATATATTTATAGATATAATTTAATAAATTTATTCATATTTMTAAAATATTTAAAATTAAA T A T A T A T T T A T A G A T A T A A T T T A A T A A A T T A TATATATTTATAGATATAATTTAATAAATTTATTCATATTTMTMTATTTMTTM TATATATTTATAGATATAATTTAATAAATTTATTCATATTTAAAATAAAATATTTAAAATTAAA TATATATTTATAGATATAATTTAATAAATTTATTCATATTTAAAATAARATATTTWTTAAA A Figure 5 . Alignment of the three repeiiiive units idcntified in the lurger size clüss A+T rich region clone of A. pisirm. Delta synibol (A) indicates location of site differences beiween the repeüting units. The hiitched syinbol (-) represents an iiiseriionfdeletion site and the arrüy of asterisk (*) depicts the trunciited zone of the ri repeat. Superscript L and S refers to the large and small size cluss clones, respeciively. at t a at1, t t ~ a a a :aactt atatt t 'ttaaaaatta a a t t t t laat tataa t 'tl aat a t a 3' 'a gaa aa aa tta t tac .Caa t cta a Figure 6. Predicted secondary structures for the repetitive sequence identified in the A+Trich region of pea aphid: (a) one repeating element (AG0= -8.3kcal/mol) and (II) two repeating elements (AG0= -19.7kcaiirnol). DISCUSSION Cornparison of fmit fly, honeybee and mosquito reveals that the protein and ribosomal RNA gene order is identical among the three insects (Clary and Woistenholme, 1985; Mitchell et ai., 1993; Crozier and Crozier, L993) and it is therefore highly probable to be the same in A. pisum. Thus far, a total of four pea aphid protein coding genes (COI, ND4, ND1 and Cyt b) have been paniaiiy sequenced and their transcriptional orientation is identical to that of D. yakubn, A. quadrimaczikztus and A. rnellifera (Clary and Wolstenholme, 1985; Mitchell et al., 1993; Crozier and Crozier, 1993). However, sequence analysis of these four protein encoding genes has detected none of the restriction sites anticipated to be found based on the restriction and gene order map (Appendix le). Since these partial sequences represent only a fraction of the total gene sequence, and given that the gene order map may be skewed in its alignment relative to the restriction map (Barrette et ai., 1994), it is not unusual that the restriction sites were not detected. The tRNA genes are considerably more labile in their positioning than are the other mtDNA genes. Relative to D.yakuba, eleven tRNAs of A. mellifera and three tRNAs of A. quadrimacrrlntirs are located in different positions within the mtDNA genome (Crozier and Crozier, 1993; Mitchell et al., 1993). Hence, description of tRNA positioning in the pea aphid genorne will be Limited to the four tRNAs that have been identified by sequence and secondary structure analysis. The three tRNAs flanking the A+T rich region of pea aphid (tRNAfmet,tRNAdn and tRNAiIe) are identical in their location and orientation to D. ynkubn and A. qziadrimaculatiis. Eight nucleotides sepmte tRNAfmet and R N A & in both A. pisum and D. yakuba, whereas in A. quadrimaculntus, both of these tRNAs are directly adjacent to one another. In D.yakuba and A. qundrimaculatus, tRNAile and [email protected] separated by 3 1 nt and one nt, respectively. However, in A. pisum as with A. gambiae, a close relative of A. quadnmûculatns, the 3' ends of both the tRNAgin and tRNAIlr genes overlap by three nt (Clary and Wolstenholme, 1985; Beard et al., 1993; Mitchell et al., 1993). In honeybee, three additional tRNAs (tRNAElu, tRNAserAGN and tRNAda) occur in this region and the orientation of some of these genes relative to D. yakrrba (Clary and Wolstenholme, 1985) has changed. These six tRNA genes are separated by either a single nt or by as many as 49 nt and al1 are transcribed in the same direction (Crozier and Crozier, 1993). The location and orientation of the tRNAserUCN (between the 3' ends of Cyt b and ND0 is identical in D. yakrrba, A, quaJrirnaculntus, A. mellifera and A. pisum mtDNAs (Clary and Wolstenholme, 1985; Mitchell et al., 1993; Crozier and Crozier, 1993). However, with the exception of D.yakrtba, the tRNAsergene is separated by as few as two to as rnany as 45 nt. As found in other rnetazoan mitochondrial genomes (Wolstenholrne, 1992), the RNA genes of A. pisum will likely be dispersed throughout the various protein coding and rRNA genes. This distribution is consistent with the tRNA punctuation mode1 proposed by Ojaia et al. (1980, L98 1) who suggested that the secondary structure of the RNA genes serves as a recognition site for processing the polycistronic mRNA. in situations where the genes are not separated by RN&, a recognition mechanism involving secondary structure is believed to function as the cleavage site for E2Nase-P like enzymes. For exarnple, the junctions between the ATPase 6 and CODI genes (no intergenic nt), between the ND4L and ND4 genes (separated by 33 nt), and between the ND6 and Cyt b genes (separated by 10 nt) of the crustacean Arrernia. frcznciscana are capable of foming hairpin loop configurations (Valverde et al., 1994). Similar hairpin structures have been reported at the ATPase 6/COlII and ND4UND4 junctions of D. yczhrba mtDNA (de Bruijn, 1983; Clary and Wolstenholme, 1985). Overlapping coding sequences have been detected in a number of rnetazoan mtDNA genomes. Some ovedaps occur between the 3' ends of two genes that are encoded on opposite strands of the molecule. There are d s o cases of overlap that involve genes encoded on the same strand. For example, the ATPase 8 gene overIaps with the ATPase 6 gene by between two and 46 nt in vertebrate and higher invertebrate rntDNAs. Similady, the ND4L gene overlaps the ND4 gene by between four and seven nt in vertebrate and some higher invertebrate mtDNAs (Wolstenholme, 1992). Ojala et ai. (1981) successfully isolated mature, bicistronic transcripts of the ATPase 8 and ATPase 6 genes, and of the ND4L and ND4 genes from HeLa mitochondria. Subsequent protein isolation from bovine mitochondria recovered fully functional ATPase 6 & 8 proteins that corresponded to the size and sequence predicted from the overlapping coding regions (Feadey and Walker, 1986). Therefore, transIation of the ATPase 6 protein must involve ribosome binding and translation initiation within the ATPase 8 coding region (Wolstenholme, 1992). Although sequencing of pea aphid protein and tRNA coding regions was limited, it is most probable that pea aphid mtDNA is similar to other insect mtDNA genomes with overIaps occurring between the ATPase 8 & 6 genes and possibly the ND4L and ND4 genes. However, the 15 nt overlap between the 3' end of the ND1 gene and the tRNAser gene identified in D. ynktibn is absent in A. pis~un,A. mellifera and A. quadrirnncularrrs. Pea aphid rntDNA shows the compact organization similar to that observed in other animal rntDNAs. The number of intergenic nt in D. y a k h a , although four times that of A. qundrimaculatus (183 nt and 46 nt, respectively) and considerably less then the 81 1 detected in A. rnellifera, most likely parallels that of the pea aphid. Few intergenic nucleotides were detected between any of the three R N A genes found adjacent to the A+T rich region or the ND 1 and Cyt b junction (Figure 1). Genetic code and codon usage The DrosophiIa genetic code differs only slightly from the standard genetic code. The codons AGG and AGA specify the amino acid serine in the mitochondrial genetic code, rather then arginine. ATA specifies methionine instead of isoleucine and TGA codes for tryptophan rather then acting as a termination codon (WolstenhoIme, 1992). However, until the pea aphid COI, ND4, ND1 and Cyt b proteins are isolated, sequenced and compared to their corresponding DNA sequences, it will not be possible to address the issue of coding differences when the Drosophila genetic code is used to predict codon assignrncnt in A. pisrim. in A. mellifera, al1 but two tFU4As use the same sequence in the anticodon region as reported in D. yakrrbn (Crozier and Crozier, 1993). The anticodon sequence for tRNAlys ÎS CTT in D. pktiba and TïT in A. mellfera as in Xenopus (Roe et al., 1985), Gallr~s(Desjardins and Morais, 1990) and Caenorhabditis (Okiomoto et al., 1992).GCT is the anticodon sequence for tRNAserAGN in D. yakttba, Xenopus and Gallra compared to TCT in Apis and Caenorhabcliris. The A+T content of vertebrate mtDNA ranges from 56% to 64% (Gadaleta et al., 1989) with the base composition between the two strands being unequally distributed. The mtDNA of insects, including that of pea aphid. is considerably higher in A i T content than that of other invertebrates and vertebrates. AiT Levels range from 84.9% in honeybee (Crozier and Crozier, 1993) to 77.4% in mosquito (Mitchell et al., 1993) with coding regions and A+T content being more equally distributed between the two strands. Aithough only a Iirnited portion of the pea aphid mitochondrial genome was sequenced, it is quite evident that its A+T content is simiIar to that of the honeybee whose A+T content cepresents the highest recorded level for metazoan mtDNA (Tables 2 , 3 , 4 , 5 and 6). The codons most frequently used are those hesivily biased in A+T content and even more so in those ending with A or T nucleotides. For example, phenyialanine is encoded by the codons TTT and TTC. in the frog, Xenopus lnevis, ï T T and TTC occur 125 and 105 times, respectivdy (Roe et al., 1985) whereas in Drosophila they occur 3 13 and 17 times (Clary and Wolstenholme, 1985), and 355 and 26 times in Apis (Crozier and Crozier, 1993). Moreover, excluding termination codons, the percentage of codons ending in A or T is 93.8% in Drosophila and 95.2% in Apis. Although the sample size is considerably srnalier, the use of TTT and TTC phenylalanine codons in A. pistrm is dso heavily skewed: TM' occun 16 times and TTC occurs only twice. Of the 159 pea aphid codons screened, 95.7% ended with either A or T, and 56% of the codons consisted entirely of A andlor T (Table 8). Furthermore, when the relationship between the base composition of a codon family and the occurrence of the corresponding amino acid is examined within D. yakuba, A. q~indrimuculatus,A. rnellifera and A. pisrirn, there is a ciear bias in favour of amino acids encoded by A+T rich codons (Crozier and Crozier, 1993). The ratio of 0.297 in A. pisum is similar in magnitude to that of A. melliferci, yet both ratios are considerably srnalier then either D.yakitba or A. qrmdrimaculatrrs. The "A+T pressure" mode1 (lukes and Bhushan, 1986; Jermiin et al., 1994) proposes that there is an evolutionary tendency toward directional substitutions patterns that result in the accumulation of A and T nucleotides within protein coding regions, and in particular within synonymous codon sites. Clary and Wolstenholme (1985) postulated continuous selection for A+T nucleotides at al1 sites of Drosophiln mtDNA where it is compatible with function. However, the enzymes involved in both transcription and replication of organïsms with A+T rich mtDNA are no more efficient at processing A+T nch DNA then G+C rîch DNA (Lewis et ai., 1994). Studies of mtDNA polymerase show no preferences for dATP or dTI'P substrates (Lewis et al., L994).Furthermore, replication of mtDNA in both invertebrates and vertebrates is highly accurate with in vitro error rates of one in every one million bases replicated (Kunkel, 1985; Wemette et al., 1988; Kunkel and Mosbaugh, 1989). On the other hand, fidelity of DNA polymerase activity is also known to be affected by in vitro reaction conditions, and in particular by nucleotide pool biases (Wernette et al., 1988; Olson and Kaguni, 1992). Unfominately, little is known about nucleotide pools or their regulation in vivo (Lewis et al., 1994). Two additionai explanations for A+T richness have been proposed by Crozier and Crozier (1993) and both involve the conversion of GC pairs to AT pairs via alkylation. 0 6 methylguanine, generated by the alkylation of guanine, often mispairs with thymine, leading to the replacement of GC pairs with AT pairs during mtDNA replication (Watson et al., 1987). It has been demonstrated chat aikylating agents capable of producing 0 6 - methylguanine are endogenously produced in prokaryotic cells (Reebeck and Samson, 199 1) and in large arnounts in some insect mitochondria (Beard et al., 1993). The second hypothesis proposes that the mitochondria are relatively ineFficient at importing the DNA repair methyltransferases needed to combat mutation resulting from methylation (Crozier and Crozier, 1993). A+T ricti region The A. pisrlm mtDNA moIecuIe sequenced in this study contains an A+T rich region of 848 nt which is slighdy larger than chat of A. rnellifera (826 nt) (Crozier and Crozier, 1993) but srnaller than that of D. yukrtba (1077 nt) (Clary and Wolstenholme, 1985). However, a 123 nt repetitive element is present within this region in pea aphid and the clone sequenced in the present study is the Largest size class variant (ie. RI-3; Barrette et al., 1994). Thus, the expected size range of the A+T rich region is 602 bp, 725 bp and 848 bp depending on the number of repeats present. Furthemore, since the 5' domain of the 12s rRNA gene exhibits considerable length and nucleotide sequence divergence arnong various metazoan animais (Wolstenholme, 1992; Neefs et ai., 1993; Lewis et ai., 1995)' determination of the 5' end of the A+T rich region based upon sequence aiignrnent is not very reliable. Total length is therefore based on the "tentative" placement of the 5' end of the 12s rRNA gene (Figure 3). A more accunte method to determine the 5' end of the 12s rRNA gene would involve primer extension mapping, or sequence and secondary structure analysis of the entire 12s rRNA gene. The pea aphid A+T rich region is 88% A+T which is nearly 10% less than that of A. rnellifera and 6% less then A. q~tndrimac~datus. Two long ORFs encoding 80 and 8 1 amino acids were located in this region and comparative analysis of protein sequences from the GenBank revealed no substantiai full length sequence similarities. However, nine of the first 14 arnino acids of one of the ORFs matched to the large subunit of the mRNA capping enzyme of the variola virus. It is possible that this coding sequence represents an ancestral gene reiic that has since been integrated into the nuclear genome (Brown, 1985) Comparative analysis of the A+T rich region of A. pisum revealed no apparent signals for initiation of transcription and replication comparable to those detected in mammalian systems. Sequences analogous to the conserved sequence blocks are absent in A. p i s m as in dl other invertebrates. In vertebrates, a total of three CSBs have been identified. CSB-1 and CSB-2 are stongly conserved throughout marnmals. However, the third CSB is absent in the playpus, bovine and cetacean control region (Anderson et al., 1982; Southern et ai, 1988; Hoelzel et ai., 1991; Dillon and Wright, 1993; Gemrnell et al., 1996) and therefore its functional importance is questionable. There are three primary roles that have been proposed for the CSBs: (1) involvement in the transition of RNA to DNA synthesis in mtDNA (Chang and CIayton, 1985). (2) relief of supercoiling during H-strand synthesis (Low et al., 1987), and (3) substrate recognition by RNA processing enzymes (Chang and Clayton, 1987; Low et al., L988; Coté and Ruiz-Carrillo, 1993). Thus far, three endoribonucleases including RNase MRP have been identified that show substrate specificity for CSB-2 and wiii cleave in vitro either DNA or RNA at this point (Chang and Clayton, 1987; Low et al., 1988; Coté and Ruiz-Carrillo, 1993). It has also been demonsuated that CSB2 is a binding site for mtSSB identified by Mignotte et al. (1985) and together with CSB-1 is a binding site for rnitochondrial transcription factor MTF (Gernrnell et al., 1996). Neither termination associated sequences (Doda et al., 1981) nor the light and heavy strand transcriptional promoters (Chang and Clayton, 1984; 1985) are present in the pea aphid A+T rich region. The TAS, dthough moderately conserved, have been identified in a large number of taxa and are thought to signal the end of D-loop synthesis (Doda et al., 1981, Mackay et al., 1986; Dunon-Bluteau et ai., 1987). In humans, a short region associated with the LSP and HSP is essential for promoter function (Hixson and Clayton, 1985). Efficient transcription requires the presence of a trans-acting protein termed mtTFl (Parisi & Clayton, 1991; Fisher et al., 1992), which h a been shown to bind 10 to 40 nt upstrearn of the transcriptional staa sites at each promoter where it wraps and bends the duplex DNA (Fisher et al., 1992). Mitochondrial RNA polymerase binds at the transcriptional start site where it is presumed to interact either directly or indirectly with mtTF1 and begins transcribing the mtDNA encoded genes. Transcription in the mammdian system is terminated at a site located at the 3' end of the rRNA region directly adjacent to the tRNAleu gene. Tt is believed that termination is accomplished through the binding of protein(s) at a 13 nt sequence located at the 5' end of the tRNAieu by a simple footprinting event (Kruse et al., 1989; Hess et ai., L991). Thus, any hrther transcription by mtRNA polymerase is prevented by blockage with a termination protein (Clayton, 1992). The only structural element detected in the A+T rich region of pea aphid mtDNA that is sirnilar between both vertebrates and invertebntes is a hairpin structure containing a Trich loop. Hairpin structural configurations are of interest because replication of circular DNA molecules obtained from a variety of sources has been shown to be initiated within or adjacent to these regions (Schailer, 1978; Tijan, 1978). Replication of the light strand (Le., equivaient to the invertebrate second srrand) in mouse, human and frog is initiated within a short sequence located at the junction of the t R N A c y s and tRNAasn genes (Martens and Clayton, 1979; Tapper and Clayron, L98 1; Clayton, 1982; Wong et ai., 1983). This short sequence, capable of forming a highiy stable stem and T-rich loop structure, is conserved among al1 vertebrates except chicken (Chang et ai., 1985; Roe et al., 1985; Dejardins and Morais, 1990; Clayton, 1992). DNA replication at the OLof vertebrates is known to begin with the synthesis of a RNA primer starting within a run of T's (1 1 in mouse and seven in human) of the T-rich Ioop (Le., ternplate strand) and continuing to the RNA/DNA transition point located within a pentanudeotide sequence (5' CGGCC 3') at the base of the stem (Wong and Clayton, 1985; Hixson and Clayton, 1985). Although this secondary structure is absent in chicken and quail, the S'CGGCC 3' target sequence for RNase-H andor transition from RNA to DNA synthesis is present in the amino acid acceptor stem of the tRNAcys gene suggesting that an alternate secondary smcsural conformation may be responsible for the OLof lagornorphs (Desjardins and Morais, 1991). A sirnilar stem and loop motif is located at the intergenic region between the ND4 and CO1 genes of two nematodes (Okiomoto et al., 1992). In C. elegans, the stem and ioop structure is 109 nt long with a loop containhg a run of 4 T's whereas in A. suclm the structure is 117 nt with a mn of 6 T's within the loop. A stem and T-rich loop structure exhibiting considerable resemblance to the vertebrate OLhas been found within the A+T rich region of D. yakrrba, D. virilis and D. teisieri (Clary and Wolstenholme, 1987; Monnerot et al., 1990). EIectron microscopy studies have mapped the origin of second strand synthesis to an area of the Drosophila A+T rich region found to contain this hairpin structure (Goddard and Wolstenholme, 1978; 1980). Furthemore, since the 5'-3' sequences of the stem and loop structure would be the product of fkst strand synthesis from this structure proceeding towacds the 12s rRNA gene, it has been suggested that this structure in Drosophila is the functional equivalent of the vertebrate OL-contriinhg sequence (Clary and Wolstenholme, 1987; Wolstenholme, 1992). In A. pisum. the hairpin structure is lacated immediately adjacent to the tRNAilt gene whereas in Drosophila, the structure is some 250 nt downstream from the tRNAi1r gene. Although the pentanucleotide transition sequence 5' CGGCC 3' located at the base of the stem of the rnammalian OL is absent in A. p i s m and Drosophila, some other nucleotide sequence may function as the transition site. It is therefore very plausible that this structure is the origin of second strand synthesis in pea aphid mtDNA. Analysis of secondary structure in the mtDNA genome of A . quadrimaculatus and A. mellifera has not revealed che presence of the putative second strand origin stem and T-rich loop structure (Mitchell et al., 1993; Crozier and Crozier, 1993). The loop portion of A. pisum's putative second strand origin also contains a pentanudeotide sequence (CCAAT) identified as the "CAAT" box (Efstradiadis et al., 1980). This sequence has been shown to be important in transcriptionai promotion of mammalian alpha and beta globulin genes (Dierks et al., 1983). Subsequent research bas identified CCAAT trmscriptional factors (CTF)that bind DNA in a sequence specific rnanner and facilitate either the initial binding of RNA polymerase to a promoter, or the isomerization of the RNA poIymerase/promoter complex into an open configuration (Mcbight and Tijan, 1986). However, this pentanucleotide sequence is absent in both D. ynktrba and A. quadrimnculat~ismtDNA, but is found once in the A+T nch region of A. nrdl.$era, although it is not associated with any secondary structure. Moreover, since this "CCAAT" sequence has not been documented in any other mitochondrial genome, it is possible that its occurrence in pea aphid and honeybee is a coincidence. Further anaiysis is necessary to estabiish any functional importance of this sequence. Two other hairpin and laop structures were found downsueam of the putative second strand origin and upstream of the commencement of the 123 nt cepetitive sequence in pea aphid (Figure 3). However, the function of these secondary structures is currently unknown. Among vertebrate 0 H 9 s , a secondary structure is found oniy in humans (Attardi et al., 1978; Chang et al., 1985). [t is therefore conceivable that one of these other hairpin loops could represent the origin of first strand synthesis in A. pisrrrn. In Drosophiln, an approximately 300 nt conserved sequence elernent adjacent to the tRNAiie gene is involved in a site specific protein-DNA interaction. This area was shown to be protected from crosslinking with 4,5', 8- trimethylpsoralen in both D. melnnugaster (Potter et ai-. 1980) and in D. virilis (Pardue et al., 1984) and ovedaps the origin of DNA replication in the three species in wfiich it has been mpped by electron microscopy studies (Wolstenholme et al., 1983). in A. gambiae, a tocal of seven stem and loop motifs are present in the A+T rich region. However, no functionai relevance has yet been ascribed to any of these structures (Beard et d., 1993). Neverthelas, in spite of some sirniiarities to other documented ongins of mitochondrial replication, the functional relevance of these hairpin Ioop motifs in A. pimm rernain tentative until experimentai studies c o n f i their rote in replication. Another sequence motif present in the A+T rich region of the pea aphid that may have functional relevance is a 49 nt sequence consisting mostly of polypyrimidine runs. It begins approximately 25 nt downstream from the putative second strand replication ongin. A run of T's, aithough only 10 nt in length, may be analogous to the poly-T runs identified in Drosophila (Clary and Wolstenhoime, 1987; Monforte et al., 1993; Lewis et d.,1994). Members of the genus Drosopkila have two poly-T tracts at the ends of the A+T rich region. One 19 to 25 poly-T sequence flanks the 300 nt conserved element and the other 13 to 17 poly-T sequence is located on the complementary strand adjacent to the 12s rRNA gene. The position and orientation of the two conserved poly-T segments present in Drosophila suggest a role in promoting both transcription and replication (Lewis et al., 1994). In both nuclear and viral systems, thyrnidylate homopolymers have been shown to function as core elements in DNA replication, and as trmscriptional activators (Campbell, 1986; Delucia et al., 1986). Lastly, the uea spanning nt 265 to nt 349 of the A. pisuni A+T rkh region (Figure 3) has a rather elevated G+C content of 40%. This does lower the overail A+T content of the pea aphid A+T rich region to ievels less than the expected values detected in A . mellifem (96%), A. quc~drirnaculatiis(94%) or D. yakuba (92.8%). Given the extreme A+T bias observed in A. pislim gene coding regions, it would be highly unlikely that this region of elevated G+C content is devoid of the "A+T pressure". Therefore in ail likelihwd, there could be some type of Functionality (transcriptionai promotion?) associated with this area, restricting its susceptibility to increased A+T content. Length variation and heteroplasmy Attardi's (1985) assertion that the animal mitochondriai genome is "an extremely economical unit" is a broad generdization, that although based on the limited number of rntDNA genomes available at the time, is in many facets stdl accurate. Intense selection for small size and invariable structure has been a primary dnving force throughout mtDNA evolution (Attardi, 1985; Brown, 1985). The mtDNA of many organisrns contains only the full complement of protein, ribosomal and tRNA coding seqences with few superfluous noncoding sequences. Coding regions are ofien directiy adjacent to one another, separated by a few intergenic bases, or in some circumstances, overlapping. However, the number of fully sequenced mtDNA genomes has greatly increased over the 1 s t decade. Numerous examples of mtDNA genome size variation and heteroplasmy have now been chancterized, making it necessary to revise some of Attardi's original views. The pea aphid is but another example of the fiuidity in mtDNA genome size and heteroplasmy. Two length variable regions have been identified in the pea aphid mtDNA genome. Three length variants ( 1,2,3) differing in size by multiples of approximately 120 bp were found in region one (Barrette et a1.,1994). Nearly 80% of the pea aphids screened contained the largest R1-3 size class (Appendix lb) and none of the clones were found to be heteroplasrnic. An additional sixteen A. pisum clonal lineages were screened for their TaqI restriction enzyme pattern (data not shown). One lineage was heteroplasmic at region one (R 1- 1 & R 1-3) and subsequently used as template for PCR amplification. DNA sequence analysis revealed that region one is located within the A+T rich region of the genome and that two potential mechanisms are responsible for the size fluctuation. The Fust mechanism involves the truncation of three tRNA genes (methionine, glutamine and isoleucine) and the hairpin loop structure identified in this report as the putative ongin of second strand mtDNA replication. Although the absence of tRNA genes is documented in the sea anemone M. senile (Wolstenholme, 1992), it is highly unlikely that this is a viable explanation for pea aphid size variation. Since region 1 comprises three size categories, it would require the deletion of two tRNAs to account for each size class shift. Different size classes would therefore have a distinctive complement of tRNA genes. Moreover, the total size change between the large and small PCR products is nearly 50 bp greater than the expected 240 bp based on restriction enzyme analysis. Lastly, the small PCR product begins its sequence alignment with the larger PCR product at a pentanucleotide locale that is a perfect match for the 3' end of the tRNAmet-lc PCR primer. Therefore, in di likelihood, the smdier fragment is a PCR mispriming artifact. The second mechanism is the more plausible explanation for the observed size variation detected within the pea aphid A+T rich region. This mechanism involves differences in the copy number of a 123 nt tandem repeat identified downstream from the putative second strand origin of replication. This locale contains one complete copy of a 123 nt sequence, a second copy of 122 nt in which a T has been deleted from a poly-T stretch plus a third copy ( h h e s t from the 12s rRNA gene) that is tmncated by 35 nt at its 5' end. Ail available data suggest that length heterogeneity is due to variation in the number of this repeat sequence, but this cannot be confmed until the other two (RI-1 & Rl-2) size classes of mtDNA molecules are cloned and sequenced. This has been an extremely difficult process. Numerous cloning attempts using both PCR-mediated routes or direct cloning of restriction digested ultrapurified mtDNA have thus far been unsuccessfu~(data not shown). An alternative method to cloning and sequencing involves Southem probing. The thirty-four base (5' AGTTGAATTTTTAATTAATTTATATATAAATATC 3') oligonucleotide, complementary to the 123 nt repeat, could be P32 end-labelled and then used for Southem probing of slot blots containing the various pea aphid mtDNA size classes. Scanning densitometry measures of band intensity would quantify the number of repeating units present in each size category. Nevertheless, the most logical explanation for length variation within the A+T rich region is variation in the number of 123 nt repeat motifs. The presence of tandedy repeated motifs in the noncoding region of metazoan mtDNA is common and has been documented in numerous invertebrates including sea scallops (Snyder et ai., 1987; La Roche et al., 1990; Gjetvaj et al., 1992), cricket (Rand and Harrison, 1989), bark weevils (Boyce et al., 1989), honeybee (Comuet et al., 199l), fruit fly (Monforte et al., 1993; Pissios and Scouras, 1993; Lewis et al., 1994) and nematodes (Okiomoto et al., 1991). Two size variable regions have been identified in the mtDNA molecule of the bird cherry oak aphid R. padi (Martinez-Torres et al., 1996). A 113 bp tandemly repeated unit present in one to eight copies has been detected within the A+T rich region. However, unlike the pea aphid, 25% of the clones surveyed were heteroplasmic for this region. A second length variable locale differing by increments of about 100 bp is localized to an area close to the ND5 gene. Five size classes have thus far been detected with only 12% of the 148 surveyed R. padi clones heteroplasmic for this second local (Martinez-Torres et al., 1996). Several molecular mechanisms have been proposed to account for length variation in mtDNA: (1) transposition (Rand and Harrison, 1989); (2) intra and inter rnolecular recombination (Rand and Harrison, 1989) (3) slippage replication (Sueisinger et al., 1966; Efstradiadis et al., 1980; Levinson and Gutman, 1987; Buroker et al., 1990; Hayasaka et al., 1991; Wilkinson and Chapman, 1991; Amason and Rand, 1992). It has been documented that DNA can be transferred from the nuclear genome to the mitochondna by a protein (Vestweber and Schatz, 1989), which suggest the possibility of transposition in mitochondria. The presence of a small tandem repeat flanking a repetitive element (e.g., scallops; La Roche et al., 1990) is also suggestive of a transposable element. However, the absence of an open reading f r m e of substantial length and of short inverted repeats at the ends of the repetitive sequence do not support the hypothesis that it is a tramferable element (Calos and Miller, 1980). Furthemore, transposition mechanisms cannot generate insertions/deletions nor can they explain the fact that adjacent tandem repeats are more sirnilar to one another than extemal repeats. Overall, evidence for transposable elernents in animai mDNA is lacking (Brown, 1985; Moritz et al., 1987). Rand and Harrison (1989) proposed a mitochondrial recombination mode1 involving a double molecule intermediate as a possible mechanisrn for the generation of a variable number of tandem repeats (VNTR) in cricket mtDNA. Recombination between repeats in the sarne molecule would cause the formation of loops containing one or more repeats. Excision of the loop would create a mtDNA molecule that is shorter then the original. Recombination between repeats on different molecules could produce daughter molecules that were either shorter or longer than the parent molecules (Appendix 4a ). Although severd gene remangements due to sequence inversion and translocation in rnetazoan mtDNA have occurred, (Wolstenholme and Clary, 1985; Moritz et al., 1987; Jacobs et al., 1988) recombination had yet to be documented in animal ceIl mitochondria (Clayton, 1982; Brown, 1985: Hayashi et al., 1985; Moritz et al., 1987; Birky, 1991). Hence, modets such as those of Rand and Harrison (1989) postuIating the generation of length variation via recombination were believed to be unlikely. However, the minicircle by-products of recombination have recently been identified in the phytonematode, M. juvnniccz (Lunt and Hyman, 1997) suggesting that recombination mechanisrns may indeed be involved in the generation of tandem cepetitive regions. The currently favored mechanism for generating repetitive sequence polymorphism in both phnt and animal mtDNA involves the occurrence of insertion and deletion events via strand slippage during DNA replication (Streisinger et al., 1966; Efsuadiadis et al., 1980; Tautz et al., 1986; Levinson and Gutman, 1987; Buroker et al., 1990; Hayasaka et al., 1991; Wilkinson and Chapman, 1991; Amason and Rand, L992). New and longer repeat motifs cm be generated from the mispairing of short simple contiguous or noncontiguous di-, tetra-, and hexanucleotide repeat motifs (Levinson and Gutman, 1987) followed by polymerase- and endonuclease- mediated cepair. In the plant Oenothera, a 29 bp duplication present in the chloroplast DNA (cpDNA) was most likely caused by a single slippage event involving a 6 bp short direct repeat (5' GAAATA 3') present at the 5' end of the duplication and adjacent to its 3' end (Wolfson et al., 1991; refer to appendk 4b). Since the overall in-vivo DNA-replication rates of mtDNA and cpDNA polymerases are extremely slow, it has been suggested that much of their time in DNA synthesis is consumed by pauses (Kornberg, 1980). Hence, when the replication complex pauses while passing through the region destined to be duplicated, there is a localized melting of the newiy synthesized daughter strand from the template stmnd. if the sequence GAAATA on the nascent daughter s m d reanneals with the complement of the preceding GAAATA repeat on the template strand, continuation of the replication cornplex from the upstream repeat will create a duplication of the repeat sequence, Realignment of the daughter and template strands is believed to be stabilized by the formation of a hairpin loop structure whose loop contains the duplication. The stem portion of this hairpin structure is believed to be formed between the GAAATA repeat and a 6 nt complementary sequence Iocated just upstream of the duplication (refer to figure 4, Wolfson et al., 1991). Several length mutation "hot spots" have been detected at many dispersed locations throughout the Oenothera cpDNA genome. Al1 of these sites share the trait of multiple direct repeats suggesting that replication slippage has been of general importance in its evolution (Blasko et al., 1988). The illegitimate elongation model (Buroker et ai., L990) was proposed to explain the peneration of the 82 nt RS 1 (Le., those adjacent to tRNApr0 TAS sequences; Hoelzel et al., 1994) repeat in white sturgeon. Since each repeat was found to contain a TAS sequence, an individu1 with multiple copies of the repeat would have several different lengths of D-loop strands. Length mutations would originate through replication slippage brought about by a competitive equilibrium between the H-strand and the D-loop s t ~ n for d base pairing with the L-strand. Misalignment of the repeat region could easily occur if the D-loop was partially displaced by the H-strand and then reinvaded at a different repeat locale. Since one or more single-stranded repeat units c m form thermodynamically stable hairpin loop structures, the likelihood of rnisalignment is drarnatically increased by the shortening of the length of either the D-loop strand, H-strand or L-strand. If a misaligned D-loop strand is extended into a nascent H-strand during replication, gains or losses of a repeat unit wouId occur. The heteroduplex molecule, with one or more repeat units unpaired and stabilized by the interna1 base pairing capability of the repeats, would be resolved at the next round of replication (Buroker et al., 1990). Although the competitive displacement model is applicable to repeat sequence units adjacent to the TAS sequences as found in cod (Johansen et al., 1990), evening bat (Wilkinson and Chapman, 1991), frog (Roe et ai., 1985) and sturgeon (Buroker et al., 1990), it cannot be generalized for the many other documented cases of Iength variable m y s Iocated between the OHand the LSP (RS 3,4 & 5; Hoelzel et al., 1994). For example, the repeat motifs of the platypus are Iocated between the major transcriptional promoters and the OH of the control region (i.e., CSB-2 and the tRNAphe gene) (Gemme11 et al., 1996). In platypus and other similar cases, the length variation is believed to result from rnisali,ment errors in the initial RNA priming event that precedes DNA replication. Misdignment, facilitated by the propensity of repeat units to f o m stable secondary structures, could result from the looping out of repeat units. Looping out of repeats in the template strand would decrease the length of the RNA primer, while looping out of repeats in the primer strand would result in an increase in the length of the RNA primer. Increases or decreases in the RNA primer length would be accompanied by a subsequent change in the overall length of the molecule when replication proceeds to completion (Gernrneii et al., 1996). Whether the length variants are generated by iliegitimate termination (Buroker et al., 1990) or by illegitimate prirning (Ghiviwani et al.-1993; Gemmeil et ai., 1996). it is quite evident that misalignment of the strands is promoted by the propensity of repeat sequences to form thermodynamicaily stable structures with free energies of formation (AG0) that decrease linearly with repeat number. The secondary structure free energy profiles of the 8 1 nt repeat within evening bats (Wilkinson and Chapman, 1991) have a AG0 of -25.4 kcal/moI for 2 repeats, -40.9 for 3 repeats, -51.2 for 4 repeats and -64.4 for 5 direct repeats. Analyses of nucleotide sequence polyrnorphisms within repetitive arrays have often revealed a pattern of conceaed evolution (Rand, 1994). Homogeneity of repeat sequences within individuais but heterogeneity between individuals or different species is indicative of this evolutionary proçess (see Solignac et ai., 1986; Wilkinson and Chapman, 1991; Broughton and Dowling, 1994; Rand, 1994; Stewart and Baker, 1994a, 1994b; Yang et d.,1994; Fumagalli et ai., 1996). Restriction site analyses from both the D.melmzogasrer groups (Solignac et al., 1986) and the D, obscurs groups (Monforte et ai., 1993) have revealed that the conserved and variable hdves of the AtT rich region evolve in concert as a result of duplication and deletion events involving the two types of repeated elements (Lewis et al., 1994). New polymorphisms are generated by either point mutation or insenions/deletions, then swept through the contiguous array of repeats by repeated cycles of expansion and contraction of the region leading eventually to the homogenization of repeats (Fumagalli et ai., 1996). The "edge effect" (Rand, 1994) in concerted evolution has been defiied to describe the situation in which the repeats at the ends of a tandem m a y are more deviant from the consensus repeat sequence than are repeats in the middle of the array. Moreover, repeats at the 3' end of the array not only show higher divergence than repeats at the 5' end, but most often possess a single "imperfect" copy or a highly divergent copy of the petiti ive element at the 3' end of the array that rarely undergoes duplicationldeletionevents (Furnagaili et al., 1996). This pattern of variation has been obsemed in the mtDNA of elephant seals (Hoelzel et al., 1993). rabbit (Mignotte et ai., 1990), severai species of carnivores (Hoelzel et al., 1994) and shrews (Fumagalli et al., 1996). The orientation of the pea aphid tandem array is such that the truncated repeat is at the end of the array, discal to the 12s rRNA gene. The three R1 size classes identified (Barrette et al., 1994) differ oniy by increments of approximately 120 nt with no size class differing by 88 nt (tnincnted repeat). Thus, this tmncated repeat may be the 3' terminai repeat, suggesting that the region 1 repetitive array is the result of a replication slippage-like mechanism originating during replication of the second strand. Although the two full version repeats differ by a single Ac->G transition and a single T insertioddeletion, it will not be possible to address the issue of concerted evolution until a number of pea aphid clonal lineages are sequenced for the A+T rich region. This sequence information will also be valuable in assessing the Likelihood that a replication slippage-like process is the prirnaty mechanism by which this npetitive m y was generated. Further evidence favoring replication sLippage as the mechanism for generating the size variation in pea aphid mtDNA is obtained from the secondary structure anaiysis of the 123 bp repetitive element (Figure 6 a,b). The formation of the secondary structures, although not a prerequisite of slippage-like mechanisrns, enhances the rate at which this process occurs in both plasmid and bacteriophage genomes (Pierce et ai., 1991; Trinh and Sinden, 1991). The predicted s e c o n d q structure of the single 123 nt repeat is thennodynamicaily quite stable and stability is more than doubled with two repeating elements. Since the polymerization rate of mtDNA polymerase is extremely slow (-270 ntlminute) and DNA synthesis is delayed by numerous pauses (Kornberg, 1980), there could be sufficient time for the repeat(s) to fold into a secondary conformation. If the misalignment occurred in the nascent strand, slippage would resuit in duplication whereas a deletion would result if the template strand repeat(s) folded. Two octanucleotide repeat sequences, separated by 8 nt, are adjacent to the r3 repeat nearest the 12s rRNA gene (Figure 3). This same octanucleotide, although absent from the remainder of the A+T rich region, is present one time in each of the three repeats and could have served as anchor points for slippage misalignment processes. Levinson and Gutman (1987) suggest that new and longer repeat motifs can be generated from the mispairing of short contiguous or noncontiguous di-, tetra-, hexanucleotide repeat elements. Therefore, it is plausible that this 5' TTAAAAAT 3' sequence could have been the onginal starting point from which larger repeating units evolved until some threshold or optimum was reached, possibly based on secondary structure, and the pea aphid 123 nt repeating unit w u generated. Lastly, support for slippage-based models is provided by the fact that propagation of some cloned tandemly repeated sequences in Escherichia coi has generated both losses and gains of units in the tandem airay (Ghivizzani et al., 1993). Madsen and colleagues (1993) analyzed a repeat domain present in the mitochondrial genome of pigs. This domain, located at the 5' end of the D-loop, is composed of 14 to 29 copies of a 10 bp selfcomplementary tandemly repeated sequence. initiai characterization of a heteroplasmic individual revealed that most variants differed in length by two (20 bp) repeating units. However, after propagation of the recombinant phsrnids containing the repetitive domain in E. coli, new variants differing in length by multiples of one (10 bp) repeat unit were generated. To test the hypothesis that the variants were induced by siippage, Madsen et al. (1993) performed a series of in vitro primer extension experiments on both single and double stranded templates containing the repeat domain. They were able to correlate repeats generated in vitro with those seen in both the mitochondria and bacteria, thus providing strong evidence that replication slippage is responsible for a major class of mammalian mutations (Madsen et al., 1993). Restriction site analysis of pea aphid rntDNA reveded a second region of length heterogeneity (Barrette et al., 1994). This region 2 length variation, localized to an area spanning the ND3 and ND5 genes, was found to contain six size class variants (1,2,3,4,5,6) differing by increments of 210 bp. Numerous attempts to PCR amplify region 2 for cloning experiments using ND4 primers [#10 & #12] in combination with CODI primers [#5 & #3] (Table 8; Barrette et d.,1994) were unsuccessFul. Barrette et al. (1994) suggested that this may be due to the inversion of the Ml4 gene. Aithough this possibility cannot be fully dismissed until this region is completely sequenced, partial sequencing of a small portion of the ND4 gene amplified with primers #12 & #11 has suggested it is in the same orientation as the ND4 gene of D. yakubn. Moreover, none of the fully sequenced insect mtDNA genomes exhibit protein-coding gene order changes. AU modifications with respect to gene order have been due to reshuffling a srnail number of tRNA genes (Clary and Wolstenholme, 1985; Crozier and Crozier, 1993; Mitchell et al., 1993; Beard et al., 1993). Numerous attempts to amplify region 2 using the Expandn' PCR system (Boehringer) and various primer combinations were also unsuccessiül. It is possible that this failure is the result of intense secondary structure(s), possibly associated with the repeating units, that prevent the Taq polyrnerase enzyme from proceeding unimpeded dong the pea aphid DNA template. An attempt was dso made to clone fragments of pea aphid mtDNA generated with restriction enzymes. Plasmid recombinants containing other regions of the pea aphid mtDNA molecule were obtained in these experiments. However, no inserts containing region two were recovered. Sorne of the plsrnid recombinants containing inserts of the size expected from restriction site rnapping were sequenced and found to be extremely A+T rich, with numerous poly-A and poly-T runs (data not shown). A search of GenBank DNA data base reveded no sequences with substantid similarity to the ones obtained from the pea aphid clones. It is possible that some feature(s) of region two of the pea aphid mtDNA molecule, such as the secondary structures suggested above, destabiiizes the fragment of interest, rendering it unstable in recombinant plasmids md thus making it difficult, if not impossible to clone. The most plausible explanation for size variation within region two is the presence of a tandernly repeated element of approximately 210 nt in length. The largest size class (R2-6) will likely contain a minimum of six copies of the approximately 2 10 nt repeat motif with each successively smailer size class containing one less repetitive unit. Each individual unit will likely be capable of forming a stable self-annealed secondary structure, with increases in the length of the tandem may leading to proportionally Iarger increases in the thennodynamic stability of the array. The distribution of mtDNA size variation is governed by three primary evolutionary forces (mutation, selection and genetic drift) and c m be viewed as an equilibrium between the forces that generate the variation (slippage or recombination) and the forces that reduce it (selection and drift) (Rand, 1993). The observed patterns of A. pisum length variation and heteroplasmy are considerably different between region one and region two (Barrette et al., 1994). Virtually al1 pea aphid clonai lineages were homoplasmic for region one while 13 of 35 clones were heteroplasrnic at region two. Temporal stability of region one and region two was assessed by isolating mtDNA from four heteroplasmic clones after a two year intemal (Le., approximately 30 generations) of laboratory culture. Each of these clones maintained their phenotype at region one and also sustained a stable pattern of heteroplasmy at region two (Appendix Id; Barrette et al., 1994). However, scanning densitometry measures of band intensity revealed a substantiai change in the relative proportion of region two variants within one clonal Lineage. The primary events involving the generation of length variation take place in the femde germ line and involve mtDNA replication and the partitioning of mitochondria at cytokinesis (Solignac et al., 1987). If at the time of mtDNA replication an error generates a size change in repeat copy number, a rnixed or heteroplasmic population of size variants will have been produced. Heteroplasmy is an obligatory transitory stage that is not indefinitely maintained and can eventually lead to a homoplasmic ceii line that contains the DNA mutation. The levels of heteroplasmy for length variants can be explained (Birky et al., 1983) by mutation rates (insertions/deletions) relative to the mean time required to elirninate the so-generated diversity rhrough vegetative segregation in the germ line. The effective population size of mitochondria through the germ line is likely to be speciesspecific since it is affected by the number of celi divisions per generation and the number of mitochondria transmitted to daughter cells (Rand and Harrison, 1986). Ln D. melanognster, development of the germ line begins with the individuaiization of 10 to 18 pole celis with two to three divisions giving rise to forty to sixty pole cells. It is believed that approximately eight of these cells rnigrate to the presumptive gonads where they begin a second round of three or four divisions leading to the stem cells of the ovarioles. The number of stem ceils varies from one to five per ovariole. Hence, there are from seven to nine random samplings of mitochondria from the original egg to the first offspring (Solignac et ai., 1984, 1987) and therefore numerous opportunities to son out any mutations that have occurred within the parent by stochastic partitioning. MtDNA size variants are dso not usuaiiy viewed as neutrd (Hale and Singh, 1986; Rand and Harrison, 1986; MacRae and Anderson, 1988; Wallace, 1989) and it has often been argued that selection x t s to keep the molecule compact as h a been reported in D. martritiann (Solignac et ai., 1984) and in the cricket G.finnus (Rand and Harrison, 1986). Studies of the inheritance of heteroplasmic mtDNA from mother to offspnng suggest that in animals, smaller rntDNAs can have an advantage in transmission over Longer rntDNAs (Solignac et al., 1984, 1987; Rand and Harrison, 1986). Wild-type strains of D. rnelnnogasrer exhibit a skewed distribution pattern of Iength variants in a manner suggesting selection for srnaller mtDNAs (Haie and Singh, 1986, 199 1). Observations such as these suggest that selection for small size has been an important force in the evolution of the rnitochondrial genome (Wallace, 1982; Attardi, 1985; Rand and Harrison, 1986, 1989). in animal mtDNA replication, the synthesis of a 16 kb daughter strand takes about one hour to accomplish (Clayton, 1982). This represents a polymerization rate of approximately 270 nucleotides per minute. in a heteroplasmic cell, a mitochondrid genome that is a few hundred nucleotides shorter than anocher mtDNA could have a temporal advantage in a "race for replication" (Rand, 1993). Given that mtDNA molecules can be selected at random throughout the ce11 cycle (Bogenhagen and Clayton, 1977), this could effectively lead to an increase and eventual fixation of smaller mtDNAs in the ceil (Rand, 1993). Although the larger mtDNA may be at a temporai disadvantage in the replication race with smaller mtDNA, selective advantages for larger mitochondriai genornes could corne from additional repeated sequences that provide "attractive*'conformations for more efficient binding of mtDNA polymerase (Rand, 1993). This could Iead to an advantage in the initiation of replication that could, in turn, outweigh the disadvantage associated with polymerizing additional nucieotides. One explmation for the observed pattern of pea aphid mtDNA size heteroplasmy in region two and its virtual absence in region one coupled with the observed temporal changes detected in region two, is that region two is considenbly more mutationally active than is region one. The mutation rate for indels h a been estimated at 1 0 4 for crickets (Rand and Harrison, 1986) and 10-2 for bats (Wilkinson and Chapman, 1991). It would be reasonable to assume that the mutation rate of pea aphid region two must be somewhere in this range while the rate of region one would be considerably lower then the 10-4 estimated for crickets. Until the molecular basis for region two length variation is characterized through DNA sequence anaiysis, one c m oniy postulate that some feature, possibly one involving the conformationai state of the repetitive element, enables region two repeat(s) to foId much more quickly or effectively then the region one repeats. Nternatively, replication could be intempted somewhere within the region two domain leavinp the template and nascent strands single stranded and the repeated sequences with ampIe time to fold on themselves, thus providing the opportunity for the occurrence of slippage events. Another explanation could involve the intensity of selection acting on the length variants between the two regions. As discussed eulier, Rand (1993) has suggested that additionai repeat sequences m y provide a more appealing conformationai binding site for the mtDNA polymerase enzyme. This conformationai "attractiveness" could be sufficient COooutcornpete the ili effects associated with increased mtDNA length in the race for repkation. This could possibly explain the observation that the most prevalent size class for boeh region one and two in A. pisilm mtDNA is not the smailest size variant but the Iargesc and median, respectively. This study has begun to characterize the intricacies of another insect mtDNA genome. Although the arnount of pea aphid mtDNA sequence data is Limited, it is evident that this genome is comparable in nature to the other fully sequenced insect mitochondrid genomes. It appears to be very s W a r to Drosophila, Apis and Anopheles in not only gene organization, but also A+T nucleotide bias. It is quite possible that the A+T content of the pea aphid mtDNA genome will parallel or surpass that of the highest A+T content measured CO date, the honeybee. The A+T rich region is the proposed control centre for both transcription and replication events of insects. As detected in Drosophila, a T-rich loop and hairpin structure, analogous to the ongin of iight strand replication origin of mrimmals, is present in pea aphid. However, unlike the extensive studies characterizing the mechanics of vertebrate control region replication and triiscription, Little work of this nature has thus far been pursued within insects. Comparative analysis of various insect A+T rich regions has shown little to no sequence similarity to vertebwtes. It has therefore k e n suggested that the initiation signals and quite possibly the overall mechanics of replicational and transcriptional events of insects, including those of A. pisum, are difierent from those of vertebrates. Two length variable regions are present in pea aphid mtDNA. Region one has been sequenced and is likely the result of a 123 nt tandem repeat. However, this will only be confirmed once a number of region one size class variants have been sequenced. Region two has thus far eluded al1 attempts at cloning and subsequent characterization. Given the short generation time of pea aphids and the ease with which pea aphids can be cultured, obtaining sufficient quantities of mtDNA is relatively simple. Therefore, the pea aphid can be quite usefd as an insect mode1 system for characterizing the molecular mechanics involved in not only transcription and replication, but aiso those involved in generating length variation. REFERENCES Aloni, Y., and G. Attardi. 1971. Syrnmetricd in vivo transcription of mitochondrial DNA in HeLa celis. Proc. Natl. Acad. Sci. USA 68: 1957-1961. Anderson, S., M.H. L. DeBruijn, A.R. Coulson, I.C. Eperon, F. Sanger, and I.G. Young. 1982. Complete sequence of bovine mitochondd DNA. J. Mol. Biol. 156: 683-717. Arnason, U., and E. Johnsson. 1992. The complete mitochondrial DNA sequence of the Harbour Seal, Phoca vitdina. J. Mol. Evol. 34: 493-505. Arnason, E., and D.M. Rand. 1992. HeteropIasmy of short tandem repeats in mitochondrial DNA of Atlantic cod, Gadus morhua. Genetics 132: 21 1-220. Attardi, G. 1985. Animal mitochondriai DNA: an extreme example of genetic economy. Int. Rev. Cytol. 93: 93-145. Attardi, G., S.T. Crews, J. Nishigushi, D.K. Ojala, and J.W. Posakony. 1978. Nucleotide sequence of a Gagment of HeLa-ceIl mitochondrial DNA containing the precisely localized origin of replication. Cold Spring Harbor Symp. Quant. Biol. 43: 179192. Azevedo, J.L.B., and B.C. Hyman. 1993. Molecular characterization of lengthy mitrichondrial DNA duplications from the parasitic nematode Romanomemis ndicivorar. Genetics 133: 933-942. Barrette, R.J., T.J. Crease, P.D.N. Hebert, and S. Via. 1994. Mitochondrial DNA diversity in the pea aphid Acyrthosiphon pisum. Genome 37: 858-865. Beard, C.B., D.M. Hamm, and F.H. Collins. 1993. The mitochondrial genome and of the mosquito Anopheles gambiae: DNA sequence, genome O-nization. comparisons with mitochondrial sequences of other insects. Insect Mol. Biol. 2: 103-114. Birky, C.W. 1991. Evolution and population genetics of organelle genes: mechanisms and models. Pp. 112-134 in R.K.Selander, A.G. Clark, and T.S. Whittam, eds. EvoIution at the molecular level. Sinauer, Sunderland, Mass. Birky, C.W., T. Maruyama, and P. Fuerst. 1983. An approach to population and evolutionary genetic theory for genes in mitochondria and chloroplast, and some results. Genetics 103: 5 13-527. Blackman, R.L., and V.S. Eastop. 1994. Aphids on the world's trees: an identificationand information guide. CAB International, University Press, Cambridge. Blasko, K., S.A. Kaplan, K.G. Higgins, R. Wolfson, and B.B. Sears. 1988. Variation in copy number of a 24-base pair tandem repeat in the chloroplast DNA of Oenothera hookeri strain Johansen. Curr. Genet. 14: 287-292. Bogenhagen, D., and D.A. Clayton. 1977. Mouse L ce11 mitochondrial DNA moIecuIes are selected randomly for replication throughout the ce11 cycle. Ce11 11: 719727. Boursot, P., H m Yonekawa, and F. Bonhomme. 1987. Heteroplasmy in mice with deletion of a large coding region of mitochondnal DNA. Mol. Biol. Evol. 4: 46-55. Boyce, T.M., M.E. Zwick, and C.F. Aquadro. 1989. Mitochondrial DNA in the bark weevils: size structure and heteroplasmy. Genetics 123: 825-836. Broughton, EDED, and T.E. Dowllng. 1994. Length variation in the mitochondrial DNA of the minnow Cyprinella spilopsina. Genetics 138: 179-180. Brown, G.G., a n d LDJm Des Rosiers. 1983. Rat mitochondrial DNA polyrnorphisms: sequence analysis of a hypervariable site for insertions or deletions. Nucleic Acids Res. 11: 6699-6708. Brown, G.G., GD Gadaleta, G. Pepe, C. Saccone, and E. Sbisi. 1986. Structural conservation and variation in the D-loop containing region of vertebrate mitochondrial DNA. J. Mol. Biol. t 92: 503-5 11. Brown, W.M. 1985. The mitochondriai genome of animais. in: MacIntyre R J . (ed) Molecuiar evolutionary genetics. Plenum, New York, 95- 130. Buroker, NmE.9 J.R. Brown, T.A. Gilbert, P.J. OYHara, A.T. Beckenbach, W.K. Thomas, and M.J. Smith. 1990. Length heteroplasmy of sturgeon mitochondrial DNA: an illegitimite elongation model. Genetics 124: 157- 163. Calos, M.P., and J.H. Miller. 1980. Transposable elements. Ce11 20: 579-595. Campbell, JmL. 1986. Eukaryotic DNA replication. Ann. Rev. Biochem. 55: 733-771. Cantatore, P., M. Roberti, G. Rainaldi, M.N. Gadaleta, and C. Saccone. 1989. The complete nucleotide sequence, gene organization. and genetic code of the mitochondriai genome of Pnracentrotcts Zividns. J. B iol. Chem. 264: 10965-10975. Cantatore, P., and C. Saccone. 1987. Organization, structure and evolution of mamrnalian mitochondnai genes. Int. Rev. Cytol. 108: 149-207. Chang, D.D.9 and D.A. Clayton. 1984. Precise identification of individual promoters for transcription of each strand of human mitochondrial DNA. Ce11 36: 635643. Chang, D.D., and D.A. Clayton. 1985. Priming of human mitochondrial DNA replication occurs at the Iight strand promoter. Proc. Natl. Acad. Sci. USA 82: 35 1-355. Chang, D.D., and D.A. Clayton. 1987. A mamrnalian mitochondrial RNA processing acùvity contains nucleus-encoded RNA. Science 235: 1178-1184. Chang, D.D., W.W. Hauswirht, and D.A. Clayton. 1985. Replication priming and transcription initiate from precisely the same site in mouse mitochondnai DNA. EMBO J. 4: 1559-1567. Chomyn, A., and G. Attardi. 1987. Mitochondrial gene products. Curr. Top. Bioenerg. 15: 295-329. Clary, D.O., and D.R. Wolstenholme. 1985. The rnitochondrial molecule of Drosophila yakuba: nucleotide sequence, gene organization, and genetic code. 1. Mol. Evol. 22: 252-271. Clary, D.O., and D.R. Wolstenholme. 1987. Drosophila mitochondrial DNA : conserved sequences in the A+T-rich region and supporting evidence for a secondary structure mode1 of the small ribosomai RNA. J. Mol. Evol. 25: 116-125. Clayton, D.A. 1982. Replication of animai mitochondriai DNA, Ce11 28: 693-705. Clayton, D.A. 1984. Transcription of the marnmaiian mitochondrial genome. Ann. Rev. Biochem. 53: 573-594. Clayton, D.A. 199la. Replication and transcription of vertebrate rnitochondrial DNA Ann. Rev. Cell. Biol. 7: 453-478. Clayton, D.A. 199tb. Nuclear gadgets in rnitochondrial DNA replication and transcription. Trends Biol. Sci. 16: 107-1 1 1, Clayton, D.A. 1992. Transcription and replication of animal mitochondriai DNAs. int. Rev. Cytol. 141: 2 17-232, Coote, J.L., G. Szabados, and T.S. Work. 1979. The heterogeneity of mitochondrial DNA in different tissues from the same animai. FEBS Lett. 99: 255-260. Cornuet, J.M., L. Garnery, and M. Solignac. 1991. Putative origin and function of the intergenic region between COI and COD of Apis mellifera L. mitochondriai DNA. Genetics 133: 393-403. Coté, J., and A.C. Ruiz-Carrillo. 1993. Prirners for mitochondriai DNA replication generated by endonuclease G. Science 26 1: 765-769. Crozier, R.H.,and Y.C. Crozier. 1993. The mitochondrial genome of the honeybee Apis mellifera: cornplete sequence and genorne organization. Genetics 133: 97- 1 17. Dawid, I.B., and A.W. Blackler. 1972. Materna1 and cytoplasmic inheritance of mitochondriai DNA in Xenopus. Dev. Biol. 29: 152- 161. de Bruijn, M.H.L. 1983. Drosophila melanogasrer rnitochondrial DNA, a novei organization and genetic code. Nature 304: 234-241. Delucia, A.L., D. Surnitra, K. Partin, and P. Tegtmeyer. 1986. Functionai interactions of the simian virus 40 core origin of replication with flanking regulatory sequences. J. Virol. 57: 138-144. Desjardins, P., and P. Morais. 1990. Sequence and gene organization of the chicken rnitochondrial genome. J. Mol. Biol. 2 12: 599-634. Desjardins, P., and R. Morais. 1991. Nucleotide sequence and evolution of coding and noncoding regions of a quail mitochondrial genome. J. Mol. Evol. 32: 1%- 16 1. Dierks, P., A. van Ooyen, M.D. Cochran, C. Dobkin, J. Reiser, and C. Weissmann. 1983. Three regions upstrearn from the cap site are required for efficient and accunte transcription of the rabbit beta-globin gene in mouse 3T6 ceus. Ce11 32: 695706. Dillon, M.C., and J.M. Wright. 1993. Nucleotide-sequence of the D-loop region of the sperm whale (Physeter macrocephalis) mitochondrial genome. Mol. Biol. Evol. 10: 296-305. Doda, J.A.,- C.T. Wright, and D.A. Clayton. 1981. Elongaion of displacementloop strands in human and mouse mitochondrial DNA is arrested near specific template sequences. Proc. Natl. Acad. Sci. USA 78: 6 116-6 120. Dunon-Bluteau, D.C., and G.M. Brun. 1987. Mapping at the nucleotide level of Xenopus laevis mitochondrial D-loop H strand: structurai features of the 3' region. Biochem Int. 14: 643-657. Efstradiadis, A., J.W. Posakony, T. Maniatis, R.M. Lawn, C. O'Connell, R.A. Spritz, J.K. De Riel, B.G. Forget, S.W. Weismann, J.L. Sightom, A.E. Blechel, O. Srnithies, F.E. Baralle, C.C. Shoulders, and N.J. Proudfoot. 1980. The structure and evolution of the humai beta-globin gene family. Ce11 2 1: 653-668. Fauron, C.M.R., and D.R. Wolstenholme. 1976. Structural heterogeneity of mitochondrial DNA molecules within the genus Drosophila. Proc. Natl. Acad. Sci. USA 73: 3623-3627. Fearnley, LM., and J.E. Walker. 1986. Two overlapping genes in bovine rnitochondrial DNA encode membrane components of ATP synthme. EMBO J. 5: 30032006. Fisher, C., and D.O.F. Skibinski. 1990. Sex-biased mitochondrial DNA heteroplasmy in the marine musse1 Mytilics. Proc. R. Soc. Lond. 242: 149- 156. Fisher, R.P., T. Lisowsky, M.A. Parisi, and D.A. Clayton. 1992. DNA wrapping and bending by a rnitochondnal high mobility group-like transcriptionai activator protein. J. Biol. Chem. 267: 3358-3367. Fisher, R.P., J.N. Topper, and D.A. Clayton. 1987. Promoter selection in human mitochondria involves binding of a transcription factor to orientation-independent upsueam regulatory elements. Ce11 50: 247-258. Fumagalli, L., P. Taberlet, L. Favre, and J. Hausser. 1996. Origin and evolution of homologous repeated sequences in the mitochondriai DNA control region of shrews. Mol. Biol. Evol. 13: 3 1-46. GadaIeta, G., G. Pepe, G. DeCandia, C. Quagliariello, E. Sbisa, and C. Saccone. 1989. The complete nucleotide sequence of the Rat~isnorvegicus ~tochondrial genome: cryptic signais revealed by comparative analysis between vertebrates. J. Mol. Evol. 28: 497-516. Gemmell, N.J., P.S. Western, J.M. Watson, and J.A. Marshall Graves. 1996. Evolution of the mammalim mitochondtial control region-cornparisons of the control region sequences between monotreme and therian mammds. Mol. Biol. Evol. 13: 798-808. Ghivizzani, S.C., S.L.D. Mackay, C.S. Madsen, P.J. Laipis, and W.W. Hauswirth. 1993. Trmscribed heteroplasmic repeated sequences in the porcine mitochondnal DNA D-loop region. J. Mol. Evol., 37: 3647. Gilbert, DG. 1990. LoopViewer, a Macintosh pmgram for visualizing RNA secondstructure. Published electronically on the Intemet, available via anonymous ftp to ftp.bio.indiana.edu. Gjetvaj, B., D.I. Cook, and E. Zouros. 1992. Repeated sequences and large-scale size variation of mitochondrial DNA: a common feature among scallops (Bivalvia: Pectinidae). Mol Biol. Evol. 9: 106-124. Goddard, J.M., and D.R. Wolstenholrne. 1978. Origin and direction of replication in mitochondrial DNA molecules from Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 76: 3886-3890. Goddard, J.M., and DaRo Wolstenholme. 1980. Origin and direction of replication in mitochondrial DNA molecules from the genus Drosophiln. Nucleic Acids Res. 8: 7417C7 Greenberg, B.D., J.E. Newbold, and A. Sugino. 1983. Intenpecific nucleotide sequence variability sunounding the origin of replication in human mitochondrial DNA. Gene 2 1: 33-49. Gyllensten, U., D. Wharton, A. Josefsson, and A.C. Wilson. 1991. Paternal inhentance of mitochondrial DNA in mice. Nature 352: 255-257. Hale, L.R.? and R.S. Singh. 1986. Extensive variation and heteroplasmy in size or mitochondnal DNA among geographic populations of Drosophiia melanogaster. Proc. Natl. Acad. Sci. USA 83: 88 13-88 17. Hale, L.R., and R.S. Singh. L99L. A comprehensive study of genetic variation in natural populations of Drosophila melanogaster,IV.Mitochondrial DNA variation and the role of history vs selection in the genetic structure of geographic populations. Genetics 129: 103- 117. Hauswirth, W.W., and P.J. Laipis. 1985. Transmission genetics of mmmalian mitochondria: a molecular mode1 and experimental evidence. 4 49-59 in Quagliarello E.. Slater E.C.,F. Palmieri, C. Saccone, and A.M. Kroon (eds) Acheivements and perspectives of mitochondrial reseûrch, Vol II: Biogenesis. Elsevier Science Publishen, New York. Hauswirth, W.W., M.J. Van de Walle, P.J. Laipis, and P.D. Olivo. 1984. Heterogeneous rnitochondriai D-loop sequences in bovine tissue. Ce11 37: 1001-1007. Hayasaka, K., T. Ishida, and S. Horai. 1991. Heteroplasrny and polymorphism in the major noncoding region of mitochondrial DNA in Japanese monkey: association with tandemly repeated sequences. Mol. Biol. Evol. 8: 399-4 15. Hayashi, J.I., 1. Tagashira, M.C. Yoshida. 1985. Absence of extensive recombination between inter- and intra-species mitochondnal DNA in mammûlian cells. Exp. Cell. Res. 160: 387-395. Heddi, A., P. Lestienne, D.C. Wallace, and G. Stepien. 1994. Steady state levels of mitochondrial and nuclear oxidative phosphorylation transcripts in Keams-Sayre syndrome. Biochim. Biophys. Acta. 1226: 206-2 12. Heie, O.E. 1981. Morphology and phylogeny of some Mesozoic aphids (Insecta: Hemiptera). Entomol. Scand. 15: 40 1-415. Heie, O.E. 1987. Palaeontology and phylogeny. Pp. 367-391 Ni Minks, A.K., and P. Harrewijn eds. 1987. Aphids, their biology, naturd enemîes, and conuol, World crop pests. Vol. 2A, Amsterdam; Elsevier. Hess, J.F., M.A. Parisi, J.L. Bennett, and D.A. Clayton. 1991. Impairment of mitochondrial transcription tecmination by a point mutation associated with the MELAS subgroup of mitochondrial encephalomyopathies. Nature 35 1: 236-239. Hixsoo, J.E., and W.M. Brown. 1986. A cornparison of the small ribosomal RNA genes from the mitochondrial DNA of the great apes and humans: sequence structure, evolution, and phylogenetic implications. Mol. BioL Evol. 3: 1- 18. Hixson, J.E., and D.A. Clayton. 1985. Initiation of transcription from each of the two human mitochondnai promoters requires unique nucleotides at the transcriptional start sites. Proc. Natl. Acad. Sci. USA 82: 2660-2664. Hoeh, W.R., K. A. Blakley, and W.M. Brown. 1991. Heteroplasmy suggest lirnited biparental inheritance of Mytihs mitochondrial DNA. Science 25 1: 1488-L490. Hoelzel, A.R., J.M. Hancock, and G A . Dover. 1991. Evolution of the cetacean mitochondrial D-loop region. Mol. Biol. Evol. 8: 475-493. Hoelzel, A.R., J.M. Hancock, and G.A. Dover. 1993. Generation of VNTRs and heteroplasmy by sequence tumover in the mitochondrial control region of two elephant seals. J. Mol- Evol. 37: 190497. Hoelzel, A.R., J.V. Lopez, G.A. Dover, and S.J. O'Brian. 1994. Rapid evolution of a heteroplasmic repetitive sequence in the mitochondrial DNA control region of carnivores. J. Mol. Evol. 39: 191-199. Hoffman, R.J., and W.M. Brown. 1992. A novel mitochondrial genome organization from the blue musse1 Mytilus edulis. Genetics 131: 397-4 12. Holt, I.J., A.E. Harding, and J.A. Morgan Hughes. 1988. Deletions of muscle mitochondriai DNA in patients with rnitochondnal myopathies. Nature 33 1:7 17-7 19. Hutchison, C.A.,III, J.E. Newbold, S.S. Potter, and M.H. Edgell. 1974. Matemal inheritance of m m a l i a n mitochondnal DNA.Nature (London)25 1: 536-538. Hyman, B.C., and J.L. Beck Azevedo. 1996, Similai- evolutionary patterning among repeated and single copy nematode rnitochondriai genes. Mol. Biol. Evol. 13: 22 1232. Hyman, B.C., and T.M. SIater. 1990. Recent appearance and molecular characterization of mitochondrial DNA deletions within a defined nematode pedigree. Genetics 124: 845-853. Jacobs, B.T., D.J. Elliot, J.B. Math, and A. Farquharson. 1988. Nucleotide sequence and gene organization of sea urcbin mitochondrid D M AJ.. Mol. Biol. 202: 185- 1 7 Jaeger, J.A., D.H. Turner, and M. Zuker. 1989a. Emproved predictions of secondary structures for RNA. Proc. Natl. Acad. Sci. USA 86: 7706-7710. Jaeger, J.A., D.H. Turner, and M. Zuker. 1989b. Predicting optimal and suboptirnal secondary structure for RNA. in "MoIecuIar evolution: cornputer anaiysis of protein and nucleic acid sequences", R.F. Doolittie ed. Methods in Enzymology 183: 28 1306. Jermiin, L.S., D. Graur, R.M. Lowe, and R.H. Crozier. 1994, Analysis and directional mutation pressure and nucleotide content in mitochondrial cytochrome b genes. J. Mol. Evo~.39: 160-173. Johansen, S., P.H. Guddal, and T. Johansen. 1990. Organization of the rnitochondriai genome of Atlantic cod, Godus murhua. Nucleic Acids Res. 18: 4 L 1-419. Johnson, W.G. 1899. The pea louse, a new and important econornic species of the genus Nectarophora. Sci. Am. 8 1: 325-326. Jukes, T,H., and V. Bhushan. 1986. Silent nucleotide substitution and G+C content of some rnitochondrial and bacteriai genes. I. Mol, Evol. 24: 39-44. Kornberg, A. 1980. DNA replication, W.H.Freeman, Sm Francisco. Kruse, B., N. Narasimhan, and G. Attardi. L989. Temination of transcription in hurnan rnitochondria: identification and purification of a DNA binding protein factor that promotes termination. Ce11 58: 391-397. Kunkel, T.A. 1985. The mutational specificity of DNA polymerase-alpha and gamma during in vitro DNA synthesis. I. Biol. Chem. 260: 12866-12874. Kunkel, T.A., and D.W. Mosbaugh. 1989. ExonucIeolytic proofreading by a mamrnalian DNA polymerase gamma. Biochemistry 28: 988-995. Lamb, R.J., and P.J. Pointing. 1972. Sexuai morph determination in the aphid, Acyrthosiphon pisum. J. insect Physiol. 18: 2029-2042. La Roche, J., M. Snyder, D.I. Cook, K. Fuller, and E. Zouros. 1990. Molecular characterization of a repeat element causing large-scde size variation in the mitochondnal DNA of the deep-sea scaiiop Placopecren rnagellanicus. Mol. Biol. Evol. 7: 45-64. Levinson, G., and C.A. Gutman. 1987. Slipped-strand rnispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. 4: 203-22 1. Lewis, D.L., C.L. Farr, A.L. Farquhar, and L.S. Kaguni. 1994. Sequence, organization, and evolution of the A+T region of Drosophila melanogaster mitochondnal DNA. Mol. Biol. Evol. 11: 523-538. Lewis, D.L., C.L. Farr, and L.S. Kaguni. 1995. Drosophila melanogaster mitochondrial DNA: completion of the nucleotide sequence and evolutionary comparisons. Ins. Mol. Bio. 4: 263-278. Low, R.L., J.M. Buzan, and C.L. Couper. 1987. The preference of the mitochondrid endonuclease for a conserved sequence block in mitochondrial DNA is conserved during mammalian evolution. Nucleic Acids Res. 14: 6427-6445. Low, R.L., O.W. Cummings, and T.C. King. 1988. The bovine mitochondrid endonuclease prefers a conserved sequence in the displacement loop repion of mitochondrial DNA. J. Biol. Chem. 262: 16146-16 170. Lunt, D.A. and B.C. Eyman. 1997. Animal mitochondrial DNA recornbination. Nature 387: 247-247. Mackay, S.L.D., P.D. Olivo, P.J. Laipis, and W.W. Eauswirth. 1986. Template-directed arrest of mammalian mitochondrial DNA synthesis. Mol. Ceil. Biol. 6: L261-1267. MacRae, A.F., and W.W. Anderson. 1988. Evidence for non-neutrality of mitochondrial DNA haplotypes in Drosophila pseudoobscurn. Genetics 120: 485-494Madsen, T.S., S.C. Ghizziani, and W.W. Hauswirth. 1993. In vivo and in vitro evidence for slipped-strand mispairing in rnammalian mitochondria. Proc. Natl. Acad. Sci. USA 90: 767 1-7675. Martens, P.A., and D.A. Clayton. 1979. Mechanisms of mitochondrial DNA replication in mouse L-cells: localization and sequence of the light strand origin of replication. J. Mol. Biol. 135: 327-35L. Martinez, D., A. Moya, A. Latorre, and A. Fereres. 1992. Mitochondrial DNA variation in Rhopalosiphum padi (Homoptera: Aphididae) populations from four Spanish localities. Ann. Entomol. Soc. America 85: 241-246. Martinez-Torres, D., J.C. Simon, A. Fereres, and A. Moya. 1996. Genetic variation in naturd populations of the aphid Rhpalosiph~impadi as revealed by maternally inherited markers. Mol. Ecol. 5: 659-670. McKnight, S., and R. Tjian. 1986. Transcriptional selectivity of viral genes in mammalian celis. Cell46: 795-805. Mignotte, E., M. Barat,and J.C. Mounolou. 1985. Characterization of a mitochondrial protein binding to single-stranded DNA. Nucleic Acids Res. 13: 17031716. Mignotte, E., M. Gueride, A.M. Champagne, and J.C. Mounolou. 1990. Direct repeats in the non-coding region of rabbit mitochondrial DNA. Eur. J. Biochem. 194: 56 1-57 1. Mitchell, S.E., A.F. Cockburn, and J.A. Seawright. 1993. The mitochondrid genome of Anopheies quudnmacukrtirs species A: cornplete nucleotide sequence and gene organization. Genome 36: 1058-1073. Monforte, A., E. Barrio, and A. Latorre. 1993. Characterization of the length polymorphism in the A+T-rich region of the Drosophila obscura group species. J- Mol. EVOL36: 214-223. Monnerot, M., M. Solignac, and D.R. Wolstenholme. 1990. Discrepancy in divergence of the mitochondrial and nuclear genomes of Drosophila teissieri and Drosophiln yakribu. J. Mol. Evol. 30: 500-508. Montoya, J., T. Christianson, D. Levens, M. Rabinowitz, and G. Attardi. 1982. Identification of initiation sites for heavy strand and light strand transcription in human rnitochondrial DNA. Proc. Natl. Acad. Sci. USA. 79: 7195-7 199. Montoya, J., G. Gains, and G. Attardi. 1983. The pattern of transcription of the human mitochondrïai rRNA genes reveais two overlapping transcription units. CeU 34: 151-159. Moran, N.A. 1992. The evolution of aphid life cycles. Ann. Rev. Ento. 37: 321-348. Moritz, C. 1991. Evolutionary dynarnics of mitochondrial DNA duplications in parthenogenetic geckos, Heteronoria binoei. Genetics 129: 22 1-230. Moritz, C., and W.M. Brown. 1986. Tandem duplications of D-loop and ribosomal RNA sequences in lizard tnitochondnai DNA. Science 233: 1425-1427. Moritz. C., and W.M. Brown. 1987. Tandem durikations in animal mitochondrial DNAs:.variation in incidence and gene content. ~rÔc.Natl. Acad. Sci. USA 84: 71837187. Moritz, C., T.E. Dowling, and W.M. Brown. 1987. Evolution of animal mitochondrial DNA: relevance for population biology and systematics. Ann. Rev. Ecol. Syst. 18: 269-292. Neefs, J.M., Y. Pui, D.R. Van de Peer, S. Chapelle, and R. De Wachter. 1993. Compilation of smail nbosomai subunit RNA structures. Nucleic Acids Res. 21: 3025-3049. Ojala, D.?and G. Attardi. 1977. A detailed physicai map of HeLa cell rnitochondrial DNA and its alignment with the positions of known genetic markers. Plasmid 1: 78-105. Ojala, D., C. Merkel, R. Gelfand, and G. Attardi. 1980. The tRNA genes punctuate the reading of genetic information in human mitochondrial DNA. CeIl 22: 393403. Ojala, D., J. Montoya, and G. Attardi. 1981. &NA punctuation mode1 of RNA processing in human rnitochondria. Nature 290: 470-474. Okiomoto, R., H.M. Chamberlin, J.L. Macfarlane, and D.R. Wolstenholme. 1991. Repeated sequence sets in mitochondrial DNA molecules of root knot nematodes (Meioidogyne): nucleotide sequences, genome location and potential for host-race identification. Nucleic Acids Res. 19: 1619-1626. Okiornoto, R.? J.L. Macfarlane, D.O. Clary, and D.R. Wolstenholme. 1992. The mitochondrial gemmes of two nematodes, Caenorhabditis elegans and Ascaris siuim. Genetics 130: 47 1-498. M.W., and L.S. Kaguni. 1992. 3'-5' exonuclease in Drosophiiu mitochondrial DNA poiymerase: substrate specificity and functional coordination of nudeocide polymerization and mispair hydrolysis. J, Biol. Chem. 267: 23 136-23142. Olson, Pardue, M.L., J.M. Fostel, and T.R. Cech. 1984. DNA-protein interactions in the Drosopiiiia virilis mitochondrial chromosome. Nucleic Acids Res. 12: 1991-1999. Parisi, M.A., and D.A. Clayton. 1991. Similarity of human mitochondrial transcription factor 1 to high mobility group proteins. Science 252: 965-969. Pierce, J.C., D. Kong, and W. Masker. 1991. The effect of the iength of direct repeats and ~ h epresence of palindromes on deletion between directly repeated DNA sequences in bacteriophage T7.Nucleic Acids Res. 19: 3901-3905. Pissios, P., and Z.G. Scouras. 1993. Mitochondrial DNA evolution in the montirirn-species subgroup of Drosophila. Mol. Biol. Evol. IO: 375-382. Potter, D.A., J.M. Fostel, M. Berninger, M.L. Pardue, and T.R. Cech. L980. DNA-protein interactions in the Drosophiiu rnelanogaster mitochondrial genome as deduced from trimethylpsoraien crosslinking patterns. Proc. Natl. Acad. Sci. USA 77: 41 18-4122. Potter, S.S., J.E. Newbold, C.A. Hutchison III, and M.H. Edgell. 1975. Specific cleavage andysis of mammalian mitochondrial DNA. Proc. Natl. Acad. Sci. USA 72: 4496-4500. Powers, T.O., S.G. Jensen, S.D. Kindler, C.J. Stryker, and L.J. Sandall. 1989. Mitochondrial DNA divergence among greenbug (Homoptera: Aphididae) biotypes. Ann. Entomol. Soc. 82: 298-302. Rand, D.M. 1993. Endotherms, ecotherms, and mitochondrial genome-size variation. J. Mol. Evol. 37: 281-295. Rand, D.M. 1994. Concerted evolution and RAPing in mitochondrial VNTRs and the rnoiecular geography of cricket populations. Pp. 227-245 in B. Schierwater, B. Streit, G.P. Wagner, and R. DeSdle, eds. Molecular ecology and evolution: approaches and applications. Birkhauser Verlag, Basel. Rand, D.M., and R.G. Harrison. 1986. Mitochondriai DNA transmission genetics in crickets. Genetics 114: 955-970. Rand, D.M., and R.G. Harrison. 1989. Molecular population genetics of mtDNA size variation in crickets. Genetics 121: 551-569. Reebeck, G.W., and L. Samson. 1991. lncreased spontaneous mutation and aikylation sensitivity of Escherichia cuti strains lacking the ogr 06-methylguanine DNA repair methyitransferase. J. Bacteriol. 173: 2068-2076. Reilly, J.G. and G.A. Jr Thomas. 1980. Length polymorphism, resviction site variation and matemal inheritance of mitochonclrid DNA of Drosophilcl melanogmfer. Plasmid 3: L09-115. Robberson, D.L., D.A. Clayton, and J.F. Morrow. 1974. Cteavage of replicating forms of mitochondrid DNA by EcoRi nuclease. Proc. Natl. Acad. Sci. USA 7 1: 4447-445 1. Roe, B.A., D.P. Ma, R.K. Wilson, and J.F. Wong. 1985. The complete nucleotide sequence of the Xenoprrs lnevis mitochondrial genome. J . Biol. Chem. 260: 9759-9774, Saccone, C.G., M. Attimonelli, and E. Sbisà. 1987. Stmctural elements highly presewed during the evolution of the û-loop containhg region in vertebnte mitochondrial DNA. J. Mol Evol. 26: 205-2 1L. Saccone, C.G., G . Pesole, and E. Sbisà. 1991. The main regulatory region of marnrnalian mitochondrial DDN: structure-function mode1 of evolutionary pattern. J. Mol. Evo~.33: 83-91. Schalier, H. 1978. The intergenic region and the origin of filamentous phage DNA replication. Cold Springs Harbor Symp. Quant Biol. 43: 40 i -408. Shoulbridge, E.A., G. Karpati, and K.E.M. Hastings. 1990. Deletion mutants are functiondly dominant over wild-type mitochondrial genomes in skeletal muscle fibre segments in rnitochondrial disease. Ce11 62: 43-49. Snyder, M., A.R. Fraser, J. La Roche, K.E. Gartner-Kepkay, and E. Zouras. 1987. Atypicd rnitochondrial DNA from the deep-sea scallop Placopecren mngellnnicus. Proc. Natl. Acad. Sci. USA 84: 7595-7599. Solignac, M.? J. Genermont, M. Monnerot, and J.C. Mounolou. 1984. Genetics of mtochondria in Drosopizila: inheritance in heteroplasmic strains of D. nîauritiana. Mol. Gen. Genet. 197: 183- 188. Solignac, Mr, J. Genermont, M. Monnerot, and J.C. Mounolou. 1987. Drusuphila mtochondrial genetics: EvoIution of heteroplasmy through germ line ce11 division. Genetics 117: 687-696. Solignac, M., M. Monnerot, and J.C. Mounolou. 1986. Concerted evolution of sequence repeats in Drosophila mitochondrial DNA. J. Mol. Evol. 24: 53-60. Southern, S.O., P.J. Southern, and A.E. Dizon. 1988. Molecular characterization of a cloned dolphin mitochondrial genome. J. Mol. Evol. 28: 32-42. Stewart, D.T., and A.J. Baker. 1994a- Patterns of sequence variation in the mitochondrial D-loop region of shrews. Mol. Biol. Evol. 11: 9-21. Stewart, D.T., and A.J. Baker. 1994b. Evolution of rntDNA D-loop sequences and their use in phylogenetic studies of shrews in the subgenus Otisorex (Sorex: Soricidae: Insectivora). Mol. Phylogenet. Evol. 3: 3846. Streisingner, G., Y. Okada, L. Emrich, J. Newton, A. Tsugita, E. Terzaghi, and M. Inouye. 1966. Frameshift mutations and the genetic code. Cold Spring Harbor Symp. Quant. Biol. 3 1: 77-84. Takahata, N., and T. Maruyama. 1981. A mathematicai mode1 of extra-nuclear genes and the genetic variabîiity rnaintained in a finite population. Genet. Res. 37: 29 1302. Tapper, D.P., and D.A. Claytoo. 1981. Mechanism of repIication of human mitochondriai DNA. Location of the 5' ends of nascent daughter strands. J. Biol. Chem. 256: 5 109-5115. Tautz, D., M. Trick, and G A . Dover. 1986. Cryptic sirnplicity in DNA is a major source of genetic variation. Nature 322: 652-656. Tijan, R. 1978. Protein-DNA interactions at the origin of replication of sirnian virus 40 DNA replication. Cold Springs Harbor Symp. Quant Biol. 43: 655-662. Trinh, T.K., and R.R. Sinden. 1991. Preferential DNA secondary structure mutagenesis in the lagging Strand of replication in E. coli. Nature 352: 544-547. Tullo, A., W. Rossmanith, E.M. Imre, E. Sbisa, C. Saccone, and R. Karwan. 1994. RNase rnitochondrid RNA processing cleaves RNA from the rat mitochondrial displacement loop at the origin of heavy-strand replication. Eur. .J. Biochem. 227: 657-662. Upholt, W.B., and I.B. Dawid. 1977. Mapping of mitochondrial DNA of individual sheep and goats: rapid evolution in the D-loop region. Cell Il: 571-583. Valverde, J.R., B. Batuecas, C. Moratilla, R. Marco, and R. Garesse. 1994. The complete mitochondriai DNA sequence of the crustacean Artemia franciscana. J. Mol. Evo~.39: 400-408. Vestweber, D., and G. Schatz. 1989. DNA-protein conjugates can enter mitochondria via the protein import pathway. Nature 338: 170- 172. Via, S. 1991. Specialized host plant performance of pea aphid clones is not altered by experience. Ecology 72: 1420-1427. Walberg, W.M., and D.A. Clayton. 1981. Sequence and properties of human KB ce11 and the rnouse L ce11 D-loop regions of mitochondrial DNA. Nucleic Acids Res. 9: 54 11-542 1. Wallace, D.C. 1982. Stmcture and evolution of organelle genomes. Microbiol. Rev. 46: 208-240. Wallace, D.C. 1989. Mitochondrial DNA mutations and neuromuscular disease. Trends Genet. 5: 9-13. Wallis, G.P. 1987. Mitochondrid DDNA insertion polymorphism and germ line heteroplasmy in the Tntunrs crisratus cornplex. Heredity 58: 229-238. Warrior, R., and J. Gall. 1985. The mitochondrial DNA of Hydraattenttata and Hydra littoralis consists of 2 linear molecules. Arch. Sci. Geneva 38: 439-445. Watson, J.D., N.H. Hopkins, J.W. Roberts, J.A. Steitz, and A.M. Weiner. 1987. Pp. 346-347 in Moleculas biology of the gene, Ed. 4, vol 1. Benjamin/Cummings, Menlo Park, California. Wernette, C.M., M.C. Conway, and L.S. Kaguni. 1988. Mitochondrial polyrnerase frorn Drosophila melanogaster embryos: kinetics, processivity, and fidelity of DNA polymerization. Biochemistry 27: 6046-6054. Wilkinson, G.S. and A.M. Chapman. 1991. Length and sequence variation in evening bat D-loop mtDNA. Genetics 128: 607-617. Wolfson, R., K.G. Higgins, and B.B. Sears. 1991. Evidence of replication slippage in the evolution of Oenothera chioropIast DNA. Mol. Biol. Evol. 8: 709-720Wolstenholme, D.R- 1992. Animal mitochondrial DNA: structure and evolution. [nt. Rev. Cytol. 141: 173-216. Wolstenholme, D.R., and D.O. Clary. 1985. Sequence evolution of Drosophila mitochondrial DNA. Genetics 109: 725-744. Wolstenholme, D.R., J.M. Goddard, and M.R. Fauron. 1979. Replication of Drosophila mitochondrial DNA. Pp. 131-148 in Y. Becker, ed. Replication of viral and cellular genomes. Martinus Nijhoff,Boston. Wolstenholme, D.R., J.M. Goddard, and M.R. Fauron. 1983. Replication of Drosophila mitochondrial DNA. Pp. 13L-148 in Y. Becker, ed. Replication of viral and cellular genomes. Martinus Nijhoff, Boston Wong, J.F.H., D.P. Ma, R.K. Wilson, and B.A. Roe. 1983. DNA sequence of Xenopiis laevis mitochondrial heavy and Iight strand replication ongins and flanking tRNA genes. Nucleic Acids Res. 11: 49774995. Wong, T.W., and D.A. Clayton. 1985. In vitra replication of human mitochondrial DNA: accurate initiation at the origin of light-strand synthesis. Ce11 42: 95 1-958. Yang, Y*G*y Y.S. Lin, J.L. W u , and C.F. Hul. 1994. Variation in mitochondrial DNA and population structure of the Taipei treefog Rhacophonis taipeiantts in Taiwan. MOL.EcoI. 3: 219-228. Zevering, C.E., C. Moritz, A. Heideman, and R.A. Sturm. 1991. Parallel origins of duplication and the formation of pseudogenes in mitochondrial DNA from parthenogenetic lizards (Heteronotiabinoei: Gekkonidae). J . Mol Evol. 33: 43 1-441. Zuker, M., and P. Steiegler. 1981. Optimal computer folding of large RNA sequences using thermodynamics and awriliary information. Nucleic Acids Res. 9: 133148. Appendix Id: Analysis of the teniporül stübility of length variünts for regions 1 & 2 in four clones of A. pisiti~ifrom ülfülfü ünülyzed in 1990 and again in 1992 after approxiniotely 30 generütions of pünhenogenetic reproduction (Tüble 6, Barrette et al., 1994) Region 1 size class Clone L25 ES5 E64 E68 1990 3 1992 3 2 2 2 3 2 3 Region 2 size clriss (relative frequency) 1992 1990 2 (0.45) 2 (0.31) 3 (0.55) 3 (0.69) 4 (0.60) 4 (O. 16) 5 (0.22) 5 (0.65) 6 (0.18) 6 (O. 19) 1 (0.47) 4 (0.53) 1 (0.46) 2 (0.57) 3 (0.43) 4 (0.54) 2 (0.52) 3 (0.48) Note: Clone E85 was only analyzed with the enzynie Th11 and is not included in the other nnulysis. Length vüriiini 6 in region 2 wus observed only in this clone Appendix le. Restriction map and gene order in A. pisum. The intemal gene order rnap is that of D.yakuba (Clary and Wolstenholme 1985). Genes for TRNA are hatched. Protein encoding genes are denoted COI, COD, and COm for the genes encoding subunits 1.2, and 3 of cytochrome c oxidase, Cyt b for the cytochrome b gene, and ND 1 -6 and ND4L for the genes encoding subunits 1 to 6 and 4L of the NADH dehydrogenase system. The AT-rich region is denoted "A + T." The genes encoding the smail and large subunits of tibosomai RNA are denoted srRNA and IrRNA, respectively. The position of the PCR prirners that have been used successfully on A. pisum and their direction of extension is indicated on the gene order map by the solid flags. The primer numbers correspond to those in Table 8. The hatched bars on the restriction map represent regions of hybridization between total A. pisum rntDNA and the DIG-labelled PCR fragments amplified from A. pisum. The dotted lines connect the ends of the bars to the primers used to generate the probe fragments. Length variable regions (regions 1 and 2) are indicated on the extemd A. pisum restriction site map by open bars. The solid bars on the restriction map represent regions of the A. pisrcm mtDNA that we have been unable to [email protected] with any available primen. Letter codes for the restriction sites mapped in A. pisum are as follows: Ac, Accl; Av, Aval; Bc, Bcll; Bg, BglIl; Ec, EcoRl; Hi, Hindlll; Ps, Psrl; Rs, Rsal; Ss, Ssrl; Ta, Taql; Xb,Xbal. (Reproduced from Barrette et al., 1994) Appendix 2: Single-letter amino acid code designation used in codon usage Table 8. A - Alanine E -Glutamic acid H -- Histidine L - Leucine P --- Proline S - Serine W - Tryptophane C -Cysteine F - Phenylalanine 1 - Isoleucine M -Methionine Q -Glutamine T -Threonine V - Valine D - Aspartic acid G -Glycine K -- Lysine N- Asparagine R - Arginine Y -- Tyrosine A. me11 A- quad D* yak A. pimm A. me11 A. quad D* yak A. pisum A. me11 A. quad D- yak A. pisum A. me11 A. quad D- yak A. piçum me11 A. quad D. yak A. pisum A. me11 A. quad D. yak A. pisum A. Appendix 3a. DNA sequence alignment used in the comparative andysis of the 12s rRNA partial gene sequence for the four insects. The abbreviations are as follows: A. melI, A. mellifera; A. quad, A. quadrinzaculatus; D. yak,D. yakuba. A.mell A. quad D -yak A.pisum A.me11 A. quad D-yak A.pisum A.mell A. quad n.~ak A.pisum Appendix 3b. DNA sequence alignment used in the comparative anaiysis of the 16s rRNA partial gene sequence for the four insects. The abbreviations are as foiiows: A. mell, A. meIlifera; A. quad, A. q~madrimaculntris;D. yak, D. yakriba. 1 I 1 1 I 1 1 80 90 100 110 120 UO 140 A. me11 CACATCA~rPnmACAGTCGGA~TvrmGATAcZlfGAGCATAmA~CAATAATCAT A. quad CTCATCArATAmACAGTAGGAA~~TAcrCGA~AmA~mTAATTA D- yak A. pisum C T C A T C A m T A T l T A C A ~ T A G A ~ m C A ~ m ~ A C T A T A A T T A CACATfA~TA~CAA~~~~TACACGAGCATA~CATCAGCAACTAT Appendix 3c. DNA sequence alignment used in the comparative analysis of the COI gene sequence for the four insects. The abbreviations are as follows: A. mell, A. mellifera; A. quad. A. qundrimacrtlatus; D. yak, D.yakzïba. A.mell A. quad D.yak A.piçum A.mell A. quad D .yak A. pisum Appendix 3d. DNA sequence alignment used in the comparative analysis of the ND4 gene sequence for the four insects. The abbreviations are as follows: A. mell, A. rnellifern; A. quad, A. q~irlrrimmtlnrrïs;D. yak, D.yakuba. Appendix 4a. Possible mechanisms of recombination generating length variation in cricket mtDNA. Bold lines represent repeated DNA. Numbers or letters serve as Imdmarks with which to identify ends of different repeats. A bold line running perpendicular to repeats indicates the site of recombination. A, Intraniolecular recombination; B, intemoLecular recombination. These are meant to serve as examples; other intermediates and products could be drawn. Figure and Iegend reproduced from Rand and Harrison, 1989. .s'.-..ttaGAAATA..-....~taQMATAA--~..3' . .....trtCTTTAT.... CTTTAT.. -5. Appendix 4b. Replication slippage mode1 for the creation of tandem duplications. The exact 29-bp duplication in plastome 1 of Oenothera could have arisen by the following process: replication through the region destined for duplication (A), pausing and rnisaügnment of newly synthesized strand, mediated by 6-bp repeat shown in capital letters (B), replication continuing and producing the duplicated sequence (C), and realignment of daughter strand, stabilized by 6-bp complementary sequence (D). Figure and legend reproduced from WoIfson et al., (1991). TEST TARGET (QA-3) APPLIED A I W G E .lnc -----* 1653 East Main Street Rochester, NY 14609 USA Phone: 71W48~-03OO Fax: 716/28&5989 O 1993. Appiied Image. Inc. All Rights Resenred