* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The insect cytochrome oxidase I gene: evolutionary
Frameshift mutation wikipedia , lookup
Transposable element wikipedia , lookup
Genomic library wikipedia , lookup
Gene therapy wikipedia , lookup
Gene expression programming wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Gene nomenclature wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genetic engineering wikipedia , lookup
Genome (book) wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Human genome wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Gene desert wikipedia , lookup
Non-coding DNA wikipedia , lookup
History of genetic engineering wikipedia , lookup
Expanded genetic code wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome evolution wikipedia , lookup
Metagenomics wikipedia , lookup
DNA barcoding wikipedia , lookup
Genetic code wikipedia , lookup
Designer baby wikipedia , lookup
Genome editing wikipedia , lookup
Microsatellite wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Point mutation wikipedia , lookup
Microevolution wikipedia , lookup
Insect Molecular Biology (1996) 5(3), 153-165 The insect cytochrome oxidase I gene: evolutionary patterns and conserved primers for phylogenetic studies D. H. Lunt, D.-X. Zhang, J. M. Szymura* and 0. M. Hewltt Introduction Population Biology Sector, School of Biological Sciences, University of fast Anglia, Norwich The study of mitochondrial DNA (mtDNA) sequences has become the method of choice in recent years for a wide range of taxonomic, population and evolutionary investigations in animals. Many aspects of the structure and evolution of mtDNA have made it a valuable evolutionary tool. These include its ease of isolation, high copy number, lack of recombination, conservation of sequence and structure across metazoa, and range of mutational rates in different regions of the molecule (reviewed by Moritz et al., 1987; Harrison, 1989; Simon, 1991; Wolstenholme, 1992a). The mitochondrial gene encoding subunit I of cytochrome oxidase (COI) possesses some extra characteristics which make it particularly suitable as a molecular marker for evolutionary studies. Firstly COI, as the terminal catalyst in the mitochondrial respiratory chain, has been relatively well studied at the biochemical level, and its size and structure appears to be conserved across all aerobic organisms investigated (Saraste, 1990). Mutational studies have been used to map the reaction centres of this subunit (Gennis, 1992) and these provide a background which enables interpretation of sequence differences in terms of gene function. Cytochrome oxidase I is involved in both electron transport and the associated translocation of protons across the membrane and it has been shown to contain a range of different types of functional domain including ligand sites, components of the proton channel, structural ahelices and interspersing hydrophilic loops (Saraste, 1990; Gennis, 1992). Amino acid residues in the reaction centres, which are highly conserved, do not dominate the entire COI molecule, allowing scope for considerable variability in some regions. Such a mix of highly conserved and variable regions so closely associated in a mitochondrial gene make the COI gene particularly useful for evolutionary studies. Secondly, the COI gene is the largest of the three mitochondria-encoded cytochrome oxidase subunits (composed of 511 amino acids in D. yakuba, compared Abstract Insect mltochondrlal cytochrome oxidase I (COI) genes are used as a model to examlne the wlthlngene heterogeneity of evolutlonary rate and Its lmpllcations for evolutionary analyses. The complete sequence (1537 bp) of the meadow grasshopper (Chorthlppus parallelus) COI gene has been determined, and compared with eight other Insect COI genes at both the DNA and amino acid sequence levels. This reveals that different regions evolve at different rates, and the patterns of sequence varlablllty seems associated with functional constralnts on the protein. The COOH-terminal was found to be slgnlflcantly more variable than Internal loops (I), external loops (E), transmembrane helices (M) or the NH2 terminal. The central region of COI (MSM8) has lower levels of sequence variability, which Is related to several Important functional domains In thls reglon. Highly conserved primers which amplify regions of different variabilities have been designed to cover the entire insect COI gene. These primers have been shown to amplify COI in a wide range of species, representing all the major insect groups; some even In an arachnid. Implications of the observed evolutlonary pattern for phylogenetic analysis are dlscussed, wlth particular regard to the choice of regions of sultable variability for specific phylogenetic projects. Keywords: insect, Chorthlppus parallelus, cytochrome oxidase I, mltochondrlal DNA, conserved PCR primers, genetic marker. 'Present address: Department of Comparative Anatomy, Jagellonian University, Karasia 6,30460Krakow, Poland. Received 14 July 1995; accepted 3 November 1995. Correspondence: Professor G. M. Hewttt, Population Biology Sector, School of Biological Sciences, University of East Anglia, Norwich NR4 7TJ. f, 1996 Blackwell Science Ltd 153 154 D. H. Luntet al. to 228 for COll and 261 for COlll; Clary & Wolstenholme, 1985), and is one of the largest proteincoding genes in the metazoan mitochondrial genome. This enables one to amplify and sequence many more characters (nucleotides), within the same functional complex, than is possible for almost any other mitochondrial gene. A suitable genetic marker is an essential prerequisite for success in many evolutionary studies. The crucial characteristic in the choice of such a marker is the substitutional rate of the particular region. To a large degree it is the broad spectrum of substitutional rates which accounts for the popularity of animal mtDNA as a molecular tool, since it allows resolution of both intraspecific phylogenies (e.g. Avise et a/., 1987) and the higher level systematics of anciently diverged taxa (e.g. Ballard et a/., 1992). It is well known that different genes may evolve at different rates, and the same gene may have different rates of evolution in different lineages. However, within-gene heterogeneity of evolution rate has not yet received enough attention especially in the field of lower taxonomic level phylogenetic studies. It may be misleading for many applications to consider a gene as fast or slowly evolving, because this implies a homogeneity of rate across the whole gene, which is rarely true due to the concentration of functional constraints in specific regions of the DNA sequence. Hence it is highly advantageous to have information concerning the relative substitutional rates of different gene regions, as this will allow a much more informed choice of sequence for particular phylogenetic investigations. Sequences evolving too quickly are known to lose their ability to unambiguously reveal the phylogeny of anciently diverged taxa. Similarly, choice of a sequence which is too conserved when addressing questions of intraspecific phylogeography, for example, will not provide enough informative characters to determine the requisite relationships. Thus for many studies success will depend to a large degree on the sampling of a region containing a suitable level of variability. In this study we have used the COI gene as a model to study the within-gene heterogeneity of evolutionary rate and discuss this in terms of its implications to phylogenetic studies. The complete sequence of cytochrome oxidase I for the meadow grasshopper (Chorthippusparallelus) has been determined. (At the time of writing, no other Orthopteran COI sequence has previously been published, though the Locusta rnigratoria sequence, kindly provided by Paul Flook, University of Berne, Switzerland, prior to publication, is also included here; see Flook et a/., 1996.) Comparative analysis of this sequence with seven other published insect COI sequences has been carried out in order to identify and quantify areas of differing levels of variability. The relative rates of evolution (at the amino acid level) of these areas are considered in the context of both the structurefunction model of COI and their utility in evolutionary studies. Areas of DNA sequence which are completely conserved are also shown and conserved primers designed and tested for their applicability to a wide range of insect species. Results and Discussion Sequence and structure ofthe C. parallelus COI gene The COI gene of Chorthippus parallelus has been completely sequenced and is presented in Fig. 1 together with an alignment of eight other insect species. This sequence significantly extends our knowledge on insect COI genes, because six of the eight sequences previously available were from the same Order, Diptera. The Chorthippus parallelus COI gene is flanked by tRNA-Tyr at the 5’ end and tRNALeu at the 3 end. Although the relocation of tRNA genes is not an uncommon event (Hauke & Gellisson, 1988; Paabo et a/., 1991; Smith et a/., 1993), Chorthippus shares its positioning of these genes with many of the other insects presented here (Beard eta/., 1993). The complete C. parallelus COI nucleotide sequence was found to be 69.4% A+T, comprised of 34.1% A, 15.5% C, 15.1% G and 35.3% T. The A + T percentage of nucleotides at the third codon position is much higher (90.8% AT) than either of the other two locations (first = 57.9%; second = 60.6%). With the exception of Apis, these values are typical of those reported for other insects (see Table 1). The putative initiation codon for C. parallelus COI is TCG and the stop codon is a single T at position 1537. Although the exact initiation and termination codons for insect COI genes are probably less clear than those for any other insect mitochondrial gene (Beard et al., 1993) the codons suggested here are wholly consistent with those reported elsewhere (see Table 1). Beard et a/. (1993) discuss the possibility that the initiation codon for D. yakuba is the TCG triplet immediately following the ATAA which is more usually recognized. Although TCG would fit well with many of the other insect sequences presented here, it is not supported by the other Drosophila sequences. Both D. simulans and D. sechellia share the ATAA motif but have undergone substitutions which alter the following TCG (Ser) triplet to CCG (Pro). The 5’ COI sequence of two other Drosophila species have been reported by Satta et a/. (1987). These species commence with the tetranucleotides ATAA (D. rnelanoga- 0 1996 Blackwell Science Ltd, lnsect Molecular Biology5: 153-165 lnsect COI gene evolution and conserved primers 155 50 I I I I I .............. ---................... ---................ 100 I I I I I t c g C C G C A A A A A n ; A T A ~ ~ ~ Chorth attaA.. C...........C.....C.....T..AC.G..T........C..G........T.....A.................A Locuata tcg.G.C T.................T.....T.....T..C.....T.....T.....A.....G.....TT....A A. gambiae tcg.G.C T....................A.....T.....T........T...........A........T..TC....T A. quadrim ataaT...G.C....G........T.....T.....A...........T.....T..C..T........T.....C.....A...........TT....A D. yakuba A..C.........T....A D.aechel1 ataa....G.C....G........T.....T.....A...........T.....T.....C........C.....T. ataa....G.C.............T.....T.....A....,......T...,.T.....T........T T.....A..C.........T....A D.simul tcg.G.C T..T..T.....A........T..T.....T..C..T..C.....T...T.C.....A........T..TP....A Phormia ataAT...G.....CA.A.....C..T.....AA.......G.T...G..TA....TC.A.~....T.T.....AC.....T.......G..A Apia mZ +-..-.......-.... + I I I I I I I I zoo I I I I I A A ~ ~ ~ C A A A A A n ; A T M ~ A ~ ' Chorth Locuata A..T...T.A..T........AA.AA.A...AAC.........C.A........A...........A..C........T.......... A. gambiae CC.A. A..T A.. C..T..AG.AT C.................A...G....T........T...A.T.......... A. quadrim ..TT. A.....A......T.A.....C..T..AG..T.......T.......................n;.A...........T...A.T........T. ..TT.A.....A......T.A..T..T.....AG.AT.A..........................A..n;....T..A.....T...A.T........T. D yakuba .C..T..AG.AT.A..........................A..n;.A..T..A.....T...A.T........T. D aechell ..TT. A.....A..C...T.A..T. D.aimu1 ..TT. A.....A..C...T.A..T..T..T..AG.AT.A..........................A..TG.A..T..A.....T...A.T........T. CC.A.....A..C...T.A..G..C..T..AG.A..A...........C..............A..n;.A..G.....C..T...A.T.......... Phormia Apia C.T........AAT....T.AA..TCC.....A..ATGA...A.CA.................ACA..n;....TAG.........CC.A........T. El . - . - - - - . . . - - - . . . - . - - - - - - . . - - - - - - - x .... .. -. .. -.. .. .. .- . .. ~ --...--------.----.----...-..-.--.-~-.---.-.___.----_...~---~~.. 150 I ~ ..... .... ............. ---- ---- ~ ........... .. .... ..... ... L T .......... .. f-----....------------.----..---- 250 I I I I I 300 I ~ A T ~ T A C C T A T T A T A A ~ ~ M ~ ~ ~ C A T P A A T A A ~ ~ T A T A ~ C A C Chorth .C..G..T..G..A...........A.....C..A......T...................A..T..................... Locuata T.................A..G.....A..C...T....T..T......T.A..A.....T..............T...........T..... A. gambiae T.................A........A...........C..TC.....C.A.....C........G.....C..T...........T..... A.quadrb A.................G..G.....A......T....G..T......T.A..A..T..T..C.....A..C..............T..... D. yakuba T.......................C..A......T.......T......T.A..T..T..T........A..C..............T..... D.aechel1 T.......................C..A......T.......T......T.A..T..T..T........A.................T..... D.aimul A.....G...........A........A...........T..CC.T...T.A.....T..T........A..C..T...........T..... Phormia T.....AT..T.......A........A.....G..TA.T..T......C.A..AT....T... A..C..C... T..T.. Apia ............................................................. 11 ................................. .............. ....... ....... ....... ....... ....... ....... ....... ..... ........ 350 Chorth Locuata A. gambiae A.quadrim D.yakuba D aechell D.aimul Phonnia Apia . I I I 'C~~~~ATM~TAA~~~~CAAAAAn;~C~~~C~ I I I I 400 I I I A.........T................A..CC..C.AATG..T...G..G.A........A..T..T..............A..T........A...A.. A......A.G..T..T..T........A........TTIT.TA6AG....G.A..A..C..G..T..............T.....T.....T..AT.TPCP A......A..T....C..C..T.....T...C....TI.TAG.AG....G.A..A.....A..T........G.....T..A..T.....T..AT..TC. A..............T..TG.TC.TT.TT.A...T.A.T.AG.AGA...G.T..A..C..A..T..T.....T.....T...........TT.AT.~. A..............C..n;.TC.W.TT.A...T.A.T.AG.AGA...G.T..A..C..A..T..............T.....T..A..TT.AT.T.CT A..............C..TG.TC.'LT.TT.A...T.A.T.AG.AGA...G.T..A..C..A..T..............T.....T..A..~.AT.T.CP C...C.TP....T..n;.......TT.A...T.A.T.AL;TAG....G.A..A.....G..T..............T........A..TT.AT.TPCP A...........T..T..C..... .mA.AC.TP.ATT.AG.AA.T..T.TT..CCAA.AC.......T...........A..T..A...T.AT.A.C. ...................MJ ---....----....-..-..-------~-.---------.----.... EZ ....................... ... 450 I I I I I 500 I I I I I G C A A T n ; C C C A n ; O T G C T ~ A ~ T T ~ T C T ~ C ~ ~ A ~ A ~ O T ~ ~ A T T A C Chorth .TC..... T...A.A ...GCT.. T......T A.....T..............A........C..A...........TA.T..T..C.......... Locuata A. gambiae .G......T....C....GCT.........T....A.....T..TC.T...T....A..AA....T....................T............. .G......T....CA...GCT.........T..........T.........T....A..AA....T..A..............T..T............. A.quadrM .OT.. C..T......... GCT.. T......T.. T..TC.T...T....T..AA....T..A...........T.....T........G..T. D yakuba T. D.aechel1 .G................GCP.....T...T..........T..T......T....T..AA....T..A...C.......T.....T........ .G.........C......GCT.....T...T..........T..T......T....T..AA....T..A...C.......T.....T...........T. D.aimul AAT. T..C..A...GC...T..T...T.............TC.T..CP.G.....AA....T..A...........T.....T..C........T. Phormia TATT.ATAT ...TC.TC. Cff.........T.T..A.....T..TC.T...A..T.A..AA....C..A...A.....T..T.......A..ADPT...A Apia ................... EZ --......-------.-.-----------.-.-.------.--. M4 ........................... .... ........ . ... .... 550 I Chorth Locuata A.gambiae A.quadrim D.yakuba D. sechell D.simul Phormia Apia I I I I 600 I I C A T T ~ T A ~ ~ ~ T ~ ~ ~ ~ C A ....AC. .... A..A...A.T.AT.....C..T..............A................C.....C...CP.A...C.T..AT............. T.........A.....TCC.G....T...T.A....G..T......A...........G.....T....C....G.A........AT............ TT........ ~ ......... ~ I A O I T A A T T A..AG..CC.G....T..T..T..C.G..T...G..A..C........T. C....G.A........A..T..T........ T.. A..A...ACTG. T...T.A..C.G..T T..A.....A...........T....CT..TT.A...C.T..A.....T........ T. A..A...AC.G....TT..T.A....DI.T...T..A.. T....CT..CP.A......C..T.......G..... T. A. AC.G ....TP.. T.A....DI.T T..A.................T....CT..TT.A......C.AT............. T.. A..A..TAC.G....T...T.T....G..T...T..A...........T.....T....CT..T......C.T..AT....T........ TP.. A.TA..AAAAAA~....ATTAT..C....CAAAAAn;....A...CCA........TT.T....C....A........A.TA..........T.. ..-.-...-....-----------1.2 ................. ....... ........ ........... .. ....... ... ... ... Flgure 1. 01996 Blackwell Science Ltd, lnsect Molecular Biology 5: 153-165 ............... I ~ . ~ A T 156 D. H. L unf et al. 700 650 IC Chorth IG I U I C I T I A m A G A G A G U L T A T A T P A.....T................................CC.T.....G........C..C..G..A.....A..T.........T.A..T.....C... A. gambiae A...........T......................................-T..........,...A.....A..T.~T......T.A..T.....C... A.quadrim A.....T.....T.........C.T.....A........C....................C.,......G..A..G..A..C.A.T.A.,......C,.. D.yakuba .C.T..C.....T.................A..C..............T..T..T.................A.....T..T...T.G............. D sechell A.....G.....T.................A.......................T.....C......,....A........T...T.A............ D. a h 1 A.....G.....T.................A.......................T.....C.........,.A,.......T...T.A............ Phormia A........T..T....................C..............T.....T.................A........T...T.A............ Apis A.....T...............C.....TP............T...........T..C.....TATA...........T...........T......... I A ~ ~ I~ ~ ~ IT ~ ~ IC A ~ T T Locuelta . - ~ - . - - - - - - - - - - - ~ - - - - - - - - - - - - - - - - - - - - - - - - E3 - - ---- ---.. . -.-- - - - - . - - - - . - - - - - - - - - - - - - - - - 750 chorth Loculrta A. gambiae A.qU.¶dKim D.yakuba D.sechel1 D.simul Phormia Apia I I I I I 800 I I I I .............. T..C..A..................T......................A... ..C..T...........A...,............ .. C.....T.....T... .. A.....A............T.. .. T.....C..A..A.....G. ..... A.TAC...............G.AG...A... ........ T.. ...... .... T......... ....... A............A.TACA........A..T....AG...A... ........ T.....T..C................. ....TT........GT...... ...... ....AA....AA................ A.TA.A......TC...T....AG...A.T. ........ T.....T..... ................... ........A.TA.A......TCG. ......AG...A.T. ........ T.....T............ ............ T....T........G..A. ........... A.TA.A......TCA..G....AG...A.T. ........... . ....A............T....T.....C..A..A............A.TA........TCA.....G.AG...A.T. ........T.....CT....T..C...... A.... ..............T....T ........AT.A..C............ATAA.T. ....A.......AA...A~. C................. ..................... M6 I3 - - - - - - . - - - - - - - - - - - - - - - - - - - -. . . . . . . . . . . . . . . . . . . . . . . . . --------- 850 Chorth Locusta A. gambiae A. quadrim . . D yakuba D.sechel1 D s h l Phormia Apis chorth I T T T K A ~ ~ A T A " T I ' A A l T C l ' A C C A G G A ~ A T P A T P K T C A T A 1 T T P I U TTGAATCAT T .......AA. I I I I I 900 I I I I I I I I I I I I I ~ T A T.............................A........A..............T.....C...........A..A........A..... ......AC..... A.....C.....T...C..G.......C...C.T..A........T.....T....................A.............. .C....AC.....A...........T...C..G.T.....A...C....G........T.....T.....C...........A..A.............. .C..TT.......A.....C.....T...C.TG.T.....A...T....A........T.....T....................A.............. ....TT....... A...........T......G.T.....A...T....A..............T...........C.....A..A........A..... A...........T......G.T.....A...T....A..............T...........C.....C..G........A..... .C...T.......A........C..T......G.T.....A...T....A........T.....T...........C..C..A..A.............. T.A....A.A..................GC......A..TC.............T...........C...........C..AT....T........ .............I3 - - - - - - - - - - - - - - - - - - - - - - -M7 ---. -~ -..~ -~ -~ - .. -- . ~ . . ~ - ~ - - - . ~ ....~....... .... I I I I I I I I 950 I 1000 I ~ ~ T ~ ~ ~ ~ T A T..T..C..C.....A..............T.................G.....C......A.......AT.A..C.....T.....C..A..C Iocumta A. gambiae T.....T..............A..T...........T..G.....T........WL.T.....T...T....T..AT.AC.C......C....G.C..A. A.quadrim T.....T..T.....T..T..T..T...........T.....T.....T......A.T.....T...T.......AA.AC....T...C....A.C..A. D.yakuba T..T.....T..T..T..T...........G..T..T............A.T.........T....T...T.AC.......TC..C..TC..A. T..T..C..T.....T..T..............T..T..T..T......A.T.....C...T....T...T.AC.......TC..C..TC..A. D.sechel1 T..T..C..T.....T..T..............T..T..T..T......A.T.....C...T....T...T.AC.......TC..C..TC..A. D.elhl Phormia T.....T........T....................T..T.....T.........A.T.....T.......................TC....A....AC Apia T........T.....T.....A........C.....T....................T.........T........TA.C....~.......A..A..A . . Ed . . . . . . . . . . . . . . . . . . . . . . . . . . - - - - - . . - - - - - - . - - - . - - - - . - - - - -_ -_ -_ -_ -. _ -_ ... ..~ - ...... ...... ...... ...... 1050 Chorth A I 1100 A ~ ~ ~ ~ ~ ~ A ~ T.A.....AT.A...........C...........A..A........T..TC.TG.AC.T.....................G.T.... Locueta A. gambiae .GC... G.TA T.G.....AT... CG....CT........G........C. G.TC.AC....T.........A.T......G.TC.T. A.quadrim .G...TG..A..T.A......T.C......G.A...T........G.A..A........C..TG..G.A.....T........AA.T......G.T.. D. yakuba TC.... G.TA.TP.A......T.A......G.....T....C...G.A..A..A.........G.TG.A.....T........AG.T........T.... D sechell TC.... G.TA.TP.A......T.A......G.A...T....C...G.A..A..A...... G.n;.......T........AG.......C..T.... D.simul TC.... G.TA.TP.A......T.A......G.A...T....C...G.A..A..A.........G.n;.... T........AG..........T.... Phoda TCC...G.TACTT.A.. T.A. G.A...T.......n;.A..A..A.....T...G .n;..C....T..C..T...A.T........T.... A~..A.TP.A...T.A..A..T......A.A.... T.....b..A.........A.T..A...T.......T...A.T........TC.T. Apia .............. . .... ..... ... ....... ... .... I A ~ .. ... .... ng E5 ------.---------------------------------------_ _ -_ -_ ._ -_ -. -_ - .- I ChOrth Locu4ta A. gambiae A.qu4IdZ-h D.yakuba D. sachell D.aimu1 Phoda Apis I I I 1150 I 1200 I I I I I A ~ T ~ A ~ ~ ~ ~ ~ T ~ A T ~ ~ A T ~ A T C..A........A..............C..T..AT..................................A..............C-.T........ .C.................T..T...........T..AT....A.................T.......C...AT..G.A..T..............A.. A........A..T..C.....T.....T..(;T..........G.....C.....T.......C...DT.......T.....C......C.A.. A. T... C......T....A........T........T..,....C....T.......C.....C........... A.....T.....T.....T...T,...A........T........T.......C....T.......T.....C-.T....... A.....T.....T.....T...T....A........T........T.......C....T.......T.....C..T....... .C.....A........A.....T........C..T..AT....A........T..C.....T.......CT..AT..G....T....TC..T........ A.....C.....T.G......T.....T.....T..A.....T.................~.A.AT.......T............A... -----------------------M1O ---. --.---- --. ----.15 -_._..-------------.. .... ....... ................ ................ .... ..... . . ................ ....... . Flgurr 1 (continued) 0 1996 Blackwell Sclence Ltd, lnsect Molecular Biologys: 153-165 lnsect COI gene evolution and conserved primers 157 1250 Chorth Locusta A. gambiae A.quadrim D.yakuba D.sechel1 D simul Phormia Apia . 1300 I I ~ ~ T.......G..A....C....A..T.....T........................T.......C........................... T...............CCA..T........A..T.....TT...........G.A........TT...................T...T....T.....C T.....CCC..AT....... T.......TG.A..A......G..........TT....C.....C........T..CT....G..G..C T.....G...T.........A...G.....A.GT.....T.T......G..............TT......... .C..C.....T...T........... T.........T.........A.........A.OI.....T.T..C..................TT..........C........T...T..........T T.........T.........A.........A.GT.....T.T.....................TT..........C........T...T..........T T...C.....T.......C.AG.T......A.GT.....n;................T.....TT....C..C..C........T..CT..........T T ......TP. T......T A.........A..T.....T.T...A.................T.....T..C...........T...T.....C..AT ~ I I I I I I I I A ......... ......... .. .. . *- _~........_..._. .------------......~....--.--..---..---...... M11 ....................... 1350 I I I I I 1400 I I Chorth G G A A T A C C A C G A C G C A ~ ~ T A T C C A T..C..T.................T..C..C..A........A...........CAC......T....... -eta A. qambiae T...........A..C.T.......AGC...TT.A...........G.....TC.T.A..TAG......C...T.AT.C.CT..TT.ATAC. A.quadrim T........C..A....T.......~..CTT.G........A.TG.A..ATC.T.A...AGA........TP.AT.T.CT..TP.ATAC. D.yakuba T.....T.....A.....C..T.....T..C..TA..........n;.G....C......G..A..T......T.AT.......TP.AT.T. D.scchel1 G.....T...........A..............T..C...A........A.TG.G....C.........A..T......T.AT.......TC.AT.C. D simul T........C..A..............T..C...A........A.TG.G....C.........A..T......T.AT.......TC.AT.C. T........C.....C...........T..C...G.T................C......T..A.........T.AC.......TP.AT.T. Phonnia Apia TCT...........T.....A..C.........T.T...TAC.GT......TC......ATC...A.....A.T.......T.AAATA....A...T.T. ...................... E6 .................................................. ............................. I G A T I G C A I T A T ........ ........ . ........ .. ........ ........ ................. 1450 I I I I I 1500 I I I I I T T A T C C P A A T P T P A T G R G A A A W V L T A A T A ~ ~ T A A T ~ T ~ ~ ~ ~ ~ T T chorth Locusta .C..TT.....A..................AGC.A....ATG.AT.TA..T...CA...................................T..A..A.. A. qambiae T.AT.T...A.T........T.....C.CTC.A.....TC~..CCCT.TAC.AT..TC.TC.....TT.......AC..T.CC~..T..... A.quadrim T.AT.T...A.T........T.......C.C.A......CCAGC...CCC..TAC.AC.TPCTLY:.....TT..G....AC..C.CCCTA..C..... D.yakuba T..TAT...A.T........TT..G.~.A...CA.G.A..T.A.CC..T.C.AT...ATPC...T.TP.......AT.......CA..C..A.. D.sechel1 T.TP.T...A.T........~..G.ATC.C.A..CCA.G.A..T.ACCC..T.C.AT...ATPC...T.TT.......AC.......CP..C..A-D.simul T.TT.T...A.T........W..G.ATC.C.A..GCA.G.A..T.A.CC..T.C.AT...ATLY:...T.TT.......AC.......CP..A..A-Phormia .CP.TP. C...A.T........TT..G.ATC.C.A..TCA.G..T....CCCTGPCC.AT...ATPC.....TT................eP..A..A.. Apis .A..TP.T...A.T.T.......T.....TCI..A....T.T.AT......A.TPC..CCA.T---C....CIT.........A.TIT.TTA..A...CT ..................................................... Coo" ___.._._....-....-...-.....~-- .. .. .. .. .. 1550 Chorth Locusta A.qambiae A.quadrim D.yakuba D.sechell D.s-1 Phormia Apia I I I I AGAACATAGATATPCAGRACTACCAATARmCPAGRt-----------------------------I 1568 I I ...... C.....C............C.....AA.TTATAGAt-------------------------........CT... G....G..T..TC..T.AA...ATAACt--------------------------- ........................... .................................................................... .................................................................... ........CT... G.....T.....T..T.ACPA.TAATTt T..............T...T.....C.TT.AA.A.ATtaa---------------------------- T.....C..C...AGT...T....TT..T.AA...ACTPCtaa------------------------- ...T...TC.C...T....A.T...T..T.AAT..AAAAmAAAmAAAATCAAmTAATPAAAt COOH .......................... ............................. * Figurel. DNAalignment ofthe COI genefor ninespeciesof insect. Identity to C.parallelusis indicated byaperiod, adeletion by adash. (Putative) termination and initiation codons are displayed in lower case. The twenty-five structural regions are indicated (see text and Fig. 3 for details). sfer) and GTAA (D. rnauritiana), indicating that differences in initiation codon can be present even between sibling species. The termination codons used for insect COI genes are shown in Table 1. Many organisms terminate with a single T, or TA, immediately adjacent to the tRNA gene, and it is known that complete (TAA) termination codons can be produced by post-transcriptional polyadenylation (Ojala et a/., 1981). Drosophila and Phorrnia, however, show the complete TAA termination codon common in many other mitochondrial genes. Translation of this sequence with the invertebrate mitochondrial genetic code (Clary & Wolstenholme, 1985) gives a protein sequence with a mixture of residues conserved across all the studied species and residue positions of differing levels of variability [see r; 1996 Blackwell Science Ltd, lnsect Molecular Biology5: 153-165 below and Fig. 2. Note that the first amino acid residues for all COI proteins discussed here are defined as methionines regardless of the initiation codons, as suggested by Wolstenholme (1992b)l. No insertion/ deletion events are apparent between C. parallelus and eight of the nine other insects. Apis, however, shows a deletion of 3 bp at position 1464 (Fig. 1). The position indicated here differs by three codons from that given by Crozier & Crozier (1993) but seems to give a better overall alignment. This deletion falls within the COOH-terminal region and may not be constrained with respect to either size, structure or amino acid function to the same extent as would a deletion of non-terminal residues. These observations agree with the expectation of conservation of size and structure in functionally constrained systems. Figure 3 158 D. H. Luntet al. T a b 1. Summary of data concerningthe COI gene for nine species of insect. The sequencesfor D. simulans and D. sechellia terminate prematurely Ref Organism Order lnit codon Term codon Length (bp) % AT 1 2 C. parallelus Locusta migratoria Anopheles gambiae A,quadrimaculatis Orosophila yakuba D. sechellia D. simulans Phormia regina Apis mellitera Orthoptera Orthoptera Dlptera Dlptera Dlptera Diptera Diptera Diptera Hymenoptera TCG ATTA TCG TCG ATAA ATAA ATAA TCG ATA T T T T TAA 1537 1542 1537 1537 1540 ( 1498) (1498) 1539 1561 69.4 69.1 68.6 68.1 69.9 70.2 70.7 68.3 75.9 3 4 5 6 7 8 9 TAA T Accessior, number X80245 L20934 LO4272 X03240 M57908 M57911 L14946 LO6178 References: (1) This paper: (2) Rook etal., 1988; (3) Beard etal., 1993 (4) Mitchell etal., 1993; (5) Clary 8 Wolstenholme, 1985; (6 and 7) Satta 8 Takahata. 1990; (8) Sperling e t a / . , EMBL database access L14946 (unpublished): (9) Crozier B Crozier, 1993. insect COll gene the COOH terminal also appears to be the most variable region. Pooling the data into classes in this way however will lose much information if there are large differences within these classes. Figures 4 and 5 show the mean variability per region (individual loops, transmembrane stretches or terminal regions) and there can be seen to be large differences in the mean variability of different regions of the same structural class. Transmembrane helices M2, M6, M7 and M10 provide the metal ligands to interact with the two haem groups and copper atom which are essential for the activity of COI (Gennis, 1992). These regions can be seen to account for four of the seven highly conserved transmembrane helices. The fifth of these conserved transmembrane helices is M8, which is suggested to be involved with the cytochrome oxidase proton-conduction channel. This region contains three polar residues (Thr-352, Thr-359 and Lys-362) which are completely conserved among all organisms so far studied, and which are thought to be essential for this translocation activity (Gennis, 1992). Transmembrane helices M5 and M11 are also very conserved. External loops E3, E5, and especially E4,seem to be very conserved (Fig. 4). Although the functional role played by these interhelical loops is unclear, E5 is thought to lie very close to heme-A in the association to which Tyr-414 has been suggested to play an important role (Holm etal., 1987; Gennis, 1992). shows a two-dimensional structural model of the COI protein, with functionally essential (boxed) and variable (filled circles) residues in insects highlighted, respectively. It is clear that the distribution of variable residues is not random along the molecule (see below for further discussion). Mode and tempo of evolution of the insect COI gene The COI amino acid sequence was divided into twenty-five regions comprising five structural classes [twelve transmembrane helices (Ml-M12), six external loops (El-E6), five internal loops (WE), carboxy (COOH) and amino (NH2) terminals] as shown in Figs 2 and 3. In order to test the null hypothesis that there is no difference between the average amino acid variability, per site, between the five structural classes, a Kruskal-Wallis analysis was employed which led us to reject this hypothesis with an associated probability level of <0.0001. When the average variability per residue site was calculated for the five structural classes the COOH terminal was found to be significantly more variable than any other region (<5% significance level). The observed mean levels of variability were not significantly different between the amino terminal, internal loops, external loops or transmembrane regions (Table 2). This difference reflects the highly variable nature of COOH terminal amino acid sequences and agrees with Liu & Bechenbach (1992) who report that for an alignment of the Table 2. Mean variability, per amino acid position, for the different structural classes 01 COI. Region NH2terminal Internal loops External loops Transmembrane COOH terminal Size (amino acids) 13 103 120 232 30 Standard deviation SE mean Mean variability 0.947 0.263 0.089 0.072 0.050 0.239 1.692 1.689 1.467 1.461 2.500 0.908 0.788 0.755 1.306 0 1996 Blackwell Science Ltd, lnsect M o l e c u l a r Wiology5: 153-165 Insect COI gene evolution and conserved primers 50 Chorthip Locusta A. gambiae A.quadrim D yakuba D.sechel1 D.simu1 Phormia Apis . I I I I 159 H102 100 I I I I I ~ ~ S T N H K D I ~ Y E ~ W ~ ~ S H S M I I R A E ~ ~ S L I G D D Q I Y I I T A H A F V n I F F ~ I M I G F G ~..........................................TH. N L................................................ M-RQ.. I.. L.IL......H..AF..........V.....I......................L.............. M-RQ I...........L.IL......H..~..........V.....I......................L.............R MSRQ. I...........L.IL......H..A...........V.....I................. L.............. M.RQ I...........L.IL......H..A...........V.....I......................L.............. M.RQ I...........L.IL......H..A...........V.....I.. L.............. M-RQ I....S......L.IL......H A...........V.....I......................L.............. M-M m.....N..I..IILAL.S..L.S..RL...M..RS...W.SN.....T.V.S...L........FL........I...L.S..........IR *-... NH2 . - . f c . - . . . .......*......... El ... . . . . . . u . . . .&Q. .. . . . . . . . u - . - .I1 . ......... . U- ... ............. ......... ............... .............. ............... ............... ............... ..... .................... .. .. 150 I I I I I 200 I I I I I FSLHLAGVSSILGAVNFITTAINMRSESMTLGQTPLFVWWVI~LPV Chorthip FWI.UPSLTLLIASSMMDNGAGTGWWYPPGADr(rWrVYPPLAGAIAHGGLAI W...............SV...S.A....................I...........NN.............A.T.......... Locusta M.........S...VE.............SSG...A.A.............I............V.....PGI...RM.........T.V........ A.gambiae M SR..VE.............SSG...A.A.............I............V....APGI...RM.........T.V. A.quadrim A.S..LV...VE.............SSG.....A.............I............V.....n;I...RH.........T.......... D yakuba A . S . . L V . . . VE.............SAG.....A.............I............V.....n;IS..Rn.........T.......... D. sechell A.S..LV...VE.............SAG.....A........ I............V.....n;IS..RH.........T.......... D.simu1 A....LV...VE.............SSN.....A.............I............V.....n;I.F.RM.........T.......... Phormia ........FPI LLRNLFYPRP..........SAYLY.SSP...F.. ....HS. I...M.SL.~.IHH.IWP..NY..IS..P...F.T.I..IM.... Apis ...... M3 ---...-.----.-...... E2 .......---..-tf---... M4 --.-..-----.. 12 - - - - - - - - -M5 . .. . -- . ...........~.. .. .. ......... ...... ...... ...... ...... ....... ..... . I I I H239 *I 250 I H333,H334 I I I ** 300 I Chorthip ~ ~ ~ T D ~ S F F D P A ~ D P I L Y Q H L ~ F G ~ ~ I L I ~ ~ I I S H I V C Q E S ~ I E S F ~ I.............................. ~ocusta M....IT.....K.T..N........A...L........ A. gambiae E . N . . . . . . . . . . . . M . . . . I T . . R . . K . T . . N . . . . . . . . A . . . L......... A.quadrim M....IS.....K.T..S........A...L.................. D yakuba N....IS.....K.T..S A...L............. D.sechel1 M....IS.....K.T..S........A...L.................. D.simul M....IS.....K.T..S........A...L.................. Phormia F F.......M.............................L.....MN.R..K.I..N.R......G..n.............L.... Apis -...u~.-.~3. .-.... - . . . . - u - . M6 -.. ---....--.--...-... .I3 - - - - - - - - - - - f t - -M7 ------ - - - - u - - . ..................................................................... ................................................... ......................... ........... ................................................... ................................................... ................................................... ................................................... .......... ......... ..... ........ ......... ... T352 T359 * I K362 350 * I I I I Y414 H419 H421 * * I 400 I I I I I GI IQWYPLFT Chorthip ~ Y ~ S A ~ I I A V P T G I K V F S W L A T L Y O I X P " P P L L W A U ; F I n E T I G ILHDmWAHFHYVLSHGAVFAIMGGI M........ K................M......V........V................................... Locusta I. H..QLTYS .AM... F..V....V.....W.....I..V........................A.FVH....L. A. gambiae I.. ....~..Q LTYS.AM...F..V....V.....W.....I..V........................A.F.H....L. A.quadrim I.......H..QLSYS.AI......V....V.....W.....V...........................A.F.H...... D yakuba I.. H..QLSYS.AI.. V....V.....W.....V...........................A.F.H...... D.sechel1 I.......H..QLSYS.AI......V....V.....W.....V...........................A.F.H...... D.simul I . . . . . . . . . . Q L . Y S . A T . . . . . . V . . . . V . . . . . W . . . ..I...........................A.FVH.F.... Phormia R....YH.S.~.ISI..S....M.........IM.S...I...........G..............ISRF.H....I. Apia -.----..Ma . ..-.----.14 - - - - - - - .M9 ---..-..&---.-E5 - - - - - U - - -MI0 - - - - - - - - - U I5 - - .. . ...................... .................. ...... .................. .................. .................. ..... .................. .................. .................... .... 450 I I I I I 500 I I I Chorthip ~ ~ ~ I H F I G V N L r r F P Q H F L G L A G W R R Y S D Y P I S I V G I I W I L I L W E S M I M N R T I H F S N S ~ ~ P A Locusta I......... T........M.....KQ.~I~............... A. gambiae P....I..S...V.......................F..S.LT...V..L.....LFA.LY.LF.I.....TQ..PA.PnQL...I..YHTL... A.quadrim PU L..An..V.......................F..S.LA..IV..L.R...LFA.LY.LF.I.....TQ..PA.PPLQL...I..YHTL... D. yakuba L..K...S..I..... T...V.T......LL..LF.M.I...LVSQ.QVIYPIQLN..I..Y..T... D. sechell L..K...S..IT................................T..IV.T......LL..LF.FF.I...LVSQ.QVIYPIQLEI..I..Y..T..D.0imul L..K...S..I.................................T..IV.T......LL..LF.FF.I...LVSQ.QVIYPIQLN..I..Y..T..Phormia L..n..S..A........ A. T.. ....LL..LF.FF. I. LVSQ.QVL.WQ LN.. I.....T... Apia ..LL. IK I..M.................HS..........S.YC..S...M..M..LNRM.n.F.IL.RL.SK.M.~.Q..-L...Nn..L -----15----------- -----------..E6 -....---.-.. ---.-.-..---.-..... coon ............ .....~... ..... ..... ... ... ... ... ... ... ......................................... ............................ ......................... .... .. 522 Chorthip Locusts I Emm=S----------- I .. S.S...L.NIr--------........urn--------........LLrn--------.. S.S...LLrn--------...................... A. gambiae A.quadrim D.yakuba D.sechel1 D.aimul Phormia s.s...LLrn--------D.SHL.I.LLIK"LKSIL1K Apis .. ....... coon - - - - - - - - * Figure 2. Alignment of COI amino acid sequences for nine species of insect. Identity to C. parallelussequenceis denoted by period, a deletion by a dash. Asterisks denote universally conserved residues, or those with functional significance discussed in the text. The twenty-five structural regions are indicated (see text and Fig. 3 for details).Note that the first amino acid residues for all COI proteins discussed here are defined as methionines regardlessof the initiation codons. as suggestedby Wolstenholme (1992). 01996 Blackwell Science Ltd,.lnsectMolecular BiOlOgY5: 153-165 d i 4 Insect COI gene evolution and conserved primers !7SO ~ PI Flgure4. A graph showing the mean amino acid variabilityfor the twenty-five structural regions of the insect COI gene. The variability is expressed as the average number of amino acids per site observed in a given region. Internal loop 1 is highly conserved, in contrast to the other four internal loops which are relatively variable (Fig. 4). This pattern of conservation is also apparent in a similar alignment of twenty-two animal COI genes (unpublished data). A functional role for internal loop 1 is as yet unclear from current structural models of COI, though this data indicates that its function may be relatively important. The foregoing discussion of protein variability begs the question: does amino acid variability reflect DNA variability? In this situation it probably does. The evidence for this assertion is twofold. Firstly, a certain correspondence between amino acid substitutions and nucleotide substitutions is expected despite of the degeneracy of the genetic code, since it is DNA sequence which determines amino acid sequence. (However, the form of this relationship does not follow a simple linear function, as the degenerate nature of 161 genetic codes makes it possible for DNA variation to either match or outweigh protein variation, and it also falls under the influence of the phylogenetic distances of organisms compared.) Secondly, the relationship between patterns of amino acid variability and nucleotide variability can clearly be seen by comparing Figs 1 and 2, with the COOH terminal coding region being the most variable part, followed by regions coding for El, M3, E2, 12, 14, M9 and M12 structural regions. Finally, there is empirical evidence for the level of protein variability being a good predictor of DNA variability. The distribution of polymorphic sites in the COI gene between C. parallelus individuals in a population survey (Lunt, 1994) has been observed to match very closely to the variability predictions made by this study. Furthermore, Howland & Hewitt (1995) have sequenced 400 bp of the COI gene in Coleopteran species with a variety of divergence levels. Their results show adjacent regions of DNA sequence conservation and variability, which correspond to the loop and transmembrane structures in exactly the same pattern as reported here. These studies show that the expectations of shared patterns of DNA and protein variability are well founded, and that the regions identified in this study will almost certainly express the expected level of variability in most organisms. Given this pattern of conservation of sequence and structure, one is able to choose a level of variability to suit one's particular phylogenetic investigation. Conserved primers and their potential for use in evolutionary studies Primers which are conserved across many insect groups would be extremely useful in many types of evolutionary studies. The variability observed between insect COI sequences has been quantified by position and conserved primers designed to exploit Table 3. Details of the ten conserved primers designed to cover the whole insect COI gene. Positions given are those in Fig. 1. Primers UEAl and UEAlO are found in the tRNA-Tyr and tRNA-Leu genes respectively. Their positions relative to the COI gene (in the D. .yakuba sequence; Clary B Wolstenholme, 1985) are 56 bp (Tyr) and + 8 bp (Leu). Prlmer Strand Position (3' base) Length (bp) Sequence UEAl UEAZ UEAl UEA4 UEA5 UEA6 UEA7 UEA8 UEA9 UEA10 Sense Antisense Sense Antisense Sense Antisense Sense Antisense Sense Antisense tRNATyr 375 294 618 62 1 966 900 1266 1284 tRNALeu 26 26 24 24 24 29 24 24 26 25 GAATAATTCCCATAAATAGATTTACA TCAAGATAAAGGAGGATAAACAGTTC TATAGCATTCCCACGAATAAATAA AATTTCGGTCAGTTAATAATATAG AGTTTTAGCAGGAGCAATTACTAT TTAATWCCWGTWGGNACNGCAATRATTAT TACAGTTGGAATAGACGTTGATAC AAAAATGTTGAGGGAAAAATGTTA GTAAACCTAACATTTTTTCCTCAACA TCCAATGCACTAATCTGCCATATTA Q 1996 Blackwell Science Ltd, lnsect Molecular Biology 5: 153-165 162 D. H. Luntet al. El ~1 Region 1.6 1.7 M4 I1 NH2 Average variability M3 1.8 1.2 1.6 1.7 M7 M8 I3 I2 1.9 1.1 M6 M5 E5 E4 E3 E2 M2 1.3 1.2 2.1 1.o M9 E6 M10 1.o 1.5 1.1 1.6 2.1 MI2 I5 I4 1.2 M11 1.2 1.3 cI)cH 1.3 1.8 1.5 2.3 2.5 - 1 0 1 amino acid position Flgure 5. An illustrationof the distributionof amino acid variability (number of different amino acids per residue site) along the insect COI gene. The twenty-five structural regions, and their mean levels of variability, are shown above the graph. The position of the ten conserved primers are given by arrows, primers UEAl and UEAIO are in the tRNA-Tyr and tRNA-Leu genes respectively. areas of differing variability. From the multiple alignment of the DNA sequences of nine insect COI genes presented in Fig. 1, it can be seen that several areas are very generally conserved and the same, or slightly modified, primers may be applicable to many organisms. Primers were thus designed to cover the whole of the insect COI gene and to be positioned to aid the sequencing of regions of different variability. The primers are described in Table 3 and their positions shown in Fig. 5. Primer UEA10, located in the tRNALeu gene, although identified independently in this study, turns out to be the general insect primer 'PAT' designed in the lab of R. Harrison (Cornell University, pers. comm.). To test the universality of the conserved primers identified in this study, PCR amplifications have been carried out for nine insect taxa. These taxa cover the main divisions of the class Insecta, from wingless insects of the order Thysanura (silverfish, Lepisma saccharina, and firebrat, Thermobia dornestica), to winged insects of the orders Odonata (damselfly, Calopteryx splendens), Orthoptera (desert locust, Schistocerca gragaria and meadow grasshopper Chorthippus parallelus), Hemiptera (pea aphid, Acyrthosiphon pisum), Coleoptera (beetle, Carabus vidaceous), Diptera (fruit fly, Drosophila melanogaster) and Hymenoptera (bumble bee, Bombus lapidar- ius). Figure 6 shows the amplification product obtained using the primer pair UEA3-UEA8. These primers cover a large part (1018 bp) of the COI gene from the I1 region to M11 region (see Figs 3 and 5). A single main band of the expected size was produced in all insect taxa described above with the exception of the pea aphid, A. pisum (no product under the assay conditions, although other primers amplify the COI gene in this aphid). Direct terminal sequence analysis of the amplified PCR fragments confirms that they are the COI gene and that all sequences are taxa-specific, thus excluding the possibility of crosscontamination (terminal sequences of the PCR fragments for these species are available from the authors). A preliminary phylogenetic analysis of these sequences further confirmed the authenticity of the amplified bands. PCR amplifications with other primer combinations indicate that most of the primers described here work well in many of these different taxa. Degenerate forms of some of these primers have also been designed based on the sequence alignment in Fig. 1. We are currently investigating further the universality of all these primers and results of this analysis will be reported later. Initial results indicate that the primers identified in this study are quite broadly conserved, with primers UEA7, UEA9 and UEA10 working well even between the superclasses lnsecta and Arach- 01996 Blackwell Science Ltd, Insect Molecular BiologyS: 153-165 Insect COI gene evolution and conserved primers -1 kb --t -0.7 kb 4- 163 Figure6 PCR products amplified from nine different arthropod taxa using primer pairs UEAWEAB or UEA74EA10. A specific band of. . 1 kb was amplified from all taxa except the pea aphid, A. pisum, using the primer pair UEAIUEAB. The -0.7 kb PCR product for A. pisumshowedin the photograph was obtained using primer pair UEA74EAlO. The marker is a 123 bp size ladder (GIBCOBRL). nida, which are thought to have diverged during the Devonian period at least 400 million years ago (Pearse etal., 1987). As a general guide to the applicability of these primers, regions amplified by primer combinations UEASUEAG, UEA5-UEA8 or UEA7-UEA8 could be suitable for higher-level evolutionary studies (e.g. at genus or family level); regions amplified with primer pairs UEA3-UEA4 or UEA7-UEA10 should be more variable and thus suitable for lower-level analyses (such as study of intraspecific variation, and the phylogenetics of closely related species). The use of the last two primer pairs (UEA3-UEA4 and UEA7-UEA10) in population analysis of beetles, grasshoppers and other insects in our laboratory suggests that these regions are probably variable enough for revealing intraspecific polymorphisms (unpublished data, G.M.H. etal.). In this report we show that a detailed examination of the evolutionary patterns of a DNA region could provide valuable guidelines for its effective use as a molecular marker in phylogenetic studies. While this paper was in review, Simon et al. (1994) published a most useful compendium of conserved primers covering the whole insect mitochondrial genome. Increasing practices employing such primers will certainly reveal more information on the evolutionary patterns of other mitochondrial genes, which will in turn help us to assess their usefulness for addressing phylogenetic questions of different taxonomic levels. 01996 Blackwell Science Ltd, lnsect Molecular Biology5: 153-165 Experimental procedures Insect COI gene sequences The sequence of the C.parallelus COI gene was obtained as part of the complete sequence of a 6.4 kb mtDNA Hindlll fragment as described in Zhang et a/. (1995). Both strands of the COI gene have been sequenced. Other insect DNA sequences were taken from the GenBank and EMBL databases, with the exception of the Locusta migratoria sequence which was kindly supplied by P. Flook prior to publication. The accession numbers of these sequences are L20934 (Anopheles gambiae), LO4272 (Anopheles quadrimaculatis), L14946 (Phormia regina), LO6178 (Apis mellifera), X03240 (Drosophila yakuba), M57908 (Drosophila sechellia), M57911 (Drosophila simulans) and X80245 (Locusta migratoria) (see Table 1 for individual references). All sequences were aligned by the Clustal V method (Higgins & Sharp, 1989) and translated with the invertebrate mitochondrial genetic code using programs implanted in the LASERGENE computer software package (DNASTAR). The aligned DNA sequences were then examined manually using amino acid sequences and codon positions as references, producing the alignment shown in Fig. 1 which seems to be quite robust. Analysis of variability and statistical tests The aligned amino acid sequence was divided into twenty-five regions comprising five structural classes (twelve transmembrane helices, six external loops, five internal loops, carboxy and amino terminals: Fig. 3). The points of transition between these regions were taken from Gennis (1992). The number of different amino acids observed at each position of the protein alignment was recorded and the variability level expressed as the average number of amino acids per site observed in a 164 D. H. Luntet al. given region. A spreadsheet written (by J. B. Lunt and D. H. Lunt) to score the variability across such alignments is available from the authors. This analysis was limited to the 498 homologous positions between the ends of the shortest sequence. Insertion and deletion events were scored equally to the possession of a novel amino acid. Statistical tests were carried out using the StatView SE v1.03 (Abacus Concepts Inc.) software package. A KruskalWallis test (analysis of variance by ranks) was performed on the data sets to test the null hypothesis (Ifo):there is no difference between the average amino acid variability, per site, between the five structural classes. In the event of this h being rejected, an analysis would be performed to determine between which of the samples significant differences occur. This analysis followed the method described by Zar (1984). Conserved primers, PCR amplification and direct sequencing Primers for amplifying and sequencing the whole of the COI gene were designed using the Oligo 4.0 (National Biosclences Inc.) software package following the guidelines given by Rychlik (1992). Nine insect taxa, which cover the main divisions of the superclass Insecta, were used to test the primers identified in this study, viz: Lepisma saccharina (silverfish, Apterygota, order Thysanura), Thermobia domestica (firebrat, Apterygota, order Thysanura), Calopteryx splendens (damselfly, Pterygota, order Odonata), Schistocerca gregaria (locust, Pterygota, order Orthoptera), Chorthippus parallelus (grasshopper, Pterygota, order Orthoptera), Acyrthosiphon pisum (aphid, Pterygota, order Homoptera), Drosophila melanogaster (fruitfly, Pterygota, order Diptera), Carabus vidaceous (beetle, Pterygota, order Coleoptera) and Bombus lapidarius (bee, Pterygota, order Hymenoptera). An arachnid (spider, Tegenaria domesticus) was also included to gauge the broader generality of these primers. DNA was purified from individual insects using a phenol/ chloroform based extraction as described by Zhang et a/. (1995). PCR was carried out in a 50 pl reaction containing 1.52.0 mM MgCI2, 200 PM dNTP, 0.15 p~ of each primer, and 2 units of Taq polymerase (Promega) in 1 x reaction buffer (50 mM KCI, 10 mM Tris-HCI, 0.1% Triton X-100, pH 9.0 at 25"C, Promega). Following an initial denaturation at 94°C for 5 min. thirty to forty cycles were performed in a DNA Thermal Cycler 480 (Perkin Elmer Cetus), each consisting of melting at 95°C for 40 s, annealing at 4845°C for 1 min, and extension at 72°C for 40 s to 1 min 40 s. 5 PIof PCR product were used for direct DNA sequencing using the Sequenase PCR Products Sequencing Kit (USB-Amersham) following the manufacturer's protocol. Acknowledgements We acknowledge the help of Pamy Noldner, Marcus Rowcliffe, Nick Watmough, John Noble-Nesbit, Graham Hopkins and John Lunt. We also thank three anonymous reviewers for their valuable comments. We are grateful to Paul Flook for providing us with the Locusta COI sequence prior to publication. This work was supported by grants from the S.E.R.C. and E.U. References Avise, J.C., Arnold, J., Ball, R.M., Bermingham, E., Lamb, T., Neigel J.E., Reeb, C.A. and Saunders, N.C. (1987) lntraspecific phylogeography: the rnitochondrial DNA bridge between population genetics and systematics. Annu Rev fcol Systl8 489-522. Ballard, J.W., Olsen, G.J., Faith, D.P., Odgers, W.A., Rowell, D.M. and Atkinson, P.W. (1992) Evidence from 12s ribosomal RNA sequences that Onychophorans are modified arthropods. Science258: 1345-1348. Beard, C.B., Hamm, D.M. and Collins, F.H. (1993) The mitochondrial genome of the mosquito Anopheles gambiae: DNA sequence, genome organization, and comparisons with rnitochondrial sequences of other insects. lnsect MolBiol2: 103-124. Clary, D.O. and Wolstenholme. D.R. (1985) The mitochondrial DNA molecule of Drosophila yakuba: nucleotide sequence, gene organization, and genetic code. J Molfvol22: 252-271. Crozier, R.H. and Crozier, Y.C. (1993) The mitochondrial genome of the Honeybee Apis mellifera: complete sequence and genome organization. Geneticsl33:97-117. Flook, P.K., Rowell, C.H.F. and Gellissen G. (1996) The sequence, organisation and evolution of the Locusta migratoria mitochondrial genome. JMolEvol, in press. Gennis, R.B. (1992) Site-directed mutagenesis studies on subunit I of the aa3-type cytochrome c oxidase of Rhodobacter sphaeroides a brief review of progress to date. Biochim Siophys Acta 1101: 184-187. Harrison, R.G. (1989) Animal mtDNA as a genetic marker in population and evolutionary biology. Trends fcolEvol4 6-1 1. Hauke, H A . and Gellisson, G. (1988) Different mitochondrial gene orders among insects: exchanged tRNA gene positions in the COlllCOlll region between an orthopteran and dipteran species. Cuff Genetics1 4 471-476. Higgins, D.G. and Sharp, P.M. (1989) Fast and sensitive multiple sequence alignments on a microcomputer. Computer Applic Biosci5 151-153. Holm, L.. Saraste, M. and Wilkstrom, M. (1987) Structural models of the redox centers in cytochrome-oxidase. EMBOJB: 2819-2823. Howland, D.E. and Hewitt, G.M. (1995) Phylogeny of the Coleoptera based on mitochondrial cytochrome oxidase I sequence data. lnsect Mol BiolB 203-215. Liu, H. and Beckenbach, A.T. (1992) Evolution of the rnitochondrial cytochrome oxidase II gene among ten orders of insects. Mol Phylogen f v o l l : 41-52. Lunt, D.H. (1994) MtDNA differentiation across Europe in the meadow grasshopper Chorthippusparallelus (Orthoptera: Acrididae). Ph.D. thesis, University of East Anglia, Norwich. Moritz, C., Dowling, T.E. and Brown, W.M. (1987) Evolution of animal mitochondrial DNA: relevance for population biology and systematics. Annu RevEcolSystl8 269-292. Ojala, D., Montoya, J. and Attardi, G. (1981) tRNA punctuation model of RNA processing in human mitochondria. Nature 290: 470474. Paabo, S., Thomas, W.K., Whitfield, K.M., Kumazawa, Y. and Wilson, A.C. (1991) Rearrangements of mitochondrial transferRNA genes in marsupials. J MolEvol33: 426430. Pearse, V., Pearse, J., Buchsbaum, M. and Buchsbaum, R. (1987) Living Invertebrates. The Boxwood Press, Pacific Grove, California. Rychlik, W. (1992) Oligo Version 4.0: Reference Manual. National Biosciences, Inc., Plymouth, Minnesota. 0 1996 Blackwell Science Ltd, lnsect Molecular Biology5: 153-165 Insect COI gene evolution and conserved primers Saraste, M. (1990) Structural features of cytochrome oxidase. 0 Rev Biophys23: 331366. Satta, Y., Ishiwa, H. and Chigusa, S.I. (1987) Analysis of nucleotide substitutions of mitochondrial DNAs in Drosophila melanogasterand its sibling species. J Mol Biol4 638-650. Simon, C. (1991) Molecular systematics at the species boundary: exploiting conserved and variable regions of the mitochondrial genome of animals via direct sequencing from amplified DNA. M o k u l a r Techniques in Taxonomy (Hewitt, G.M., Johnston, A.W.B.,Young, J.P.W., eds), pp. 33-71. Springer, Berlin. Simon, C., Frati, F., Beckenbach, A,, Crespi, B., Liu, H. and Flook, P. (1994) Evolution, weighting, and phylogenetic utility of mitochondrial gene-sequences and a compilation of conserved 01996 Blackwell Science Ltd, lnsect Molecular BiologyJ: 153-165 165 polymerase chain-reaction primers. Ann Ent SocAmer87: 651701. Smith, M.J., Arndt, A., Gorski, S . and Fajber, E. (1993) The phylogeny of echinoderm classes based on mitochondrial gene arrangements. JMolEvo136 545654. Wolstenholme, D.R. (1992a) Animal mitochondria1 DNA: structure and evolution. lntRevCytoll41: 173-216. Wolstenholme, D.R.(1992b) Genetic novelties in mitochondrial genomes of multlcellular animals. Curr Op GenetDevel2 91&925. Zar, J.H. (1984) BiostatisticalAns/ysis. Prentice-Hall, London. Zhang, D.-X., Szymura, J.M. and Hewitt, G.M. (1995) Evolution and structural conservation of the control region of insect mitochondrial DNA. JMolEvol40: 382-391.