Download The insect cytochrome oxidase I gene: evolutionary

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Frameshift mutation wikipedia , lookup

Transposable element wikipedia , lookup

Genomic library wikipedia , lookup

Gene therapy wikipedia , lookup

NUMT wikipedia , lookup

Gene expression programming wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Gene nomenclature wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genetic engineering wikipedia , lookup

Genome (book) wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Nutriepigenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Human genome wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Gene desert wikipedia , lookup

Non-coding DNA wikipedia , lookup

History of genetic engineering wikipedia , lookup

Expanded genetic code wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome evolution wikipedia , lookup

Metagenomics wikipedia , lookup

Gene wikipedia , lookup

DNA barcoding wikipedia , lookup

Genetic code wikipedia , lookup

Genomics wikipedia , lookup

Designer baby wikipedia , lookup

Genome editing wikipedia , lookup

Microsatellite wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Point mutation wikipedia , lookup

Microevolution wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Insect Molecular Biology (1996) 5(3), 153-165
The insect cytochrome oxidase I gene:
evolutionary patterns and conserved primers
for phylogenetic studies
D. H. Lunt, D.-X. Zhang, J. M. Szymura* and 0. M. Hewltt
Introduction
Population Biology Sector, School of Biological
Sciences, University of fast Anglia, Norwich
The study of mitochondrial DNA (mtDNA) sequences
has become the method of choice in recent years for a
wide range of taxonomic, population and evolutionary
investigations in animals. Many aspects of the structure and evolution of mtDNA have made it a valuable
evolutionary tool. These include its ease of isolation,
high copy number, lack of recombination, conservation of sequence and structure across metazoa, and
range of mutational rates in different regions of the
molecule (reviewed by Moritz et al., 1987; Harrison,
1989; Simon, 1991; Wolstenholme, 1992a). The mitochondrial gene encoding subunit I of cytochrome
oxidase (COI) possesses some extra characteristics
which make it particularly suitable as a molecular
marker for evolutionary studies.
Firstly COI, as the terminal catalyst in the mitochondrial respiratory chain, has been relatively well
studied at the biochemical level, and its size and
structure appears to be conserved across all aerobic
organisms investigated (Saraste, 1990). Mutational
studies have been used to map the reaction centres of
this subunit (Gennis, 1992) and these provide a background which enables interpretation of sequence differences in terms of gene function. Cytochrome
oxidase I is involved in both electron transport and the
associated translocation of protons across the membrane and it has been shown to contain a range of
different types of functional domain including ligand
sites, components of the proton channel, structural ahelices and interspersing hydrophilic loops (Saraste,
1990; Gennis, 1992). Amino acid residues in the reaction centres, which are highly conserved, do not
dominate the entire COI molecule, allowing scope for
considerable variability in some regions. Such a mix
of highly conserved and variable regions so closely
associated in a mitochondrial gene make the COI
gene particularly useful for evolutionary studies.
Secondly, the COI gene is the largest of the three
mitochondria-encoded cytochrome oxidase subunits
(composed of 511 amino acids in D. yakuba, compared
Abstract
Insect mltochondrlal cytochrome oxidase I (COI)
genes are used as a model to examlne the wlthlngene heterogeneity of evolutlonary rate and Its lmpllcations for evolutionary analyses. The complete
sequence (1537 bp) of the meadow grasshopper
(Chorthlppus parallelus) COI gene has been determined, and compared with eight other Insect COI
genes at both the DNA and amino acid sequence
levels. This reveals that different regions evolve at
different rates, and the patterns of sequence varlablllty seems associated with functional constralnts on
the protein. The COOH-terminal was found to be
slgnlflcantly more variable than Internal loops (I),
external loops (E), transmembrane helices (M) or the
NH2 terminal. The central region of COI (MSM8) has
lower levels of sequence variability, which Is related
to several Important functional domains In thls reglon.
Highly conserved primers which amplify regions of
different variabilities have been designed to cover the
entire insect COI gene. These primers have been
shown to amplify COI in a wide range of species,
representing all the major insect groups; some even
In an arachnid. Implications of the observed evolutlonary pattern for phylogenetic analysis are dlscussed,
wlth particular regard to the choice of regions of
sultable variability for specific phylogenetic projects.
Keywords: insect, Chorthlppus parallelus, cytochrome oxidase I, mltochondrlal DNA, conserved
PCR primers, genetic marker.
'Present address: Department of Comparative Anatomy, Jagellonian
University, Karasia 6,30460Krakow, Poland.
Received 14 July 1995; accepted 3 November 1995. Correspondence:
Professor G. M. Hewttt, Population Biology Sector, School of Biological
Sciences, University of East Anglia, Norwich NR4 7TJ.
f, 1996 Blackwell Science Ltd
153
154
D. H. Luntet al.
to 228 for COll and 261 for COlll; Clary & Wolstenholme, 1985), and is one of the largest proteincoding
genes in the metazoan mitochondrial genome. This
enables one to amplify and sequence many more
characters (nucleotides), within the same functional
complex, than is possible for almost any other mitochondrial gene.
A suitable genetic marker is an essential prerequisite for success in many evolutionary studies. The
crucial characteristic in the choice of such a marker is
the substitutional rate of the particular region. To a
large degree it is the broad spectrum of substitutional
rates which accounts for the popularity of animal
mtDNA as a molecular tool, since it allows resolution
of both intraspecific phylogenies (e.g. Avise et a/.,
1987) and the higher level systematics of anciently
diverged taxa (e.g. Ballard et a/., 1992). It is well
known that different genes may evolve at different
rates, and the same gene may have different rates of
evolution in different lineages. However, within-gene
heterogeneity of evolution rate has not yet received
enough attention especially in the field of lower taxonomic level phylogenetic studies. It may be misleading for many applications to consider a gene as fast or
slowly evolving, because this implies a homogeneity
of rate across the whole gene, which is rarely true due
to the concentration of functional constraints in specific regions of the DNA sequence. Hence it is highly
advantageous to have information concerning the
relative substitutional rates of different gene regions,
as this will allow a much more informed choice of
sequence for particular phylogenetic investigations.
Sequences evolving too quickly are known to lose
their ability to unambiguously reveal the phylogeny of
anciently diverged taxa. Similarly, choice of a
sequence which is too conserved when addressing
questions of intraspecific phylogeography, for
example, will not provide enough informative characters to determine the requisite relationships. Thus for
many studies success will depend to a large degree
on the sampling of a region containing a suitable level
of variability.
In this study we have used the COI gene as a model
to study the within-gene heterogeneity of evolutionary
rate and discuss this in terms of its implications to
phylogenetic studies. The complete sequence of cytochrome oxidase I for the meadow grasshopper
(Chorthippusparallelus) has been determined. (At the
time of writing, no other Orthopteran COI sequence
has previously been published, though the Locusta
rnigratoria sequence, kindly provided by Paul Flook,
University of Berne, Switzerland, prior to publication,
is also included here; see Flook et a/., 1996.) Comparative analysis of this sequence with seven other
published insect COI sequences has been carried out
in order to identify and quantify areas of differing
levels of variability. The relative rates of evolution (at
the amino acid level) of these areas are considered in
the context of both the structurefunction model of COI
and their utility in evolutionary studies. Areas of DNA
sequence which are completely conserved are also
shown and conserved primers designed and tested for
their applicability to a wide range of insect species.
Results and Discussion
Sequence and structure ofthe C. parallelus COI gene
The COI gene of Chorthippus parallelus has been
completely sequenced and is presented in Fig. 1
together with an alignment of eight other insect
species. This sequence significantly extends our
knowledge on insect COI genes, because six of the
eight sequences previously available were from the
same Order, Diptera. The Chorthippus parallelus COI
gene is flanked by tRNA-Tyr at the 5’ end and tRNALeu at the 3 end. Although the relocation of tRNA
genes is not an uncommon event (Hauke & Gellisson,
1988; Paabo et a/., 1991; Smith et a/., 1993),
Chorthippus shares its positioning of these genes with
many of the other insects presented here (Beard eta/.,
1993). The complete C. parallelus COI nucleotide
sequence was found to be 69.4% A+T, comprised of
34.1% A, 15.5% C, 15.1% G and 35.3% T. The A + T
percentage of nucleotides at the third codon position
is much higher (90.8% AT) than either of the other two
locations (first = 57.9%; second = 60.6%). With the
exception of Apis, these values are typical of those
reported for other insects (see Table 1).
The putative initiation codon for C. parallelus COI is
TCG and the stop codon is a single T at position 1537.
Although the exact initiation and termination codons
for insect COI genes are probably less clear than
those for any other insect mitochondrial gene (Beard
et al., 1993) the codons suggested here are wholly
consistent with those reported elsewhere (see Table
1). Beard et a/. (1993) discuss the possibility that the
initiation codon for D. yakuba is the TCG triplet immediately following the ATAA which is more usually
recognized. Although TCG would fit well with many of
the other insect sequences presented here, it is not
supported by the other Drosophila sequences. Both D.
simulans and D. sechellia share the ATAA motif but
have undergone substitutions which alter the following TCG (Ser) triplet to CCG (Pro). The 5’ COI
sequence of two other Drosophila species have been
reported by Satta et a/. (1987). These species commence with the tetranucleotides ATAA (D. rnelanoga-
0 1996 Blackwell Science Ltd, lnsect Molecular Biology5: 153-165
lnsect COI gene evolution and conserved primers
155
50
I
I
I
I
I
..............
---...................
---................
100
I
I
I
I
I
t
c
g
C
C
G
C
A
A
A
A
A
n
;
A
T
A
~
~
~
Chorth
attaA..
C...........C.....C.....T..AC.G..T........C..G........T.....A.................A
Locuata
tcg.G.C
T.................T.....T.....T..C.....T.....T.....A.....G.....TT....A
A. gambiae
tcg.G.C
T....................A.....T.....T........T...........A........T..TC....T
A. quadrim
ataaT...G.C....G........T.....T.....A...........T.....T..C..T........T.....C.....A...........TT....A
D. yakuba
A..C.........T....A
D.aechel1 ataa....G.C....G........T.....T.....A...........T.....T.....C........C.....T.
ataa....G.C.............T.....T.....A....,......T...,.T.....T........T
T.....A..C.........T....A
D.simul
tcg.G.C
T..T..T.....A........T..T.....T..C..T..C.....T...T.C.....A........T..TP....A
Phormia
ataAT...G.....CA.A.....C..T.....AA.......G.T...G..TA....TC.A.~....T.T.....AC.....T.......G..A
Apia
mZ
+-..-.......-....
+
I
I
I
I
I
I
I
I
zoo
I
I
I
I
I
A
A
~
~
~
C
A
A
A
A
A
n
;
A
T
M
~
A
~
'
Chorth
Locuata
A..T...T.A..T........AA.AA.A...AAC.........C.A........A...........A..C........T..........
A. gambiae
CC.A.
A..T
A..
C..T..AG.AT
C.................A...G....T........T...A.T..........
A. quadrim ..TT. A.....A......T.A.....C..T..AG..T.......T.......................n;.A...........T...A.T........T.
..TT.A.....A......T.A..T..T.....AG.AT.A..........................A..n;....T..A.....T...A.T........T.
D yakuba
.C..T..AG.AT.A..........................A..n;.A..T..A.....T...A.T........T.
D aechell ..TT. A.....A..C...T.A..T.
D.aimu1
..TT. A.....A..C...T.A..T..T..T..AG.AT.A..........................A..TG.A..T..A.....T...A.T........T.
CC.A.....A..C...T.A..G..C..T..AG.A..A...........C..............A..n;.A..G.....C..T...A.T..........
Phormia
Apia
C.T........AAT....T.AA..TCC.....A..ATGA...A.CA.................ACA..n;....TAG.........CC.A........T.
El . - . - - - - . . . - - - . . . - . - - - - - - . . - - - - - - - x ....
..
-.
..
-..
..
..
.- .
..
~
--...--------.----.----...-..-.--.-~-.---.-.___.----_...~---~~..
150
I
~
..... ....
.............
----
----
~
...........
..
.... ..... ...
L
T
..........
..
f-----....------------.----..----
250
I
I
I
I
I
300
I
~ A T ~ T A C C T A T T A T A A ~ ~ M ~ ~ ~ C A T P A A T A A ~ ~ T A T A ~ C A C
Chorth
.C..G..T..G..A...........A.....C..A......T...................A..T.....................
Locuata
T.................A..G.....A..C...T....T..T......T.A..A.....T..............T...........T.....
A. gambiae
T.................A........A...........C..TC.....C.A.....C........G.....C..T...........T.....
A.quadrb
A.................G..G.....A......T....G..T......T.A..A..T..T..C.....A..C..............T.....
D. yakuba
T.......................C..A......T.......T......T.A..T..T..T........A..C..............T.....
D.aechel1
T.......................C..A......T.......T......T.A..T..T..T........A.................T.....
D.aimul
A.....G...........A........A...........T..CC.T...T.A.....T..T........A..C..T...........T.....
Phormia
T.....AT..T.......A........A.....G..TA.T..T......C.A..AT....T...
A..C..C...
T..T..
Apia
.............................................................
11 .................................
..............
.......
.......
.......
.......
.......
.......
.......
.....
........
350
Chorth
Locuata
A. gambiae
A.quadrim
D.yakuba
D aechell
D.aimul
Phonnia
Apia
.
I
I
I
'C~~~~ATM~TAA~~~~CAAAAAn;~C~~~C~
I
I
I
I
400
I
I
I
A.........T................A..CC..C.AATG..T...G..G.A........A..T..T..............A..T........A...A..
A......A.G..T..T..T........A........TTIT.TA6AG....G.A..A..C..G..T..............T.....T.....T..AT.TPCP
A......A..T....C..C..T.....T...C....TI.TAG.AG....G.A..A.....A..T........G.....T..A..T.....T..AT..TC.
A..............T..TG.TC.TT.TT.A...T.A.T.AG.AGA...G.T..A..C..A..T..T.....T.....T...........TT.AT.~.
A..............C..n;.TC.W.TT.A...T.A.T.AG.AGA...G.T..A..C..A..T..............T.....T..A..TT.AT.T.CT
A..............C..TG.TC.'LT.TT.A...T.A.T.AG.AGA...G.T..A..C..A..T..............T.....T..A..~.AT.T.CP
C...C.TP....T..n;.......TT.A...T.A.T.AL;TAG....G.A..A.....G..T..............T........A..TT.AT.TPCP
A...........T..T..C.....
.mA.AC.TP.ATT.AG.AA.T..T.TT..CCAA.AC.......T...........A..T..A...T.AT.A.C.
...................MJ ---....----....-..-..-------~-.---------.----....
EZ .......................
...
450
I
I
I
I
I
500
I
I
I
I
I
G C A A T n ; C C C A n ; O T G C T ~ A ~ T T ~ T C T ~ C ~ ~ A ~ A ~ O T ~ ~ A T T A C
Chorth
.TC.....
T...A.A ...GCT.. T......T
A.....T..............A........C..A...........TA.T..T..C..........
Locuata
A. gambiae .G......T....C....GCT.........T....A.....T..TC.T...T....A..AA....T....................T.............
.G......T....CA...GCT.........T..........T.........T....A..AA....T..A..............T..T.............
A.quadrM
.OT.. C..T.........
GCT.. T......T..
T..TC.T...T....T..AA....T..A...........T.....T........G..T.
D yakuba
T.
D.aechel1 .G................GCP.....T...T..........T..T......T....T..AA....T..A...C.......T.....T........
.G.........C......GCT.....T...T..........T..T......T....T..AA....T..A...C.......T.....T...........T.
D.aimul
AAT.
T..C..A...GC...T..T...T.............TC.T..CP.G.....AA....T..A...........T.....T..C........T.
Phormia
TATT.ATAT ...TC.TC. Cff.........T.T..A.....T..TC.T...A..T.A..AA....C..A...A.....T..T.......A..ADPT...A
Apia
...................
EZ --......-------.-.-----------.-.-.------.--.
M4 ...........................
....
........
.
...
....
550
I
Chorth
Locuata
A.gambiae
A.quadrim
D.yakuba
D. sechell
D.simul
Phormia
Apia
I
I
I
I
600
I
I
C
A T T ~ T A ~ ~ ~ T ~ ~ ~ ~ C A
....AC. ....
A..A...A.T.AT.....C..T..............A................C.....C...CP.A...C.T..AT.............
T.........A.....TCC.G....T...T.A....G..T......A...........G.....T....C....G.A........AT............
TT........
~
.........
~
I
A
O
I
T
A
A
T
T
A..AG..CC.G....T..T..T..C.G..T...G..A..C........T.
C....G.A........A..T..T........
T..
A..A...ACTG.
T...T.A..C.G..T
T..A.....A...........T....CT..TT.A...C.T..A.....T........
T.
A..A...AC.G....TT..T.A....DI.T...T..A..
T....CT..CP.A......C..T.......G.....
T.
A.
AC.G ....TP.. T.A....DI.T
T..A.................T....CT..TT.A......C.AT.............
T..
A..A..TAC.G....T...T.T....G..T...T..A...........T.....T....CT..T......C.T..AT....T........
TP.. A.TA..AAAAAA~....ATTAT..C....CAAAAAn;....A...CCA........TT.T....C....A........A.TA..........T..
..-.-...-....-----------1.2
.................
.......
........
...........
..
.......
...
...
...
Flgure 1.
01996 Blackwell Science Ltd, lnsect Molecular Biology 5: 153-165
...............
I
~
.
~
A
T
156
D. H. L unf et al.
700
650
IC
Chorth
IG
I
U
I
C
I
T
I
A
m A G
A G
A G
U L
T A
T A
T P
A.....T................................CC.T.....G........C..C..G..A.....A..T.........T.A..T.....C...
A. gambiae A...........T......................................-T..........,...A.....A..T.~T......T.A..T.....C...
A.quadrim A.....T.....T.........C.T.....A........C....................C.,......G..A..G..A..C.A.T.A.,......C,..
D.yakuba
.C.T..C.....T.................A..C..............T..T..T.................A.....T..T...T.G.............
D sechell A.....G.....T.................A.......................T.....C......,....A........T...T.A............
D. a h 1
A.....G.....T.................A.......................T.....C.........,.A,.......T...T.A............
Phormia
A........T..T....................C..............T.....T.................A........T...T.A............
Apis
A.....T...............C.....TP............T...........T..C.....TATA...........T...........T.........
I
A
~
~
I~
~
~
IT
~
~
IC A ~ T
T
Locuelta
.
- ~ - . - - - - - - - - - - - ~ - - - - - - - - - - - - - - - - - - - - - - - - E3
- - ---- ---.. . -.-- - - - - . - - - - . - - - - - - - - - - - - - - - -
750
chorth
Loculrta
A. gambiae
A.qU.¶dKim
D.yakuba
D.sechel1
D.simul
Phormia
Apia
I
I
I
I
I
800
I
I
I
I
..............
T..C..A..................T......................A...
..C..T...........A...,............
..
C.....T.....T...
..
A.....A............T..
..
T.....C..A..A.....G.
.....
A.TAC...............G.AG...A...
........
T.. ......
....
T.........
.......
A............A.TACA........A..T....AG...A...
........
T.....T..C.................
....TT........GT...... ......
....AA....AA................
A.TA.A......TC...T....AG...A.T.
........
T.....T.....
...................
........A.TA.A......TCG.
......AG...A.T.
........
T.....T............
............
T....T........G..A.
...........
A.TA.A......TCA..G....AG...A.T.
...........
.
....A............T....T.....C..A..A............A.TA........TCA.....G.AG...A.T.
........T.....CT....T..C......
A.... ..............T....T ........AT.A..C............ATAA.T.
....A.......AA...A~.
C.................
..................... M6
I3
- - - - - - . - - - - - - - - - - - - - - - - - - - -.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
---------
850
Chorth
Locusta
A. gambiae
A. quadrim
.
.
D yakuba
D.sechel1
D s h l
Phormia
Apis
chorth
I
T T T K A ~ ~ A T A " T I ' A A l T C l ' A C C A G G A ~ A T P A T P K T C A T A 1 T T P
I U
TTGAATCAT
T
.......AA.
I
I
I
I
I
900
I
I
I
I
I
I
I
I
I
I
I
I
I
~
T
A
T.............................A........A..............T.....C...........A..A........A.....
......AC.....
A.....C.....T...C..G.......C...C.T..A........T.....T....................A..............
.C....AC.....A...........T...C..G.T.....A...C....G........T.....T.....C...........A..A..............
.C..TT.......A.....C.....T...C.TG.T.....A...T....A........T.....T....................A..............
....TT.......
A...........T......G.T.....A...T....A..............T...........C.....A..A........A.....
A...........T......G.T.....A...T....A..............T...........C.....C..G........A.....
.C...T.......A........C..T......G.T.....A...T....A........T.....T...........C..C..A..A..............
T.A....A.A..................GC......A..TC.............T...........C...........C..AT....T........
.............I3 - - - - - - - - - - - - - - - - - - - - - - -M7
---.
-~
-..~
-~
-~
- ..
-- . ~ . . ~ - ~ - - - . ~
....~.......
....
I
I
I
I
I
I
I
I
950
I
1000
I
~
~
T
~
~
~
~
T
A
T..T..C..C.....A..............T.................G.....C......A.......AT.A..C.....T.....C..A..C
Iocumta
A. gambiae T.....T..............A..T...........T..G.....T........WL.T.....T...T....T..AT.AC.C......C....G.C..A.
A.quadrim T.....T..T.....T..T..T..T...........T.....T.....T......A.T.....T...T.......AA.AC....T...C....A.C..A.
D.yakuba
T..T.....T..T..T..T...........G..T..T............A.T.........T....T...T.AC.......TC..C..TC..A.
T..T..C..T.....T..T..............T..T..T..T......A.T.....C...T....T...T.AC.......TC..C..TC..A.
D.sechel1
T..T..C..T.....T..T..............T..T..T..T......A.T.....C...T....T...T.AC.......TC..C..TC..A.
D.elhl
Phormia
T.....T........T....................T..T.....T.........A.T.....T.......................TC....A....AC
Apia
T........T.....T.....A........C.....T....................T.........T........TA.C....~.......A..A..A
.
. Ed . . . . . . . . . . . . . . . . . . . . . . . . . .
- - - - - . . - - - - - - . - - - . - - - - . - - - - -_
-_
-_
-_
-.
_
-_
...
..~
-
......
......
......
......
1050
Chorth
A
I
1100
A
~
~
~
~
~
~
A
~
T.A.....AT.A...........C...........A..A........T..TC.TG.AC.T.....................G.T....
Locueta
A. gambiae .GC... G.TA T.G.....AT...
CG....CT........G........C.
G.TC.AC....T.........A.T......G.TC.T.
A.quadrim .G...TG..A..T.A......T.C......G.A...T........G.A..A........C..TG..G.A.....T........AA.T......G.T..
D. yakuba TC.... G.TA.TP.A......T.A......G.....T....C...G.A..A..A.........G.TG.A.....T........AG.T........T....
D sechell TC.... G.TA.TP.A......T.A......G.A...T....C...G.A..A..A......
G.n;.......T........AG.......C..T....
D.simul
TC.... G.TA.TP.A......T.A......G.A...T....C...G.A..A..A.........G.n;....
T........AG..........T....
Phoda
TCC...G.TACTT.A..
T.A.
G.A...T.......n;.A..A..A.....T...G
.n;..C....T..C..T...A.T........T....
A~..A.TP.A...T.A..A..T......A.A....
T.....b..A.........A.T..A...T.......T...A.T........TC.T.
Apia
..............
.
.... .....
...
.......
...
....
I
A
~
..
...
....
ng
E5
------.---------------------------------------_
_
-_
-_
._
-_
-.
-_
- .-
I
ChOrth
Locu4ta
A. gambiae
A.qu4IdZ-h
D.yakuba
D. sachell
D.aimu1
Phoda
Apis
I
I
I
1150
I
1200
I
I
I
I
I
A
~
T
~
A
~
~
~
~
~
T
~
A
T
~
~
A
T
~
A
T
C..A........A..............C..T..AT..................................A..............C-.T........
.C.................T..T...........T..AT....A.................T.......C...AT..G.A..T..............A..
A........A..T..C.....T.....T..(;T..........G.....C.....T.......C...DT.......T.....C......C.A..
A.
T...
C......T....A........T........T..,....C....T.......C.....C...........
A.....T.....T.....T...T,...A........T........T.......C....T.......T.....C-.T.......
A.....T.....T.....T...T....A........T........T.......C....T.......T.....C..T.......
.C.....A........A.....T........C..T..AT....A........T..C.....T.......CT..AT..G....T....TC..T........
A.....C.....T.G......T.....T.....T..A.....T.................~.A.AT.......T............A...
-----------------------M1O
---.
--.---- --.
----.15
-_._..-------------..
....
.......
................
................ .... .....
.
.
................
.......
.
Flgurr 1 (continued)
0 1996 Blackwell Sclence Ltd, lnsect Molecular Biologys: 153-165
lnsect COI gene evolution and conserved primers
157
1250
Chorth
Locusta
A. gambiae
A.quadrim
D.yakuba
D.sechel1
D simul
Phormia
Apia
.
1300
I
I
~
~
T.......G..A....C....A..T.....T........................T.......C...........................
T...............CCA..T........A..T.....TT...........G.A........TT...................T...T....T.....C
T.....CCC..AT.......
T.......TG.A..A......G..........TT....C.....C........T..CT....G..G..C
T.....G...T.........A...G.....A.GT.....T.T......G..............TT.........
.C..C.....T...T...........
T.........T.........A.........A.OI.....T.T..C..................TT..........C........T...T..........T
T.........T.........A.........A.GT.....T.T.....................TT..........C........T...T..........T
T...C.....T.......C.AG.T......A.GT.....n;................T.....TT....C..C..C........T..CT..........T
T ......TP. T......T
A.........A..T.....T.T...A.................T.....T..C...........T...T.....C..AT
~
I
I
I
I
I
I
I
I
A
.........
.........
..
..
.
*-
_~........_..._.
.------------......~....--.--..---..---...... M11 .......................
1350
I
I
I
I
I
1400
I
I
Chorth
G G A A T A C C A C G A C G C A ~ ~ T A T C C A
T..C..T.................T..C..C..A........A...........CAC......T.......
-eta
A. qambiae
T...........A..C.T.......AGC...TT.A...........G.....TC.T.A..TAG......C...T.AT.C.CT..TT.ATAC.
A.quadrim
T........C..A....T.......~..CTT.G........A.TG.A..ATC.T.A...AGA........TP.AT.T.CT..TP.ATAC.
D.yakuba
T.....T.....A.....C..T.....T..C..TA..........n;.G....C......G..A..T......T.AT.......TP.AT.T.
D.scchel1
G.....T...........A..............T..C...A........A.TG.G....C.........A..T......T.AT.......TC.AT.C.
D simul
T........C..A..............T..C...A........A.TG.G....C.........A..T......T.AT.......TC.AT.C.
T........C.....C...........T..C...G.T................C......T..A.........T.AC.......TP.AT.T.
Phonnia
Apia
TCT...........T.....A..C.........T.T...TAC.GT......TC......ATC...A.....A.T.......T.AAATA....A...T.T.
...................... E6 ..................................................
.............................
I
G
A
T
I
G
C
A
I
T
A
T
........
........
.
........
..
........
........
.................
1450
I
I
I
I
I
1500
I
I
I
I
I
T T A T C C P A A T P T P A T G R G A A A W V L T A A T A ~ ~ T A A T ~ T ~ ~ ~ ~ ~ T T
chorth
Locusta
.C..TT.....A..................AGC.A....ATG.AT.TA..T...CA...................................T..A..A..
A. qambiae
T.AT.T...A.T........T.....C.CTC.A.....TC~..CCCT.TAC.AT..TC.TC.....TT.......AC..T.CC~..T.....
A.quadrim
T.AT.T...A.T........T.......C.C.A......CCAGC...CCC..TAC.AC.TPCTLY:.....TT..G....AC..C.CCCTA..C.....
D.yakuba
T..TAT...A.T........TT..G.~.A...CA.G.A..T.A.CC..T.C.AT...ATPC...T.TP.......AT.......CA..C..A..
D.sechel1
T.TP.T...A.T........~..G.ATC.C.A..CCA.G.A..T.ACCC..T.C.AT...ATPC...T.TT.......AC.......CP..C..A-D.simul
T.TT.T...A.T........W..G.ATC.C.A..GCA.G.A..T.A.CC..T.C.AT...ATLY:...T.TT.......AC.......CP..A..A-Phormia
.CP.TP. C...A.T........TT..G.ATC.C.A..TCA.G..T....CCCTGPCC.AT...ATPC.....TT................eP..A..A..
Apis
.A..TP.T...A.T.T.......T.....TCI..A....T.T.AT......A.TPC..CCA.T---C....CIT.........A.TIT.TTA..A...CT
.....................................................
Coo" ___.._._....-....-...-.....~--
..
..
..
..
..
1550
Chorth
Locusta
A.qambiae
A.quadrim
D.yakuba
D.sechell
D.s-1
Phormia
Apia
I
I
I
I
AGAACATAGATATPCAGRACTACCAATARmCPAGRt-----------------------------I
1568
I
I
......
C.....C............C.....AA.TTATAGAt-------------------------........CT...
G....G..T..TC..T.AA...ATAACt---------------------------
...........................
....................................................................
....................................................................
........CT...
G.....T.....T..T.ACPA.TAATTt
T..............T...T.....C.TT.AA.A.ATtaa----------------------------
T.....C..C...AGT...T....TT..T.AA...ACTPCtaa-------------------------
...T...TC.C...T....A.T...T..T.AAT..AAAAmAAAmAAAATCAAmTAATPAAAt
COOH
..........................
.............................
*
Figurel. DNAalignment ofthe COI genefor ninespeciesof insect. Identity to C.parallelusis indicated byaperiod, adeletion by adash. (Putative) termination and
initiation codons are displayed in lower case. The twenty-five structural regions are indicated (see text and Fig. 3 for details).
sfer) and GTAA (D. rnauritiana), indicating that differences in initiation codon can be present even between
sibling species. The termination codons used for
insect COI genes are shown in Table 1. Many organisms terminate with a single T, or TA, immediately
adjacent to the tRNA gene, and it is known that
complete (TAA) termination codons can be produced
by post-transcriptional polyadenylation (Ojala et a/.,
1981). Drosophila and Phorrnia, however, show the
complete TAA termination codon common in many
other mitochondrial genes.
Translation of this sequence with the invertebrate
mitochondrial genetic code (Clary & Wolstenholme,
1985) gives a protein sequence with a mixture of
residues conserved across all the studied species and
residue positions of differing levels of variability [see
r; 1996 Blackwell Science Ltd, lnsect Molecular Biology5: 153-165
below and Fig. 2. Note that the first amino acid residues for all COI proteins discussed here are defined
as methionines regardless of the initiation codons, as
suggested by Wolstenholme (1992b)l. No insertion/
deletion events are apparent between C. parallelus
and eight of the nine other insects. Apis, however,
shows a deletion of 3 bp at position 1464 (Fig. 1). The
position indicated here differs by three codons from
that given by Crozier & Crozier (1993) but seems to
give a better overall alignment. This deletion falls
within the COOH-terminal region and may not be
constrained with respect to either size, structure or
amino acid function to the same extent as would a
deletion of non-terminal residues. These observations
agree with the expectation of conservation of size and
structure in functionally constrained systems. Figure 3
158
D. H. Luntet al.
T a b 1. Summary of data concerningthe COI gene for nine species of insect. The sequencesfor D. simulans and D. sechellia terminate prematurely
Ref
Organism
Order
lnit
codon
Term
codon
Length
(bp)
% AT
1
2
C. parallelus
Locusta migratoria
Anopheles gambiae
A,quadrimaculatis
Orosophila yakuba
D. sechellia
D. simulans
Phormia regina
Apis mellitera
Orthoptera
Orthoptera
Dlptera
Dlptera
Dlptera
Diptera
Diptera
Diptera
Hymenoptera
TCG
ATTA
TCG
TCG
ATAA
ATAA
ATAA
TCG
ATA
T
T
T
T
TAA
1537
1542
1537
1537
1540
( 1498)
(1498)
1539
1561
69.4
69.1
68.6
68.1
69.9
70.2
70.7
68.3
75.9
3
4
5
6
7
8
9
TAA
T
Accessior,
number
X80245
L20934
LO4272
X03240
M57908
M57911
L14946
LO6178
References: (1) This paper: (2) Rook etal., 1988; (3) Beard etal., 1993 (4) Mitchell etal., 1993; (5) Clary 8 Wolstenholme, 1985; (6 and 7) Satta 8 Takahata.
1990; (8) Sperling e t a / . , EMBL database access L14946 (unpublished): (9) Crozier B Crozier, 1993.
insect COll gene the COOH terminal also appears to
be the most variable region.
Pooling the data into classes in this way however
will lose much information if there are large differences within these classes. Figures 4 and 5 show the
mean variability per region (individual loops, transmembrane stretches or terminal regions) and there
can be seen to be large differences in the mean
variability of different regions of the same structural
class.
Transmembrane helices M2, M6, M7 and M10
provide the metal ligands to interact with the two
haem groups and copper atom which are essential for
the activity of COI (Gennis, 1992). These regions can
be seen to account for four of the seven highly conserved transmembrane helices. The fifth of these
conserved transmembrane helices is M8, which is
suggested to be involved with the cytochrome oxidase
proton-conduction channel. This region contains three
polar residues (Thr-352, Thr-359 and Lys-362) which
are completely conserved among all organisms so far
studied, and which are thought to be essential for this
translocation activity (Gennis, 1992). Transmembrane
helices M5 and M11 are also very conserved.
External loops E3, E5, and especially E4,seem to be
very conserved (Fig. 4). Although the functional role
played by these interhelical loops is unclear, E5 is
thought to lie very close to heme-A in the association
to which Tyr-414 has been suggested to play an
important role (Holm etal., 1987; Gennis, 1992).
shows a two-dimensional structural model of the COI
protein, with functionally essential (boxed) and variable (filled circles) residues in insects highlighted,
respectively. It is clear that the distribution of variable
residues is not random along the molecule (see below
for further discussion).
Mode and tempo of evolution of the insect COI gene
The COI amino acid sequence was divided into
twenty-five regions comprising five structural classes
[twelve transmembrane helices (Ml-M12), six external loops (El-E6), five internal loops (WE), carboxy
(COOH) and amino (NH2) terminals] as shown in Figs
2 and 3. In order to test the null hypothesis that there is
no difference between the average amino acid variability, per site, between the five structural classes, a
Kruskal-Wallis analysis was employed which led us to
reject this hypothesis with an associated probability
level of <0.0001. When the average variability per
residue site was calculated for the five structural
classes the COOH terminal was found to be significantly more variable than any other region (<5%
significance level). The observed mean levels of
variability were not significantly different between the
amino terminal, internal loops, external loops or
transmembrane regions (Table 2). This difference
reflects the highly variable nature of COOH terminal
amino acid sequences and agrees with Liu & Bechenbach (1992) who report that for an alignment of the
Table 2. Mean variability, per amino acid position, for the different structural classes 01 COI.
Region
NH2terminal
Internal loops
External loops
Transmembrane
COOH terminal
Size
(amino acids)
13
103
120
232
30
Standard deviation
SE mean
Mean variability
0.947
0.263
0.089
0.072
0.050
0.239
1.692
1.689
1.467
1.461
2.500
0.908
0.788
0.755
1.306
0 1996 Blackwell Science Ltd, lnsect M o l e c u l a r Wiology5: 153-165
Insect COI gene evolution and conserved primers
50
Chorthip
Locusta
A. gambiae
A.quadrim
D yakuba
D.sechel1
D.simu1
Phormia
Apis
.
I
I
I
I
159
H102
100
I
I
I
I
I
~ ~ S T N H K D I ~ Y E ~ W ~ ~ S H S M I I R A E ~ ~ S L I G D D Q I Y I I T A H A F V n I F F ~ I M I G F G
~..........................................TH.
N
L................................................
M-RQ..
I..
L.IL......H..AF..........V.....I......................L..............
M-RQ
I...........L.IL......H..~..........V.....I......................L.............R
MSRQ.
I...........L.IL......H..A...........V.....I.................
L..............
M.RQ
I...........L.IL......H..A...........V.....I......................L..............
M.RQ
I...........L.IL......H..A...........V.....I..
L..............
M-RQ
I....S......L.IL......H
A...........V.....I......................L..............
M-M m.....N..I..IILAL.S..L.S..RL...M..RS...W.SN.....T.V.S...L........FL........I...L.S..........IR
*-... NH2 . - . f c . - . . . .......*......... El ... . . . . . . u . . . .&Q. .. . . . . . . . u - . - .I1
. .........
.
U-
...
............. .........
...............
..............
...............
...............
...............
.....
....................
..
..
150
I
I
I
I
I
200
I
I
I
I
I
FSLHLAGVSSILGAVNFITTAINMRSESMTLGQTPLFVWWVI~LPV
Chorthip FWI.UPSLTLLIASSMMDNGAGTGWWYPPGADr(rWrVYPPLAGAIAHGGLAI
W...............SV...S.A....................I...........NN.............A.T..........
Locusta
M.........S...VE.............SSG...A.A.............I............V.....PGI...RM.........T.V........
A.gambiae
M
SR..VE.............SSG...A.A.............I............V....APGI...RM.........T.V.
A.quadrim
A.S..LV...VE.............SSG.....A.............I............V.....n;I...RH.........T..........
D yakuba
A
.
S
.
.
L
V
.
.
.
VE.............SAG.....A.............I............V.....n;IS..Rn.........T..........
D. sechell
A.S..LV...VE.............SAG.....A........
I............V.....n;IS..RH.........T..........
D.simu1
A....LV...VE.............SSN.....A.............I............V.....n;I.F.RM.........T..........
Phormia
........FPI
LLRNLFYPRP..........SAYLY.SSP...F..
....HS. I...M.SL.~.IHH.IWP..NY..IS..P...F.T.I..IM....
Apis
...... M3 ---...-.----.-......
E2 .......---..-tf---...
M4 --.-..-----..
12 - - - - - - - - -M5
. ..
. --
.
...........~..
..
.. .........
......
......
......
......
.......
.....
.
I
I
I
H239
*I
250
I
H333,H334
I
I
I
**
300
I
Chorthip ~ ~ ~ T D ~ S F F D P A ~ D P I L Y Q H L ~ F G ~ ~ I L I ~ ~ I I S H I V C Q E S ~ I E S F ~
I..............................
~ocusta
M....IT.....K.T..N........A...L........
A. gambiae
E
.
N
.
.
.
.
.
.
.
.
.
.
.
.
M
.
.
.
.
I
T
.
.
R
.
.
K
.
T
.
.
N
.
.
.
.
.
.
.
.
A
.
.
.
L.........
A.quadrim
M....IS.....K.T..S........A...L..................
D yakuba
N....IS.....K.T..S
A...L.............
D.sechel1
M....IS.....K.T..S........A...L..................
D.simul
M....IS.....K.T..S........A...L..................
Phormia
F
F.......M.............................L.....MN.R..K.I..N.R......G..n.............L....
Apis
-...u~.-.~3. .-.... - . . . . - u - . M6
-..
---....--.--...-...
.I3 - - - - - - - - - - - f t - -M7
------ - - - - u - -
.
.....................................................................
...................................................
.........................
...........
...................................................
...................................................
...................................................
...................................................
..........
.........
.....
........
......... ...
T352
T359
* I
K362
350
* I
I
I
I
Y414 H419 H421
* *
I
400
I
I
I
I
I
GI
IQWYPLFT
Chorthip ~ Y ~ S A ~ I I A V P T G I K V F S W L A T L Y O I X P " P P L L W A U ; F I n E T I G ILHDmWAHFHYVLSHGAVFAIMGGI
M........
K................M......V........V...................................
Locusta
I.
H..QLTYS .AM... F..V....V.....W.....I..V........................A.FVH....L.
A. gambiae
I.. ....~..Q
LTYS.AM...F..V....V.....W.....I..V........................A.F.H....L.
A.quadrim
I.......H..QLSYS.AI......V....V.....W.....V...........................A.F.H......
D yakuba
I..
H..QLSYS.AI..
V....V.....W.....V...........................A.F.H......
D.sechel1
I.......H..QLSYS.AI......V....V.....W.....V...........................A.F.H......
D.simul
I
.
.
.
.
.
.
.
.
.
.
Q
L
.
Y
S
.
A
T
.
.
.
.
.
.
V
.
.
.
.
V
.
.
.
.
.
W
.
.
.
..I...........................A.FVH.F....
Phormia
R....YH.S.~.ISI..S....M.........IM.S...I...........G..............ISRF.H....I.
Apia
-.----..Ma . ..-.----.14 - - - - - - - .M9
---..-..&---.-E5 - - - - - U - - -MI0
- - - - - - - - - U I5
- - ..
.
......................
..................
......
..................
..................
..................
.....
..................
..................
....................
....
450
I
I
I
I
I
500
I
I
I
Chorthip ~ ~ ~ I H F I G V N L r r F P Q H F L G L A G W R R Y S D Y P I S I V G I I W I L I L W E S M I M N R T I H F S N S ~ ~ P A
Locusta
I.........
T........M.....KQ.~I~...............
A. gambiae
P....I..S...V.......................F..S.LT...V..L.....LFA.LY.LF.I.....TQ..PA.PnQL...I..YHTL...
A.quadrim
PU
L..An..V.......................F..S.LA..IV..L.R...LFA.LY.LF.I.....TQ..PA.PPLQL...I..YHTL...
D. yakuba
L..K...S..I.....
T...V.T......LL..LF.M.I...LVSQ.QVIYPIQLN..I..Y..T...
D. sechell
L..K...S..IT................................T..IV.T......LL..LF.FF.I...LVSQ.QVIYPIQLEI..I..Y..T..D.0imul
L..K...S..I.................................T..IV.T......LL..LF.FF.I...LVSQ.QVIYPIQLN..I..Y..T..Phormia
L..n..S..A........
A.
T.. ....LL..LF.FF. I. LVSQ.QVL.WQ LN.. I.....T...
Apia
..LL. IK
I..M.................HS..........S.YC..S...M..M..LNRM.n.F.IL.RL.SK.M.~.Q..-L...Nn..L
-----15----------- -----------..E6 -....---.-.. ---.-.-..---.-.....
coon ............
.....~...
.....
..... ...
...
...
...
...
...
.........................................
............................
.........................
....
..
522
Chorthip
Locusts
I
Emm=S-----------
I
..
S.S...L.NIr--------........urn--------........LLrn--------..
S.S...LLrn--------......................
A. gambiae
A.quadrim
D.yakuba
D.sechel1
D.aimul
Phormia
s.s...LLrn--------D.SHL.I.LLIK"LKSIL1K
Apis
..
.......
coon - - - - - - - - *
Figure 2. Alignment of COI amino acid sequences for nine species of insect. Identity to C. parallelussequenceis denoted by period, a deletion by a dash. Asterisks
denote universally conserved residues, or those with functional significance discussed in the text. The twenty-five structural regions are indicated (see text and
Fig. 3 for details).Note that the first amino acid residues for all COI proteins discussed here are defined as methionines regardlessof the initiation codons. as
suggestedby Wolstenholme (1992).
01996 Blackwell Science Ltd,.lnsectMolecular BiOlOgY5: 153-165
d
i
4
Insect COI gene evolution and conserved primers
!7SO
~
PI
Flgure4. A graph showing the mean amino acid variabilityfor the twenty-five
structural regions of the insect COI gene. The variability is expressed as the
average number of amino acids per site observed in a given region.
Internal loop 1 is highly conserved, in contrast to the
other four internal loops which are relatively variable
(Fig. 4). This pattern of conservation is also apparent
in a similar alignment of twenty-two animal COI genes
(unpublished data). A functional role for internal loop 1
is as yet unclear from current structural models of
COI, though this data indicates that its function may be
relatively important.
The foregoing discussion of protein variability begs
the question: does amino acid variability reflect DNA
variability? In this situation it probably does. The
evidence for this assertion is twofold. Firstly, a certain
correspondence between amino acid substitutions
and nucleotide substitutions is expected despite of the
degeneracy of the genetic code, since it is DNA
sequence which determines amino acid sequence.
(However, the form of this relationship does not follow
a simple linear function, as the degenerate nature of
161
genetic codes makes it possible for DNA variation to
either match or outweigh protein variation, and it also
falls under the influence of the phylogenetic distances
of organisms compared.) Secondly, the relationship
between patterns of amino acid variability and nucleotide variability can clearly be seen by comparing Figs
1 and 2, with the COOH terminal coding region being
the most variable part, followed by regions coding for
El, M3, E2, 12, 14, M9 and M12 structural regions.
Finally, there is empirical evidence for the level of
protein variability being a good predictor of DNA
variability. The distribution of polymorphic sites in the
COI gene between C. parallelus individuals in a population survey (Lunt, 1994) has been observed to match
very closely to the variability predictions made by this
study. Furthermore, Howland & Hewitt (1995) have
sequenced 400 bp of the COI gene in Coleopteran
species with a variety of divergence levels. Their
results show adjacent regions of DNA sequence conservation and variability, which correspond to the loop
and transmembrane structures in exactly the same
pattern as reported here.
These studies show that the expectations of shared
patterns of DNA and protein variability are well
founded, and that the regions identified in this study
will almost certainly express the expected level of
variability in most organisms. Given this pattern of
conservation of sequence and structure, one is able to
choose a level of variability to suit one's particular
phylogenetic investigation.
Conserved primers and their potential for use in
evolutionary studies
Primers which are conserved across many insect
groups would be extremely useful in many types of
evolutionary studies. The variability observed
between insect COI sequences has been quantified by
position and conserved primers designed to exploit
Table 3. Details of the ten conserved primers designed to cover the whole insect COI gene. Positions given are those in Fig. 1. Primers UEAl and UEAlO
are found in the tRNA-Tyr and tRNA-Leu genes respectively. Their positions relative to the COI gene (in the D. .yakuba sequence; Clary B Wolstenholme,
1985) are 56 bp (Tyr) and + 8 bp (Leu).
Prlmer
Strand
Position (3' base)
Length (bp)
Sequence
UEAl
UEAZ
UEAl
UEA4
UEA5
UEA6
UEA7
UEA8
UEA9
UEA10
Sense
Antisense
Sense
Antisense
Sense
Antisense
Sense
Antisense
Sense
Antisense
tRNATyr
375
294
618
62 1
966
900
1266
1284
tRNALeu
26
26
24
24
24
29
24
24
26
25
GAATAATTCCCATAAATAGATTTACA
TCAAGATAAAGGAGGATAAACAGTTC
TATAGCATTCCCACGAATAAATAA
AATTTCGGTCAGTTAATAATATAG
AGTTTTAGCAGGAGCAATTACTAT
TTAATWCCWGTWGGNACNGCAATRATTAT
TACAGTTGGAATAGACGTTGATAC
AAAAATGTTGAGGGAAAAATGTTA
GTAAACCTAACATTTTTTCCTCAACA
TCCAATGCACTAATCTGCCATATTA
Q 1996 Blackwell Science Ltd, lnsect Molecular Biology 5: 153-165
162
D. H. Luntet al.
El
~1
Region
1.6
1.7
M4
I1
NH2
Average
variability
M3
1.8
1.2
1.6
1.7
M7
M8
I3
I2
1.9
1.1
M6
M5
E5
E4
E3
E2
M2
1.3
1.2
2.1
1.o
M9
E6
M10
1.o
1.5
1.1
1.6
2.1
MI2
I5
I4
1.2
M11
1.2
1.3
cI)cH
1.3
1.8
1.5
2.3
2.5
- 1
0
1
amino acid position
Flgure 5. An illustrationof the distributionof amino acid variability (number of different amino acids per residue site) along the insect COI gene. The twenty-five
structural regions, and their mean levels of variability, are shown above the graph. The position of the ten conserved primers are given by arrows, primers UEAl
and UEAIO are in the tRNA-Tyr and tRNA-Leu genes respectively.
areas of differing variability. From the multiple alignment of the DNA sequences of nine insect COI genes
presented in Fig. 1, it can be seen that several areas
are very generally conserved and the same, or slightly
modified, primers may be applicable to many organisms. Primers were thus designed to cover the whole
of the insect COI gene and to be positioned to aid the
sequencing of regions of different variability. The
primers are described in Table 3 and their positions
shown in Fig. 5. Primer UEA10, located in the tRNALeu gene, although identified independently in this
study, turns out to be the general insect primer 'PAT'
designed in the lab of R. Harrison (Cornell University,
pers. comm.).
To test the universality of the conserved primers
identified in this study, PCR amplifications have been
carried out for nine insect taxa. These taxa cover the
main divisions of the class Insecta, from wingless
insects of the order Thysanura (silverfish, Lepisma
saccharina, and firebrat, Thermobia dornestica), to
winged insects of the orders Odonata (damselfly,
Calopteryx splendens), Orthoptera (desert locust,
Schistocerca gragaria and meadow grasshopper
Chorthippus parallelus), Hemiptera (pea aphid,
Acyrthosiphon pisum), Coleoptera (beetle, Carabus
vidaceous), Diptera (fruit fly, Drosophila melanogaster) and Hymenoptera (bumble bee, Bombus lapidar-
ius). Figure 6 shows the amplification product
obtained using the primer pair UEA3-UEA8. These
primers cover a large part (1018 bp) of the COI gene
from the I1 region to M11 region (see Figs 3 and 5). A
single main band of the expected size was produced
in all insect taxa described above with the exception of
the pea aphid, A. pisum (no product under the assay
conditions, although other primers amplify the COI
gene in this aphid). Direct terminal sequence analysis
of the amplified PCR fragments confirms that they are
the COI gene and that all sequences are taxa-specific,
thus excluding the possibility of crosscontamination
(terminal sequences of the PCR fragments for these
species are available from the authors). A preliminary
phylogenetic analysis of these sequences further confirmed the authenticity of the amplified bands. PCR
amplifications with other primer combinations indicate that most of the primers described here work well
in many of these different taxa. Degenerate forms of
some of these primers have also been designed
based on the sequence alignment in Fig. 1. We are
currently investigating further the universality of all
these primers and results of this analysis will be
reported later. Initial results indicate that the primers
identified in this study are quite broadly conserved,
with primers UEA7, UEA9 and UEA10 working well
even between the superclasses lnsecta and Arach-
01996 Blackwell Science Ltd, Insect Molecular BiologyS: 153-165
Insect COI gene evolution and conserved primers
-1 kb
--t
-0.7 kb
4-
163
Figure6 PCR products amplified from nine different
arthropod taxa using primer pairs UEAWEAB or
UEA74EA10. A specific band of. . 1 kb was
amplified from all taxa except the pea aphid,
A. pisum, using the primer pair UEAIUEAB.
The -0.7 kb PCR product for A. pisumshowedin
the photograph was obtained using primer pair
UEA74EAlO. The marker is a 123 bp size ladder
(GIBCOBRL).
nida, which are thought to have diverged during the
Devonian period at least 400 million years ago
(Pearse etal., 1987).
As a general guide to the applicability of these
primers, regions amplified by primer combinations
UEASUEAG, UEA5-UEA8 or UEA7-UEA8 could be
suitable for higher-level evolutionary studies (e.g. at
genus or family level); regions amplified with primer
pairs UEA3-UEA4 or UEA7-UEA10 should be more
variable and thus suitable for lower-level analyses
(such as study of intraspecific variation, and the phylogenetics of closely related species). The use of the
last two primer pairs (UEA3-UEA4 and UEA7-UEA10)
in population analysis of beetles, grasshoppers and
other insects in our laboratory suggests that these
regions are probably variable enough for revealing
intraspecific polymorphisms (unpublished data,
G.M.H. etal.).
In this report we show that a detailed examination of
the evolutionary patterns of a DNA region could
provide valuable guidelines for its effective use as a
molecular marker in phylogenetic studies. While this
paper was in review, Simon et al. (1994) published a
most useful compendium of conserved primers covering the whole insect mitochondrial genome. Increasing practices employing such primers will certainly
reveal more information on the evolutionary patterns
of other mitochondrial genes, which will in turn help
us to assess their usefulness for addressing phylogenetic questions of different taxonomic levels.
01996 Blackwell Science Ltd, lnsect Molecular Biology5: 153-165
Experimental procedures
Insect COI gene sequences
The sequence of the C.parallelus COI gene was obtained as
part of the complete sequence of a 6.4 kb mtDNA Hindlll
fragment as described in Zhang et a/. (1995). Both strands of
the COI gene have been sequenced. Other insect DNA
sequences were taken from the GenBank and EMBL databases, with the exception of the Locusta migratoria sequence
which was kindly supplied by P. Flook prior to publication. The
accession numbers of these sequences are L20934 (Anopheles gambiae), LO4272 (Anopheles quadrimaculatis),
L14946 (Phormia regina), LO6178 (Apis mellifera), X03240
(Drosophila yakuba), M57908 (Drosophila sechellia), M57911
(Drosophila simulans) and X80245 (Locusta migratoria) (see
Table 1 for individual references). All sequences were aligned
by the Clustal V method (Higgins & Sharp, 1989) and translated
with the invertebrate mitochondrial genetic code using programs implanted in the LASERGENE computer software
package (DNASTAR). The aligned DNA sequences were then
examined manually using amino acid sequences and codon
positions as references, producing the alignment shown in
Fig. 1 which seems to be quite robust.
Analysis of variability and statistical tests
The aligned amino acid sequence was divided into twenty-five
regions comprising five structural classes (twelve transmembrane helices, six external loops, five internal loops, carboxy
and amino terminals: Fig. 3). The points of transition between
these regions were taken from Gennis (1992). The number of
different amino acids observed at each position of the protein
alignment was recorded and the variability level expressed as
the average number of amino acids per site observed in a
164
D. H. Luntet al.
given region. A spreadsheet written (by J. B. Lunt and D. H.
Lunt) to score the variability across such alignments is available from the authors. This analysis was limited to the 498
homologous positions between the ends of the shortest
sequence. Insertion and deletion events were scored equally
to the possession of a novel amino acid.
Statistical tests were carried out using the StatView SE
v1.03 (Abacus Concepts Inc.) software package. A KruskalWallis test (analysis of variance by ranks) was performed on
the data sets to test the null hypothesis (Ifo):there is no
difference between the average amino acid variability, per
site, between the five structural classes. In the event of this h
being rejected, an analysis would be performed to determine
between which of the samples significant differences occur.
This analysis followed the method described by Zar (1984).
Conserved primers, PCR amplification and direct
sequencing
Primers for amplifying and sequencing the whole of the COI
gene were designed using the Oligo 4.0 (National Biosclences
Inc.) software package following the guidelines given by
Rychlik (1992). Nine insect taxa, which cover the main divisions of the superclass Insecta, were used to test the primers
identified in this study, viz: Lepisma saccharina (silverfish,
Apterygota, order Thysanura), Thermobia domestica (firebrat,
Apterygota, order Thysanura), Calopteryx splendens (damselfly, Pterygota, order Odonata), Schistocerca gregaria (locust,
Pterygota, order Orthoptera), Chorthippus parallelus (grasshopper, Pterygota, order Orthoptera), Acyrthosiphon pisum
(aphid, Pterygota, order Homoptera), Drosophila melanogaster (fruitfly, Pterygota, order Diptera), Carabus vidaceous
(beetle, Pterygota, order Coleoptera) and Bombus lapidarius
(bee, Pterygota, order Hymenoptera). An arachnid (spider,
Tegenaria domesticus) was also included to gauge the
broader generality of these primers.
DNA was purified from individual insects using a phenol/
chloroform based extraction as described by Zhang et a/.
(1995). PCR was carried out in a 50 pl reaction containing 1.52.0 mM MgCI2, 200 PM dNTP, 0.15 p~ of each primer, and 2
units of Taq polymerase (Promega) in 1 x reaction buffer (50
mM KCI, 10 mM Tris-HCI, 0.1% Triton X-100, pH 9.0 at 25"C,
Promega). Following an initial denaturation at 94°C for 5 min.
thirty to forty cycles were performed in a DNA Thermal Cycler
480 (Perkin Elmer Cetus), each consisting of melting at 95°C
for 40 s, annealing at 4845°C for 1 min, and extension at 72°C
for 40 s to 1 min 40 s. 5 PIof PCR product were used for direct
DNA sequencing using the Sequenase PCR Products Sequencing Kit (USB-Amersham) following the manufacturer's protocol.
Acknowledgements
We acknowledge the help of Pamy Noldner, Marcus
Rowcliffe, Nick Watmough, John Noble-Nesbit,
Graham Hopkins and John Lunt. We also thank three
anonymous reviewers for their valuable comments.
We are grateful to Paul Flook for providing us with the
Locusta COI sequence prior to publication. This work
was supported by grants from the S.E.R.C. and E.U.
References
Avise, J.C., Arnold, J., Ball, R.M., Bermingham, E., Lamb, T., Neigel
J.E., Reeb, C.A. and Saunders, N.C. (1987) lntraspecific phylogeography: the rnitochondrial DNA bridge between population
genetics and systematics. Annu Rev fcol Systl8 489-522.
Ballard, J.W., Olsen, G.J., Faith, D.P., Odgers, W.A., Rowell, D.M.
and Atkinson, P.W. (1992) Evidence from 12s ribosomal RNA
sequences that Onychophorans are modified arthropods.
Science258: 1345-1348.
Beard, C.B., Hamm, D.M. and Collins, F.H. (1993) The mitochondrial
genome of the mosquito Anopheles gambiae: DNA sequence,
genome organization, and comparisons with rnitochondrial
sequences of other insects. lnsect MolBiol2: 103-124.
Clary, D.O. and Wolstenholme. D.R. (1985) The mitochondrial DNA
molecule of Drosophila yakuba: nucleotide sequence, gene
organization, and genetic code. J Molfvol22: 252-271.
Crozier, R.H. and Crozier, Y.C. (1993) The mitochondrial genome of
the Honeybee Apis mellifera: complete sequence and genome
organization. Geneticsl33:97-117.
Flook, P.K., Rowell, C.H.F. and Gellissen G. (1996) The sequence,
organisation and evolution of the Locusta migratoria mitochondrial genome. JMolEvol, in press.
Gennis, R.B. (1992) Site-directed mutagenesis studies on subunit I
of the aa3-type cytochrome c oxidase of Rhodobacter sphaeroides a brief review of progress to date. Biochim Siophys Acta
1101: 184-187.
Harrison, R.G. (1989) Animal mtDNA as a genetic marker in population and evolutionary biology. Trends fcolEvol4 6-1 1.
Hauke, H A . and Gellisson, G. (1988) Different mitochondrial gene
orders among insects: exchanged tRNA gene positions in the
COlllCOlll region between an orthopteran and dipteran
species. Cuff Genetics1 4 471-476.
Higgins, D.G. and Sharp, P.M. (1989) Fast and sensitive multiple
sequence alignments on a microcomputer. Computer Applic
Biosci5 151-153.
Holm, L.. Saraste, M. and Wilkstrom, M. (1987) Structural models of
the redox centers in cytochrome-oxidase. EMBOJB: 2819-2823.
Howland, D.E. and Hewitt, G.M. (1995) Phylogeny of the Coleoptera
based on mitochondrial cytochrome oxidase I sequence data.
lnsect Mol BiolB 203-215.
Liu, H. and Beckenbach, A.T. (1992) Evolution of the rnitochondrial
cytochrome oxidase II gene among ten orders of insects. Mol
Phylogen f v o l l : 41-52.
Lunt, D.H. (1994) MtDNA differentiation across Europe in the
meadow grasshopper Chorthippusparallelus (Orthoptera: Acrididae). Ph.D. thesis, University of East Anglia, Norwich.
Moritz, C., Dowling, T.E. and Brown, W.M. (1987) Evolution of
animal mitochondrial DNA: relevance for population biology
and systematics. Annu RevEcolSystl8 269-292.
Ojala, D., Montoya, J. and Attardi, G. (1981) tRNA punctuation
model of RNA processing in human mitochondria. Nature 290:
470474.
Paabo, S., Thomas, W.K., Whitfield, K.M., Kumazawa, Y. and
Wilson, A.C. (1991) Rearrangements of mitochondrial transferRNA genes in marsupials. J MolEvol33: 426430.
Pearse, V., Pearse, J., Buchsbaum, M. and Buchsbaum, R. (1987)
Living Invertebrates. The Boxwood Press, Pacific Grove, California.
Rychlik, W. (1992) Oligo Version 4.0: Reference Manual. National
Biosciences, Inc., Plymouth, Minnesota.
0 1996 Blackwell Science Ltd, lnsect Molecular Biology5: 153-165
Insect COI gene evolution and conserved primers
Saraste, M. (1990) Structural features of cytochrome oxidase. 0
Rev Biophys23: 331366.
Satta, Y., Ishiwa, H. and Chigusa, S.I. (1987) Analysis of nucleotide
substitutions of mitochondrial DNAs in Drosophila melanogasterand its sibling species. J Mol Biol4 638-650.
Simon, C. (1991) Molecular systematics at the species boundary:
exploiting conserved and variable regions of the mitochondrial
genome of animals via direct sequencing from amplified DNA.
M o k u l a r Techniques in Taxonomy (Hewitt, G.M., Johnston,
A.W.B.,Young, J.P.W., eds), pp. 33-71. Springer, Berlin.
Simon, C., Frati, F., Beckenbach, A,, Crespi, B., Liu, H. and Flook, P.
(1994) Evolution, weighting, and phylogenetic utility of mitochondrial gene-sequences and a compilation of conserved
01996 Blackwell Science Ltd, lnsect Molecular BiologyJ: 153-165
165
polymerase chain-reaction primers. Ann Ent SocAmer87: 651701.
Smith, M.J., Arndt, A., Gorski, S . and Fajber, E. (1993) The phylogeny of echinoderm classes based on mitochondrial gene
arrangements. JMolEvo136 545654.
Wolstenholme, D.R. (1992a) Animal mitochondria1 DNA: structure
and evolution. lntRevCytoll41: 173-216.
Wolstenholme, D.R.(1992b) Genetic novelties in mitochondrial genomes of multlcellular animals. Curr Op GenetDevel2 91&925.
Zar, J.H. (1984) BiostatisticalAns/ysis. Prentice-Hall, London.
Zhang, D.-X., Szymura, J.M. and Hewitt, G.M. (1995) Evolution and
structural conservation of the control region of insect mitochondrial DNA. JMolEvol40: 382-391.