Download Determination of nucleotide sequences in DNA

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression wikipedia , lookup

DNA repair protein XRCC4 wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Genetic engineering wikipedia , lookup

RNA-Seq wikipedia , lookup

DNA profiling wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

DNA sequencing wikipedia , lookup

Agarose gel electrophoresis wikipedia , lookup

Restriction enzyme wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Nucleosome wikipedia , lookup

SNP genotyping wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Gene wikipedia , lookup

Transformation (genetics) wikipedia , lookup

Point mutation wikipedia , lookup

Genetic code wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Genomic library wikipedia , lookup

Molecular cloning wikipedia , lookup

Biosynthesis wikipedia , lookup

DNA supercoil wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Non-coding DNA wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Community fingerprinting wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Bioscience Reports 1, 3-18 (1981)
Printed in Great Britain
3
D e t e r m i n a t i o n of n u c l e o t i d e s e q u e n c e s in DNA
Nobel lecture, 8 December 1980
Frederick SANGER
Medical Research Council Laboratory of Molecular Biology,
Hills Road, Cambridge CB2 2QH, U.K.
In spite of the important role played by DNA sequences in living
m a t t e r , it is onty relatively recently that general methods for their
determination have been developed. This is m a i n l y b e c a u s e of t h e
v e r y l a r g e s i z e of DNA molecules, the smallest being those of the
simple bacteriophages such as dpX174 (which contains about 5000 base
pairs).
It was t h e r e f o r e d i f f i c u l t to d e v e l o p methods with such
complicated systems.
There are however some relatively small RNA
molecules - notably the transfer RNAs of about 75 nucleotides - and
these were used for the early studies on nucleic acid sequences (1).
F o l l o w i n g my work on amino acid s e q u e n c e s in proteins (2) I
t u r n e d my a t t e n t i o n
t o RNA and) w i t h G. G. B r o w n l e e a n d
B. G. Barrell, developed a relatively rapid small=scale method for the
fractionation of 32 P-labelled oligonucleotides (3).
This b e c a m e t h e
basis for m o s t s u b s e q u e n t studies of RNA sequences.
The general
approach used in these studies, and in those on proteins, depended on
the principle of partial degradation. The large molecules were broken
down, usually by suitable enzymes, to give s m a l l e r p r o d u c t s which
were then separated from each other and their sequences determined.
When sufficient results had been obtained they were f i t t e d together by
a process of deduction to give the complete sequence. This approach
was necessarily rather slow and t e d i o u s , o f t e n i n v o l v i n g s u c c e s s i v e
digestions and fractionations, and it was not easy to apply it to the
larger DNA molecules.
When we first studied DNA, some significant
s e q u e n c e s of about 50 nucleotides in length were obtained with this
method (%5), but it seemed t h a t to be a b l e to s e q u e n c e g e n e t i c
material a new approach was desirable and we turned our attention to
the use of copying procedures.
Copying Procedures
In t h e R N A f i e l d t h e s e p r o c e d u r e s had been p i o n e e r e d by
C. Weissmann and his colleagues (6) in t h e i r s t u d i e s on t h e RNA
sequence of the bacteriophage Q6.
Q6 contains a replicase that will
synthesize a c o m p l e m e n t a r y copy of the s i n g l e - s t r a n d e d RNA c h a i n ,
s t a r t i n g f r o m its 3' end.
These workers devised elegant procedures
involving pulse-labelling with radioactively labelled n u c l e o t i d e s , f r o m
which sequences could be deduced.
For DNA s e q u e n c e s we have used the enzyme DNA p o l y m e r a s e ,
which copies single-stranded DNA as shown in Fig. 1.
The e n z y m e
requires a primer, which is a single-stranded oligonucleotide having a
sequence that is c o m p l e m e n t a r y to, and t h e r e f o r e able to h y b r i d i z e
Abbreviations: The abbreviations C, A, and G are used to describe
both the ribonucleotides and the deoxyribonucleotides, according
to context.
9
Nobel
Foundation
1981
#
F.
.
.
.
.
.
.
.
P
P
5'
P
"P
~ 3'
P
P
P
+
.
.
.
.
'F
P-P-P
P
/
.
P-P
~
+
SANGER
etc
p-p-P.~
etc
P-P-p
f
/
Fig. I. Specificity requirements for DNA polymerase.
w i t h , a r e g i o n on t h e D N A b e i n g sequenced ( t h e t e m p l a t e ) .
Mononucleotide residues are added sequentially to the 3' end of the
primer from the corresponding deoxynucleoside triphosphates, making a
complementary copy of the template DNA.
By using t r i p h o s p h a t e s
containing 32p in the cz position, the newly synthesized DNA can be
labelled. In the early experiments synthetic oligonudeotides were used
as primers, but after the discovery of restriction enzymes i t was more
convenient to use fragments resulting from their action as they were
much more easily obtained.
The copying procedure was used i n i t i a l l y to prepare a short specific
region of labelled DNA w h i c h could then be s u b j e c t e d to p a r t i a l
digestion procedures. One of the difficulties of sequencing DNA was
to find specific methods for breaking it down into small f r a g m e n t s .
No s u i t a b l e enzymes were known t h a t would r e c o g n i z e only one
nucleotide. However, Berg, Fancher, and Chamber]in (7) had shown
e a r l i e r t h a t under c e r t a i n conditions it was possible to incorporate
ribonucleotides, in place of the normal deoxyribonucleotJdes, into DNA
chains w i t h DNA p o l y m e r a s e .
Thus, for instance, if copying were
carried out using ribo CTP and the o t h e r t h r e e d e o x y n u c l e o s i d e
triphosphates, a chain could be built up in which the C residues were
in the ribo form.
Bonds involving ribonucleotides could be broken by
alkali under conditions where those involving the deoxynucleotides were
not, so that a specific s p l i t t i n g at C residues could be o b t a i n e d .
Using this method we were able to extend our sequencing studies to
some extent (g). However, extensive fractionations and analyses were
still required.
DNA SEQUENCING
The ' P l u s and M i n u s '
5
Method
In the course of these e x p e r i m e n t s we needed to prepare DNA
copies of high specific radioactivity, and in order to do this the highly
labelled substrates had to be present in low concentrations.
Thus if
~ [ 3 2 p ] - d A T P was used for l a b e l l i n g , its c o n c e n t r a t i o n was much
lower than that of the other three triphosphates~ and frequently when
we analysed the newly synthesized DNA chains we found t h a t t h e y
terminated at a position immediately before that at which an A should
have been incorporated.
C o n s e q u e n t l y a m i x t u r e of products was
produced all having the same 5' end (the 5' end of the primer) and
terminating at the 3' end at the position of the A residues. If these
products could be fractionated on a system that separated only on the
basis oi the chain l e n g t h , the p a t t e r n of t h e i r d i s t r i b u t i o n
on
fractionation would be proportional to the distribution of the A ' s along
the DNA chain. And this, together with the distribution of the other
t h r e e m o n o n u c l e o t i d e s , is the i n f o r m a t i o n r e q u i r e d for sequence
determination.
Initial experiments carried out w i t h 3. E. Donelson
suggested t h a t this approach could be the basis for a more rapid
method~ and i t was found that good fractionations according to size
could be obtained by ionophoresis on acrylamide gels.
The method described above met with only limited success~ but we
were able to develop two modified techniques that depended on the
same general principle and these provided a much more rapid and
simpler method of DNA sequence determination than anything we had
used before (9).
This, w h i c h is known as the ' p l u s and m i n u s '
techniqu% was used to determine the almost complete sequence of the
DNA of bacteriophage 0X174, which contains 5386 nucleotides (10).
The 'Dideoxy'
Method
More r e c e n t l y we have developed another, similar method which
uses specific chain-terminating analogues of the normal deoxynucleoside
triphosphates (If).
This method is both quicker and more accurate
than the plus and minus technique.
It was used to c o m p l e t e the
sequence of 0 X I 7 4 (12) and to determine the sequence of a related
bacteriophage, G~ (13)9 and has now been applied to m a m m a l i a n
mitochondrial DNA.
The analogues most widely used are the dideoxynucleoside triphosphates (Fig. 2).
They are the same as the norrnal d e o x y n u c l e o s i d e
triphosphates
b u t l a c k t h e 3' h y d r o x y l group.
They can be
incorporated into a growing DNA chain by DNA polymerase but act as
terminators because, once they are incorporated~ the chain contains no
3' hydroxyl group and so no other nucleotide can be added.
The principle of the method is summarised in Fig. 3. Primer and
template are denatured to separate the two strands of the p r i m e r ,
which is usually a restriction enzyme fragment, and then annealed to
form the primer-template complex.
The mixture is then divided into
f o u r samples. One (the T sample) is incubated with DNA polymerase
in the presence of a mixture of ddTTP (dideoxythymidine triphosphate)
and a low c o n c e n t r a t i o n of TTP, t o g e t h e r w i t h the o t h e r three
deoxynucleoside triphosphates (one of which is labelled with 32p) at
6
F.
~OCHz
NR
O
~OCHz
CH3
"N
OH +p ppOCH2~ ]
NR
~u~
O
HN ,,,~
e%p--OQH~o
polymerase
OH
SANGER
i/C. 3
I
OH
TTP
"OCH2
0
HN"~
NR
~OCH20 NR
./CH3
O
,
OJ~- N~
/
OH+pppOC~,,~
e02p -- O C I ~
ddTTP
Fig. 2. Diagram showing chain termination with
dideoxythymidine triphosphate (ddTTP). The top line
shows the DNA polymerase-catalysed reaction of the
normal deoxynucleoside triphosphate (TTP) with the 3'
terminal nucleotide of the primer; the bottom line9
the corresponding reaction with ddTTP.
3'
5'
TAGCAACT
Primer
!
j
ATP
GTP
CTP
TTP+ddTTP
- - ATCGTddT
--ATCGddT
DNAJpolymerase +
I
1
ATP
ATP
ATP+ddAT~
GTP
G T P + d d G T P GTP
C T P + d d C T P CTP
CTP
TTP
TTP
TTP
- - ATddC
X
-- ATOGTTddG -- ATCGTTddA
--ATCddG
--ddA
T CG
A
Acrylamide gel electrophoresis
Fig.
3. P r i n c i p l e
of the chain-terminating
method.
DNA SEQUENCING
7
normal concentration. As the DNA chains are built up on the 3' end
oi the primer the position of the T ' s will be filled, in most cases by
the normal substrate T and extended further, but occasionally by ddT
and t e r m i n a t e d .
Thus at t h e end of i n c u b a t i o n t h e r e r e m a i n s a
mixture of chains terminating with T at their 3' end but aJI having
the same 5' end (the 5' end of the primer). Similar incubations are
c a r r i e d out in the p r e s e n c e of e a c h of the o t h e r t h r e e dideoxy
derivatives, giving mixtures terminating at the positions of C, A, and
G respectively, and the four mixtures are f r a c t i o n a t e d in parallel by
electrophoresis on acrylamide gel under denaturing c o n d i t i o n s .
This
system separates the chains according to size, the small ones moving
quickly and the large ones slowly. As all the chains in the T mixture
end at T, the relative positions of the T ' s in the chain will define the
relative sizes of the chains, and t h e r e f o r e their relative positions on
the gel after fractionation.
The actual sequence can then simply be
read off from an autoradiograph of the gel (Fig. #).
The method is
c o m p a r a t i v e l y rapid and accurate, and sequences of up to about 300
nucleotides from the 3' end of the primer can usually be determined.
C o n s i d e r a b l y l o n g e r s e q u e n c e s have been r e a d o f f , but these are
usually less reliable.
One p r o b l e m with t h e method is that it requires single-stranded
DNA as t e m p l a t e .
This is t h e n a t u r a l f o r m of t h e DNA in t h e
bacteriophages ~X174 and G4, but most DNA is double-stranded, and it
is frequently d i f f i c u l t to s e p a r a t e t h e two s t r a n d s .
One way of
overcoming
t h i s was d e v i s e d by A. 3. H. S m i t h ( 1 4 ) .
If t h e
d o u b l e - s t r a n d e d l i n e a r DNA is t r e a t e d with e x o n u c l e a s e III (a
double-strand-specific 3' exonuclease), each chain is degraded from its
3' end, as shown in Fig. 5, giving rise to a structure that is largely
single-stranded and can be used as t e m p l a t e for DNA polymerase with
suitable small primers.
This method is particularly s u i t a b l e for use
with fragments
c l o n e d in p l a s m i d v e c t o r s and has been used
extensively in the work on human mitochondrial DNA.
Cloning in Single-stranded Bacteriophage
Another method of preparing suitable template DNA that is being
more widely used is to clone fragments in a single-stranded bacteriophage vector (15-17). This approach is summarised in Fig. 6. Various
v e c t o r s have been d e s c r i b e d .
We have used a d e r i v a t i v e of
bacteriophage M I 3 developed by Gronenborn and Messing (16) which
c o n t a i n s an insert of the B-galactosidase gene with an EcoRl
restriction enzyme site in it. The presence of ~3-galactosidase in a
plaque can be readily detected by using a suitable colour-forming
substrate (X-gal).
The presence of an insert in the EcoRI site
destroys the B-galactosidase gene, giving rise to a colourless plaque.
Besides being a simple and g e n e r a l method of p r e p a r i n g
single-stranded DNA this approach has other advantages. One is that
it is possible to use the same primer on all clones.
Heidecker e t
a l . (18) prepared a 96-nucleotide-long restriction fragment derived
from a position in the Ml3 vector flanking the EcoRI site (see Fig.
6). T h i s can be used to prime into, and thus determine, a sequence
of about 200 nucleotides in the inserted D N A . Smaller synthetic
primers have now been prepared (19,20) which allow longer sequences
g
F.
SANGER
Fig. 4. A u t o r a d i o g r a p h of a DNA
sequencing gel.
The origin is at
the top, and migration of the DNA
chains 9 a c c o r d i n g to s i z e ~
is
downwards.
The gel on the left
has been run for 2.5 h and that on
the right for 5 h with the same
polymerization mixtures.
DNA SEQUENCING
9
1
3:
5'
3'
I3"
Exo ~I
5"
5'
3"
Fig. 5. Degradation
exonuclease III.
of double-stranded
DNA with
to be determined.
The approach that we have used is to prepare
clones at random from restriction enzyme digests and determine the
sequence with the flanking primer. Computer programs (21) are then
used to store, overlap, and arrange the data.
Another important advantage of the cloning technique is that it is
a very e f f i c i e n t and rapid method of fractionating fragments of DNA.
In all s e q u e n c i n g t e c h n i q u e s , b o t h for p r o t e i n s and nucleic acids,
fractionation has been an i m p o r t a n t s t e p , a n d m a j o r p r o g r e s s has
u s u a l l y been d e p e n d e n t on t h e d e v e l o p m e n t of new f r a c t i o n a t i o n
methods. With the new rapid methods for DNA sequencing, f r a c t i o n a t i o n is still i m p o r t a n t , and as t h e sequencing procedure itself is
becoming more rapid more of the work has involved fractionation of
t h e r e s t r i c t i o n e n z y m e f r a g m e n t s by electrophoresis on acrylamide.
This becomes i n c r e a s i n g l y d i f f i c u l t as l a r g e r DNA m o l e c u l e s a r e
studied and may involve several successive fractionations before pure
Rest,iction
enzyme site
lacx~/~
galactosidase
gene
(Blue plaques)
I Cut with restriction enzyme.
Insert DNA fragment(~).
(
<.=e
,...,e, ,
Fig. 6. Diagram illustrating the cloning of DNA
fragments in the single-stranded bacteriophage vector
Ml3mp2 (16) and sequencing the insert with a flanking
primer.
~0
F.
SANGER
fragments are obtained.
In the new method these fractionations are
replaced by a cloning procedure.
The mixture is spread on an agar
p l a t e and grown.
Each clone r e p r e s e n t s the progeny of a single
molecule and is t h e r e f o r e pure, i r r e s p e c t i v e of how c o m p l e x the
original mixture was.
It is p a r t i c u l a r l y s u i t a b l e for studying l a r g e
DNAs.
In f a c t t h e r e is no t h e o r e t i c a l limit to the size of DNA t h a t
could be s e q u e n c e d by this m e t h o d .
We have applied the m e t h o d to f r a g m e n t s f r o m m i t o c h o n d r i a l DNA
(22,23) and to b a c t e r i o p h a g e X D N A .
Initially, new data can be
accumulated
v e r y quickly (under ideal conditions at a b o u t 5 0 0 - 1 0 0 0
nucleotides a day).
H o w e v e r , a t l a t e r s t a g e s m u c h of t h e d a t a
p r o d u c e d will b e in r e g i o n s t h a t h a v e a l r e a d y been s e q u e n c e d , and
p r o g r e s s then a p p e a r s to be much slower.
N e v e r t h e l e s s we find t h a t
m o s t new clones studied give s o m e useful d a t a , e i t h e r for c o r r e c t i n g
or for c o n f i r m i n g old s e q u e n c e s .
Thus in the work with b a c t e r i o p h a g e
X DNA we h a v e a b o u t 90~ of the m o l e c u l e identified in s e q u e n c e s and
m o s t of the new clones we study c o n t r i b u t e s o m e new i n f o r m a t i o n .
In
m o s t s t u d i e s on D N A one is c o n c e r n e d with i d e n t i f y i n g the reading
f r a m e s for p r o t e i n genes, and to do this the s e q u e n c e m u s t be c o r r e c t .
Errors can readily o c c u r in such e x t e n s i v e s e q u e n c e s and c o n f i r m a t i o n
is always n e c e s s a r y .
We usually consider it n e c e s s a r y to d e t e r m i n e
t h e s e q u e n c e of e a c h region on both s t r a n d s of the DNA.
Although in t h e o r y it would be possible to c o m p l e t e a s e q u e n c e
d e t e r m i n a t i o n solely by the random a p p r o a c h , it is p r o b a b l y b e t t e r to
use a m o r e s p e c i f i c m e t h o d to d e t e r m i n e the final r e m a i n i n g n u c l e o tides in a sequencing study. Various m e t h o d s a r e possible (22,24), but
all a r e slow c o m p a r e d with the random cloning a p p r o a c h .
Bacteriophage ~X174 DNA
The first
D N A t o be c o m p l e t e l y sequenced by the copying
procedures was from bacteriophage e)XI7# (10,12) - a single-stranded
circular DNA, 5386 nucleotides long, which codes for ten genes. The
most u n e x p e c t e d f i n d i n g f r o m this w o r k was t h e p r e s e n c e o f
'overlapping' genes. Previous genetic studies had suggested that genes
were arranged in a linear order along the D N A chains, each gene
being encoded by a unique region of the DNA. The sequencing studies
indicated however that there were regions of the qbX DNA that were
coding for two genes.
This is made possible by the nature of the
genetic code. Since a sequence of three nucleotides (a codon) codes
for one amino acid, each region of DNA can theoretically code for
three different amino acid sequences, depending on where t r a n s l a t i o n
starts.
This is illustrated in Fig. 7.
The reading frame or phase in
which t r a n s l a t i o n takes place is defined by the p o s i t i o n of t h e
initiating ATG codon, following which nucleotides are simply read off
three at a time.
In dpX there is an initiating ATG within the gene
coding for the D protein,
b u t in a d i f f e r e n t reading f r a m e .
Consequently this initiates an entirely d i f f e r e n t sequence of amino
acids, which is that of the E protein.
Fig. 8 shows the genetic map
of (~X. The E gene is completely contained within the D gene and
the B gene within the A.
Further studies (25) on the related bacteriophage, G4, revealed the
presence of a previously unidentified gene, which was called K.
This
DNA SEQUENCING
Gly
tl
His
GGC
CAC
Ala
. GCA
Thr
G . GCC
.ACG.
Pro
GG.
Ala
CCA.
Arg
CGC
Cys
. TGC
His
CAT
Arg
.AGG
Ala
. GCA
Met
.ATG.
Arg
. CGG
Gly
. GGC
. GG
Gin
Ala
CAG.
GCG.
G
Fig. 7. D i a g r a m illustrating how one DNA
sequence can c o d e for t h r e e d i f f e r e n t
amino acid sequences.
The dots indicate
the positions of t r i p l e t c o d o n s c o d i n g
for the amino acids.
o v e r l a p s both the A and C genes, and there is a sequence of four
nucleotides that codes for part of all three p r o t e i n s , A, C, and K,
using al! of its three possible reading frames,
It is uncertain whether overlapping genes are a general phenomenon
or whether they are confined to viruses, whose survival may depend on
their rate of replication and therefore on the size of the DNA:
with
overlapping genes more genetic information can be c o n c e n t r a t e d in a
given-sized DNA.
Further details o5 the sequence of bacteriophage ~XI7# DNA have
been published elsewhere (10,12).
m RNA
DNA
~m RNA
KB
A
Fig.
8. Gene map of ~X174 DNA.
F.
12
Mammalian
Mitochondrial
SANGER
DNA
M i t o c h o n d r i a contain a small d o u b l e - s t r a n d e d DNA ( m t D N A ) which
codes for two ribosomal RNAs (rRNAs)~ 22-23 transfer
RNAs
(tRNAs)~ and a b o u t 10-13 p r o t e i n s which a p p e a r to be c o m p o n e n t s of
the inner m i t o c h o n d r J a l m e m b r a n e a n d a r e s o m e w h a t h y d r o p h o b i c .
O t h e r p r o t e i n s of the m i t o c h o n d r i a a r e encoded by t h e nucleus of the
cell and s p e c i f i c a l l y t r a n s p o r t e d i n t o t h e m i t o c h o n d r i a .
Using the
a b o v e m e t h o d s we h a v e d e t e r m i n e d the n u c l e o t i d e s e q u e n c e of h u m a n
m t D N A (23) and a l m o s t the c o m p l e t e s e q u e n c e oi b o v i n e m t D N A .
The sequence
r e v e a l e d a n u m b e r of u n e x p e c t e d
features which
indicated that the transcription and translation machinery of mitochondria is rather diiferent irom that of other biological systems.
The genetic code in mitochondria
Hitherto it has been believed that the genetic code was universal.
No diflerences were found in the E s c h e r i c h i a
c o l i , yeast, or
mammalian
systems that had been studied.
Our initial sequence
studies were on h u m a n m t D N A .
N o a m i n o acid sequences of the
proteins that were encoded by h u m a n m t D N A
were known.
However,
Stellans and Buse ( 2 6 ) h a d determined the sequence of subunit II of
cytochrome
o x i d a s e ( C O II) f r o m bovine m i t o c h o n d r i a , and Barrell~
Bankier, and Drouin (27) found t h a t a region of the h u m a n m t D N A
t h a t t h e y w e r e studying had a s e q u e n c e t h a t would code for a p r o t e i n
homologous to this a m i n o acid s e q u e n c e - i n d i c a t i n g t h a t it m o s t
p r o b a b l y w a s t h e gene for the human CO II. Surprisingly the DNA
s e q u e n c e c o n t a i n e d T G A t r i p l e t s in t h e r e a d i n g
frame
of the
homologous protein.
A c c o r d i n g to the normal g e n e t i c cod% TGA is a
t e r m i n a t i o n codon and if it o c c u r s in the reading f r a m e of a p r o t e i n
t h e p o l y p e p t i d e c h a i n is t e r m i n a t e d at t h a t position.
It was noted
t h a t in t h e p o s i t i o n s w h e r e T G A o c c u r r e d in t h e h u m a n m t D N A
s e q u e n c % t r y p t o p h a n was found in the bovine p r o t e i n s e q u e n c e .
The
only possible conclusion s e e m e d to be t h a t in m i t o c h o n d r i a TGA was
not a termination
codon but was coding for tryptophan.
It was
s i m i l a r l y concluded t h a t ATA~ which n o r m a l l y codes for isoleucin% was
coding for m e t h i o n i n e .
As t h e s e studies w e r e based on a c o m p a r i s o n
of a h u m a n D N A w i t h a b o v i n e p r o t e i n 9 t h e p o s s i b i l i t y t h a t t h e
differences
w e r e due to s o m e s p e c i e s variation~ a l t h o u g h unlikely,
could not be c o m p l e t e l y excluded.
For a conclusive d e t e r m i n a t i o n of
the m i t o c h o n d r i a l code it was n e c e s s a r y to c o m p a r e the DNA s e q u e n c e
of a gene with the a m i n o acid s e q u e n c e of the p r o t e i n it was coding
for. This was done by Young and Anderson (28)~ who i s o l a t e d bovine
mtDNA~ d e t e r m i n e d the s e q u e n c e of its CO II gen% and c o n f i r m e d the
above differences.
Fig. 9 shows the human and bovine m i t o c h o n d r i a l g e n e t i c code and
the f r e q u e n c y of use of the d i f i e r e n t codons in h u m a n m i t o c h o n d r i a .
All codons a r e used with the e x c e p t i o n of UUA and UAG, which a r e
t e r m i n a t o r s ~ and AGA and AGG, which n o r m a l l y c o d e f o r a r g i n i n e .
This suggests t h a t AGA and AGG a r e probably also t e r m i n a t i o n codons
in m i t o c h o n d r i a ,
F u r t h e r support for this is t h a t no t R N A recognising
the codons
h a s b e e n f o u n d ( s e e b e l o w ) a n d t h a t s o m e of t h e
u n i d e n t i f i e d reading f r a m e s found in the DNA s e q u e n c e a p p e a r to end
with t h e s e codons.
DNA S E Q U E N C I N G
13
SECOND LETTER
A
ut~
Phe
UUC
77
140
52
UAU
46 I
s~
UUA
Leu
UUG
73
17
UCA
99
83
UAC
UAA
IICG
7
UAG
qU~
65
CCU
41
119
ICAU
CAC
CUC
Set
167
CCC
CUA
276
CCA
CUG
45
COG
125
ACU
196
ACC
166
ACA
40
ACG
30 I
49
71
GCU
124
GCC
Ala
GCA
80
Leu
ALnJ
AUC
lle
AUA
AUG
Met
IGI/U
GUC
GUA
Val
GUG
8
GCG
Pro
Thr
G
l
Ter
7?
CGC
S!
CGA
51
155
IAAU
AAC
133
10
I
Lys
AAG
85 I
43
I GAU
5515]
8
Asn 13o
331
Asp
10
n~
Glu
64241
I0
7
Arg
CGG
25
29
2
Set
AGC
59
AGA
AGG
Ter
Gly
C,AA
GAG
17
95
CGU
9
5
UGA
18
ICAA
CAG
Cyll
~Gc
UGG
52
7
GAC
UGU
88
34
Fig. 9. The h u m a n and bovine mitochondrial genetic
code, showing the coding p r o p e r t i e s of the tRNAs
(boxed codons), the frequency of use of the different
codons in human mitochondria, and the total number of
codons used in the whole genome shown in Fig. 10.
(One methionine tRNA has been detected, but as there
is some u n c e r t a i n t y about the n u m b e r present and
their coding properties, these codons are not boxed.)
In parallel
his c o l l e a g u e s
c h a n g e s in the
as t h o s e found
s u m m a r i s e d in
with our studies on m a m m a l i a n m t D N A , T z a g o l o f f and
(29,30) w e r e studying y e a s t m t D N A .
They also found
g e n e t i c code, but surprisingly t h e y a r e not all the s a m e
in m a m m a l i a n m i t o c h o n d r i a .
These differences
are
T a b l e 1.
Transfer RNAs
T r a n s f e r P.NAs have a c h a r a c t e r i s t i c b a s e - p a i r i n g s t r u c t u r e which
can be drawn in the f o r m of t h e ' c l o v e r l e a f ' model.
By e x a m i n i n g
t h e D N A s e q u e n c e ~or c l o v e r l e a f s t r u c t u r e s and using a c o m p u t e r
p r o g r a m ( 3 1 ) , it w a s p o s s i b l e to i d e n t i f y g e n e s c o d i n g f o r t h e
mt-tRNAs.
Besides the c l o v e r l e a f s t r u c t u r e , n o r m a l c y t o p l a s m i c t R N A s h a v e a
n u m b e r of s o - c a l l e d ' i n v a r i a b l e ' f e a t u r e s w h i c h a r e b e l i e v e d to b e
important
to t h e i r b i o l o g i c a l f u n c t i o n .
M o s t of t h e m a m m a l i a n
m t - t R N A s a r e a n o m a l o u s in t h a t s o m e of t h e s e i n v a r i a b l e f e a t u r e s a r e
14
F.
Table i.
SANGER
Coding changes in mitochondria
Amino acid coded
Codon
Normal
Mammalian
mitochondria
Yeast
mitochondria
TGA
Term
Trp
Trp
AUA
lie
Met
lie
CTN
Leu
Leu
Thr
AGA, AGG
Arg
Term ?
Arg
CGN
Arg
Arg
Term ?
missing.
T h e m o s t b i z a r r e is one of the serine t R N A s in which a
c o m p l e t e l o o p of t h e c l o v e r l e a f s t r u c t u r e
is m i s s i n g
(32,33).
N e v e r t h e l e s s it f u n c t i o n s as a t R N A .
In n o r m a l c y t o p l a s m i c s y s t e m s a t l e a s t 32 t R N A s a r e required to
c o d e for all the a m i n o acids.
This is r e l a t e d to the ' w o b b l e ' e f f e c t .
C o d o n - a n t i c o d o n relationships in the f i r s t and second positions of the
codons a r e defined by the n o r m a l b a s e - p a i r i n g rules, but in t h e third
position G can pair with U. The result of this is t h a t one t R N A can
recognise two codons.
T h e r e a r e m a n y c a s e s in the g e n e t i c code
w h e r e all four codons s t a r t i n g with the s a m e two n u c l e o t i d e s code for
the same amino acid.
T h e s e a r e k n o w n as ' f a m i l y b o x e s ' .
The
s i t u a t i o n for the alanine f a m i l y box is shown in T a b l e 2, i n d i c a t i n g
t h a t with the normal wobble s y s t e m two t R N A s a r e r e q u i r e d to code
Ior the four alanine codons.
Table 2.
Normal coding properties of alanine tRNAs
Codon
Anticodon
(wobble)
Anticodon
(mitochondria)
GCU
GGC
GCC
UGC
GCA
UGC
GCG
Note that the first position of the
codon pairs with the third position of
the anticodon and vice versa~ e.g.
5' GCU 3' (codon)
3' CGG 5' (anticodon)
DNA SEQUENCING
15
Only 22 t R N A g e n e s could be found in m a m m a l i a n m t D N A , and for
all t h e f a m i l y boxes t h e r e was only one, which had a T in the position
c o r r e s p o n d i n g to the third position of the codon ( 3 4 ) .
It s e e m s v e r y
unlikely t h a t none of the o t h e r p r e d i c t e d t R N A s w o u l d h a v e b e e n
d e t e c t e d , and the m o s t f e a s i b l e e x p l a n a t i o n is t h a t in m i t o c h o n d r i a one
t R N A can r e c o g n i s e all four codons in a f a m i l y box and t h a t a U in
t h e first position of the a n t i c o d o n can pair with U, C, A, or G in the
third position of the cod0n.
C l e a r l y , in boxes in w h i c h t w o of t h e
c o d o n s c o d e f o r one amino acid and two for a d i f f e r e n t one, t h e r e
m u s t be two d i f f e r e n t t R N A s a n d t h e w o b b l e e f f e c t s t i l l a p p l i e s .
Such t R N A s a r e found, as e x p e c t e d , in the m i t o c h o n d r i a l genes.
The
coding p r o p e r t i e s of t h e m t - t R N A s
a r e s h o w n in F i g . 9.
Similar
c o n c l u s i o n s h a v e b e e n r e a c h e d by H e c k m a n e t
ai.
(35) and by
Bonitz e t a l .
( 3 6 ) , working r e s p e c t i v e l y on n e u r o s p o r a a n d y e a s t
mitochondria.
Distribution of protein genes
Mitochondrial DNA was known to code for t h r e e of the subunits of
c y t o c h r o m e oxidase, p r o b a b l y t h r e e subunits of t h e A T P a s e c o m p l e x ,
c y t o c h r o m e b, and a n u m b e r of o t h e r u n i d e n t i f i e d proteins.
In order
to i d e n t i f y t h e p r o t e i n - c o d i n g
genes, the DNA was searched for
r e a d i n g f r a m e s , i.e. long s t r e t c h e s of DNA c o n t a i n i n g no t e r m i n a t i o n
codons in one of the phases and thus being c a p a b l e of coding for long
p o l y p e p t i d e chains.
Such reading f r a m e s should s t a r t with an initiation
codon, which in n o r m a l s y s t e m s is nearly always A T G , and end with a
t e r m i n a t i o n codon.
Fig. 10 s u m m a r i s e s the distribution of t h e r e a d i n g
f r a m e s on the DNA, and t h e s e are b e l i e v e d to be t h e genes coding for
the proteins.
The gene for CO II was i d e n t i f i e d f r o m the a m i n o acid
s e q u e n c e as d e s c r i b e d a b o v e , for subunit I of c y t o c h r o m e oxidase f r o m
a m i n o a c i d s e q u e n c e s t u d i e s on the bovine p r o t e i n by 3. E. Walker
( p e r s o n a l c o m m u n i c a t i o n ) , and CO III, c y t o c h r o m e b, a n d , p r o b a b l y ,
A T P a s e 6 w e r e identified by c o m p a r i s o n with the DNA s e q u e n c e s of
the c o r r e s p o n d i n g genes in y e a s t m i t o c h o n d r i a .
T z a g o l o f f a n d his
c o l l e a g u e s w e r e a b l e to i d e n t i f y t h e s e g e n e s in y e a s t by g e n e t i c
methods.
It has not y e t been possible to assign p r o t e i n s to the o t h e r
reading f r a m e s .
One unusual f e a t u r e of the m t D N A is t h a t it has a very c o m p a c t
structure.
The reading f r a m e s coding for the p r o t e i n s and the rRNA
g e n e s a p p e a r to be flanked by the t R N A g e n e s with no, or v e r y f e w ,
intervening nucleotides.
This suggests a r e l a t i v e l y s i m p l e m o d e l f o r
t r a n s c r i p t i o n , in which a large RNA is copied f r o m the DNA and the
t R N A s a r e cut out by a processing e n z y m e , and this s a m e p r o c e s s i n g
l e a d s to t h e p r o d u c t i o n of t h e r R N A s and t h e m e s s e n g e r R N A s
( m R N A s ) , m o s t of which will be m o n o c i s t r o n i c .
Strong s u p p o r t f o r
this model c o m e s f r o m the work of A t t a r d i (37,38), who has i d e n t i f i e d
t h e RNA s e q u e n c e at t h e 5 ' and 3' ends of the m R N A s , thus l o c a t i n g
t h e m on the DNA s e q u e n c e .
One c o n s e q u e n c e of this a r r a n g e m e n t is
t h a t the i n i t i a t i o n c o d o n is a t , or v e r y n e a r , t h e 5 ' e n d of t h e
mRNAs.
This suggests t h a t t h e r e must be a d i f f e r e n t m e c h a n i s m of
initiation f r o m t h a t found in o t h e r s y s t e m s .
In b a c t e r i a t h e r e is
u s u a l l y a r i b o s o m a l b i n d i n g s i t e b e f o r e t h e i n i t i a t i n g ATG codon,
w h e r e a s in e u c a r y o t i c s y s t e m s t h e ' c a p ' s t r u c t u r e on t h e 5' end of
2
0
0
Cytochrome
9 0
H S L
0
L
b
URF1
COT
O
I
CO T0-
I~
F
,_,
2
T P
16000
D loop
URF 5
16569
-7?
i
0
~
11000
COIT
8000
I
URF2
i
5O0O
14000
9
10
S D
0
R URF4 L
-1
1
0
I
i
URF 3
~ -3
I QFM
0
I-
V
j_,
,~ 0
13000
G
10000
~
7000
I
I
4000
12SrRNA
I
j
0
1
a
3 ~ . _ ~ . 12
7
-46?
y_
"~-'URF6 ~ 0
E
4
I-~ATPase 6 ?
L strand origin
1
O Y
WAN
URF A6L
URF 4
25
K
16SrRNA
i
15000
'
12000
9000
I
i
60O0
!
3000
Fig. i0. Gene map of human mtDNA deduced from the DNA sequence.
Boxed regions
are the predicted reading frames for the proteins.
URF = unidentified reading
frame,
tRNA genes are denoted by the one-letter amino acid code and are either
L-strand coded ( m D-) or H-strand coded ( ~ ) .
Numbers above the genes show the
scale in nucleotides and those below~ the predicted number between genes.
*Indicates that termination codons are created by polyadenylation of the mRNA.
9
,
H strand origin
D_ loop
2000
1000
rn
>
Z
DNA S E Q U E N C I N G
/7
t h e m R N A a p p e a r s to h a v e a s i m i l a r f u n c t i o n and the f i r s t ATG
following the c a p a c t s as i n i t i a t o r .
It s e e m s t h a t m i t o c h o n d r i a m a y
h a v e a m o r e s i m p l e , a n d p e r h a p s m o r e p r i m i t i v e , s y s t e m with t h e
t r a n s l a t i o n m a c h i n e r y recognising s i m p l y t h e 5 ' e n d o f t h e m R N A .
A n o t h e r unique f e a t u r e of m i t o c h o n d r i a is t h a t A T A and possibly ATT
can a c t as i n i t i a t o r codons as well as ATG.
On t h e b a s i s of t h e a b o v e model, s o m e of the m R N A s will not
c o n t a i n t e r m i n a t i o n codons at t h e i r 3' ends a f t e r the t R N A s a r e c u t
out.
H o w e v e r t h e y h a v e T or TA a t the 3 ' end.
The m R N A s a r e
g e n e r a l l y found p o l y a d e n y l a t e d at t h e i r 3' ends and this p r o c e s s will
n e c e s s a r i l y g i v e r i s e to t h e codon TAA to t e r m i n a t e those p r o t e i n
reading f r a m e s .
The c o m p a c t s t r u c t u r e of the m a m m a J i a n m i t o c h o n d r i a l g e n o m e is
in m a r k e d c o n t r a s t to t h a t of y e a s t , w h i c h is a b o u t f i v e t i m e s as
l a r g e and y e t codes for only a b o u t the s a m e n u m b e r of p r o t e i n s and
RNAs.
The genes a r e s e p a r a t e d by long A T - r i c h s t r e t c h e s
of D N A
with no obvious biological function.
T h e r e a r e also insertion s e q u e n c e s
within s o m e of the genes, w h e r e a s t h e s e a p p e a r to b e a b s e n t f r o m
m a m m a l i a n m~DNA.
References
I. Holley, R.W., L e s P r i x N o b e l , p. 183 (1968).
2. Sanger, F., L e s P r i x N o b e l , p. 134 (1958).
3. Sanger, F., Brownlee, G.G.~ & Barrell, B.G.~ J. Mol. B i o l .
13, 373 (1965).
4. Robertson, H.D., Barrell~ B.G.~ Weith, H.L., & Donelson, J.E.,
Nature New Biol. 241, 38 (1973).
5. Ziff, E.B., Sedat, J.W., & Galibert, F., N a t u r e N e w B i o l .
241, 34 (1973).
6. Billeter, M.A., Dahlberg, J.E., Goodman, H.M.~ Hindley, J., &
Weissmann~ C.~ N a t u r e , 224, 1083 (1969).
7. Berg, P., Fancher, H., & Chamberlin, M., Symposium on
'Informational Macromolecules', pp. 467-483, Academic Press,
New York & London (1963).
8. Sanger~ F., Donelson~ J.E., Coulson, A.R. 9 K6ssel, H., &
Fischer, D., P r o c . N a t l . A c a d . Sci. U . S . A . 70, 1209
(1973).
9. Sanger~ F., & Coulson, A.R., J. Mol. B i o l . 94~ 441 (1975).
10. Sanger, F., Air~ G.M., Barrell, B.G., Brown, N.L.,
Coulson, A.R., Fiddes, J.C., Hutchison~ C.A., Slocombe, P.M.,
& Smith, M., N a t u r e , 265~ 687 (1977).
11. Sanger, F., Nicklen, S., & Coulson, A.R.~ P r o c . N a t l . A c a d .
Sci. U.S.A. 74, 5463 (1977).
12. Sanger, Fo, Coulson~ A.R.~ Friedmann, T.~ Air~ G.M.~
Barrell, B.G., Brown, N.L., Fiddes, J.C.~ Hutchison, C.A.~
Slocombe, P.M., & Smith, M., J. Mol. B i o l . 125, 225 (1978).
13. Godson, G.N.~ Barrell, B.G., Staden, R., & Fiddes, J.C.,
Nature, 276, 236 (1978).
14. Smith, A.J.H., N u c l . A c i d s Res. 6, 831 (1979).
15. Barnes, W.M., Gene, 5~ 127 (1979).
18
F.
SANGER
16. Gronenborn, B., & Messing, J. N a t u r e , 272, 375 (1978).
17. Herrmann, R., Neugebauer, K., Schaller, H., & Zentgraf, H., in
The Single-stranded
D N A P h a g e s (Denhardt, D.T.,
Dressler, D., & Ray, D.S., eds.), pp. 473-476, Cold Spring
Harbor Laboratory, New York (1978).
18. Heidecker, G., Messing, J., & Gronenborn, B., G e n e , I09 69
(1980).
19. Anderson, S., Gait, M.J., Mayol, L., & Young, I.G., N u c l .
Acids Res. 8, 1731 (1980).
20. Gait, M.J., Unpublished results (1980).
21. Staden, R., N u c l . A c i d s Res. 8, 3673 (1980).
22. Sanger, F.~ Coulson, A.R., Barrell, B.G., Smith, A.J.H., &
Roe, B.A., J. Mol. B i o l . 143, 161 (1980).
23. Barrell, B.G., Anderson, S., Bankier, A.T., de Bruijn, M.H.L.,
Chen, E., Coulson, A.R., Drouin, J., Eperon, I.C., Nierlich,
D.P., Roe, B.A., Sanger, F., Schreier, P.H., Smith, A.J.H.,
Staden, R., & Young, I.G., 31st Mosbach Colloq., B i o l o g i c a l
C h e m i s t r y and O r g a n e l l e F o r m a t i o n , in press.
24. Winter, G., & Fields, S., N u c l . A c i d s Res. 8, 1965 (1980).
25. Shaw, D.C., Walker, J.E., Northrop, F.D., Barrell, B.G.,
Godson, G.N., & Fiddes, J.C., N a t u r e , 272, 510 (1978).
26. Steffans, G.J., & Buse, G., H o p p e - S e y l e r ' s Z. P h y s i o l .
Chem. 360, 613 (1979).
27. Barrell, B.G., Bankier, A.T., & Drouin, J., N a t u r e , 282, 189
(1979).
28. Young, I.G., & Anderson, S., G e n e , in press.
29. Macino, G., Coruzzi, G., Nobrega, F.G., Li, M., &
Tzago!off , A., Proc. N a t l . A c a d . sci. U . S . A . 76, 3784
(1979).
30. Macino, G., & Tzagoloff, A., Proc. N a t l . A c a d . Sci.
U.S.A. 76, 131 (1979).
31. Staden, R., Nucl. A c i d s Res. 8, 817 (1980).
32. de Bruijn, M.H.L., Schreier, P.H., Eperon, I.C.
Barrell, B.G., Chen, E.Y., Armstrong, P.W.~ Wong, J.F.H., &
Roe~ B.A., N u c l . A c i d s Res. 8, 5213 (1980).
33. Areari, P., & Brownlee, G.G., N u c l . A c i d s Res., in press.
34. Barrell, B.G., Anderson~ S., Bankier, A.T., de Bruijn, M.H.L.,
Chen, E., Coulson~ A.R.~ Drouin, J., Eperon, I.C.~ Nierlich,
D.P., Roe, B.A.~ Sanger, F., Schreier, P.H., Smith, A.J.H.~
Staden, R., & Young, I.G., Proc. Natl. A c a d . Sci. U . S . A .
77~ 3164 (1980).
35. Heckman, J.E., Sarnoff, J.~ Alzner-De Weerd~ B.~ Yin~ S., &
RajBhandary, U.L.~ Proc. N a t l . A c a d . Sci. U . S . A . 77,
3159 (1980).
36. Bonitz, S.G., Berlani, R., Coruzzi, G., Li, M., Macino, G.,
Nobrega, F.G., Nobrega, M.P., Thalenfeld, B.E., &
Tzagoloff, A., Proc. N a t l . A c a d . sci. U . S . A . 77, 3167
(1980).
37. Montoya, J., Ojala, D., & Attardi, G., N a t u r e , in press.
38. Ojala, D., Montoya, J., & Attardi, G., N a t u r e , in press.