Download Sequence Analysis of the DNA Encoding the Eco RI Endonuclease

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transformation (genetics) wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Nucleosome wikipedia , lookup

Transposable element wikipedia , lookup

Metalloprotein wikipedia , lookup

Molecular ecology wikipedia , lookup

SNP genotyping wikipedia , lookup

Gene expression wikipedia , lookup

Amino acid synthesis wikipedia , lookup

DNA supercoil wikipedia , lookup

Multilocus sequence typing wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Molecular cloning wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

RNA-Seq wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Genomic library wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Restriction enzyme wikipedia , lookup

Non-coding DNA wikipedia , lookup

Genetic code wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Biosynthesis wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Community fingerprinting wikipedia , lookup

Point mutation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
THEJOURNAL OF BIOLOGICAL
CHEMISTRY
Vol. 25fi.No. 5, Issue of March 10. pp. 2143-2153.19RI
Printed Ln U S A .
Sequence Analysis of the DNA Encoding the Eco RI Endonuclease and
Methylase*
(Received for publication, October 22, 1980)
Patricia J. GreeneS, Madhu Gupta, and HerbertW. Boyer
From the Howard Hughes Medical Institute and the
Department of Biochemistry and Biophysics, University of California,
San Francisco, Sun Francisco, California 94143
William E. Brown
From the Depurtment of Biological Sriences, Curnrgic-Mellon lintrvrstty. Prttrhurgh, Prnnsvfrwniu 15213
John M. Rosenbergg
From the Department
of
Biological Suences, Unioersity of Pittsburgh, Pittsburgh. Pennsyltaniu 1.5260
acid sequences
The Eco R1 endonuclease and methylase recognize RI endonuclease and methylase and the amino
the same hexanucleotide substrate sequence. We have of both enzymes deduced from the DNA sequence.
determined the sequenceof a fragment of DNA which
The source of DNA for this analysis is the plasmid, pMB1,
encodestheseenzymesusingthechain-termination
and derivatives constructedby recombination in zlitro. pMBl
method of Sanger (Sanger,F., Nicklen, S., and Coulson, was obtained from the original clinical strain of Escherichia
A. R. (1977)h c . Natl. Acad. Sci. U. S. A. 74,5463-5467). coli found to have the Eco RI host specificity. It is closely
The amino acid sequences of both enzymes were de- related to ColE1. Both plasmids encode colicin immunity and
rived from the DNA sequence. The coding regions seproduction and have identicalreplication properties. pMBl is
lected include the only open translational frames of larger by approximately two kilobases, the amount required
sufficient length to accommodate the enzymes. They to accommodate the Eco RI genes. Examination of heterocoincide with previously established gene boundaries
duplexes of pMBl with ColEl
by electron microscopy revealed
and orientation. The predicted amino acid sequences
no mismatched regionsexcept theEcoRI
genes ( 6 ) . The
of the purified protein.
correlate well with analyses
approximate boundariesof the EcoRI genes have been deterComparison of the nucleotide and protein sequences
reveals no homology between the endonuclease and mined by subcloning restriction fragments of pMB1, and the
methylase which might provide insight into the origin direction of transcription has been shown to be endonuclease
to methylase ( 4 ) (Fig. 1).
o f the restriction-modification system or the mechaT h e 2234-base pair DNA sequencepresentedhere
was
nism of common substrate recognition. Based on secdetermined by the dideoxyribonucleotide chain-termination
ondarystructure predictions, thetwo enzymesalso
method of Sanger (7). Single-stranded template DNA was
have grossly different molecular architecture. The base
obtained by cloning the Eco RI genes in M13mp5, a single
composition of the sequence is 65% A + T, and the
that ob- strand bacteriophage vectordeveloped by Messing and hiscocodon usage is significantlydifferentfrom
served in several Escherichia coli chromosomal genes.
workers (8, 9).
In some cases, frequently selected codons
are recogIn the accompanying paper, Newman et al. (10) report a
nized by minor tRNA species. A spontaneous mutation sequence analysis of the Eco RI genes contained in pMB4. a
in the endonuclease gene was isolated. Serine replaces derivative of pMBl which determines ampicillin resistance
arginine at residue 187. In crude extracts, Eco RI spe- but not colicin production.These twoplasmids have been
cific cleavage is-0.3% wild type.
maintained as separate laboratorylines for approximately 10
years (6, 11). TheDNA sequence in the accompanying paper
was determined by the method of Maxam and Gilbert (12).
The two DNA sequences, obtained in different laboratories
TheEco RI restrictionendonucleaseand
modification
methylase,togetherwiththeirDNAsubstrate,
provide a using differentplasmids assources of DNA and different
model system for probing the molecular mechanisms of se- methods of DNA sequencing, are identical in both thecoding
and noncoding regions. We are, therefore, confident that the
quence-specific DNA-protein interactions. The two enzymes
(1, nucleotide sequence and the predicted amino acid sequences
recognize the same nucleotide sequence d(-GAATTC-)
2) but differ in several aspectsof interaction with the substrate are correct.
(see Ref. 3 for a recent review). The endonuclease and methMATERIALS A N D METHODS’
ylase have been purified to homogeneity andx-ray diffraction
Figs. 1 and 2 depict the plasmids and phage which carry the E C O
( 4 , 5 ) .We HI genes from pMR1.
analysis of the crystalline endonuclease is underway
report here the sequenceof the DNA which encodes the Eco . _ _ _ _ _ _ _ _ _ _ ~
_”
_________”
* The costs of publication of this article were defrayed in part by
the payment of page charges. This article must therefore be hereby
marked “aduertisernent” in accordance with 18 U.S.C. Section 1734
solely to indicate this fact.
Recipient of Grant GM 25729-03 from the National Institutes of
Health.
8 Recipient of Grant GM 25671 from the National Institutes of
Health.
*
’ Portions of this paper (including “Materials and Methods,” Figs.
6 to 8, Tables IV and V, and additional references) are presented in
miniprint at the end of this paper. Miniprint is easily read with the
aid ofa standard magnifying glass. Full size photocopies are available
from the Journal of Biological Chemistry, 9650 Rockville Pike, Bethesda, Md. 20014. Request Document No. $OM-2241, cite author(s),
and include a check or money order for $7.60 per set of photocopies.
Full size photocopies are also included in themicrofilm edition of the
Journal that is availablefrom Waverly Press.
2143
Eco RI: Gene and Protein Sequences
2144
Sal1
SEQUENCING STRATEGY
EcoRII
? pPG30
0
Hindm
e
0
EcoRlI
EcoRI
HincII
FIG. 1. Eco RI-containing plasmids. The approximate boundaries of the Eco KI genes were determined by subcloning restriction
fragments of pMBl (4, 6, 13). The 2300-base pair Eco RII fragment
indicated by the inner heaty line contains all of the information
necessary for normal expression of both genes. The 1280-base pair
PstI-Hind111 fragment indicated by the outer heavy line contains the
methylase gene. pPG30 and pPG31 contain the 2300-base pair E ~ C
KIIfragmentinserted
into the Eco HI site of pBH20 (a pBR322
derivative) (14)in opposite orientations relative to the lac promoter.
The arrouw indicate the direction of transcription from the lac, tet,
and amp promoters. pPG30 and pPG31 were constructed by filling in
the Eco HI ends of the vehicle and theEco RII ends of the fragment
using T4 DNA polymerase. The resulting blunt-ended fragmentswere
joined using T4 DNA ligase (4). The direction of transcription of the
Eco KI genes was determined by measuring the effect of the lac
promoter on the specific activity of the Eco RI enzymes in strains
containing these plasmids. In strains containing pPG31, the endonuclease and methylase levels are both elevated by growth in glycerol
Enand by the addition of isopropyl-1-thio-P-D-galactopyranoside.
zyme levels are not affected by these growth conditions in pPG30 or
pMB1. ENDO, endonuclease; METH, methylase.
J
M13-RI recombinant phage were used to prepare template DNA.
A HindIII fragment which spans the Eco HI genes was isolated from
a partial digest of pPG30 and inserted into the HindIII of
site
M13mp5
(Fig. 2 4 ) . Isolateswith each of theorientations of thefragment
relative to the phage DNA were obtained to provide templates of
both strands of the Eco HI genes. Opposite orientations were identified by assessing the ability of phage DNA from two M13-KI isolates
to form double-stranded DNA in the region of interest. M13-KI
recombinant phageareunstable,
eliminating at high frequencya
segment of DNA approximately the same size as the insert encoding
the Eco RI enzymes. In spite of this problem, we were able to obtain
high quality templatefor sequence analysis. The cloning experiments,
analysis of therecombinant phage, andtemplatepreparationare
described in the miniprint.
Primers wereisolated from restriction endonucleasedigests of
pMBl and pPG30 DNA as described in the miniprint. A synthetic
primer, 5' p-C-C-C-A-G-T-C-A-C-G-A-C-G-T-T-OH3', which hybridizes to M13mp5 adjacent to the
cloning site, was a gift from Dr.
Roberto Crea (Genentech, Inc.). The
hybridization site of this sequence is shown in Fig. 2 A . Restriction sites used to prepare primers
and a summary of sequencing experiments are shown in Fig. 3.
The absolute orientation of generated sequence was deduced by
using primers with different restrictionsites at theends. For example,
when the smaller HindIII-Pst I fragment was used as a primer with
one template orientation, the reaction mixtures were divided in half
and digested with either Pst I or H i n d I I . Readable sequence was
obtained withonly oneset of cleavage products.Whensequence
could be read following Pst I cleavage, the orientation, designated a,
corresponded to the direction of transcription of the Eco RI genes.
When sequence could be read following HindIII cleavage, the ori-
( A)
EcoRI(Hlnf1)
i/
ECORIl
Hmdm EcoRII
HindIII
Tat
t
Htndm ECORI
P51r
- ....\.
pPG30
ym-
+METH
ENDO
t OP L a c
-
H l n d Ill
-+
az
Z t O P Lac
M E TEHN D O
I
+MEEN D O
LOCPO4Z
TH
toc
E c oERcI o R I I
pMG31
-
Hindm
... .
Lac P O +
PstI
HindlII
4
T eMt E N
TH
DO
EcoRI EcoRlI
pMG31-6
Hlndlll
Hlndm
I
PstI
-,,=,,, v
Lac P O
-+
+ ENDO
FIG. 2. Recombinant plasmids and phage carrying the Eco
RI genes from pMB1. The distances between restriction sites near
the end of the Eco RII fragment from pMBl are exaggerated to show
details of gene transfer. The endonuclease and methylase genes are
indicated by h e a y lines. Arrows show the direction of transcription
HI promoters. DNA originatingfrom
from the lac, tet,andEco
pBH20, the vehicle used to construct pPG30 and pPG31, is indicated
by hatched lines. ENDO, endonuclease; METH, methylase. A , derivation of MIS-RI hybrid phage: M13mp5, designated by w a ~ ylines,
contains a HindIII site in the DNA encoding the lacZ a peptide (8,
9). The HindIII fragment from pPGSO, which was inserted into this
METH
"
+ Tet
site, lacks approximately 40 nucleotides from the methylase end of
the Eco HI1 fragment of pMBl and contains 34 nucleotidesfrom
pBH20. The two orientations of the Eco RI genes relative to theM13
are designated a and b. B, derivation of pMG31 and pMG31-6: pMG31
is analogous to pPG31 except for deletion of a HindIII fragment of
approximately 70 base pairs. pMG31-6 was constructed by isolating
the 1685-base pair HindIII fragment from MI3-RI6a R F DNA and
inserting it back intopPG31. MI3-RI6a contains a spontaneous mutation in the endonuclease gene marked (V in M13-RI6a and pMG316.
2145
Eco RI: Gene and Protein Sequences
A
t
Hlnd
I
I
I
Hlnd JII
EcoRIJ
Pst
I
Avo
I
Eco R I I
HindDl
1
II
Y
II
Sau3AI
I
I
FnuD
1
I
342
1
I 1
1
ENDONUCLEASE
I
I
0
500
1
1171 1205
I
1000
I
I
1
METHYLASE
I
I500
2 0 0 PAIRS
0 BASE
FIG. 3. Summary of sequencing strategy. Nucleotide number-
above. The Taq I site indicatedby a dotted line overlaps a methylated
Mbo I site andis not cleavedby Taq I (15).B , sequencing experiments.
ing starts with the Eco RII siteof pMB1. A , restriction sites used to
The start of a DNA segment is indicated by 0 and a letter which
make primers. The HindIII site at the left marked t occurs only in
designatedtherestrictionsite:
A , Aua I; F, FnuDII (Tha I); H,
pPG30, and the DNA between the HindIII and Eco RII sites origiHindIII; h, H i n f f ;S, Sau3AI (Mbo I); T,Taq I; Syn, synthetic. The
nates from pBHZ0 (via pPG30). pMBl DNA was digested with Eco
HII, and the 2300-base pair fragment carrying the Eco RI genes was length of the solid line corresponds to the length of sequence read.
off prior to electrophoresis.
DNA segments
isolated. This was further digested with the
following enzymes either Long primers were cleaved
V. Several primers were
read from cleavage points are designated
singly or in combination: H i n f f ,Taq I, FnuDII, and Snu3aI. pPG30
A L JI,~and PstI; and the four digested with Exonuclease111; * designates sequence read from these,
DNA was triply digested with HindIII,
and the dotted line indicates the distance from the
5' end of the
fragments which span the Eco RI genes were isolated. These were
primer that readable sequence began.
used directly as primers or further digested with the enzymes listed
entation was designatedb. These assignments were further confirmed dodecyl sulfate gel electrophoresis (16). The size of the methby use of the synthetic primer.
ylase predicted from the nucleotide sequence is 38,050 daltons
RESULTS AND DISCUSSION
We havedeterminedthesequence
of a 2234-nucleotide
DNA fragment from pMBl by the sequencing strategy outlined. The nucleotide sequence together with the amino acid
sequences of the Eco RI endonuclease and methylase are
presented in Fig. 4. Most of the DNA sequence was determined onbothtemplates
with morethanone
primer. In
sections where only one primer was available, or only one
strand was used as template, reactions were performed more
than once with the sameprimer. The sequence in lower case
letters represents DNA from the M13 cloning vector and from
pBH2O (via pPG3O). The numbering of the nucleotidesequence begins with DNA originating from pMB1.
The Amino Acid Sequences
Agreement of Predicted Sequenceswith Experimental Obseroations-The methionines selected as startcodons for the
endonuclease and methylase precede the only open-reading
frames of thecorrectlengthtoaccommodatethesubunit
molecular weights of the two proteins. The size of the endonuclease predicted from the nucleotide sequence is 31,065
daltons compared with 29,500 daltons measured by sodium
compared with 36,000 daltons measured by sodium dodecyl
sulfate gel electrophoresis (16).Other potentialreading frames
in either orientation have frequent translational stop codons
with no open translational frame starting from an initiation
codon specifying a peptide over 49 amino acids. The direction
of transcription and translation corresponds to the direction
predicted from measuring the effect of the lac promoter on
levels of Eco RI endonuclease and methylase in strains containing pPG3O and pPG31. The boundaries of the translated
segments fall within the gene boundaries established by subcloning experiments (Fig. 1) (4, 6, 13).
Theamino acid sequences derivedfrom the nucleotide
sequence correlatewell with available data derived from analyses of the proteins. NH?-terminalsequence analysis by
Hsiang' confirms the startpoint selected forthe endonuclease.
There is reasonable agreement between previously published
amino acid compositions of the endonuclease and methylase
and the amino acid compositions predicted from the nucleotide sequence (Table I). Further substantiation
of the predicted amino acid sequence of the endonuclease was accornplished by comparing the composition of peptides obtained
' M. Hsiang, personal communication.
Eco RI: Gene and Protein Sequences
2146
TTTCGGGTGGmGlTGCCATTTTTACCTGTCT~TGCCGTGATCGCGATGAAC~G~TTAGC~GTGCGTACYTTAAGGGATTATGGT~TCAAACGTATGTTAATCTATCGACATATGT
308
10
S e rP h eA r gT y rA r gA s pS e r
TCA TTT CGA TAT AGA
GAT
AGT
ATA
AAG
ACA
I l e L y sL y sT h r
90
Asn S eSr e r
I l e L yPsrA
o sG
p lG
y ly
AAT TCc AGC A W AAA CCT
GAT
GGT
GG:
G 1 U I l e Asn G l u A l aL e uL y sL y s
11.2 A s pp r o ASP ~ e G
u l yG l y" h i
GAA ATA M T GAA
GCT
TTA
MA AAA ATT
GAC
CCT GAT CTT GGC
GGT
ACT
TTA
100
I l e V a l G 1 U Val L yAs s p
ASP ~ y r
ATT
GTA
GAG
AAA GAT
GAT
TAT
GGT
110
~ l y
~ l vua l val
~ r L~~
p
v a l ~1~
GAA TGG AGA GTT
GTA
CTT GTT
GCT
GAA
GCC
~ e pu h e v a l
TTT
GTT
TCA
ser
~1~L~~
~1~
y
593
CAC
CAA
686
12 0
G l y LyS Asp I l e I l e Asn I l e Arg Asn G l y LeuLeu
GGT AAA GAT ATT
ATA
AAT
ATA
AGG AAT
GGT
150
H i s LyS Asn I l e S e r G 1 U I l e A l a AsnPhe
:AT
AAG
AAT
ATA
TCA
GAG
A
GT
C!G
130
140
Val G l yL y sA r gG l y
Asp G l nA s pL e u
Met Ala la G l y ~ s nla 11e ~ l u
S~e r r g
T X TTAGTT
GGG AAA AGA
GGA
GAT
CAA
GAT
TTA A X GCT GCT GGT AAT GCT
ATC
GAA AGA TCT
179
160
Met Leu S e r G l u S e r H i s P h eP r oT y r
AAT TTT :X
LBO
CT(: TCT GAG AGC CAC
210
220
T y rG l y
Met P r o I l e Asn S e r AsnLeuCys
I l e Asn L y sP h eV a lA s n
TAT GFA ATG CCT ATA
AAT
AGT U T CTA
TGT
ATT
AAC AAA TTT GTA
AAT
240
G 1 U T r p ASP S e rL y s
AGG GAG
TGG
FAT TCG A M
Phe G i u LST
L
Gyeh
elsru
n
r
TTT GAA CAG CTT ACA TCT AAG
2 50
I l e Met P h eG l u
AT
A s pA r gL e uT h r
Ala A l aA s n
GAT CGA CTA ACT GCA GCT AAT
965
His L y sA s p
CAT AAA GAC AAA AGC ATT
ATG
CTA
CAA
GCA
GCA
,
I l e S e rT h rT h r
TlT FAT ATA
TCA
ACG
ACT
10
AlaLeu
Leu
Lys
Asn
Thr
AGA AAT
GCA
ACA
AAC
Met Ala Asn
Arg
ATG
GCT
AAG AAC GGA
H i s P h eS e rA s p
TTC
TCT
GAT
1435
100
T y rG l uT y r
H i s L y sG l u Asn G l yL y s
GAA TAT
CAT
M A GAA AAT
GGA
AAG
TTT TAC TAT
130
I l e AspLeuLeuLysLysSerAsp
AAA TCA
GAT
GfT CTG
CTA
I l e L y sT y rA s pL y sL y sP h eL e u
AAG TAT
GAT
RAG
TTC
I l e V a lV a lT h r
$TT
GTT
GTT
ACG
ATT
GCT
AAT
GTT
AAT
TCA
ATA
ACA
200
"
A
:
I l e V a lP r oG l u
His
GGA TTT ATT
GTT
CfA GAG CAT
220
G 1 U Ala Arg I l e A s pS e rA s nG l y
GAG GCG AGA ATT GAT TCT AAT
GGT
2 50
240
P h e G l y Asn G l uS e r
S e r T y rP r oL y sT y rA s pA s n
Leu P r o Leu T h rA r gL y sT y r
P h e I l e Arg His L y sA s p
AAA TAT
GAT
AAT
CCA
AAT
AGT TCA TAT
TTT ATT AGG CAT AAA GAC TTG CCT CTT ACA AGA M A TAT
GGG
GY
1807
2 30
AsnArg
I l e I l e S e r P r o AsnAsnCysLeuTrpLeuThrAsnLeuAspVal
Y T AGA ATT
ATC
TCG CCA
AAC
AAC
TGC
TTA
I*;G CTA
ACT
TTT
1621
1714
190
2 10
1528
I l e I l e Ala Asn Val Asn S e r I l e T h r
T y rL y sG l uV a lP h eA s n
Leu I l e L y sG l u Asn L y s I l e T r p Leu G l y V a l His L e uG l yA r gG l yV a lS e rG l yP h e
TAT AAA GAG GTG TTT AAT
CTA
ATT
F G GAA
AAT
ATT
TGG
CTT YGG GTT
CAT
CTC
GGG AGA GGT GTT
TCT
Y C CTA
GAT
GTC
260
TyT Asp Ala I l e Asn V a l
TAT
G
T:
GCT ATA AAT GTA
1900
1993
280
2 70
2200
Val P h e
LysLeu
AAA TTA
GTT
TTT
170
CTT AT:
180
LY s
AAG
AA$
160
Asn P r oP r oP h eS e rL e uP h eA r gG l uT y rL e uA s pG l nL e u
AAT CCT
CCA
TTC
TCG
TTA
TTT AGA
GAG
TAT CTT GAT
CAA
CTA
ATT
Asp I l e P r o
Asn L y sT h rL y s
GAT
ATT
CCA
AAC
AAA tCA AAG
300
L y sG l y Val
I l e L y sP h eA r g
ATA AAA T l T AGA
M G GGT
GTT
1249
Asp AsnLeuGlyLeuLysLysLeu
I l e Ala S e rC y sT Y K
TTT GAT
AAT
CTT GGC
TTG
AAA AAG TTA
ATA
GCA
TCT TGC
TAT
T y rA r gG l u
TAC
AGA
GAG
CAC
150
T y r G1u L e uT y rG l yT h r
TAT GAA
TTA
TAT
GGT
ACT
AAA AAA
H i s L y s Ala Lys
Lys
AAG TTA
CTG
CAC
AAA
GCT
1342
110
1 20
A s pA s p
I l e Ser V a lS e rS e rP h eC y sG l yA s pG l yA s pP h eA r gS e rS e rG l uS e r
GAT
GAT
ATT
AGT
GTT
TCT
TCT
TGT
GGC GfT GGC
GAT
TTf CGC
AGT
TCG
FAG
AGC
ATT
140
1151
L y sV a l
Val T y rC y s
GTT
GTT
TAT
TGC
90
Ala L y s Asn G l yP h eT y r
L y sG l uG l yP h e
Ser S e rS e rG l uA l a
GTA GAG AAT AAA GAA GGT TTT TCT AGT
AGC
GAA GCC
GCG
T X
70
TTT AAA TAT TTT GCA
GTG
AAT
Val G l uA s n
1058
40
50
60
Asp Asp P r o Arg Val Ser A s nP h eP h eL y sT y rP h eA l aV a lA s nP h e
80
ACT
SeK L e uA r gV a lL e uG l yA r gA s pL e u
TCG CTC AGA GTG TTG GGG CGT
GAC
30
AGA GTA
AGC
AAT
TTC
TCT ATA
TAT
2 70
I l e Met PheAsp
1200
TGATATTTTTTATTTTAATAAGGTTTTAATTA
20
230
~ y Ss e r I l e Met Leu G l n A l a Ala S e r 11e T y r T h r
S e rL y sS e r
ASP G l uP h eT y r
T h rG l nT y rC y s
Asp I l e G l u Asn G l u L e uG l nT y r
TCG AAA AGC
GAC
GAA
TTT TAC ACT
CAG
TAT
Y T GAT ATT GAG AAC GAA CTG
CAA
TAC
AsnCys
AAT TGT
GAT
GAT
CCT
872
260
ATG TTT GAA ATA
A%
,"_"
A
A
:
l Gu l y S e r Asn P h e ~ e u
T h rG l u
TTT TTA
GGG TCT
ACA
$AA
190
200
V a l AsnLeu
G l u T y r Asn Ser G l y I l e Leu Asn Arg
GTT GTT AAT CTT GAG TAT AAT TCT GGT ATA TTA AAT AGG TTA
Asn I l e S e r I l e T h rA r gP r oA s pG l yA r gV a l
AAT ATC
TCA
ATA ACA AGA CCA
GAT
GGA
AGG
G l nG l yA s pG l yA r g
CAA
GGA
GAT
GGF
170
Leu p h e Leu ~
TTT CCT TAC VGTCa l FTT
TM: TTA
GAG
Leu AspTyrAsnGly
TTA
GAT
TAC
AAT
290
Val Met G l yV a l
P r o I l e T h rP h e
GGG GTT ATG GGG GTTCCT
TTC
ATC ACA
310
A s pG l uL y sA s pL e u
GAT GAA AAA GAT
TTG
Ser I l e A s nG l yL y s
TCT ATA
C y s P r oT y r
CCT
TAT
Y T GGT AAA TG:
Leu H i s Lys PheAsn
P r o G l u G l nP h e
TTG CAT AAG TTT AAC CCT GAG CAA
TTT
G l uL e u
GAG TTA
32 0
L e uG l n
P h e Arg I l e Leu I l e L y s Asn LysArg
TTA
CAA
TTC
TTG ATA AAA AAC
EGA
:GA ATT
e
-
2080
2179
TAATTGATETTTGTTAG~TTTcTTG$GATCATTAGf?T
FIG. 4. Nucleotide sequence of the DNA cloned in M13mp5 loss of onehasepair.Nucleotidenumberingstartswithsequence
and amino acid sequences of the Eco R1 endonuclease and originating from pMBl (upper m s e letters). Aminoacidnumbering
methylase. Sequence originating from M13mp5 and
pBH2O (via starts with the initial methionine
of each protein. Homology with the
pPG30) is depicted in l o u w case lettem. T h e boundaries are marked
3' end of 16 S ribosomal RNA is indicated with tuaty underlines.
with I . Two Eco HI sites precede the HindIII site of M13mp5. The
Inverted repeats discussed in the text are underlined with arrou1.s.
Kco HI site expected in pPG30 has been converted to a Hinfl site by
Eco RI: Gene and Protein Sequences
2147
predicted amino acidsequences. They find t h a t t h e N H r
in purified
terminal alanine as
well as the methionine is absent
methylase.
Search for SequenceHomologies-Since the endonuclease
Methylase
Endonuclease
and
methylase recognize a common substrate sequence, and
Predicted
Predicted
the presence of the endonuclease without a functional methfrom nucleofrom nucleo- Reported,z
tide setide seylase is apparently lethal, Boyer et af. (16) postulated that
quence"
quence"
the two enzymes might haveevolved from a commonancestral
Ala
15
15.6
IO
9.8
gene and suggested that sequence homology would be found
Arg
14
12.9
13
12.0
between the two proteins in regions involved in site recogni23
Asp
61.1
40.0
tion. Subsequently, physical characterization of the enzymes
38
Asn
and studies on their reaction mechanisms have revealed excvs
1
1.8
7
6.5
Glu
tensive differences in the way the two enzymes recognize the
25.7
26
29.0
Gln
common substrate sequence (3,4).
Comparison of the nucleoGlY
21
16.8
17
17.8
tide
and
amino
acid
sequences
of
these
two enzymes reveals
7
5.6
His
6 6.6
no extensive regions of sequence homology, and it now seems
23
19.5
23
21.9
Ile
likely that recognition of the same DNA sequence by these
28
26.0
28.2 Leu
27
two enzymes represents convergent evolution from unrelated
LYS
22
21.1
35
30.4
1
0.9
Met
6
4.7
precursors. This is not the only example of two nonhomolo23
29.7
11
16.2
Phe
gous proteins recognizing the identical DNA sequence. The
1
0.9
Pro
6
4.7
products of the cI and cro genes of phage h bind to the same
23
29.7
24
18.9
Ser
set of operatorsequencesbut
show no obvioussequence
9
8.4
10
9.9
Thr
homology (17).
2
2
Try'
The amino acid sequences of the endonuclease and methTYr
8
7.6
21
17.9
Val
16
15.5
19
16.5
ylase do containa number of common tri- anddipeptides, and
Newman et al. (10) have identified a region with five out of
" NH,-terminal methionine not included.
Reported mole per cent values (16) are multiplied by 276 residues nine homologous amino acids. Solution of the tertiary strucfor the endonuclease and 325 for the methylase, the totals obtained ture will reveal whether any of these have functional homolfrom the predicted sequence excluding the NH!-terminal methionine.
ogy.
' Tryptophan was not determined.
Secondary
Structure
Predictions-The
translated
sequences of the proteinswere analyzed for expected secondary
TABLE
I1
structure using the method of Chou and Fasman (18, 19).
Amino acid composition of cyanogen bromide peptides
Newman et af. (10) present similar secondary structure prePeptide positions
dictions based on their identical sequences. While their pre138-157
252-255
256-277
Amino acids
dictions differ from ours in detail, the generalimpressions
gained from both are the same, particularly the substantial
differencesbetween the endonuclease and methylase. The
0.1
4
Ala 3.6
results of our analysis are presented in the miniprint (Tables
2
2.5
1
1.7
A%
IV and V and Fig. 7 and 8) together with a discussion of the
2.0
2
0.2
3
3.3
ASP
differencesbetween the two predictions. Since the current
2
2.1
2
1
1.1
Glu2.0
state
of the art of secondary structure prediction is imperfect
1
1.1
1
1.2
0.1
Gly
(20, 21), these predictions should be viewed with appropriate
1 1
.o
0.1
His
1
1.0
3
3.1
1
1.0
Ile
caution, and are mainly useful in suggesting further experi4
3.9
0.2
0.3
Leu
ments such as the location of sites for specific mutagenesis.
1.0
1
1
1.2
LYS
With this caveat in mind, we have drawn the following con1
0.5
1
0.7
Met"
clusions.
1.a
2
0.8
1
1
1.o
Phe
1) The endonuclease and methylase exhibit gross differ3
2
1.8
0.2
3.6
Ser
ences in predicted molecular architecture. The former con3
2.7
0.2
Thr
1
1.0
0.1
Val
tains significantly more predicted (Y helix than the methylase,
Methionine is determined as homoserine in the first two peptides; while the latter contains substantiallymore amino acid residues which are predicted to be neither a helix nor /3 sheet.
none is found in the third, indicating it is the COOH-terminal peptide.
Specifically, the endonuclease is predicted to be 35% (Y helix
after cyanogenbromidecleavage
of the proteinwith the and 26%p sheet. Newman et al. (10) predicted 39% and 17%
corresponding peptides predictedfrom
the nucleotide se- for a helix and p sheet, respectively. There are two experiquence. Table I1 gives the compositions of three peptides
mental measurements of a helix and p sheet in the endonuincluding residues 138-157 near themiddle and 252-277 at the clease: Goppelt et af. (22) obtained values of 36% a helix and
COOH terminus of the protein. The close match between the 21% /3 sheet, while Newman et af. (10) report 31% and 13%,
measured and predicted values confirms the postulate that
respectively. There is very good agreement of all these results
the peptides predicted from the nucleotide sequence exist in for a helical content. The p sheet values are more variable,
the purified protein. These data strengthen our
confidence but all values indicate that the endonuclease contains more
that the derived amino acid sequences are correct. Hsiang's
a helix than /3 sheet, Our prediction for the methylase is 19%
NHZ-terminal sequence analysis indicates that the methionine
a helix and 30% p sheet; Newman et al. (10) predict 18%and
is not present in the purified endonuclease.' Comparison of 27% for a helix and p sheet, respectively. In both predictions,
the predicted and measured amino
acid compositions suggests the p sheetcontent is greaterthanthe
a helix content.
that the NH?-terminal methionine is also removed from the Newman et al. (10) report circular dichroism results of 11% (Y
methylase.
helix and 5% /3 sheet for this enzyme; however, they regard
Newman et al. (10) present additional data verifying the these values with caution. Indeed, caution is required in the
TABLEI
Amino acid composition of the Eco RI endonuclease and
methylase
;:1
- i -
"
'i}
2148
Eco RI: Protein
Gene and
Sequences
interpretation of all these secondary structure estimations.
stacking distance in helical DNA.
Even so, the differences between the two molecules appears
A notable feature of the methylase sequence is the repeat
significant.
of the tripeptide,Leu-Ile-Lys, fourtimes beginning at residues
2) Eco RI endonuclease can be divided into two domains of 153,177,294, and 318. The fist pair and the lastpair are each
dissimilar architecture. The region from residues 1 to 174 is separated by21 amino acids. Three of the tripeptides fa11
predicted to consist of 44% a helix and 18%p sheet in alter- within predicted a helices while the fourth overlaps a prenating segments; this is a well known motif in protein struc- dicted p sheet strand. It remains
a matter of speculation
tures, forming the basis of the domainswhich bind nucleotides
whetherthispatternhas
asignificant relationshiptothe
in avariety of oxidation-reductionenzymes(23).Residues
function of the methylase.
175-277 are predicted to contain20% a helix and 39%/3 sheet.
Additional limited homology in the amino acid sequences
Besides the differences in amount of predicted (Y helix and /1 adjacent to the two pairs and underlying genetic homologies
sheet, the NH,- and COOH-terminalregions appear different are discussed by Newman et al. (10).
in a more subtle sense. The predictions are quite consistentin
the NH2-terminal region whereas considerable ambiguity is
The Nucleotide Sequence
encountered in the COOH-terminalregion; there are competIn comparing the nucleotide sequence derivedfrom the two
ing tendencies to predict (x helix and p sheet in the same
region. The differences in statistical associations of amino acid templateorientations represented by M13-RI6a and M13RI13b, a perfect match was obtained with two exceptions.
residues in these two regions of the molecule support the
prediction of two domains, even if details of the predicted
One of these proved to be a mutation which will be discussed
was foundin the noncoding
secondary structure do not prove to be accurate. The predic- later. The other unclear area
tion of two structural domainsin the endonuclease is interest- region 168-181. This 14-base pairregion contains 12 G-C pairs
ing since thereis considerable data tosuggest that other DNA and the following restriction sites: one Hha I, two FnuDII
(Tha I), oneFnu4H1, and one Puu I site which is also a
h repressors and
recognition proteins, for example, the lac and
Sau3AI (MboI) site. Thepresence of at least one FnuDII and
CAP, aredivided into two functional domains (17,24).
Limited
proteolysis and site-directedmutagenesis will be employed to the Hha I and Puu I sites was established by analysis with
these enzymes. The Puu I site is at the center of a 10-base
further analyze the endonuclease for the presence of different
pair sequence with2-fold symmetry. Thiskind of symmetrical
functional domains.
Both enzymes exhibit clusterings of amino acids which are sequence rich in GC residues can form secondary structure
potential points of contact with substrate. The most obvious which leads to compression and poor resolution on sequencing
of these are several clusters of basic residues, which might be gels (26) and may also interfere with enzymatic chain elongation. In this case, we were able to read sequence through
expected to interactwith phosphate moieties in the backbone
the Sau3AI site on both templates. From the appearance of
of DNA. Many of these occur in regions predicted to be devoid
of secondary structure,i.e. in loops between secondary struc- autoradiographs of the sequencing gels, it was clear that the
ture elements. In enzymes withknown structure, similar loops sequence immediately following the site could not be read
accurately. Therefore, thesequence presented was established
commonly form the active sites. The remaining basic clusters
in the Eco RI enzymes can be found in regions predicted to by combining the apparently normal regions from each template. An inverted repeat which involves most of these resiform a helices, primarily at the COOH end of those helices.
These clusters could also be playing a structural role since dues is marked in Fig. 4. Nucleotides 170-183 and 220-233
there is a statistical tendencyfor helices to exhibit this charge would pair (in mRNA)with an energy of -35.2 kcal (27). The
not
distribution (18,19). Both
enzymes also containclusters which function of this noncoding region of the sequence is known,
are rich in hydroxyls: in the endonuclease, Ser(84)-Asn-Ser- but this is the most impressive free energy change of any of
the possible stem and loop structures in this sequence.
SerandSer(259)-Thr-Thr-Ser; in themethylase,Ser(85)Base Composition a n d Codon Usage-The DNA sequence
Ser-Ser,
Tyr(95)-Tyr-Glu-Tyr,
Ser(ll2)-Val-Ser-Ser,
and
shown in Fig. 4 is 65% A + T, significantly higher than thatof
Ser( 144)-Ser-Glu-Ser.
ColE1 (28) and E . coli (29), the host cell for both pMBl and
Internal Homologies in the Amino Acid Sequences-One
ColE1. Furthermore, a shift in base composition occurs near
of the most striking features of the endonuclease amino acid
sequence is an iterative tetrapeptidebeginning at residue 250, residue 260, so that residues 1-260 are 49% A + T, similar to
where thesequence reads: Ile-Met-Phe-Glu-Ile-Met-Phe-Asp.ColEl and E. coli. The codon usage in the endonuclease and
methylase genes reflects the high A + T content. There is a
The DNA which encodes this region is equally iterative with
only two base changes, one in the wobble position of the Ile strong preference for codons which end with A or T, and in
the case of arginine and leucine, wherethe choice is available,
and the other leading to the Glu-Asp substitution. This is
suggestive of a duplication event in the history of the endo- a preference is also manifest for those codons which begin
nuclease. This octapeptide is predicted to form a strand of p with A or T. This pattern of codon usage is quite different
from that seenin a number of E. coli chromosomal genes, and
sheet. Furthermore, commencing at residue 230, there is a
(Table 111) (30-36).
similar tetrapeptide, Ile-Met-Leu-Gln,which is also predicted especially the ribosomalproteingenes
Post and Nomura (30) have argued that codon preference in
to lie within a strand of p sheet. If this prediction is accurate,
to themost
the methionines and
acidic groupswould project outfrom one the ribosomal protein genes generally corresponds
side of the sheet and the hydrophobic isoleucines and phen- abundant tRNAspecies, reflecting the need for efficient transylalanines would project fromthe otherside, probablytowards lation of these proteins. In most cases, wobble pairing allows
the interior of the molecule. The residues immediately pre- translation of the Eco RI codons by major tRNA species;
ceding both these regions are also similar in character, con- however, exceptions occur. AGA and AGG for arginine and
sisting of aspartate, lysine, and serinein different order. With- AUA for isoleucine are frequentlyused codons in the Eco RI
out further analysis, the function of the region cannot be sequence which are represented by minor tRNA species and
are almost never selected in a number of other E. coli gene
assigned; however the occurrence of this periodicity in predicted p sheets is tantalizing sincethe distancebetween amino sequences (Table 111).The frequentuse of codons represented
by minor tRNA species may affectthe translationalefficiency
acid side chains in /? sheets is 3.4 8, (25), which means that
of these genes. Betlach et al. (6) speculated that pMB1 arose
the repetitive amino acids occur at multiples of the base-
Eco RI: Protein
Gene and
Sequences
2149
TABLEI11
Codon usage (in frequency per 1000)
ENDO
CODON
Leu
4
3
19 S e r
19
13
10
10
2
13
CGG
AGA
AGG
CUU
CUC 8
CUA
CUG
U UA
UUG
UCU
22
ucc
UCA
2
UCG
AGU
AGC
T h r 28 ACU
ACC
3
.ACA
.2 ACG
Pro
CCU
ccc
22
CCA
CCG
Ala
GCU
GCC
38 GCA
GCG
0
27
10
18
7
15
5
33
13
27
2
17
10
10
13
13
23 0
15
3
18
0
10
0
22
3
12
42 5
1
RIBO-
+SOMAL'
METH
COLI M I X 2
1
1
0
4
3
0
61
ENDO
CODON
+SOMAL
METH
GGG
GUU
GUC
38 GUA
GUG
Lvs
AAA
I "
25AAG
18
35
23
5
13
28 5
65
30
Val
2
0
8
13
4
70
10
9
9
RIBO-
COLI M I X
0
5
43
7
13
I
15
70
24
1
7
His
CAU
4
6
5
13
Glu
ASD
GAA
GAG
GAU
13
4
Cys
UGU
a
Phe
UUU
28
60
19
4
1
5
29
68
12
25
7 i
6
19
15
28
AUC
AUA
25
22
30 43
8
22
12
' See Kef. 30.
'Codon usage for lac1 (31), l a c y (32), trypA (33), recA (34, 35), lipoprotein (36), and part of KNAP (30).
by translocation of a segment of DNA into a commonancestor
TGAAAAATGGCAGGTTCGGTGGATTTTGACGGGCTAATGTGGTCTGCACC
113
here
of ColE1. The features of the DNAsequencenoted
TGTGGTCTGCACCATCTGGTTGCATAGGTATTCATACGGTTAAAATTTAT
150
suggest that the Eco RI genes may originate from a species
CTGCACCATCTGGTTGCATAGGTATTCATACGGTTAAAATTTATCAGGCG
156
whose DNA has high A + T content. We are extending our
GCCGTGATCGCGATGAACGCGTTTTAGCGGTGCGTACAATTAAGGGATTA
256
sequence analysis of pMBl and comparing it to ColEl DNA
GAACGCGTTTTAGCGGTGCGTACAATTAAGGGATTATGGTAAATCAAACG
270
sequence in order to determine the boundaries of nonhomolTGCGTACAATTAAGGGATTATGGTAAATCAAACGTATGTTAATCTATCGA
286
ogy.
ACAATTAAGGGATTATGGTAAATCAAACGTATGTTAATCTATCGACATAT
291
Transcriptional and TranslationalSignals-We have preGTAAATCAAACGTATGTTAATCTATCGACATATGTAACTTTATAAAATAA
308
viously suggested that the endonuclease and methylase are
AACGTATGTTAATCTATCGACATATGTAACTTTATAAAATAACAGTGGAA
316
coordinately controlled. The level of both enzymes is altered
by conditions which affect the lac control system when the
- 35
- 10
+1
endonuclease is adjacent to thelac promoter (pPG31) but not
aAa-t"-'TTGACa
_____
- t - - T_
A t A_
AT-_
- - - - C.A _
T-- _ _ _
t
g
9
when the orientation is reversed (pPGSO), and the magnitude
FIG. 5. Possible promoter sequences. The bottom line shows
of the effect is similar for both enzymes ( 4 ) . Accordingly, the
DNA sequence preceding the endonuclease initiation codon homologous regions of published promoter sequences (37, 38). Variwas examined for sequences
homologous to known promoters. ation in spacing between the Pribnow sequence (-10) and the -35
region in wild type promoters is two more or one less than shown;
Severalpartial homologies occur, but none is convincing variation in spacing between the Pribnow sequence and the messenger
enough to select a promoter by inspection of the sequence
initiation site (+ 1) is f 2 . Capital letters designate bases which occur
alone. The potential promoter sequences most distant from in 61% or more of the sequences, small letters designate frequency
in DNA between 46R and 61%. Two smallletters designate a combined
the initiation of translation are probably located
.
number following each of the possible prohomologous to ColEl, but this does not automatically
exclude frequency of ~ 7 0 % The
moter
sequences
indicates
the position of the first nucleotide of the
them as possible choices. The potential promoters (Fig. 5)
Pribnow box, designated *. The sequences at positions 156, 256, 270,
include five sequences that match reported Pribnow boxes 286, and 316 match published Pribnow boxes. Thoseat positions 113,
and four more with the three most
conserved bases (seeRefs. 150,291, and 308 match the three most frequently occurring bases
37 and 38 for recent reviews). Of these, only the Pribnowbox TA---T.
beginning at position 156 has significant homology with the
-35 region, and, in this case, the spacing of two homologous residue 200, but the accompanying Pribnow box would have
in any published
regions is one nucleotide lessthan that found
to be an uncharacteristic -TGCCGT- or -CGTGAT-.
wild type promoter sequence. There is a sequence with very
The mode of regulation of the Eco RI genes has not been
good homology to the consensus -35 region beginning a t elucidated. The restriction-modificationsystemprovidesa
2150
Eco RI: Protein
Gene and
barrier to foreign DNA entering a cell containing both enzymes, but when a plasmid carrying the Eco RI genes enters
a host cell with unmodified DNA, hostDNA is restricted with
low efficiency. The order of translation of the genes would
appear to present a source of difficulty for such a plasmid.
However, since the endonuclease requires at least two subunits while the methylase is active as a monomer (3), the
methylase is functional first in spite of the translational order.
The earlierpresence of active methylase achieved by the
subunit structureof the enzymes may be sufficient to account
for the survival of the genes. Alternatively, there may be a
separate promoter for the methylase. When different fragments (HindIII, HindIII-PstI, andBgl 11-HincII), which contain the methylase but not the
endonuclease gene, are cloned
in avariety of sites and orientations, the resultinghybrid
plasmids always conferthe Eco RI
modification phenotype on
the host cell (4, 6, 13). Methylase specific activity was measured in one of these clones and found to be 5% of normal (6).
Early expression of this amountof enzyme could be functionally important for the restriction-modification system. We are
currently examining promoter activity in the regions preceding both methylase and the endonuclease genes.
The DNA sequence following each of the structural genes
was examined for RNA polymerasechain-termination signals
(38). A series of T residues occurs following both the endonuclease andmethylasestop
codons. Neither T series is
preceded by a GC-rich region as is characteristic of some
transcription-termination sequences. The sequence near the
end of themethylase gene has two hyphenatedinverted
repeats which could form secondary structure in the message
preceding the series of Ts. Unlike other messenger-termination sequences, these overlap the coding sequence. The possibility remains that the true termination sequence occurs
beyond the sequence presented here.
Homology with the 3' end of 16 S rRNA sequence (39) is
indicated in Fig. 4 for both genes. The trinucleotide -GGAwhich occurs twice, 6and 14 base pairsbefore the endonuclease ATG, and the pentanucleotide-TAAGG- which occurs 14
nucleotides before the methylase ATG, are likely ribosomebinding sites. It hasbeen suggsted that the termination
codons
UAA and UGA may also be part of the recognition signal for
ribosomal initiation (40). One UAA codon begins at position
320 preceding the endonuclease Shine-Dalgarno sequence.
Four UAA codons occur in the region of the methylase initiation site.
The sequence of the intercistronic region is over 90% A +
T; in the coding strand, 19 of 32 nucleotides are Ts. The
presence of stems and loops in intercistronic regions with the
ribosome-binding site exposed in the loops has been noted in
a number of sequences (32,41). Residues1144-1155 and 12211232 could pair to forma stem andloop structure (AG = -16.8
kcal) (27). In scanning the entire sequence for possible secondary structure, however, the low potential for the intercistronic region to participate in any stem is more impressive
than the presence of the stem andloop noted.
A Spontaneous Mutation in the EcoRI Endonuclease Gene
The sequences obtained
from the two templates, M13-RI6a
and M13-R113b, differed a t residue 902. The M13-RI6a sequence was GCG and the complementarysequence obtained
from M13-RI13b was CCC. Autoradiographs of sequencing
gels from each templatewere clear and unambiguous. Therefore, template DNAwas preparedfrom two other clones,
M13-RI3a and M13-R122b, representing independent isolates
of the two templates orientations. The sequences read from
both of these terndates match the sequence obtained from
M13-RI13b. M13-Rf6a contains a spo~taneousmutation in
Sequences
the Eco RI endonuclease gene which replaces the arginine at
residue 187 with aserine. Endonuclease activity was measured
in crude extracts of strain 71-18 infected with M13-RI6a and
M13-RI3a. Specific cleavage at Eco RI sites
was undetectable
in M13-RI6a/71-18 under assay conditions in which about 1%
of the activity of M13-RI3a/71-18 would have been detectable.
Extracts of the two strainscontained comparable amounts of
Eco RI methylase.
The specific activity of the Eco RIenzymes is lower in the
M13-RI strains than in plasmid-containing strains. Furthermore we have been unable to measure Eco RI restriction of
h phage in the M13-RI strains even whena substantial amount
of Eco RI endonuclease activity is measurable in vitro. Therefore, we transferred the mutant sequence
back into a plasmid
derivative of pPG31 for further study of the mutant endonuclease.
Two plasmids, pMG31 and pMG31-6 (Fig. 2 8 ) , were constructed. pMG31-6 contains the 1685-base pair Hind111 fragmentcarryingthemutantsequence
from M13-R16a, and
pMG31 contains the analogous fragment from pPG31. Both
plasmids differ from pPG31 (Figs. 1 and 2B) in that the small
Hind111 fragment which spans one of the Eco RI-Eco RII
junctions hasbeen deleted.
The activity of the Eco RI enzymes in E. coli strain 294
(42) carrying pMG31 or pMG31-6 was measured after growth
in minimal medium containing glycerol + isopropyl-1-thio$D-galactopyranoside. These growth conditions yield the highest specific activity of the Eco RIenzymes in pPG31/294 (4).
The specific activity of the Eco RI methylase
in the two
strains is approximately equivalent, 860 and 1090 units/mg of
protein, in pMG31 and pMG31-6, respectively. Endonuclease
activity is measured
in relative numbers because of the nature
of the assayused. We could not use quantitative assayswhich
rely on measuring conversion of circular to linear forms in
plasmids with one Eco RI site (43,44), or on methodswhich
quantitate cleavage by measuring the increase in 5' termini
(45),since these methods do not distinguish Eco RI specific
cuts from nonspecific cuts. Eco RI endonuclease activity is
high enough in normal strains that nonspecific background
cleavage is not a problem. However, to assess activity in the
mutant, we had to rely on the appearance of specific DNA
fragments. Quite good relativevaluescan
be assigned by
comparing different times of digestion and different dilutions
of the extracts. The mutant
endonuclease clearly cleavesDNA
a t Eco RI sites. The level of enzyme activity is 0.3% of the
level observedin pMG31. Furthercharacterization of the
mutant endonuclease is underway.
Achnoulledgments-We thank Patricia Clausen for preparing the
manuscript and Sonja Bock for editing the sequence in the computer
to produce Fig. 4. We also thank Keith Backman and Jon Lawrie for
comments on the manuscript and Dr. Paul Modrich for interesting
and useful discussions of results prior to publication.
REFERENCES
1. Hedgpeth, J., Goodman, H. M., and Boyer, H.
W. (1972) Proc.
Natl. Acad. Sci. U. S. A . 69, 3448-3452
2. Dugaiczyk, A,, Hedgpeth, J., Boyer, H. W., and Goodman, H. M.
(1974) BiochemistQ 13,503-512
3. Modrich, P. (1979) Q.Reu. Biophys. 12,315-369
4. Rosenberg, J. M., Boyer, H. W., and Greene, P. J. (1980) in The
Site Specific Restriction Endonucleases(Chirikjian, J. G., ed),
Elsevier-North Holland, in press
5. Rosenberg, J. M., Dickerson, R. E., Greene, P. J., and Boyer, H.
W. (1978) J. Mol. Biol. 122,241-245
6. Betlach, M., Hershfield, V., Chow, L., Brown, W., Goodman, H.
M., and Boyer, H. W. (1976) Fed. Proc. 35, 2037-2043
7. Sanger, F., Nicklen, S., and Coulson, A. R.(1977) Proc. Natl.
Acad. ScL. U. S. A. 74, 5463-5467
A-_MPacinrr
€3.. Muller-Hill. B.. and Hofschneider.
D, .I
", Gronenborn.
" ~ .
Eco RI: Protein
Gene and
P. H. (1977) Proc. Natl. Acad. Sci. U. S. A . 74, 3642-3646
9. Gronenborn, B., and Messing, J . (1978) Nature 272, 375-377
10. Newman, A. K., Rubin, R. A,, Kim, S.-H., and Modrich, P. (1981)
J.Biol. Chem. 256,2131-2137
11. Goodman, H. M., Greene, P. J., Garfin, D. E., and Boyer, H. W.
(1977) in Nucleic Acid-Protein Recognition(Vogel, H. J., ed),
pp. 239-259, Academic Press, New York
12. Maxam. A. M., and Gilbert, W. (1977) Proc. Nu/[. Acad.Sei. U.
S. A . 74, 560-564
13. Bolivar,F.,Rodriguez,
K. L., Greene,P.J.,Betlach,M.
C.,
Heyneker, H. L., Boyer. H. W., Crosa, J. H., and Falkow. S.
(1977) Gene 2.95-113
Itakura, K., Hirose, T., Crea, R., Riggs, A. D., Heyneker, H. L.,
Bolivar, F., and Boyer, H. W. (1977) Science 198, 1056-1063
15. Backman, K. (1980) Gene 11, 169-171
16. Boyer,H. W., Greene,P. J., Meagher,R. B., Betlach,M. C.,
Russel, D., and Goodman, H. M. (1975) Fed. Eur. Biochem.
SOC.
Symp. (Budapest) 34, 23-37
17. Ptashne, M., Jeffrey, A., Johnson, A. D., Maurer, R.. Meyer. B.
J., Pabo, C.O., Roberts, T. M., and Sauer, R. T.(1980) Cell 19,
14.
1-11
18. Chou, P. Y., and Fasman, G. D. (1974) Biochemistry 13,211-222
19. Chou, P. Y., and Fasman, G. D. (1974) Biochemistry 13,222-245
20. Schulz, G. E., Barry, C. D., Friedman, J., Fasman, G. D., Chou, P.
Y., Finkelstein, A. V., Lim, V. I., Ptifsyn, 0. B., Kabat, E. A,,
Wu, T. T., Levitt,M.,Robson,B.,andNagano,
K. (1974)
Nature 250, 140-142
21. Matthews, B. W. (1975) Biochim. Biophys. Acta 405,442-451
22. Goppelt, M., Pingoud, A,, Maas, G., Moyer, H., Koster, H., and
Frank. H. (1980) Eur. J . Biochem. 104. 101-107
23. Rossman, M. G., Liljas, A,, Branden, C.'I., and Banaszak, L. J .
(1975) in The Enzymes (Boyer, P. D., ed) Vol. XI, pp. 61-102,
Academic Press, New York
Sequences
2151
0. C., Crothers, D. M., and Grolla, J. (1973) Nature New Biol
246,40-4 1
28. Falkow, S. (1975) in Infectious Multtple Drug Resis/ance (Lag
nado, J. R., ed), pp. 98-143, Pion Limited, London
29. Shapiro, H. S. (1970) CRC Handbook of Biochemistry Selecteu
Data For Molecular Biology (Sober, H. A., ed) 2nd Ed, pp
H80-H88, The Chemical Rubber Co., Cleveland
30. Post, L. E., and Nomura, M. (1980)J . Biol. Chem. 255,4660-466f
31. Farabaugh, P. J. (1978) Nature 274, 765-769
32. Buchel, D. E., Gronenborn, B., and Muller-Hill,B. (1980) Naturt
283, 541-545
33. Nichols, B. P., and Yanofsky, C. (1979) Proc. Natl. Acad. Sei.Zi
S. A . 76,5244-5248
34. Horii, T., Ogawa, T., and Ogawa, H.(1980) Proc. Nut[.Acad. SCL
U. S. A . 77,313-317
35. Sancar, A,, Stachelek, C., Konigsberg, W., and Hupp,W. D. (1980:
Proc. Natl. Acad.Sci. U. S. A . 77, 2611-2615
I. L., Takeishi,K., and Inouye
36. Nakamura, K., Pirtle, R. M., Pirtle,
M. (1980) J. BioL Chem. 255, 210-216
37. Siebenlist, U., Simpson, R. B., and Gilbert,
W. (1980) Cell 20,
269-281
38. Rosenberg, M., and Court, D. (1979) Annu. Reu. Genet. 13, 319353
39. Shine, J., and Dalgarno. L. (1974) Proc. Nutl. Acad.Sei. U. S. A
71, 1342-1346
40. Atkins. d . F. (1979) Nucleic Acids Res. 7, 1035-1041
41. Selker, E., and Yanofksy, C. (1979) J . Mol. Biol. 130, 135-143
W. (1976) Proc. Natl
42. Backman. K., Ptashne, M., and Gilbert,
Acad. Sci. U. S. A . 73,4174-4178
43. Greene, P. J., Betlach. M. C., Boyer, H. W., and Goodman, H. M
(1974) in DNA Replication (Methods in Moleculur B i o l o ~ g ~ ]
(Wickner, R. B., ed) Vol. 7, pp. 87-111, Marcel Dekker, Inc.
24. Ogata, R. T., and Gilbert, W. (1979) J . Mol. Biol. 132, 709-728
New York
25. Church, G . M., Sussman, J . L., and Kim, S.-H. (1977) Proc. Natl.
44. Lackey, D., and Linn, S. (1980) Methods h'nzymol. 65, 26-28
Acad.
Sci.
U. S. A . 74, 1458-1462
45. Berkner. K . L., and Folk. W. H. (1980) Method,%Enrvmol.65, 2826. Sanger,
F.,
and
Coulson,
A. H. (1978) FEBS
Le/t. 87, 107-110
36
are found on p. 2153
27. Tinoco, I., Borer, P. N., Dengler, D., Levine, M. D., Uhlenbeck, Additional references
2152
Eco RI: Gene and Protein Sequences
44
S E C O W M R I SIWCTURE PREOiCllOW
Eco RI: Gene and Protein Sequences
2153
TABLE iV
&RI
Secondary S t r u c f ~ r eP r e d i c t i o n s f o r
Number oPf r e d i c t e d
residues
struc
"B't u r e
Region
Le" 10
Val20
Ala 28
Gln5752
Glu 65
G l Y 79
I l e 94
6
-
-
-
d l u 103
I l e 119
Asp
GI" I 8
i33
Glu 152
Tyr 165
I l e 179
Arg 187
Leu 201
Tyr 209
L y r 221
l i e 210
I l e 250
Leu 261
Leu 270
-
Phe 24
Ser 47
Tyr
I l e 7)
Val 83
Asp6 99
-
-
Gln
Ile
Gly
Lyr
i.19
B
0
1.17
0.97
B
0.97
1.19
0.99
1.11
D
1.21
B
a
191
207
21)
226
240
258
267
277
B
0.89
B
B
I
AAA+
A.
2
B
1.11
C
5
B
0.88
1.21
A-
1.01
1.21
1.04
CC
C
B
5
B
7
5
6
G
I1
1.17
a
0.79
o.
i.06
1.09
5
B
B
B
8
0
9
i.11
1.01
1.15
3, 5
4, 5
B-
5
B
5
1.090
0.95
a
I
c
1.12
1.00
1.16
0.92
1.22
1.15
1.11
7
12
18)
1.14
1.24
D
Conmenf
Grade
1.16
0.96
5
IJ
I1
129
119
161
"a'
20
9
5
G I " 115
Gly
Ala
Phe
169
Leu
Arg
Leu
Ala
Ile
Lyr
0.95
o
9
Endonuclease
n
5
A
1.20
0.96
1.25
6
7
9
IO
c+
1.11
C
1.16
0.98
A
io, I1
12
B
Conmenfs on T a b l eI v
.
.
1.
I
2
4.
5
6
7
There tw elements o f recoodary Structure are very clear i nt h e i ri n d l v i d u a ip r e d i c flonr b u ta r ec r a d e dt o g e t h e r .
T h i sl o n g
h e l i x c o u l d be two h e h c e r , k i n k e d i n t h em l d d l e a t Gly 36.
P r e d i c t i o n i s v e r y ambiguoushere
~ i n c e<PG>=<PB>. The h e l i x was p r e f e r r e db e c a u s e o f
t h e r l i g h f l y h i g h e r p r o b a b i l i t y and because of the presenceof
G l u 96. a r e s i d u e
r a r e l yf o u n di n
B rheefr.
There i s a strong tendency towaTds B i n the m i d d l eo ft h i s
region.
The a h e l i x wa3
choren becaure o f the n u c l e u so u t s i d e
the ambiguousreqioo.
i . ~ .r e s i d u e s i i 0 to 115.
and becaure o f 1 f a f l 5 f i c d l p m f e r e n c e r :V a la n d
Leu a r e very o f t e nf o u n d near the
center o f ~1 h e l i c e s (there are r e s p o n s i b l ef o rt h ea m b i g u i t y ) ,
G l u i s a good N - t e r m i n a l
r e r i d u e . and L y r . H i s and G l n are the t h r e e mlt Cornon C-ferrnlnai Y h e l i c a lr e s i d u e s .
The secondary ~ f r u ~ f ~element5
r e
i n t h e r er e o i o n r
e r e a i i f f l e crowded. # . e t h e 1 0 0 ~ 1
are too r h o r f .
However, 8 f i s not c l e a r w h i i hp r e d l c f i o n ss h o u l d
be madsfled fa
r e i l e v e the c r a d i n g , hencethey
remain w i t h t h i s c a v e a t .
The o v e r l a p p i n gr e g i o n
Val166
G l u 177 also ha$ h i g h u pofenflal i<Pm>
I I1 and
<Pa> = 0 . 9 8 ) .T h i sZ o r e i b i l i f Y
was not acceDfedbecause
there was no o n u c l e a t i o n
center outride the arnbiguour r e g l o n (note that the p r e c e d i n gh e l i x cannot propagate
t h r o u g hP r o1 6 4 ) ,
the o v e r a l l B p o f e n f # a l was much higherandtheboundartes
were
c l e a r l y Chore o f a B sheet.
The cloxe p r o x i m i t y t o t h ep r e c e d i n g h e l i x i s a f u r t h e r
problemhere.
This prediction i s extremely ambiguous ~ i n c et h e regson Arg 187 t o G l u 192 ~ i v e r
<Pa>
1 . 1 1 . <Pg> = 1.07.
The method i s s a y i n g t h e r e i s s e c o n d a r ys t r u c t u r e
here, b u t
cannot d i s c r i m i n a t ew h i c hk i n d .
The B was chosen because i f has the h i g h e s tp r o b a b i l -
-
-
&I.
The p r e d l c f e dr e c o n d a r y
ifrucfure o f EcoRl endonuclease i sr e p r e s e n t e dr c h e m a t i c a l l y : Amlna acid r e r i d u e ir e p r e s e n t e d
by a d l x a r e p a r t of a p r e d i c t e d Y h e l i x ,t h o s e
represented by 8rregular (folded)hexagons
are p a r t o f a r f r a n do f
B sheet, while r e s i d u e s
which a r e n o t a p a r t o f a d e f i n e d secondary s.frucfuie are represented a s ~ i r c l e s . The
r h a d l n g scheme reprelentsthechernlcalcharacterof
the amino a c i dr e s i d u e :H e a v i l y
shaded
r e r l d u e r are hydrophobic. hence usually found i n t h e i n t e r i o r o f a p r ~ f e l n( , . e . , Phe. Trp,
T y r , l l e , Leu. Her and V a l ) . Charged residues, which are " I u a l i y found ov t h e e x t e r i o ro f
a r c b l a n k .e x c e p t
forthe
~ i g no f t h ec h a r g e( i . e . .
G l u and Asp are minus Lyr
t h eP r o t e i n
andArg
a r e p l u s w h i l eH i s
i s blank).
Thore r e s i d u e sw h i c h a w m m o n l v found i n u s r l e d
-
i t y score.
8
9
10
/I
I2
T h i sr e g i o n has a more d i f f u s e o n u c l e u ~than u s u a l l y r e q u i r c d ,b u t
s f c l e a r l y has
h i g h 0 p o t e n t i a l( c o m p o s i t i o n b i z ) . Hence the p r e d i c t i o n ,b u tw i t h
a low grade.
There a r e fw p r o b l e mh e r e . one 8 1 the presence ofPro211,which
i s rare in B rheefr,
t h e ofher i s the crowding w i t h the previous p r e d l c f i o n .
Hence the
low
grade.
an OL nucleus i n t h e
The r e g i o nf r o m
221 t o 240 c o n t a i n e d many a m b i g u i f i e r .T h e r ei s
221to226
region and a B n u e i e s s hefween 237 and 240. Both c o u l d propagate i n t o t h e
middle. The h e l i xp r o p a g a t i o n was stopped a t Lys 226 becaulethe
r e g i o n f r m Arn 224
to Ser 228 i s a ~ O n f l n u o ~sst r e t c h o f s i x h y d r o p h i l i c r e l l d u e r , f o u r o f w h i c h
am
one t u r n o fh e l i xw o u l d
be ronfincharged, i . e . i t was c o n r i d e r e du n h k e l yt h a to v e r
u o u ~ l ycharged.
The regmnf I l e 230 t o Ala215 c o n t a r n 3 b o t h Y and B n u c l e < . b u t t h e l a t t e r i s $ f a t # $ tically dominant (<Pa> = 1.23, <PO>
1.14). There 8 1 d l 3 0 the p r e v i o u s l y noted B
n ~ c l e u1217-240).
~
The boundaryresidues
are much _ r e conrirfenf W i t h a B sheet.
Her 251and Phe 256(<Pa>
The region has b o t h a and B p o t e n t i a l .e i p e c i a l l yb e t w e e n
i.20
<P > = 1.29).
However, B c l e a r l y dominafer the I f a f l ~ f i C I . The region I l e 254
to /;e2%has
an unambiguous B n u ~ l e u s(<Po> * 1.06, <Pg>
1 . 1 9 ) . The boundary
r e s i d u e sa r e c l e a r l y t h o s eo f a 8 sheet.
Hence t h i s p r e d i c t i o n .
-
-
-
TABLE V
Secondary S t r u c t u r e P r e d i c t i o n s f o r
Re4 i on
2
7
Le"
Phe
6
lie
41
Val
62
Phe
Leu
Phe
Aen
Leu
Ser
Asp
Pro
Phe
Glu
5 Ile
Val
Pro
Cyr
Leu
276 Val
Pro
Tyr
Number of
rerlduer
9 - t y r 17
21 - Cy$ 26
28 - Gln 31
- Cyr 48
56 - Val
72 - Val 78
84 - 90
Ala
92 - Tyr 96
106
I l e Ill
127 - Leu 13i
135
5
139
Thr
142
L y r 155
160
Ala 164
I71
Glu 180
187
181 Val
i91 - Val 198
199 - Leu 204
224 - I l e 214
241 - Phe 246
- Leu 284
289 - Lyr 299
314 - 6I l e 119
structure
9
6
e
0.87
B
B
1.25
0
6
7
7
5
-
k R 1 Methylase
Predicted
B0.72
B
B
1.37
8
6
5
i4
5
B1.13
8
B
B
6
6
I1
6
9
"m>
"B>
1.16
0.87
0.85
1.22
0.86
1.09
1.01
1.21
8A
1.11
A
1.21
C
C
1
1.13
0.81
c
5
1.09
1.23
B
C-
1.09
1.02
1.11
1.06
1.06
1.01
1.18
I. i 8
1.03
0.95
0.96
1.29
1.14
1.29
0.71
1.21
B
1.10
B
6
1.07
0.96
0.99
1.12
1.10
0.92
8
0.98
I 18
I1
Grade
Commnf
I
c
4
6
C
A
C
00B888CB
AA
7
8
8
9
10
C o m n f 3 on Table v
I. The p r e d i c t i o n i s quite c o n s i s t e n t , b u t the confinuoui s t r e t c h o f h y d r o p h i l i c
and
c h a r g e dr e r i d u e r .w h i c hw o u l d
wrap around the e n t i r e c i r c u m f e r e n c eo ft h e
helix. i s a
2.
1.
5.
6.
7.
8.
9.
10.
The o v e r l a p p i n gr e g i o n
Leu 12 - Arg 36 i s a B n u c i e u ~(<Pa> = 0.90,<Pa>
= 1.19).
T h i s was r e j e c r e db e c a u s e the m p o f e n f i a i wa3 gomewhat h l g h e t and theaminoacids
on
the C terminus are c o m n i y f o u n da f t e r n helices.
Af u r t h e rp r o b l e m
i~thecrowding
w i t h the p r e u i o u rp r e d i c t i o n .
The r e g i o n a l s o ha5 r i g n i f l c a n f 0 p o f e n f l a l . however t h e B c l e a r l y dominafer.
Ala 74 i s an 0 n Y C l e u I (<P.>
1 . 2 1 , <P > = 1.08);
The o v e r l a p p i n gr e g i o n
Leu 69
however. the B o r e d i c t i o n was baled on t h e exirtence o f a c l e a r B nucIe!$ o u t r i d e t h e
ambigu&sregion.
The o nucleus was not complete, however t h e region c l e a r l y has sfrang a p o t e n t i a l .
T h i lp r e d i c t i o ni s
fenuouS because the two Asp r e s i d u e si n ~ ~ C C e l l i own u i d p r o j e c t out
from bath r i d e r o f B s t r a n d - f o r c i n g one t a a r d r the i n t e r i o r o f the p r o f e s n .
The r e g i o n fromLeu
I 5 0t oL y r
155 i s ambiguousandhas
b o t h u and B p o f e n f r a l .
The G
a r r i g n m n t WaI bared On the strong D n u c l e u ~between Phe 1 4 ) and Glu 148 ((Po>
1.12.
<Pa>
0.94).
One o ft h e
two ~ o o t i g u o u eN - L e r m i n a i p r o l i n e r was assigned t o the h e l i x
and one t o t h eN - t e r m i n a ln o n - h e l i c a l
region _ r e or l e s s a r b i t r a r i l y .
Each o f t h e w p r e d i c t i o n s Ieemr acceptable, b u r t h e f a c t t h a t they a r e i m e d i a f e l y
a d j a c e n tr e n d e r st h ep r e d i c t i o n s
somewhat f e n u o u ~ .
The r e g i o nh a sc o n r i d e r a b l e
G p t e n f i a l b u t no o n u ~ l e u ~ .
The a d j a c e n tA r g 243
L y r 244 renders t h i s p r e d i c t i o n very f e n u o u ~
-
-
-
appear a f t e r theparentpaper.
Note:References1-45
46.
Greene.
nrnhlrn
I." . .
A.A.,
P.J.,Heyneker.
Backman, K . ,
B o l ~ v a r . F . , Rodriguez.
R.L.,
H.L.,
R u s l e l , D . J .T
. aif.
R..
Beflach. H . C . .
and Boyer. H.Y.
Covarrubisr.
11978) N u c l e i cA c i d r
Res.
2, 2373-2180
van de Sande,
Panel, A,,
48.
Marinur. M . G .
49.
H a n i a t i r . 1. 11975) B i o c h e m i s t r y
50.
Bolivar.
51
52.
HcDanell. H.Y., Simon. M.N., and S f u d i e r . F . Y . (1977) I.Mol. n i o l . 110. 119-146
Rogers, S.G., and
Weirs,
8. (1980) Method5 Enzymol. 65, 201-216
51.
Smith.
54.
Hermann, R . . Neugebauer.
J.R..andKleppe,
S.H.,
Zoerwen, P.C.,
H.G., Raae, A . J . L, i l l e h a u g .
47.
(1973) Biochemistry%.
K.
(1971) Mol. Gen. Genet.
x,
Khorana.
5045-5050
47-55
9,
1787-1794
F.. and Backman, K. (1979)
Methods
Enzymol.
A.J.H.
Genet.
u.
Givol.
o.,
56.
Ararnatorio.
57.
Balley,
58.
Greene, P.J..Poonian,
245-267
Enrymol. 65, 560-580
(1980)
Methods
55.
68.
K..
Pirki, E.,
Zentgraf,
H.,
and Schaller.
H.
11980)
w. =.
231-242
and P o r t e r . R . R .
O.K.,
Packer.J..
J.L.(1967)
In
11965) Blochem. J.
2.32c
and Brown, W.E.
(1980)
Anal.
Techniquer -~
i n Protein Chemilfry,
Blochem. 103.
150-158
2nd I d s f i o n , pp.
140-352,
E i r e v i e r ,h 3 t e r d a m
Goodman. H.M.
59.
11975)
Herrhfieid, V.,
~
Bacferiol.
126,
447-451
"
N.S..
Nurrbaum, A.L..
T o h i a s ,L . .G a r f i n .
_?.u.K.z.
237-261
Boyer, H.Y.,
Char. L..
and
H e l i n r k i . 0.
(19761
D.E.,
1.
Boyer, H.Y.,
and