* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Sequence Analysis of the DNA Encoding the Eco RI Endonuclease
Transformation (genetics) wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Transposable element wikipedia , lookup
Metalloprotein wikipedia , lookup
Molecular ecology wikipedia , lookup
SNP genotyping wikipedia , lookup
Gene expression wikipedia , lookup
Amino acid synthesis wikipedia , lookup
DNA supercoil wikipedia , lookup
Multilocus sequence typing wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Molecular cloning wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Genomic library wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Restriction enzyme wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genetic code wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Biosynthesis wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Community fingerprinting wikipedia , lookup
THEJOURNAL OF BIOLOGICAL CHEMISTRY Vol. 25fi.No. 5, Issue of March 10. pp. 2143-2153.19RI Printed Ln U S A . Sequence Analysis of the DNA Encoding the Eco RI Endonuclease and Methylase* (Received for publication, October 22, 1980) Patricia J. GreeneS, Madhu Gupta, and HerbertW. Boyer From the Howard Hughes Medical Institute and the Department of Biochemistry and Biophysics, University of California, San Francisco, Sun Francisco, California 94143 William E. Brown From the Depurtment of Biological Sriences, Curnrgic-Mellon lintrvrstty. Prttrhurgh, Prnnsvfrwniu 15213 John M. Rosenbergg From the Department of Biological Suences, Unioersity of Pittsburgh, Pittsburgh. Pennsyltaniu 1.5260 acid sequences The Eco R1 endonuclease and methylase recognize RI endonuclease and methylase and the amino the same hexanucleotide substrate sequence. We have of both enzymes deduced from the DNA sequence. determined the sequenceof a fragment of DNA which The source of DNA for this analysis is the plasmid, pMB1, encodestheseenzymesusingthechain-termination and derivatives constructedby recombination in zlitro. pMBl method of Sanger (Sanger,F., Nicklen, S., and Coulson, was obtained from the original clinical strain of Escherichia A. R. (1977)h c . Natl. Acad. Sci. U. S. A. 74,5463-5467). coli found to have the Eco RI host specificity. It is closely The amino acid sequences of both enzymes were de- related to ColE1. Both plasmids encode colicin immunity and rived from the DNA sequence. The coding regions seproduction and have identicalreplication properties. pMBl is lected include the only open translational frames of larger by approximately two kilobases, the amount required sufficient length to accommodate the enzymes. They to accommodate the Eco RI genes. Examination of heterocoincide with previously established gene boundaries duplexes of pMBl with ColEl by electron microscopy revealed and orientation. The predicted amino acid sequences no mismatched regionsexcept theEcoRI genes ( 6 ) . The of the purified protein. correlate well with analyses approximate boundariesof the EcoRI genes have been deterComparison of the nucleotide and protein sequences reveals no homology between the endonuclease and mined by subcloning restriction fragments of pMB1, and the methylase which might provide insight into the origin direction of transcription has been shown to be endonuclease to methylase ( 4 ) (Fig. 1). o f the restriction-modification system or the mechaT h e 2234-base pair DNA sequencepresentedhere was nism of common substrate recognition. Based on secdetermined by the dideoxyribonucleotide chain-termination ondarystructure predictions, thetwo enzymesalso method of Sanger (7). Single-stranded template DNA was have grossly different molecular architecture. The base obtained by cloning the Eco RI genes in M13mp5, a single composition of the sequence is 65% A + T, and the that ob- strand bacteriophage vectordeveloped by Messing and hiscocodon usage is significantlydifferentfrom served in several Escherichia coli chromosomal genes. workers (8, 9). In some cases, frequently selected codons are recogIn the accompanying paper, Newman et al. (10) report a nized by minor tRNA species. A spontaneous mutation sequence analysis of the Eco RI genes contained in pMB4. a in the endonuclease gene was isolated. Serine replaces derivative of pMBl which determines ampicillin resistance arginine at residue 187. In crude extracts, Eco RI spe- but not colicin production.These twoplasmids have been cific cleavage is-0.3% wild type. maintained as separate laboratorylines for approximately 10 years (6, 11). TheDNA sequence in the accompanying paper was determined by the method of Maxam and Gilbert (12). The two DNA sequences, obtained in different laboratories TheEco RI restrictionendonucleaseand modification methylase,togetherwiththeirDNAsubstrate, provide a using differentplasmids assources of DNA and different model system for probing the molecular mechanisms of se- methods of DNA sequencing, are identical in both thecoding and noncoding regions. We are, therefore, confident that the quence-specific DNA-protein interactions. The two enzymes (1, nucleotide sequence and the predicted amino acid sequences recognize the same nucleotide sequence d(-GAATTC-) 2) but differ in several aspectsof interaction with the substrate are correct. (see Ref. 3 for a recent review). The endonuclease and methMATERIALS A N D METHODS’ ylase have been purified to homogeneity andx-ray diffraction Figs. 1 and 2 depict the plasmids and phage which carry the E C O ( 4 , 5 ) .We HI genes from pMR1. analysis of the crystalline endonuclease is underway report here the sequenceof the DNA which encodes the Eco . _ _ _ _ _ _ _ _ _ _ ~ _” _________” * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduertisernent” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. Recipient of Grant GM 25729-03 from the National Institutes of Health. 8 Recipient of Grant GM 25671 from the National Institutes of Health. * ’ Portions of this paper (including “Materials and Methods,” Figs. 6 to 8, Tables IV and V, and additional references) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid ofa standard magnifying glass. Full size photocopies are available from the Journal of Biological Chemistry, 9650 Rockville Pike, Bethesda, Md. 20014. Request Document No. $OM-2241, cite author(s), and include a check or money order for $7.60 per set of photocopies. Full size photocopies are also included in themicrofilm edition of the Journal that is availablefrom Waverly Press. 2143 Eco RI: Gene and Protein Sequences 2144 Sal1 SEQUENCING STRATEGY EcoRII ? pPG30 0 Hindm e 0 EcoRlI EcoRI HincII FIG. 1. Eco RI-containing plasmids. The approximate boundaries of the Eco KI genes were determined by subcloning restriction fragments of pMBl (4, 6, 13). The 2300-base pair Eco RII fragment indicated by the inner heaty line contains all of the information necessary for normal expression of both genes. The 1280-base pair PstI-Hind111 fragment indicated by the outer heavy line contains the methylase gene. pPG30 and pPG31 contain the 2300-base pair E ~ C KIIfragmentinserted into the Eco HI site of pBH20 (a pBR322 derivative) (14)in opposite orientations relative to the lac promoter. The arrouw indicate the direction of transcription from the lac, tet, and amp promoters. pPG30 and pPG31 were constructed by filling in the Eco HI ends of the vehicle and theEco RII ends of the fragment using T4 DNA polymerase. The resulting blunt-ended fragmentswere joined using T4 DNA ligase (4). The direction of transcription of the Eco KI genes was determined by measuring the effect of the lac promoter on the specific activity of the Eco RI enzymes in strains containing these plasmids. In strains containing pPG31, the endonuclease and methylase levels are both elevated by growth in glycerol Enand by the addition of isopropyl-1-thio-P-D-galactopyranoside. zyme levels are not affected by these growth conditions in pPG30 or pMB1. ENDO, endonuclease; METH, methylase. J M13-RI recombinant phage were used to prepare template DNA. A HindIII fragment which spans the Eco HI genes was isolated from a partial digest of pPG30 and inserted into the HindIII of site M13mp5 (Fig. 2 4 ) . Isolateswith each of theorientations of thefragment relative to the phage DNA were obtained to provide templates of both strands of the Eco HI genes. Opposite orientations were identified by assessing the ability of phage DNA from two M13-KI isolates to form double-stranded DNA in the region of interest. M13-KI recombinant phageareunstable, eliminating at high frequencya segment of DNA approximately the same size as the insert encoding the Eco RI enzymes. In spite of this problem, we were able to obtain high quality templatefor sequence analysis. The cloning experiments, analysis of therecombinant phage, andtemplatepreparationare described in the miniprint. Primers wereisolated from restriction endonucleasedigests of pMBl and pPG30 DNA as described in the miniprint. A synthetic primer, 5' p-C-C-C-A-G-T-C-A-C-G-A-C-G-T-T-OH3', which hybridizes to M13mp5 adjacent to the cloning site, was a gift from Dr. Roberto Crea (Genentech, Inc.). The hybridization site of this sequence is shown in Fig. 2 A . Restriction sites used to prepare primers and a summary of sequencing experiments are shown in Fig. 3. The absolute orientation of generated sequence was deduced by using primers with different restrictionsites at theends. For example, when the smaller HindIII-Pst I fragment was used as a primer with one template orientation, the reaction mixtures were divided in half and digested with either Pst I or H i n d I I . Readable sequence was obtained withonly oneset of cleavage products.Whensequence could be read following Pst I cleavage, the orientation, designated a, corresponded to the direction of transcription of the Eco RI genes. When sequence could be read following HindIII cleavage, the ori- ( A) EcoRI(Hlnf1) i/ ECORIl Hmdm EcoRII HindIII Tat t Htndm ECORI P51r - ....\. pPG30 ym- +METH ENDO t OP L a c - H l n d Ill -+ az Z t O P Lac M E TEHN D O I +MEEN D O LOCPO4Z TH toc E c oERcI o R I I pMG31 - Hindm ... . Lac P O + PstI HindlII 4 T eMt E N TH DO EcoRI EcoRlI pMG31-6 Hlndlll Hlndm I PstI -,,=,,, v Lac P O -+ + ENDO FIG. 2. Recombinant plasmids and phage carrying the Eco RI genes from pMB1. The distances between restriction sites near the end of the Eco RII fragment from pMBl are exaggerated to show details of gene transfer. The endonuclease and methylase genes are indicated by h e a y lines. Arrows show the direction of transcription HI promoters. DNA originatingfrom from the lac, tet,andEco pBH20, the vehicle used to construct pPG30 and pPG31, is indicated by hatched lines. ENDO, endonuclease; METH, methylase. A , derivation of MIS-RI hybrid phage: M13mp5, designated by w a ~ ylines, contains a HindIII site in the DNA encoding the lacZ a peptide (8, 9). The HindIII fragment from pPGSO, which was inserted into this METH " + Tet site, lacks approximately 40 nucleotides from the methylase end of the Eco HI1 fragment of pMBl and contains 34 nucleotidesfrom pBH20. The two orientations of the Eco RI genes relative to theM13 are designated a and b. B, derivation of pMG31 and pMG31-6: pMG31 is analogous to pPG31 except for deletion of a HindIII fragment of approximately 70 base pairs. pMG31-6 was constructed by isolating the 1685-base pair HindIII fragment from MI3-RI6a R F DNA and inserting it back intopPG31. MI3-RI6a contains a spontaneous mutation in the endonuclease gene marked (V in M13-RI6a and pMG316. 2145 Eco RI: Gene and Protein Sequences A t Hlnd I I I Hlnd JII EcoRIJ Pst I Avo I Eco R I I HindDl 1 II Y II Sau3AI I I FnuD 1 I 342 1 I 1 1 ENDONUCLEASE I I 0 500 1 1171 1205 I 1000 I I 1 METHYLASE I I500 2 0 0 PAIRS 0 BASE FIG. 3. Summary of sequencing strategy. Nucleotide number- above. The Taq I site indicatedby a dotted line overlaps a methylated Mbo I site andis not cleavedby Taq I (15).B , sequencing experiments. ing starts with the Eco RII siteof pMB1. A , restriction sites used to The start of a DNA segment is indicated by 0 and a letter which make primers. The HindIII site at the left marked t occurs only in designatedtherestrictionsite: A , Aua I; F, FnuDII (Tha I); H, pPG30, and the DNA between the HindIII and Eco RII sites origiHindIII; h, H i n f f ;S, Sau3AI (Mbo I); T,Taq I; Syn, synthetic. The nates from pBHZ0 (via pPG30). pMBl DNA was digested with Eco HII, and the 2300-base pair fragment carrying the Eco RI genes was length of the solid line corresponds to the length of sequence read. off prior to electrophoresis. DNA segments isolated. This was further digested with the following enzymes either Long primers were cleaved V. Several primers were read from cleavage points are designated singly or in combination: H i n f f ,Taq I, FnuDII, and Snu3aI. pPG30 A L JI,~and PstI; and the four digested with Exonuclease111; * designates sequence read from these, DNA was triply digested with HindIII, and the dotted line indicates the distance from the 5' end of the fragments which span the Eco RI genes were isolated. These were primer that readable sequence began. used directly as primers or further digested with the enzymes listed entation was designatedb. These assignments were further confirmed dodecyl sulfate gel electrophoresis (16). The size of the methby use of the synthetic primer. ylase predicted from the nucleotide sequence is 38,050 daltons RESULTS AND DISCUSSION We havedeterminedthesequence of a 2234-nucleotide DNA fragment from pMBl by the sequencing strategy outlined. The nucleotide sequence together with the amino acid sequences of the Eco RI endonuclease and methylase are presented in Fig. 4. Most of the DNA sequence was determined onbothtemplates with morethanone primer. In sections where only one primer was available, or only one strand was used as template, reactions were performed more than once with the sameprimer. The sequence in lower case letters represents DNA from the M13 cloning vector and from pBH2O (via pPG3O). The numbering of the nucleotidesequence begins with DNA originating from pMB1. The Amino Acid Sequences Agreement of Predicted Sequenceswith Experimental Obseroations-The methionines selected as startcodons for the endonuclease and methylase precede the only open-reading frames of thecorrectlengthtoaccommodatethesubunit molecular weights of the two proteins. The size of the endonuclease predicted from the nucleotide sequence is 31,065 daltons compared with 29,500 daltons measured by sodium compared with 36,000 daltons measured by sodium dodecyl sulfate gel electrophoresis (16).Other potentialreading frames in either orientation have frequent translational stop codons with no open translational frame starting from an initiation codon specifying a peptide over 49 amino acids. The direction of transcription and translation corresponds to the direction predicted from measuring the effect of the lac promoter on levels of Eco RI endonuclease and methylase in strains containing pPG3O and pPG31. The boundaries of the translated segments fall within the gene boundaries established by subcloning experiments (Fig. 1) (4, 6, 13). Theamino acid sequences derivedfrom the nucleotide sequence correlatewell with available data derived from analyses of the proteins. NH?-terminalsequence analysis by Hsiang' confirms the startpoint selected forthe endonuclease. There is reasonable agreement between previously published amino acid compositions of the endonuclease and methylase and the amino acid compositions predicted from the nucleotide sequence (Table I). Further substantiation of the predicted amino acid sequence of the endonuclease was accornplished by comparing the composition of peptides obtained ' M. Hsiang, personal communication. Eco RI: Gene and Protein Sequences 2146 TTTCGGGTGGmGlTGCCATTTTTACCTGTCT~TGCCGTGATCGCGATGAAC~G~TTAGC~GTGCGTACYTTAAGGGATTATGGT~TCAAACGTATGTTAATCTATCGACATATGT 308 10 S e rP h eA r gT y rA r gA s pS e r TCA TTT CGA TAT AGA GAT AGT ATA AAG ACA I l e L y sL y sT h r 90 Asn S eSr e r I l e L yPsrA o sG p lG y ly AAT TCc AGC A W AAA CCT GAT GGT GG: G 1 U I l e Asn G l u A l aL e uL y sL y s 11.2 A s pp r o ASP ~ e G u l yG l y" h i GAA ATA M T GAA GCT TTA MA AAA ATT GAC CCT GAT CTT GGC GGT ACT TTA 100 I l e V a l G 1 U Val L yAs s p ASP ~ y r ATT GTA GAG AAA GAT GAT TAT GGT 110 ~ l y ~ l vua l val ~ r L~~ p v a l ~1~ GAA TGG AGA GTT GTA CTT GTT GCT GAA GCC ~ e pu h e v a l TTT GTT TCA ser ~1~L~~ ~1~ y 593 CAC CAA 686 12 0 G l y LyS Asp I l e I l e Asn I l e Arg Asn G l y LeuLeu GGT AAA GAT ATT ATA AAT ATA AGG AAT GGT 150 H i s LyS Asn I l e S e r G 1 U I l e A l a AsnPhe :AT AAG AAT ATA TCA GAG A GT C!G 130 140 Val G l yL y sA r gG l y Asp G l nA s pL e u Met Ala la G l y ~ s nla 11e ~ l u S~e r r g T X TTAGTT GGG AAA AGA GGA GAT CAA GAT TTA A X GCT GCT GGT AAT GCT ATC GAA AGA TCT 179 160 Met Leu S e r G l u S e r H i s P h eP r oT y r AAT TTT :X LBO CT(: TCT GAG AGC CAC 210 220 T y rG l y Met P r o I l e Asn S e r AsnLeuCys I l e Asn L y sP h eV a lA s n TAT GFA ATG CCT ATA AAT AGT U T CTA TGT ATT AAC AAA TTT GTA AAT 240 G 1 U T r p ASP S e rL y s AGG GAG TGG FAT TCG A M Phe G i u LST L Gyeh elsru n r TTT GAA CAG CTT ACA TCT AAG 2 50 I l e Met P h eG l u AT A s pA r gL e uT h r Ala A l aA s n GAT CGA CTA ACT GCA GCT AAT 965 His L y sA s p CAT AAA GAC AAA AGC ATT ATG CTA CAA GCA GCA , I l e S e rT h rT h r TlT FAT ATA TCA ACG ACT 10 AlaLeu Leu Lys Asn Thr AGA AAT GCA ACA AAC Met Ala Asn Arg ATG GCT AAG AAC GGA H i s P h eS e rA s p TTC TCT GAT 1435 100 T y rG l uT y r H i s L y sG l u Asn G l yL y s GAA TAT CAT M A GAA AAT GGA AAG TTT TAC TAT 130 I l e AspLeuLeuLysLysSerAsp AAA TCA GAT GfT CTG CTA I l e L y sT y rA s pL y sL y sP h eL e u AAG TAT GAT RAG TTC I l e V a lV a lT h r $TT GTT GTT ACG ATT GCT AAT GTT AAT TCA ATA ACA 200 " A : I l e V a lP r oG l u His GGA TTT ATT GTT CfA GAG CAT 220 G 1 U Ala Arg I l e A s pS e rA s nG l y GAG GCG AGA ATT GAT TCT AAT GGT 2 50 240 P h e G l y Asn G l uS e r S e r T y rP r oL y sT y rA s pA s n Leu P r o Leu T h rA r gL y sT y r P h e I l e Arg His L y sA s p AAA TAT GAT AAT CCA AAT AGT TCA TAT TTT ATT AGG CAT AAA GAC TTG CCT CTT ACA AGA M A TAT GGG GY 1807 2 30 AsnArg I l e I l e S e r P r o AsnAsnCysLeuTrpLeuThrAsnLeuAspVal Y T AGA ATT ATC TCG CCA AAC AAC TGC TTA I*;G CTA ACT TTT 1621 1714 190 2 10 1528 I l e I l e Ala Asn Val Asn S e r I l e T h r T y rL y sG l uV a lP h eA s n Leu I l e L y sG l u Asn L y s I l e T r p Leu G l y V a l His L e uG l yA r gG l yV a lS e rG l yP h e TAT AAA GAG GTG TTT AAT CTA ATT F G GAA AAT ATT TGG CTT YGG GTT CAT CTC GGG AGA GGT GTT TCT Y C CTA GAT GTC 260 TyT Asp Ala I l e Asn V a l TAT G T: GCT ATA AAT GTA 1900 1993 280 2 70 2200 Val P h e LysLeu AAA TTA GTT TTT 170 CTT AT: 180 LY s AAG AA$ 160 Asn P r oP r oP h eS e rL e uP h eA r gG l uT y rL e uA s pG l nL e u AAT CCT CCA TTC TCG TTA TTT AGA GAG TAT CTT GAT CAA CTA ATT Asp I l e P r o Asn L y sT h rL y s GAT ATT CCA AAC AAA tCA AAG 300 L y sG l y Val I l e L y sP h eA r g ATA AAA T l T AGA M G GGT GTT 1249 Asp AsnLeuGlyLeuLysLysLeu I l e Ala S e rC y sT Y K TTT GAT AAT CTT GGC TTG AAA AAG TTA ATA GCA TCT TGC TAT T y rA r gG l u TAC AGA GAG CAC 150 T y r G1u L e uT y rG l yT h r TAT GAA TTA TAT GGT ACT AAA AAA H i s L y s Ala Lys Lys AAG TTA CTG CAC AAA GCT 1342 110 1 20 A s pA s p I l e Ser V a lS e rS e rP h eC y sG l yA s pG l yA s pP h eA r gS e rS e rG l uS e r GAT GAT ATT AGT GTT TCT TCT TGT GGC GfT GGC GAT TTf CGC AGT TCG FAG AGC ATT 140 1151 L y sV a l Val T y rC y s GTT GTT TAT TGC 90 Ala L y s Asn G l yP h eT y r L y sG l uG l yP h e Ser S e rS e rG l uA l a GTA GAG AAT AAA GAA GGT TTT TCT AGT AGC GAA GCC GCG T X 70 TTT AAA TAT TTT GCA GTG AAT Val G l uA s n 1058 40 50 60 Asp Asp P r o Arg Val Ser A s nP h eP h eL y sT y rP h eA l aV a lA s nP h e 80 ACT SeK L e uA r gV a lL e uG l yA r gA s pL e u TCG CTC AGA GTG TTG GGG CGT GAC 30 AGA GTA AGC AAT TTC TCT ATA TAT 2 70 I l e Met PheAsp 1200 TGATATTTTTTATTTTAATAAGGTTTTAATTA 20 230 ~ y Ss e r I l e Met Leu G l n A l a Ala S e r 11e T y r T h r S e rL y sS e r ASP G l uP h eT y r T h rG l nT y rC y s Asp I l e G l u Asn G l u L e uG l nT y r TCG AAA AGC GAC GAA TTT TAC ACT CAG TAT Y T GAT ATT GAG AAC GAA CTG CAA TAC AsnCys AAT TGT GAT GAT CCT 872 260 ATG TTT GAA ATA A% ,"_" A A : l Gu l y S e r Asn P h e ~ e u T h rG l u TTT TTA GGG TCT ACA $AA 190 200 V a l AsnLeu G l u T y r Asn Ser G l y I l e Leu Asn Arg GTT GTT AAT CTT GAG TAT AAT TCT GGT ATA TTA AAT AGG TTA Asn I l e S e r I l e T h rA r gP r oA s pG l yA r gV a l AAT ATC TCA ATA ACA AGA CCA GAT GGA AGG G l nG l yA s pG l yA r g CAA GGA GAT GGF 170 Leu p h e Leu ~ TTT CCT TAC VGTCa l FTT TM: TTA GAG Leu AspTyrAsnGly TTA GAT TAC AAT 290 Val Met G l yV a l P r o I l e T h rP h e GGG GTT ATG GGG GTTCCT TTC ATC ACA 310 A s pG l uL y sA s pL e u GAT GAA AAA GAT TTG Ser I l e A s nG l yL y s TCT ATA C y s P r oT y r CCT TAT Y T GGT AAA TG: Leu H i s Lys PheAsn P r o G l u G l nP h e TTG CAT AAG TTT AAC CCT GAG CAA TTT G l uL e u GAG TTA 32 0 L e uG l n P h e Arg I l e Leu I l e L y s Asn LysArg TTA CAA TTC TTG ATA AAA AAC EGA :GA ATT e - 2080 2179 TAATTGATETTTGTTAG~TTTcTTG$GATCATTAGf?T FIG. 4. Nucleotide sequence of the DNA cloned in M13mp5 loss of onehasepair.Nucleotidenumberingstartswithsequence and amino acid sequences of the Eco R1 endonuclease and originating from pMBl (upper m s e letters). Aminoacidnumbering methylase. Sequence originating from M13mp5 and pBH2O (via starts with the initial methionine of each protein. Homology with the pPG30) is depicted in l o u w case lettem. T h e boundaries are marked 3' end of 16 S ribosomal RNA is indicated with tuaty underlines. with I . Two Eco HI sites precede the HindIII site of M13mp5. The Inverted repeats discussed in the text are underlined with arrou1.s. Kco HI site expected in pPG30 has been converted to a Hinfl site by Eco RI: Gene and Protein Sequences 2147 predicted amino acidsequences. They find t h a t t h e N H r in purified terminal alanine as well as the methionine is absent methylase. Search for SequenceHomologies-Since the endonuclease Methylase Endonuclease and methylase recognize a common substrate sequence, and Predicted Predicted the presence of the endonuclease without a functional methfrom nucleofrom nucleo- Reported,z tide setide seylase is apparently lethal, Boyer et af. (16) postulated that quence" quence" the two enzymes might haveevolved from a commonancestral Ala 15 15.6 IO 9.8 gene and suggested that sequence homology would be found Arg 14 12.9 13 12.0 between the two proteins in regions involved in site recogni23 Asp 61.1 40.0 tion. Subsequently, physical characterization of the enzymes 38 Asn and studies on their reaction mechanisms have revealed excvs 1 1.8 7 6.5 Glu tensive differences in the way the two enzymes recognize the 25.7 26 29.0 Gln common substrate sequence (3,4). Comparison of the nucleoGlY 21 16.8 17 17.8 tide and amino acid sequences of these two enzymes reveals 7 5.6 His 6 6.6 no extensive regions of sequence homology, and it now seems 23 19.5 23 21.9 Ile likely that recognition of the same DNA sequence by these 28 26.0 28.2 Leu 27 two enzymes represents convergent evolution from unrelated LYS 22 21.1 35 30.4 1 0.9 Met 6 4.7 precursors. This is not the only example of two nonhomolo23 29.7 11 16.2 Phe gous proteins recognizing the identical DNA sequence. The 1 0.9 Pro 6 4.7 products of the cI and cro genes of phage h bind to the same 23 29.7 24 18.9 Ser set of operatorsequencesbut show no obvioussequence 9 8.4 10 9.9 Thr homology (17). 2 2 Try' The amino acid sequences of the endonuclease and methTYr 8 7.6 21 17.9 Val 16 15.5 19 16.5 ylase do containa number of common tri- anddipeptides, and Newman et al. (10) have identified a region with five out of " NH,-terminal methionine not included. Reported mole per cent values (16) are multiplied by 276 residues nine homologous amino acids. Solution of the tertiary strucfor the endonuclease and 325 for the methylase, the totals obtained ture will reveal whether any of these have functional homolfrom the predicted sequence excluding the NH!-terminal methionine. ogy. ' Tryptophan was not determined. Secondary Structure Predictions-The translated sequences of the proteinswere analyzed for expected secondary TABLE I1 structure using the method of Chou and Fasman (18, 19). Amino acid composition of cyanogen bromide peptides Newman et af. (10) present similar secondary structure prePeptide positions dictions based on their identical sequences. While their pre138-157 252-255 256-277 Amino acids dictions differ from ours in detail, the generalimpressions gained from both are the same, particularly the substantial differencesbetween the endonuclease and methylase. The 0.1 4 Ala 3.6 results of our analysis are presented in the miniprint (Tables 2 2.5 1 1.7 A% IV and V and Fig. 7 and 8) together with a discussion of the 2.0 2 0.2 3 3.3 ASP differencesbetween the two predictions. Since the current 2 2.1 2 1 1.1 Glu2.0 state of the art of secondary structure prediction is imperfect 1 1.1 1 1.2 0.1 Gly (20, 21), these predictions should be viewed with appropriate 1 1 .o 0.1 His 1 1.0 3 3.1 1 1.0 Ile caution, and are mainly useful in suggesting further experi4 3.9 0.2 0.3 Leu ments such as the location of sites for specific mutagenesis. 1.0 1 1 1.2 LYS With this caveat in mind, we have drawn the following con1 0.5 1 0.7 Met" clusions. 1.a 2 0.8 1 1 1.o Phe 1) The endonuclease and methylase exhibit gross differ3 2 1.8 0.2 3.6 Ser ences in predicted molecular architecture. The former con3 2.7 0.2 Thr 1 1.0 0.1 Val tains significantly more predicted (Y helix than the methylase, Methionine is determined as homoserine in the first two peptides; while the latter contains substantiallymore amino acid residues which are predicted to be neither a helix nor /3 sheet. none is found in the third, indicating it is the COOH-terminal peptide. Specifically, the endonuclease is predicted to be 35% (Y helix after cyanogenbromidecleavage of the proteinwith the and 26%p sheet. Newman et al. (10) predicted 39% and 17% corresponding peptides predictedfrom the nucleotide se- for a helix and p sheet, respectively. There are two experiquence. Table I1 gives the compositions of three peptides mental measurements of a helix and p sheet in the endonuincluding residues 138-157 near themiddle and 252-277 at the clease: Goppelt et af. (22) obtained values of 36% a helix and COOH terminus of the protein. The close match between the 21% /3 sheet, while Newman et af. (10) report 31% and 13%, measured and predicted values confirms the postulate that respectively. There is very good agreement of all these results the peptides predicted from the nucleotide sequence exist in for a helical content. The p sheet values are more variable, the purified protein. These data strengthen our confidence but all values indicate that the endonuclease contains more that the derived amino acid sequences are correct. Hsiang's a helix than /3 sheet, Our prediction for the methylase is 19% NHZ-terminal sequence analysis indicates that the methionine a helix and 30% p sheet; Newman et al. (10) predict 18%and is not present in the purified endonuclease.' Comparison of 27% for a helix and p sheet, respectively. In both predictions, the predicted and measured amino acid compositions suggests the p sheetcontent is greaterthanthe a helix content. that the NH?-terminal methionine is also removed from the Newman et al. (10) report circular dichroism results of 11% (Y methylase. helix and 5% /3 sheet for this enzyme; however, they regard Newman et al. (10) present additional data verifying the these values with caution. Indeed, caution is required in the TABLEI Amino acid composition of the Eco RI endonuclease and methylase ;:1 - i - " 'i} 2148 Eco RI: Protein Gene and Sequences interpretation of all these secondary structure estimations. stacking distance in helical DNA. Even so, the differences between the two molecules appears A notable feature of the methylase sequence is the repeat significant. of the tripeptide,Leu-Ile-Lys, fourtimes beginning at residues 2) Eco RI endonuclease can be divided into two domains of 153,177,294, and 318. The fist pair and the lastpair are each dissimilar architecture. The region from residues 1 to 174 is separated by21 amino acids. Three of the tripeptides fa11 predicted to consist of 44% a helix and 18%p sheet in alter- within predicted a helices while the fourth overlaps a prenating segments; this is a well known motif in protein struc- dicted p sheet strand. It remains a matter of speculation tures, forming the basis of the domainswhich bind nucleotides whetherthispatternhas asignificant relationshiptothe in avariety of oxidation-reductionenzymes(23).Residues function of the methylase. 175-277 are predicted to contain20% a helix and 39%/3 sheet. Additional limited homology in the amino acid sequences Besides the differences in amount of predicted (Y helix and /1 adjacent to the two pairs and underlying genetic homologies sheet, the NH,- and COOH-terminalregions appear different are discussed by Newman et al. (10). in a more subtle sense. The predictions are quite consistentin the NH2-terminal region whereas considerable ambiguity is The Nucleotide Sequence encountered in the COOH-terminalregion; there are competIn comparing the nucleotide sequence derivedfrom the two ing tendencies to predict (x helix and p sheet in the same region. The differences in statistical associations of amino acid templateorientations represented by M13-RI6a and M13RI13b, a perfect match was obtained with two exceptions. residues in these two regions of the molecule support the prediction of two domains, even if details of the predicted One of these proved to be a mutation which will be discussed was foundin the noncoding secondary structure do not prove to be accurate. The predic- later. The other unclear area tion of two structural domainsin the endonuclease is interest- region 168-181. This 14-base pairregion contains 12 G-C pairs ing since thereis considerable data tosuggest that other DNA and the following restriction sites: one Hha I, two FnuDII (Tha I), oneFnu4H1, and one Puu I site which is also a h repressors and recognition proteins, for example, the lac and Sau3AI (MboI) site. Thepresence of at least one FnuDII and CAP, aredivided into two functional domains (17,24). Limited proteolysis and site-directedmutagenesis will be employed to the Hha I and Puu I sites was established by analysis with these enzymes. The Puu I site is at the center of a 10-base further analyze the endonuclease for the presence of different pair sequence with2-fold symmetry. Thiskind of symmetrical functional domains. Both enzymes exhibit clusterings of amino acids which are sequence rich in GC residues can form secondary structure potential points of contact with substrate. The most obvious which leads to compression and poor resolution on sequencing of these are several clusters of basic residues, which might be gels (26) and may also interfere with enzymatic chain elongation. In this case, we were able to read sequence through expected to interactwith phosphate moieties in the backbone the Sau3AI site on both templates. From the appearance of of DNA. Many of these occur in regions predicted to be devoid of secondary structure,i.e. in loops between secondary struc- autoradiographs of the sequencing gels, it was clear that the ture elements. In enzymes withknown structure, similar loops sequence immediately following the site could not be read accurately. Therefore, thesequence presented was established commonly form the active sites. The remaining basic clusters in the Eco RI enzymes can be found in regions predicted to by combining the apparently normal regions from each template. An inverted repeat which involves most of these resiform a helices, primarily at the COOH end of those helices. These clusters could also be playing a structural role since dues is marked in Fig. 4. Nucleotides 170-183 and 220-233 there is a statistical tendencyfor helices to exhibit this charge would pair (in mRNA)with an energy of -35.2 kcal (27). The not distribution (18,19). Both enzymes also containclusters which function of this noncoding region of the sequence is known, are rich in hydroxyls: in the endonuclease, Ser(84)-Asn-Ser- but this is the most impressive free energy change of any of the possible stem and loop structures in this sequence. SerandSer(259)-Thr-Thr-Ser; in themethylase,Ser(85)Base Composition a n d Codon Usage-The DNA sequence Ser-Ser, Tyr(95)-Tyr-Glu-Tyr, Ser(ll2)-Val-Ser-Ser, and shown in Fig. 4 is 65% A + T, significantly higher than thatof Ser( 144)-Ser-Glu-Ser. ColE1 (28) and E . coli (29), the host cell for both pMBl and Internal Homologies in the Amino Acid Sequences-One ColE1. Furthermore, a shift in base composition occurs near of the most striking features of the endonuclease amino acid sequence is an iterative tetrapeptidebeginning at residue 250, residue 260, so that residues 1-260 are 49% A + T, similar to where thesequence reads: Ile-Met-Phe-Glu-Ile-Met-Phe-Asp.ColEl and E. coli. The codon usage in the endonuclease and methylase genes reflects the high A + T content. There is a The DNA which encodes this region is equally iterative with only two base changes, one in the wobble position of the Ile strong preference for codons which end with A or T, and in the case of arginine and leucine, wherethe choice is available, and the other leading to the Glu-Asp substitution. This is suggestive of a duplication event in the history of the endo- a preference is also manifest for those codons which begin nuclease. This octapeptide is predicted to form a strand of p with A or T. This pattern of codon usage is quite different from that seenin a number of E. coli chromosomal genes, and sheet. Furthermore, commencing at residue 230, there is a (Table 111) (30-36). similar tetrapeptide, Ile-Met-Leu-Gln,which is also predicted especially the ribosomalproteingenes Post and Nomura (30) have argued that codon preference in to lie within a strand of p sheet. If this prediction is accurate, to themost the methionines and acidic groupswould project outfrom one the ribosomal protein genes generally corresponds side of the sheet and the hydrophobic isoleucines and phen- abundant tRNAspecies, reflecting the need for efficient transylalanines would project fromthe otherside, probablytowards lation of these proteins. In most cases, wobble pairing allows the interior of the molecule. The residues immediately pre- translation of the Eco RI codons by major tRNA species; ceding both these regions are also similar in character, con- however, exceptions occur. AGA and AGG for arginine and sisting of aspartate, lysine, and serinein different order. With- AUA for isoleucine are frequentlyused codons in the Eco RI out further analysis, the function of the region cannot be sequence which are represented by minor tRNA species and are almost never selected in a number of other E. coli gene assigned; however the occurrence of this periodicity in predicted p sheets is tantalizing sincethe distancebetween amino sequences (Table 111).The frequentuse of codons represented by minor tRNA species may affectthe translationalefficiency acid side chains in /? sheets is 3.4 8, (25), which means that of these genes. Betlach et al. (6) speculated that pMB1 arose the repetitive amino acids occur at multiples of the base- Eco RI: Protein Gene and Sequences 2149 TABLEI11 Codon usage (in frequency per 1000) ENDO CODON Leu 4 3 19 S e r 19 13 10 10 2 13 CGG AGA AGG CUU CUC 8 CUA CUG U UA UUG UCU 22 ucc UCA 2 UCG AGU AGC T h r 28 ACU ACC 3 .ACA .2 ACG Pro CCU ccc 22 CCA CCG Ala GCU GCC 38 GCA GCG 0 27 10 18 7 15 5 33 13 27 2 17 10 10 13 13 23 0 15 3 18 0 10 0 22 3 12 42 5 1 RIBO- +SOMAL' METH COLI M I X 2 1 1 0 4 3 0 61 ENDO CODON +SOMAL METH GGG GUU GUC 38 GUA GUG Lvs AAA I " 25AAG 18 35 23 5 13 28 5 65 30 Val 2 0 8 13 4 70 10 9 9 RIBO- COLI M I X 0 5 43 7 13 I 15 70 24 1 7 His CAU 4 6 5 13 Glu ASD GAA GAG GAU 13 4 Cys UGU a Phe UUU 28 60 19 4 1 5 29 68 12 25 7 i 6 19 15 28 AUC AUA 25 22 30 43 8 22 12 ' See Kef. 30. 'Codon usage for lac1 (31), l a c y (32), trypA (33), recA (34, 35), lipoprotein (36), and part of KNAP (30). by translocation of a segment of DNA into a commonancestor TGAAAAATGGCAGGTTCGGTGGATTTTGACGGGCTAATGTGGTCTGCACC 113 here of ColE1. The features of the DNAsequencenoted TGTGGTCTGCACCATCTGGTTGCATAGGTATTCATACGGTTAAAATTTAT 150 suggest that the Eco RI genes may originate from a species CTGCACCATCTGGTTGCATAGGTATTCATACGGTTAAAATTTATCAGGCG 156 whose DNA has high A + T content. We are extending our GCCGTGATCGCGATGAACGCGTTTTAGCGGTGCGTACAATTAAGGGATTA 256 sequence analysis of pMBl and comparing it to ColEl DNA GAACGCGTTTTAGCGGTGCGTACAATTAAGGGATTATGGTAAATCAAACG 270 sequence in order to determine the boundaries of nonhomolTGCGTACAATTAAGGGATTATGGTAAATCAAACGTATGTTAATCTATCGA 286 ogy. ACAATTAAGGGATTATGGTAAATCAAACGTATGTTAATCTATCGACATAT 291 Transcriptional and TranslationalSignals-We have preGTAAATCAAACGTATGTTAATCTATCGACATATGTAACTTTATAAAATAA 308 viously suggested that the endonuclease and methylase are AACGTATGTTAATCTATCGACATATGTAACTTTATAAAATAACAGTGGAA 316 coordinately controlled. The level of both enzymes is altered by conditions which affect the lac control system when the - 35 - 10 +1 endonuclease is adjacent to thelac promoter (pPG31) but not aAa-t"-'TTGACa _____ - t - - T_ A t A_ AT-_ - - - - C.A _ T-- _ _ _ t g 9 when the orientation is reversed (pPGSO), and the magnitude FIG. 5. Possible promoter sequences. The bottom line shows of the effect is similar for both enzymes ( 4 ) . Accordingly, the DNA sequence preceding the endonuclease initiation codon homologous regions of published promoter sequences (37, 38). Variwas examined for sequences homologous to known promoters. ation in spacing between the Pribnow sequence (-10) and the -35 region in wild type promoters is two more or one less than shown; Severalpartial homologies occur, but none is convincing variation in spacing between the Pribnow sequence and the messenger enough to select a promoter by inspection of the sequence initiation site (+ 1) is f 2 . Capital letters designate bases which occur alone. The potential promoter sequences most distant from in 61% or more of the sequences, small letters designate frequency in DNA between 46R and 61%. Two smallletters designate a combined the initiation of translation are probably located . number following each of the possible prohomologous to ColEl, but this does not automatically exclude frequency of ~ 7 0 % The moter sequences indicates the position of the first nucleotide of the them as possible choices. The potential promoters (Fig. 5) Pribnow box, designated *. The sequences at positions 156, 256, 270, include five sequences that match reported Pribnow boxes 286, and 316 match published Pribnow boxes. Thoseat positions 113, and four more with the three most conserved bases (seeRefs. 150,291, and 308 match the three most frequently occurring bases 37 and 38 for recent reviews). Of these, only the Pribnowbox TA---T. beginning at position 156 has significant homology with the -35 region, and, in this case, the spacing of two homologous residue 200, but the accompanying Pribnow box would have in any published regions is one nucleotide lessthan that found to be an uncharacteristic -TGCCGT- or -CGTGAT-. wild type promoter sequence. There is a sequence with very The mode of regulation of the Eco RI genes has not been good homology to the consensus -35 region beginning a t elucidated. The restriction-modificationsystemprovidesa 2150 Eco RI: Protein Gene and barrier to foreign DNA entering a cell containing both enzymes, but when a plasmid carrying the Eco RI genes enters a host cell with unmodified DNA, hostDNA is restricted with low efficiency. The order of translation of the genes would appear to present a source of difficulty for such a plasmid. However, since the endonuclease requires at least two subunits while the methylase is active as a monomer (3), the methylase is functional first in spite of the translational order. The earlierpresence of active methylase achieved by the subunit structureof the enzymes may be sufficient to account for the survival of the genes. Alternatively, there may be a separate promoter for the methylase. When different fragments (HindIII, HindIII-PstI, andBgl 11-HincII), which contain the methylase but not the endonuclease gene, are cloned in avariety of sites and orientations, the resultinghybrid plasmids always conferthe Eco RI modification phenotype on the host cell (4, 6, 13). Methylase specific activity was measured in one of these clones and found to be 5% of normal (6). Early expression of this amountof enzyme could be functionally important for the restriction-modification system. We are currently examining promoter activity in the regions preceding both methylase and the endonuclease genes. The DNA sequence following each of the structural genes was examined for RNA polymerasechain-termination signals (38). A series of T residues occurs following both the endonuclease andmethylasestop codons. Neither T series is preceded by a GC-rich region as is characteristic of some transcription-termination sequences. The sequence near the end of themethylase gene has two hyphenatedinverted repeats which could form secondary structure in the message preceding the series of Ts. Unlike other messenger-termination sequences, these overlap the coding sequence. The possibility remains that the true termination sequence occurs beyond the sequence presented here. Homology with the 3' end of 16 S rRNA sequence (39) is indicated in Fig. 4 for both genes. The trinucleotide -GGAwhich occurs twice, 6and 14 base pairsbefore the endonuclease ATG, and the pentanucleotide-TAAGG- which occurs 14 nucleotides before the methylase ATG, are likely ribosomebinding sites. It hasbeen suggsted that the termination codons UAA and UGA may also be part of the recognition signal for ribosomal initiation (40). One UAA codon begins at position 320 preceding the endonuclease Shine-Dalgarno sequence. Four UAA codons occur in the region of the methylase initiation site. The sequence of the intercistronic region is over 90% A + T; in the coding strand, 19 of 32 nucleotides are Ts. The presence of stems and loops in intercistronic regions with the ribosome-binding site exposed in the loops has been noted in a number of sequences (32,41). Residues1144-1155 and 12211232 could pair to forma stem andloop structure (AG = -16.8 kcal) (27). In scanning the entire sequence for possible secondary structure, however, the low potential for the intercistronic region to participate in any stem is more impressive than the presence of the stem andloop noted. A Spontaneous Mutation in the EcoRI Endonuclease Gene The sequences obtained from the two templates, M13-RI6a and M13-R113b, differed a t residue 902. The M13-RI6a sequence was GCG and the complementarysequence obtained from M13-RI13b was CCC. Autoradiographs of sequencing gels from each templatewere clear and unambiguous. Therefore, template DNAwas preparedfrom two other clones, M13-RI3a and M13-R122b, representing independent isolates of the two templates orientations. The sequences read from both of these terndates match the sequence obtained from M13-RI13b. M13-Rf6a contains a spo~taneousmutation in Sequences the Eco RI endonuclease gene which replaces the arginine at residue 187 with aserine. Endonuclease activity was measured in crude extracts of strain 71-18 infected with M13-RI6a and M13-RI3a. Specific cleavage at Eco RI sites was undetectable in M13-RI6a/71-18 under assay conditions in which about 1% of the activity of M13-RI3a/71-18 would have been detectable. Extracts of the two strainscontained comparable amounts of Eco RI methylase. The specific activity of the Eco RIenzymes is lower in the M13-RI strains than in plasmid-containing strains. Furthermore we have been unable to measure Eco RI restriction of h phage in the M13-RI strains even whena substantial amount of Eco RI endonuclease activity is measurable in vitro. Therefore, we transferred the mutant sequence back into a plasmid derivative of pPG31 for further study of the mutant endonuclease. Two plasmids, pMG31 and pMG31-6 (Fig. 2 8 ) , were constructed. pMG31-6 contains the 1685-base pair Hind111 fragmentcarryingthemutantsequence from M13-R16a, and pMG31 contains the analogous fragment from pPG31. Both plasmids differ from pPG31 (Figs. 1 and 2B) in that the small Hind111 fragment which spans one of the Eco RI-Eco RII junctions hasbeen deleted. The activity of the Eco RI enzymes in E. coli strain 294 (42) carrying pMG31 or pMG31-6 was measured after growth in minimal medium containing glycerol + isopropyl-1-thio$D-galactopyranoside. These growth conditions yield the highest specific activity of the Eco RIenzymes in pPG31/294 (4). The specific activity of the Eco RI methylase in the two strains is approximately equivalent, 860 and 1090 units/mg of protein, in pMG31 and pMG31-6, respectively. Endonuclease activity is measured in relative numbers because of the nature of the assayused. We could not use quantitative assayswhich rely on measuring conversion of circular to linear forms in plasmids with one Eco RI site (43,44), or on methodswhich quantitate cleavage by measuring the increase in 5' termini (45),since these methods do not distinguish Eco RI specific cuts from nonspecific cuts. Eco RI endonuclease activity is high enough in normal strains that nonspecific background cleavage is not a problem. However, to assess activity in the mutant, we had to rely on the appearance of specific DNA fragments. Quite good relativevaluescan be assigned by comparing different times of digestion and different dilutions of the extracts. The mutant endonuclease clearly cleavesDNA a t Eco RI sites. The level of enzyme activity is 0.3% of the level observedin pMG31. Furthercharacterization of the mutant endonuclease is underway. Achnoulledgments-We thank Patricia Clausen for preparing the manuscript and Sonja Bock for editing the sequence in the computer to produce Fig. 4. We also thank Keith Backman and Jon Lawrie for comments on the manuscript and Dr. Paul Modrich for interesting and useful discussions of results prior to publication. REFERENCES 1. Hedgpeth, J., Goodman, H. M., and Boyer, H. W. (1972) Proc. Natl. Acad. Sci. U. S. A . 69, 3448-3452 2. Dugaiczyk, A,, Hedgpeth, J., Boyer, H. W., and Goodman, H. M. (1974) BiochemistQ 13,503-512 3. Modrich, P. (1979) Q.Reu. Biophys. 12,315-369 4. Rosenberg, J. M., Boyer, H. W., and Greene, P. J. (1980) in The Site Specific Restriction Endonucleases(Chirikjian, J. G., ed), Elsevier-North Holland, in press 5. Rosenberg, J. M., Dickerson, R. E., Greene, P. J., and Boyer, H. W. (1978) J. Mol. Biol. 122,241-245 6. Betlach, M., Hershfield, V., Chow, L., Brown, W., Goodman, H. M., and Boyer, H. W. (1976) Fed. Proc. 35, 2037-2043 7. Sanger, F., Nicklen, S., and Coulson, A. R.(1977) Proc. Natl. Acad. ScL. U. S. A. 74, 5463-5467 A-_MPacinrr €3.. Muller-Hill. B.. and Hofschneider. D, .I ", Gronenborn. " ~ . Eco RI: Protein Gene and P. H. (1977) Proc. Natl. Acad. Sci. U. S. A . 74, 3642-3646 9. Gronenborn, B., and Messing, J . (1978) Nature 272, 375-377 10. Newman, A. K., Rubin, R. A,, Kim, S.-H., and Modrich, P. (1981) J.Biol. Chem. 256,2131-2137 11. Goodman, H. M., Greene, P. J., Garfin, D. E., and Boyer, H. W. (1977) in Nucleic Acid-Protein Recognition(Vogel, H. J., ed), pp. 239-259, Academic Press, New York 12. Maxam. A. M., and Gilbert, W. (1977) Proc. Nu/[. Acad.Sei. U. S. A . 74, 560-564 13. Bolivar,F.,Rodriguez, K. L., Greene,P.J.,Betlach,M. C., Heyneker, H. L., Boyer. H. W., Crosa, J. H., and Falkow. S. (1977) Gene 2.95-113 Itakura, K., Hirose, T., Crea, R., Riggs, A. D., Heyneker, H. L., Bolivar, F., and Boyer, H. W. (1977) Science 198, 1056-1063 15. Backman, K. (1980) Gene 11, 169-171 16. Boyer,H. W., Greene,P. J., Meagher,R. B., Betlach,M. C., Russel, D., and Goodman, H. M. (1975) Fed. Eur. Biochem. SOC. Symp. (Budapest) 34, 23-37 17. Ptashne, M., Jeffrey, A., Johnson, A. D., Maurer, R.. Meyer. B. J., Pabo, C.O., Roberts, T. M., and Sauer, R. T.(1980) Cell 19, 14. 1-11 18. Chou, P. Y., and Fasman, G. D. (1974) Biochemistry 13,211-222 19. Chou, P. Y., and Fasman, G. D. (1974) Biochemistry 13,222-245 20. Schulz, G. E., Barry, C. D., Friedman, J., Fasman, G. D., Chou, P. Y., Finkelstein, A. V., Lim, V. I., Ptifsyn, 0. B., Kabat, E. A,, Wu, T. T., Levitt,M.,Robson,B.,andNagano, K. (1974) Nature 250, 140-142 21. Matthews, B. W. (1975) Biochim. Biophys. Acta 405,442-451 22. Goppelt, M., Pingoud, A,, Maas, G., Moyer, H., Koster, H., and Frank. H. (1980) Eur. J . Biochem. 104. 101-107 23. Rossman, M. G., Liljas, A,, Branden, C.'I., and Banaszak, L. J . (1975) in The Enzymes (Boyer, P. D., ed) Vol. XI, pp. 61-102, Academic Press, New York Sequences 2151 0. C., Crothers, D. M., and Grolla, J. (1973) Nature New Biol 246,40-4 1 28. Falkow, S. (1975) in Infectious Multtple Drug Resis/ance (Lag nado, J. R., ed), pp. 98-143, Pion Limited, London 29. Shapiro, H. S. (1970) CRC Handbook of Biochemistry Selecteu Data For Molecular Biology (Sober, H. A., ed) 2nd Ed, pp H80-H88, The Chemical Rubber Co., Cleveland 30. Post, L. E., and Nomura, M. (1980)J . Biol. Chem. 255,4660-466f 31. Farabaugh, P. J. (1978) Nature 274, 765-769 32. Buchel, D. E., Gronenborn, B., and Muller-Hill,B. (1980) Naturt 283, 541-545 33. Nichols, B. P., and Yanofsky, C. (1979) Proc. Natl. Acad. Sei.Zi S. A . 76,5244-5248 34. Horii, T., Ogawa, T., and Ogawa, H.(1980) Proc. Nut[.Acad. SCL U. S. A . 77,313-317 35. Sancar, A,, Stachelek, C., Konigsberg, W., and Hupp,W. D. (1980: Proc. Natl. Acad.Sci. U. S. A . 77, 2611-2615 I. L., Takeishi,K., and Inouye 36. Nakamura, K., Pirtle, R. M., Pirtle, M. (1980) J. BioL Chem. 255, 210-216 37. Siebenlist, U., Simpson, R. B., and Gilbert, W. (1980) Cell 20, 269-281 38. Rosenberg, M., and Court, D. (1979) Annu. Reu. Genet. 13, 319353 39. Shine, J., and Dalgarno. L. (1974) Proc. Nutl. Acad.Sei. U. S. A 71, 1342-1346 40. Atkins. d . F. (1979) Nucleic Acids Res. 7, 1035-1041 41. Selker, E., and Yanofksy, C. (1979) J . Mol. Biol. 130, 135-143 W. (1976) Proc. Natl 42. Backman. K., Ptashne, M., and Gilbert, Acad. Sci. U. S. A . 73,4174-4178 43. Greene, P. J., Betlach. M. C., Boyer, H. W., and Goodman, H. M (1974) in DNA Replication (Methods in Moleculur B i o l o ~ g ~ ] (Wickner, R. B., ed) Vol. 7, pp. 87-111, Marcel Dekker, Inc. 24. Ogata, R. T., and Gilbert, W. (1979) J . Mol. Biol. 132, 709-728 New York 25. Church, G . M., Sussman, J . L., and Kim, S.-H. (1977) Proc. Natl. 44. Lackey, D., and Linn, S. (1980) Methods h'nzymol. 65, 26-28 Acad. Sci. U. S. A . 74, 1458-1462 45. Berkner. K . L., and Folk. W. H. (1980) Method,%Enrvmol.65, 2826. Sanger, F., and Coulson, A. H. (1978) FEBS Le/t. 87, 107-110 36 are found on p. 2153 27. Tinoco, I., Borer, P. N., Dengler, D., Levine, M. D., Uhlenbeck, Additional references 2152 Eco RI: Gene and Protein Sequences 44 S E C O W M R I SIWCTURE PREOiCllOW Eco RI: Gene and Protein Sequences 2153 TABLE iV &RI Secondary S t r u c f ~ r eP r e d i c t i o n s f o r Number oPf r e d i c t e d residues struc "B't u r e Region Le" 10 Val20 Ala 28 Gln5752 Glu 65 G l Y 79 I l e 94 6 - - - d l u 103 I l e 119 Asp GI" I 8 i33 Glu 152 Tyr 165 I l e 179 Arg 187 Leu 201 Tyr 209 L y r 221 l i e 210 I l e 250 Leu 261 Leu 270 - Phe 24 Ser 47 Tyr I l e 7) Val 83 Asp6 99 - - Gln Ile Gly Lyr i.19 B 0 1.17 0.97 B 0.97 1.19 0.99 1.11 D 1.21 B a 191 207 21) 226 240 258 267 277 B 0.89 B B I AAA+ A. 2 B 1.11 C 5 B 0.88 1.21 A- 1.01 1.21 1.04 CC C B 5 B 7 5 6 G I1 1.17 a 0.79 o. i.06 1.09 5 B B B 8 0 9 i.11 1.01 1.15 3, 5 4, 5 B- 5 B 5 1.090 0.95 a I c 1.12 1.00 1.16 0.92 1.22 1.15 1.11 7 12 18) 1.14 1.24 D Conmenf Grade 1.16 0.96 5 IJ I1 129 119 161 "a' 20 9 5 G I " 115 Gly Ala Phe 169 Leu Arg Leu Ala Ile Lyr 0.95 o 9 Endonuclease n 5 A 1.20 0.96 1.25 6 7 9 IO c+ 1.11 C 1.16 0.98 A io, I1 12 B Conmenfs on T a b l eI v . . 1. I 2 4. 5 6 7 There tw elements o f recoodary Structure are very clear i nt h e i ri n d l v i d u a ip r e d i c flonr b u ta r ec r a d e dt o g e t h e r . T h i sl o n g h e l i x c o u l d be two h e h c e r , k i n k e d i n t h em l d d l e a t Gly 36. P r e d i c t i o n i s v e r y ambiguoushere ~ i n c e<PG>=<PB>. The h e l i x was p r e f e r r e db e c a u s e o f t h e r l i g h f l y h i g h e r p r o b a b i l i t y and because of the presenceof G l u 96. a r e s i d u e r a r e l yf o u n di n B rheefr. There i s a strong tendency towaTds B i n the m i d d l eo ft h i s region. The a h e l i x wa3 choren becaure o f the n u c l e u so u t s i d e the ambiguousreqioo. i . ~ .r e s i d u e s i i 0 to 115. and becaure o f 1 f a f l 5 f i c d l p m f e r e n c e r :V a la n d Leu a r e very o f t e nf o u n d near the center o f ~1 h e l i c e s (there are r e s p o n s i b l ef o rt h ea m b i g u i t y ) , G l u i s a good N - t e r m i n a l r e r i d u e . and L y r . H i s and G l n are the t h r e e mlt Cornon C-ferrnlnai Y h e l i c a lr e s i d u e s . The secondary ~ f r u ~ f ~element5 r e i n t h e r er e o i o n r e r e a i i f f l e crowded. # . e t h e 1 0 0 ~ 1 are too r h o r f . However, 8 f i s not c l e a r w h i i hp r e d l c f i o n ss h o u l d be madsfled fa r e i l e v e the c r a d i n g , hencethey remain w i t h t h i s c a v e a t . The o v e r l a p p i n gr e g i o n Val166 G l u 177 also ha$ h i g h u pofenflal i<Pm> I I1 and <Pa> = 0 . 9 8 ) .T h i sZ o r e i b i l i f Y was not acceDfedbecause there was no o n u c l e a t i o n center outride the arnbiguour r e g l o n (note that the p r e c e d i n gh e l i x cannot propagate t h r o u g hP r o1 6 4 ) , the o v e r a l l B p o f e n f # a l was much higherandtheboundartes were c l e a r l y Chore o f a B sheet. The cloxe p r o x i m i t y t o t h ep r e c e d i n g h e l i x i s a f u r t h e r problemhere. This prediction i s extremely ambiguous ~ i n c et h e regson Arg 187 t o G l u 192 ~ i v e r <Pa> 1 . 1 1 . <Pg> = 1.07. The method i s s a y i n g t h e r e i s s e c o n d a r ys t r u c t u r e here, b u t cannot d i s c r i m i n a t ew h i c hk i n d . The B was chosen because i f has the h i g h e s tp r o b a b i l - - - &I. The p r e d l c f e dr e c o n d a r y ifrucfure o f EcoRl endonuclease i sr e p r e s e n t e dr c h e m a t i c a l l y : Amlna acid r e r i d u e ir e p r e s e n t e d by a d l x a r e p a r t of a p r e d i c t e d Y h e l i x ,t h o s e represented by 8rregular (folded)hexagons are p a r t o f a r f r a n do f B sheet, while r e s i d u e s which a r e n o t a p a r t o f a d e f i n e d secondary s.frucfuie are represented a s ~ i r c l e s . The r h a d l n g scheme reprelentsthechernlcalcharacterof the amino a c i dr e s i d u e :H e a v i l y shaded r e r l d u e r are hydrophobic. hence usually found i n t h e i n t e r i o r o f a p r ~ f e l n( , . e . , Phe. Trp, T y r , l l e , Leu. Her and V a l ) . Charged residues, which are " I u a l i y found ov t h e e x t e r i o ro f a r c b l a n k .e x c e p t forthe ~ i g no f t h ec h a r g e( i . e . . G l u and Asp are minus Lyr t h eP r o t e i n andArg a r e p l u s w h i l eH i s i s blank). Thore r e s i d u e sw h i c h a w m m o n l v found i n u s r l e d - i t y score. 8 9 10 /I I2 T h i sr e g i o n has a more d i f f u s e o n u c l e u ~than u s u a l l y r e q u i r c d ,b u t s f c l e a r l y has h i g h 0 p o t e n t i a l( c o m p o s i t i o n b i z ) . Hence the p r e d i c t i o n ,b u tw i t h a low grade. There a r e fw p r o b l e mh e r e . one 8 1 the presence ofPro211,which i s rare in B rheefr, t h e ofher i s the crowding w i t h the previous p r e d l c f i o n . Hence the low grade. an OL nucleus i n t h e The r e g i o nf r o m 221 t o 240 c o n t a i n e d many a m b i g u i f i e r .T h e r ei s 221to226 region and a B n u e i e s s hefween 237 and 240. Both c o u l d propagate i n t o t h e middle. The h e l i xp r o p a g a t i o n was stopped a t Lys 226 becaulethe r e g i o n f r m Arn 224 to Ser 228 i s a ~ O n f l n u o ~sst r e t c h o f s i x h y d r o p h i l i c r e l l d u e r , f o u r o f w h i c h am one t u r n o fh e l i xw o u l d be ronfincharged, i . e . i t was c o n r i d e r e du n h k e l yt h a to v e r u o u ~ l ycharged. The regmnf I l e 230 t o Ala215 c o n t a r n 3 b o t h Y and B n u c l e < . b u t t h e l a t t e r i s $ f a t # $ tically dominant (<Pa> = 1.23, <PO> 1.14). There 8 1 d l 3 0 the p r e v i o u s l y noted B n ~ c l e u1217-240). ~ The boundaryresidues are much _ r e conrirfenf W i t h a B sheet. Her 251and Phe 256(<Pa> The region has b o t h a and B p o t e n t i a l .e i p e c i a l l yb e t w e e n i.20 <P > = 1.29). However, B c l e a r l y dominafer the I f a f l ~ f i C I . The region I l e 254 to /;e2%has an unambiguous B n u ~ l e u s(<Po> * 1.06, <Pg> 1 . 1 9 ) . The boundary r e s i d u e sa r e c l e a r l y t h o s eo f a 8 sheet. Hence t h i s p r e d i c t i o n . - - - TABLE V Secondary S t r u c t u r e P r e d i c t i o n s f o r Re4 i on 2 7 Le" Phe 6 lie 41 Val 62 Phe Leu Phe Aen Leu Ser Asp Pro Phe Glu 5 Ile Val Pro Cyr Leu 276 Val Pro Tyr Number of rerlduer 9 - t y r 17 21 - Cy$ 26 28 - Gln 31 - Cyr 48 56 - Val 72 - Val 78 84 - 90 Ala 92 - Tyr 96 106 I l e Ill 127 - Leu 13i 135 5 139 Thr 142 L y r 155 160 Ala 164 I71 Glu 180 187 181 Val i91 - Val 198 199 - Leu 204 224 - I l e 214 241 - Phe 246 - Leu 284 289 - Lyr 299 314 - 6I l e 119 structure 9 6 e 0.87 B B 1.25 0 6 7 7 5 - k R 1 Methylase Predicted B0.72 B B 1.37 8 6 5 i4 5 B1.13 8 B B 6 6 I1 6 9 "m> "B> 1.16 0.87 0.85 1.22 0.86 1.09 1.01 1.21 8A 1.11 A 1.21 C C 1 1.13 0.81 c 5 1.09 1.23 B C- 1.09 1.02 1.11 1.06 1.06 1.01 1.18 I. i 8 1.03 0.95 0.96 1.29 1.14 1.29 0.71 1.21 B 1.10 B 6 1.07 0.96 0.99 1.12 1.10 0.92 8 0.98 I 18 I1 Grade Commnf I c 4 6 C A C 00B888CB AA 7 8 8 9 10 C o m n f 3 on Table v I. The p r e d i c t i o n i s quite c o n s i s t e n t , b u t the confinuoui s t r e t c h o f h y d r o p h i l i c and c h a r g e dr e r i d u e r .w h i c hw o u l d wrap around the e n t i r e c i r c u m f e r e n c eo ft h e helix. i s a 2. 1. 5. 6. 7. 8. 9. 10. The o v e r l a p p i n gr e g i o n Leu 12 - Arg 36 i s a B n u c i e u ~(<Pa> = 0.90,<Pa> = 1.19). T h i s was r e j e c r e db e c a u s e the m p o f e n f i a i wa3 gomewhat h l g h e t and theaminoacids on the C terminus are c o m n i y f o u n da f t e r n helices. Af u r t h e rp r o b l e m i~thecrowding w i t h the p r e u i o u rp r e d i c t i o n . The r e g i o n a l s o ha5 r i g n i f l c a n f 0 p o f e n f l a l . however t h e B c l e a r l y dominafer. Ala 74 i s an 0 n Y C l e u I (<P.> 1 . 2 1 , <P > = 1.08); The o v e r l a p p i n gr e g i o n Leu 69 however. the B o r e d i c t i o n was baled on t h e exirtence o f a c l e a r B nucIe!$ o u t r i d e t h e ambigu&sregion. The o nucleus was not complete, however t h e region c l e a r l y has sfrang a p o t e n t i a l . T h i lp r e d i c t i o ni s fenuouS because the two Asp r e s i d u e si n ~ ~ C C e l l i own u i d p r o j e c t out from bath r i d e r o f B s t r a n d - f o r c i n g one t a a r d r the i n t e r i o r o f the p r o f e s n . The r e g i o n fromLeu I 5 0t oL y r 155 i s ambiguousandhas b o t h u and B p o f e n f r a l . The G a r r i g n m n t WaI bared On the strong D n u c l e u ~between Phe 1 4 ) and Glu 148 ((Po> 1.12. <Pa> 0.94). One o ft h e two ~ o o t i g u o u eN - L e r m i n a i p r o l i n e r was assigned t o the h e l i x and one t o t h eN - t e r m i n a ln o n - h e l i c a l region _ r e or l e s s a r b i t r a r i l y . Each o f t h e w p r e d i c t i o n s Ieemr acceptable, b u r t h e f a c t t h a t they a r e i m e d i a f e l y a d j a c e n tr e n d e r st h ep r e d i c t i o n s somewhat f e n u o u ~ . The r e g i o nh a sc o n r i d e r a b l e G p t e n f i a l b u t no o n u ~ l e u ~ . The a d j a c e n tA r g 243 L y r 244 renders t h i s p r e d i c t i o n very f e n u o u ~ - - - appear a f t e r theparentpaper. Note:References1-45 46. Greene. nrnhlrn I." . . A.A., P.J.,Heyneker. Backman, K . , B o l ~ v a r . F . , Rodriguez. R.L., H.L., R u s l e l , D . J .T . aif. R.. Beflach. H . C . . and Boyer. H.Y. Covarrubisr. 11978) N u c l e i cA c i d r Res. 2, 2373-2180 van de Sande, Panel, A,, 48. Marinur. M . G . 49. H a n i a t i r . 1. 11975) B i o c h e m i s t r y 50. Bolivar. 51 52. HcDanell. H.Y., Simon. M.N., and S f u d i e r . F . Y . (1977) I.Mol. n i o l . 110. 119-146 Rogers, S.G., and Weirs, 8. (1980) Method5 Enzymol. 65, 201-216 51. Smith. 54. Hermann, R . . Neugebauer. J.R..andKleppe, S.H., Zoerwen, P.C., H.G., Raae, A . J . L, i l l e h a u g . 47. (1973) Biochemistry%. K. (1971) Mol. Gen. Genet. x, Khorana. 5045-5050 47-55 9, 1787-1794 F.. and Backman, K. (1979) Methods Enzymol. A.J.H. Genet. u. Givol. o., 56. Ararnatorio. 57. Balley, 58. Greene, P.J..Poonian, 245-267 Enrymol. 65, 560-580 (1980) Methods 55. 68. K.. Pirki, E., Zentgraf, H., and Schaller. H. 11980) w. =. 231-242 and P o r t e r . R . R . O.K., Packer.J.. J.L.(1967) In 11965) Blochem. J. 2.32c and Brown, W.E. (1980) Anal. Techniquer -~ i n Protein Chemilfry, Blochem. 103. 150-158 2nd I d s f i o n , pp. 140-352, E i r e v i e r ,h 3 t e r d a m Goodman. H.M. 59. 11975) Herrhfieid, V., ~ Bacferiol. 126, 447-451 " N.S.. Nurrbaum, A.L.. T o h i a s ,L . .G a r f i n . _?.u.K.z. 237-261 Boyer, H.Y., Char. L.. and H e l i n r k i . 0. (19761 D.E., 1. Boyer, H.Y., and