Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
doi:10.1006/jmbi.2000.3844 available online at http://www.idealibrary.com on J. Mol. Biol. (2000) 300, 353±362 A TOPRIM Domain in the Crystal Structure of the Catalytic Core of Escherichia coli Primase Confirms a Structural Link to DNA Topoisomerases Marjetka Podobnik1,3, Peter McInerney2, Mike O'Donnell2 and John Kuriyan1* 1 Department of Biochemistry and Molecular Biology, Jozef Stefan Institute, Slovenia Primases synthesize short RNA strands on single-stranded DNA templates, thereby generating the hybrid duplexes required for the initiation of synthesis by DNA polymerases. We present the crystal structure of the catalytic unit of a primase enzyme, that of a 320 residue fragment of Ê resolution. Central to the catEscherichia coli primase, determined at 2.9 A alytic unit is a TOPRIM domain that is strikingly similar in its structure to that of corresponding domains in DNA topoisomerases, but is unrelated to the catalytic centers of other DNA or RNA polymerases. The catalytic domain of primase is crescent-shaped, and the concave face of the crescent is predicted to accommodate about 10 base-pairs of RNA-DNA duplex in a loose interaction, thereby limiting processivity. *Corresponding author Keywords: DNA replication; RNA-polymerase; primase; TOPRIM; topoisomerase Laboratories of Molecular Biophysics and 2 Laboratory of DNA Replication, Howard Hughes Medical Institute, The Rockefeller University, New York, NY 10021, USA 3 # 2000 Academic Press Introduction None of the known DNA polymerases is able to initiate DNA synthesis by using single-stranded DNA as templates. Instead, DNA replication relies on a variety of mechanisms to generate short primer regions, typically RNA-DNA hybrid duplexes, that then serve as initiation sites for DNA synthesis by DNA polymerases. During chromosomal replication, in particular, the repeated generation of RNA primers required for the replication of lagging strands, is brought about by DNA-dependent RNA polymerases, known as primases. The prototypical bacterial primase is the enzyme encoded by the dnaG gene of Escherichia coli (Kornberg & Baker, 1991). All known bacterial primases, as well as primases from several bacteriophages, are homologous to E. coli primase. Referred to collectively as DnaG-type primases, these are often closely associated with DNA helicases in mobile assemblies known as primosomes, which track the moving replication fork (McMacken & Kornberg, 1978). E. coli primase is a Abbreviations used: MAD, multiwavelength anomalous dispersion; TEV, tobacco etch virus; NCS, non-crystallographic symmetry. E-mail address of the corresponding author: [email protected] 0022-2836/00/020353±10 $35.00/0 65.5 kDa protein (581 residues) that contains three functionally distinct regions that are separable by proteolysis. The N-terminal region (12 kDa, residues 1-110) contains a zinc-binding domain that is required for primase function, potentially because of a role in recognizing singlestranded DNA. The crystal structure of this domain has been determined (Pan & Wigley, 2000). A central catalytic domain (36 kDa, residues 111-433) is responsible for the catalysis of RNA synthesis, and a C-terminal fragment (residues 434-581) is involved in interactions between the primase and the helicase (Tougu & Marians, 1996; Tougu et al., 1994). Primase activity requires both the N-terminal zinc-binding domain and the central catalytic domain. Bacterial and bacteriophage primases of the DnaG family have no apparent relationship to eukaryotic and archaebacterial primases, or to any other RNA or DNA polymerases (Aravind et al., 1998; Leipe et al., 1999). Unexpectedly, iterative sequence pro®le searches using the program PSIBLAST and the E. coli primase sequence as a starting input revealed a potential structural relationship between a central region (100 residues) of the catalytic domain of DnaG-type primases and otherwise unrelated proteins such as certain DNA topoisomerases, several nucleases and proteins involved in recombinational repair (Aravind et al., # 2000 Academic Press 354 1998). This shared region was named the TOPRIM domain (for topoisomerase/primase). The level of sequence similarity upon which the presence of TOPRIM domains in bacterial primases was postulated is relatively low. The TOPRIM domains in primases are smaller than those in topoisomerases (80 residues instead of 120 residues), and only 10 % of the residues in the TOPRIM domains of known structure (all of which are in topoisomerases) are preserved identically in any one of the primase sequences. We have used X-ray crystallography to determine the crystal structure of the catalytic core of E. coli primase. Embedded within the catalytic domain is a TOPRIM fold that is similar to that seen in the topoisomerases, con®rming the striking conservation of this domain among functionally different replication proteins. The structural similarity between the TOPRIM domains of primase and topoisomerase VI (Nichols et al., 1999) allows us to identify a metal-binding site in the primase catalytic domain, and thereby to locate the likely site for nucleotide addition to a growing primer chain. Combined with analysis of surface features of the primase this has enabled a hypothesis to be generated regarding the interaction of the single- Structure of E. coli Primase Catalytic Core stranded DNA template and the nascent RNADNA duplex with the primase. Our conclusions are in general agreement with an independent report of the structure of the catalytic core of E. coli primase that was published at the same time as the submission of this paper (Keck et al., 2000). Results and Discussion General features of the structure The boundaries of the catalytic core of E. coli primase were de®ned by subjecting the full-length protein to limited proteolytic digestion with trypsin, as described (Tougu et al., 1994), and characterizing the resulting fragments by N-terminal sequencing and mass spectrometry (Figure 1(a); data not shown). A stable fragment comprising residues 111-429 was identi®ed, corresponding to the central catalytic region of primase (Mustaev & Godson, 1995). The structure of E. coli primase catÊ resolution alytic core was determined at 2.9 A using multiwavelength anomalous dispersion (MAD) and a single crystal of selenomethionyl substituted protein (Table 1). The crystallographic asymmetric unit contains ®ve molecules of primase, which are arranged in a Figure 1. The structure of the E. coli primase catalytic domain. (a) A diagram of the domain organization of fulllength primase. The residue numbers corresponding to the domain boundaries in the E. coli protein are indicated. (b) Ribbon representation of the structure of the primase catalytic domain. The orientation shown here is similar to that used in most of the Figures in the paper. The b strands are marked with the red numbers (1-12) and a-helices with black letters (A-N). The circles indicate the boundaries of the TOPRIM sub-domain. 355 Structure of E. coli Primase Catalytic Core Table 1. Data collection and re®nement statistics A. MAD phasing Energies (eV) No. of reflections (total/unique) Completeness (%) l1 12,570 273,234/40,874 l2 12,660 276,321/40,801 l3 12,664 279,034/40,836 l4 12,845 279,399/40,704 Ê -2.9 A Ê ) (centric/acentric)0.51/0.58 Mean overall figure of merit (15.0 A B. Refinement and stereochemical statistics R-value (%) Free R-value (%) Ê 2) Average B-factorsa (A Main-chain Side-chains Whole RMS deviations Ê) Bonds (A Angles (deg.) 97.6 97.9 98.0 98.5 (90.8) (93.2) (93.7) (97.4) Rsym 5.8 5.8 7.5 6.2 (20.7) (21.2) (23.4) (22.2) 21.9 27.6 46.81 50.95 48.82 0.0114 1.414 Rsym 100 jI ÿ hIij/I, where I is the integrated intensity of a given re¯ection. For Rsym and completeness, numbers in parentheses refer to data in the highest resolution shell. Figure of merit hjP(a)eia/P(a)ji, where a is the phase and P(a) is the phase probability distribution. R-value jFp ÿ Fp(calc)j/Fp, where Fp is the structure factor amplitude. The free R-value is the R-value for a 10 % subset of the data that was not included in the crystallographic re®nement. a Average B-factors for the ®ve molecules in the asymmetric unit. helical spiral. The ®ve molecules differ slightly in their inter-domain orientation, but the structures of the individual domains are virtually unchanged. This assembly is unlikely to be of physiological signi®cance, but it is of interest, since it originates from the interaction of an arginine residue sidechain (residue 299) in one protein with the putative metal-binding site of another. This arrangement positions the guanidium group of the arginine residue such that it mimics the metal observed in the TOPRIM domain of topoisomerase VI (not shown). Our attempts to soak nucleotides into this crystal form have been unsuccessful, almost certainly because the addition of divalent metal ions will disrupt this interaction between molecules in the crystal. The polypeptide chain of the catalytic domain folds into three consecutive sub-domains, with no cross-back connections. The TOPRIM sub-domain is at the center of the structure, and is ¯anked on one side by an a/b N-terminal domain that contains an anti-parallel b-sheet and several helices, and on the other side by a helical C-terminal subdomain (Figure 1(b)). The overall shape of the molecule is that of a crescent, with the conserved acidic residues of the TOPRIM domain being located at the base of a concave depression on the inner surface of the crescent. The high degree of sequence conservation that is seen between various DnaG-type primases indicates that the structure described here will be a good model for the catalytic domains of the DnaG family members (Figure 2). The TOPRIM domain Type I and type II DNA topoisomerases are fundamentally different from each other in terms of their molecular architecture and their enzymatic mechanisms (Lima et al., 1994; Berger et al., 1996; Nichols et al., 1999). Nevertheless, the three-dimensional structures of both type IA and type II topoisomerases have a small core region in common, which corresponds to the TOPRIM domain (Aravind et al., 1998; Berger et al., 1998). The fold of the TOPRIM domains in these proteins resembles a Rossmann-like nucleotide-binding fold, with a central b-sheet formed by four parallel b-strands, ¯anked by three a-helices. Both type IA and type II topoisomerases contain a tyrosine residue that forms a covalent linkage with DNA during one stage of the topoisomerase reaction cycle. This tyrosine residue is not part of the TOPRIM domain, but is positioned near it during one or more of the conformational intermediates that occur during DNA strand exchange. As in the comparison between the structures of type IA and type II topoisomerases, the architecture of the catalytic domain of primase is similar to that of the topoisomerases only in this restricted region (Figure 3(a)). Based on the structural elements that are common to these proteins, the TOPRIM domain in primase is de®ned to extend from residues 259 to 341. The TOPRIM fold in primases is more appropriately referred to as a subdomain that is embedded within the larger catalytic domain, since it has extensive hydrophobic interfaces with the N and C-terminal sub-domains. The structural elements that make up the TOPRIM sub-domain include b-strands 8-11, which form a parallel b-sheet, and helices aG, aH and aI, which pack against this sheet in a manner that resembles the Rossmann fold. The inferred location of the active site of the primase, which is within the TOPRIM domain (see below), overlaps closely with the binding site for dinucleotide phosphates in 356 Structure of E. coli Primase Catalytic Core Figure 2. Sequence alignment of the catalytic domains of bacterial primases of the DnaG family. Ec, E. coli; St, Salmonella typhimurium; Hi, Haemophilus in¯uenzae; Bs, Bacillus subtilis; Lm, Listeria monocytogenes; Rp, Rickettsia prowazakeii. The residues are highlighted according to their similarity, from white (40 % or lower sequence similarity) to dark green for 100 % similarity. The numbering is of the E. coli protein. Several regions of sequence conservation that had been noted previously (but not discussed further here) are indicated by boxes. Cyan boxes designate the conserved motifs in bacterial and bacteriophage primases (Ilyina et al., 1992). The so-called RNAP (RNA-polymerase) region is marked by a pink box, but note that the structure presented here reveals no similarity to other RNA polymerases. The bacterial signature sequence (Versalovic & Lupski, 1993) is indicated by a magenta box. The conserved acidic residues are marked by black stars. The secondary structure elements for E. coli primase are represented by cylinders for helices and arrows for strands. The elements colored in red represent the TOPRIM region. Alignments were obtained using CLUSTALX (Thompson et al., 1997) and edited by hand to match the alignment of (Ilyina et al., 1992). The sequence similarity levels used for coloring the alignment were calculated by averaging the similarity scores at each position of all possible pairs of sequences (David Jeruzalmi, unpublished software). The level of equivalence of non-identical residues was established by use of the BLOSUM62 amino acid substitution matrix (Henikoff & Henikoff, 1993). Rossman fold-containing proteins, such as alcohol dehydrogenases (not shown). When the TOPRIM sub-domain of primase is aligned structurally with the corresponding regions of topoisomerases IA, II and VI the rms deviation Ê within the a helices and in Ca positions is 2.0 A b-strands. This is a relatively high degree of structural conservation, given that the level of sequence identity between the primase and topoisomerase TOPRIM domains is low. While only 10 % of the residues within individual members of the larger topoisomerase TOPRIM domains are identical in primase, the fractional sequence identity is higher when the smaller primase sequence is used as a reference, with 18 %, 14 % and 16 % of the residues in the primase TOPRIM domain being conserved identically in topoisomerase IA, topoisomerase II and topoisomerase VI, respectively. Only ®ve residues are strictly conserved across all TOPRIM sequences (Figure 3(c)). Two of these are glycine residues that are likely to play a structural role. The other three are acidic residues Structure of E. coli Primase Catalytic Core 357 Figure 3. Structural alignment of the TOPRIM domains. (a) Backbone structure of the TOPRIM domains. The individual domains are indicated as follows: primase, E. coli primase; TopoI, E. coli topoisomerase I (Lima et al., 1994); TopoII, yeast topoisomerase II (Berger et al., 1996); TopoVI, M. jannaschii topoisomerase VI (Nichols et al., 1999). The alignment was performed by the DALI server (Holm & Sander, 1993). (b) The conserved acidic side-chains in the TOPRIM domains, colored as in (a). (c) Sequence alignment based on the structural alignment of TOPRIM domains. (Glu265, Asp309 and Asp311 in E. coli primase) that are present in two conserved sequence motifs. The importance of these acidic residues has been documented for the primases (Dracheva et al., 1995; Strack et al., 1992), as well as for topoisomerases (Chen & Wang, 1998; Zhu et al., 1998). The manner in which TOPRIM domains interact with substrates is unknown. One clue is provided by the structure of topoisomerase VI, a type II topoisomerase, in which Mg2 is seen to be coordinated by the three conserved acidic residues of the TOPRIM domain (Nichols et al., 1999). The acidic 358 side-chains of all the TOPRIM domains of known structure superimpose closely (Figure 3(b)), and it is likely that they all share a conserved metal-binding function. However, the Rossmann-fold architecture of the TOPRIM domain, and the manner in which it coordinates metal ions does not resemble any feature of DNA or RNA polymerases of known structure. The primase catalytic site and implications for DNA/RNA binding The structure of the primase catalytic core has been determined in the absence of any substrates. Nevertheless, several features of the structure reveal the likely location of the polymerase active site. A large groove is formed at the center of the concave surface of the catalytic core (Figure 4). This groove is encircled by residues that are very highly conserved among DnaG-type primases, and these constitute the only such patch of extremely high sequence conservation on the surface of the molecule (Figure 4(a)). In addition to the sequence conservation, electrostatic features of this surface groove are strongly suggestive of how the DNA template and the RNA-DNA hybrid duplex might be bound by the primase catalytic core. One corner of this groove contains several conserved acidic residues, including the three that are Structure of E. coli Primase Catalytic Core characteristic of the TOPRIM domain (Figures 1 and 3). While our structure of primase contains no bound metal ions, we infer that these three residues will coordinate divalent metal ions based upon the similarity to the structure of topoisomerase VI, which does have metal ion bound to its active site (Nichols et al., 1999) (Figure 3(b)). It is known that primases require Mg2 for their activity (Urlacher & Griep, 1995), and it is likely that the polymerization reaction is catalyzed in a metal-facilitated manner, as is the case for other DNA and RNA polymerases (Steitz, 1998) as well as for the topoisomerases IA and topoisomerases II (Wang, 1996; Zhu et al., 1998). The location of the three conserved acidic residues thus allows us to localize the site where the phosphate transfer reaction is likely to occur in primase (Figure 4(b)). It might be expected that the catalytic center of primase would contain binding sites for one or more nucleotides that act as substrates for the formation of the ®rst phosphodiester linkage in the nascent RNA strand. A notable feature of the primase surface is a relatively deep and largely hydrophobic pocket that is adjacent to the putative metal binding site and that is ringed by very highly conserved side-chains (Figure 4(a)). This pocket is large enough to accommodate a purine base (Figure 5(b)). The location of the pocket is Figure 4. Molecular surface of the primase catalytic domain. (a) Distribution of conserved residues among the DnaG-family of primases. The surface is colored in ramp from white to green, with white representing sequence identity (calculated as described in the legend to Figure 2) of 40 % or lower, and darkest green corresponding to regions with 100 % sequence identity, according to the sequence similarity (40-100 %) calculated as in Figure 2. (b) The electrostatic potential mapped onto the molecular surface, calculated using GRASP (Nicholls et al., 1991). The green sphere represents the location of the Mg2 in the structure of the TOPRIM domain of topoisomerase VI, which was superimposed onto the TOPRIM domain of primase in order to position the metal ion (Nichols et al., 1999). The black circle marks the potential site of catalysis and the yellow ellipses mark potential binding sites for the phosphate backbone of DNA or RNA (see Figure 5(a)). Both surfaces are in the standard view (shown in Figure 1). Structure of E. coli Primase Catalytic Core 359 Figure 5. Potential modes of interaction between the primase and nucleic acids. (a) A speculative model for how a newly synthesized RNA-DNA hybrid might interact with the primase. The molecular surface of the primase catalytic domain is shown, colored according to electrostatic potential as in Figure 4(b). An A-form double helix has been placed so as to locate the end of one strand near the metal-binding site, indicated by the yellow ellipse. The shape of the central groove is such that the double helix can extend only up or down along the surface. The groove narrows towards the top, and so the helix was extended downwards. Note that the phosphate backbone of the template strand (blue) runs close to surface regions with positive electrostatic potential (blue). The singlestranded template is extended upwards (broken line) along a region of positive electrostatic potential, generated by highly conserved residues (see Figure 4(a)). The model shown here is not based on any kind of energetic analysis, but represents instead a simple hypothesis that awaits experimental testing. (b) A potential binding site for nucleotides in the TOPRIM domain. A cross-sectional view of the primase is shown, rotated approximately 90o with respect to the view in (a). A deep pocket, lined by conserved surface residues (see Figure 4(a)) can be seen. This pocket is adjacent to the metal-binding site, which is directly behind it in this view. The pocket is large enough to accommodate a purine base, and an adenine residue is shown here merely to illustrate this fact. If such an interaction were to correspond to an actual binding mode for an adenine nucleotide in the initiation of primer synthesis, it would have to swing out of this pocket in order to base-pair with the complementary base in the template strand. such that the phosphate groups of the nucleotide bound at the site might be able to coordinate metal ions at the adjacent acidic site. Such an interaction would require that the base swing out of the pocket in order to base-pair with the complementary nucleotide in the template strand. In order to visualize how DNA and RNA strands might interact with the surface of primase, we carried out the following simple modeling exercise (Figure 5(a)). An A-form double helix, corresponding to an RNA-DNA hybrid, was placed on the surface of the primase such that one of the phosphate linkages in one strand was close to the metal-binding site (this marks the site of nucleotide addition to the growing RNA strand). It was immediately apparent that the surface groove is so constructed that the path of the duplex is constrained to move away from the active site essen- tially in one direction only. Upward extension of the double helix (in the orientation of Figure 5(a)) is prevented by a narrowing of the surface groove due to the juxtaposition of elements from the Nterminal and TOPRIM sub-domains. Lateral movements of the helix are restricted by the anti-parallel b-sheet of the N-terminal domain on one side, and elements of the TOPRIM domain on the other. These form a cup-shaped depression into which the A-form double helix can be placed nicely. The side of the groove opposite to the metalbinding site is lined by several conserved basic residues, and these appear to be positioned for favorable interaction with the phosphate backbone of the template DNA strand in the model (Figure 5(a)). Residues in the upper patch could interact with single-stranded DNA, which is proposed to extend upwards, away from the active 360 site. The N-terminal residue of the catalytic core is on the surface that is distal to the proposed activesite groove (Figure 1(b)). This suggests that the zinc-binding domain, which is thought to interact with single-stranded DNA (Pan & Wigley, 2000), might be placed behind the catalytic core (with reference to the standard orientation in Figure 5(a)). It is not known how the domain interacts with DNA, and whether its interaction with the catalytic domain places any particular constraints on the direction of approach of the template. Primases usually synthesize short RNA primers, that are often less than ten nucleotides long (Kornberg & Baker, 1991). Thus, one may expect that certain features of the structure of the catalytic domain of the primase may have a bearing on the length of primer synthesis. The proposed interaction between the DNA-RNA duplex and the surface of the catalytic domain of the primase involves a small area, and therefore does not seem likely to confer a high level of processivity upon the primase. The interaction between the primase and the newly synthesized primer is limited to less than about ten nucleotides, and there are no apparent mechanisms for trapping the duplex on the surface. As a consequence, it is likely that the primer may disengage from the template after the synthesis of just a few nucleotides. The interaction of primase with the DnaB helicase may help primase remain associated with DNA, but such an interaction could have consequences for the lengths of the primers that are synthesized, as discussed below. One intriguing consequence of the predicted interaction between the DNA-RNA duplex and the primase concerns potential restrictions on the continued synthesis of RNA by primase. Our model suggests that the hybrid DNA-RNA duplex is extruded by the catalytic domain along the path indicated in Figure 5(a) such that it emerges from the surface of the primase catalytic domain more or less perpendicular to a rather ¯at edge of the domain. This ¯at surface might abut other proteins of the replication complex, and thereby affect the ability of primase to continue synthesis. In many cases, primases are recruited to single-stranded DNA by helicases, such as the DnaB helicase of E. coli, and the primases interact with the helicases either covalently or non-covalently via the C-terminal region of the primase catalytic domain. The positioning of a helicase below the primase (with reference to the orientation shown in Figure 5) might enforce a length restriction on the newly synthesized primer that is set by the extent to which the linkage between the helicase and the primase can ¯ex. The geometry of the predicted interaction between the RNA-DNA duplex and the surface of the primase catalytic domain is such that up to ten nucleotides of the primer could potentially emerge from the primase active site without running into a protein that would abut the lower edge of the primase. Structure of E. coli Primase Catalytic Core Conclusions Primases are DNA-dependent RNA polymerases, and it seems remarkable to us that their catalytic centers are closely related to that of topoisomerases and not to RNA or DNA polymerases. Almost all DNA polymerases (as well as some RNA polymerases) share a common architecture at their catalytic centers, which involves the presentation of metal-coordinating acidic residues by a bsheet platform known as the palm (Steitz, 1998). The structure of a bacterial RNA polymerase has been determined, and while this has a metalcoordination site at its catalytic center, the organization of the protein is unrelated to either the DNA polymerases or to primase (Zhang et al., 1999). The connection between primases and topoisomerases which is established by the identi®cation of TOPRIM domains in these two crucial sets of replication proteins suggests that diverse proteins involved in replication and repair, including the primase RNA polymerases, have originated from a common ancestor that is unrelated to DNA polymerases and the transcriptional RNA polymerases. The impressive accuracy of the sequence comparisons by Koonin and co-workers (Aravind et al., 1998), which ®rst showed the commonality of the TOPRIM domain, bodes well for the ¯ow of insights from the further elucidation of highly divergent patterns of protein evolution to continue. In conclusion, the structure of the catalytic core of E. coli primase shows a novel fold for a nucleic acid polymerase. A centrally placed Rossman fold, the TOPRIM domain, helps form a surface groove that we have tentatively identi®ed as the channel for binding double-helical RNA-DNA hybrid. The groove includes a putative active-site cleft that contains a conserved cluster of acidic residues, suggesting that metal ions play a crucial role in the phosphotransfer reaction. It is anticipated that the structural results reported here will enable rapid progress to now be made in obtaining a complete understanding of the mechanism of primase action. Materials and Methods Purification and crystallization The catalytic core of E. coli primase (residues 111 to 429) was sub-cloned into the pPROEXHTa expression vector, using the SfoI and NotI restriction sites, and expressed in E. coli BL21 cells. The resulting protein contains a histidine tag connected to the N terminus of the primase domain by a cleavage site for the tobacco etch virus (TEV) protease (Parks et al., 1994). The cells were lysed in 20 mM Tris-HCl (pH 8.0), 300 mM NaCl, 10 mM imidazole, containing 5 mM b-mercaptoethanol and 1 mM PMSF, using an Emulsi¯ex French press (Avestin). The soluble lysate was applied to Ni-NTA (Qiagen) column to isolate the histidine-tagged fragment. After the treatment with TEV protease, the tag-free protein was ¯owed over the Ni-NTA column again to remove uncleaved fusion protein, and was further puri- 361 Structure of E. coli Primase Catalytic Core ®ed by gel ®ltration over a Superdex S75 column (Pharmacia) equilibrated in 20 mM Tris-HCl (pH 7.5), 50 mM NaCl, and 1 mM DTT. Crystals of the catalytic core were prepared using hanging drop vapor diffusion by mixing equal volumes of a 15 mg/ml protein solution and a reservoir solution containing 16-19 % PEG 4000 (Fluka), 0.053-0.063 mM Tris (pH 7.9-8.5) and 0.11-0.13 M sodium acetate, and then equilibrating over 0.5 ml reservoir at 20 C for about one week. Crystals form in space group P212121 Ê , b 107.6 A Ê, with unit cell dimensions of a 62.7 A Ê . There are ®ve molecules of p36 fragment c 263.1 A per asymmetric unit, with a solvent content of approximately 50 %. A selenomethionine-substituted primase fragment (with nine methionine residues per molecule) was also produced and crystallized as for the native protein, except for an additional 5 mM DTT in the crystallization solution. Complete incorporation of selenomethionine was con®rmed by mass spectrometry (data not shown). Structure determination Crystals of the catalytic core were transferred to stabilization solution containing 22.35 % PEG 4000, 0.075 M Tris (pH 8.3), 0.15 M sodium acetate, 5 mM DTT and 20 % (v/v) ethlyene glycol. Multiwavelength anomalous dispersion (MAD) data were collected on one selenomethionyl-derivatized crystal at four wavelengths at beamline X4A of the National Synchrotron Light Source (Brookhaven, NY) (Table 1). All data were integrated and reduced with DENZO/SCALEPACK programs (Otwinowski & Minor, 1997). The scaled and reduced intensity data were converted to amplitudes using TRUNCATE (CCP4, 1994), and cross-wavelength scaling was performed using FHSCALE (CCP4, 1994), treating l2 as native (Table 1). Out of 45 selenium sites, 22 were found by SOLVE (Terwilliger & Berendzen, 1995) and the rest by difference Fourier maps. Phase calculation and heavy-atom re®nement were performed in MLPHARE (CCP4, 1994), and map modi®cations were carried out using DM (CCP4, 1994) and MAMA (Kleywegt & Read, 1998). Automated solvent ¯attening resulted in an electron density map of exceptional quality into which a nearly complete model could be built using the program O (Jones et al., 1991). The primase catalytic core consists of three domains (residues 120-241, 242-367, 368-415), and the inter-domain orientations are slightly, but signi®cantly, different in each of the ®ve molecules in the asymmetric unit. Re®nement of the model by CNS (BruÈnger et al., 1998) proceeded by utilizing non-crystallographic symmetry (NCS) restraints for each of the three domains separately. The Ê , at l1, for the selenostructure factors measured to 2.9 A methionine crystals were used as the native data set for re®nement. The ®nal re®ned model has a crystallographic R-value of 21.9 %, a free R-value of 27.6 % and is well de®ned for the residues 113-421. No electron density could be seen for the residues 111, 112 and 429, or for the residues 177, 193, 237, 287, 288, 364 and 422-428, only the main chain could be traced. The geometry was analyzed using PROCHECK (Laskowski et al.,1993), with 88 % of the residues lying in the most favored regions of the Ramachandran plot, and no residues in disallowed regions. Protein Data Bank accession number The coordinates have been deposited with the RCSB Protein Data Bank and have the accession code 1EQN. Acknowledgments We thank Jeffrey Bonanno, David Jeruzalmi and Hiroto Yamaguchi for valuable assistance and discussions. We thank the staff at the X4A beamline at the National Synchrotron Light Source for assistance with synchrotron data collection. This work was supported in part by a grant from National Institutes of Health (GM38839; MO). References Aravind, L., Leipe, D. D. & Koonin, E. V. (1998). Toprim ± a conserved catalytic domain in type IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins. Nucl. Acids Res. 26, 4205-4213. Berger, J. M., Fass, D., Wang, J. C. & Harrison, S. C. (1998). Structural similarities between topoisomerases that cleave one or both DNA strands. Proc. Natl Acad. Sci. USA, 95, 7876-7881. Berger, J. M., Gamblin, S. J., Harrison, S. C. & Wang, J. C. (1996). Structure and mechanism of DNA topoisomerase. Nature, 379, 225-232. BruÈnger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J. S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Crystallography and NMR system: a new software suite for macromolecular structure determination. Acta Crystallog. sect. D, 54, 905-921. Callaborative Computational Project No. 4 (1994). The CCP4 suite: programs for protein crystallography. Acta Crystallog. sect. D, 50, 760-763. Chen, S. J. & Wang, J. C. (1998). Identi®cation of activesite residues in Escherichia coli DNA topoisomerase I. J. Biol. Chem. 273, 6050-6056. Dracheva, S., Koonin, E. V. & Crute, J. J. (1995). Identi®cation of the primase active site of the herpes simplex virus type 1 helicase-primase. J. Biol. Chem. 270, 14148-14153. Henikoff, S. & Henikoff, J. G. (1993). Performance evaluation of amino acid substitution matrices. Proteins: Struct. Funct. Genet. 17, 49-61. Holm, L. & Sander, C. (1993). Protein structure comparison by alignment of distance matrices. J. Mol. Biol. 233, 123-138. Ilyina, T. V., Gorbalenya, A. E. & Koonin, E. V. (1992). Organization and evolution of bacterial and bacteriophage primase-helicase systems. J. Mol. Evol. 34, 351-357. Jones, T. A., Zou, J. Y., Cowan, S. W. & Kjeldgaard, M. (1991). Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallog sect. A, 47, 110-119. Keck, J. L., Roche, D. D., Lynch, A. S. & Berger, J. M. (2000). Structure of the RNA polymerase domain of E. coli primase. Science, 287, 2482-2486. Kleywegt, G. J. & Read, R. J. (1998). Not your average density. Structure, 5, 1557-1569. Kornberg, A. & Baker, T. A. (1991). DNA Replication, 2nd edit., W.H. Freeman and Company, New York. 362 Structure of E. coli Primase Catalytic Core Laskowski, R. A., MacArthur, M. W., Moss, D. S. & Thornton, J. M. (1993). PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallog. 26, 283-291. Leipe, D. D., Aravind, L. & Koonin, E. V. (1999). Did DNA replication evolve twice independently? Nucl. Acids Res. 27, 3389-3401. Lima, C. D., Wang, J. C. & Mondragon, A. (1994). Three-dimensional structure of the 67 K N-terminal fragment of E. coli DNA topoisomerase I. Nature, 367, 138-146. McMacken, R. & Kornberg, A. (1978). A multienzyme system for priming the replication of phiX174 viral DNA. J. Biol. Chem. 253, 3313-3319. Mustaev, A. A. & Godson, G. N. (1995). Studies of the functional topography of the catalytic center of Escherichia coli primase. J. Biol. Chem. 270, 1571115718. Nicholls, A., Sharp, K. A. & Honig, B. (1991). Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons. Proteins: Struct. Funct. Genet. 11, 281-296. Nichols, M. D., DeAngelis, K., Keck, J. L. & Berger, J. M. (1999). Structure and function of an archaeal topoisomerase VI subunit with homology to the meiotic recombination factor Spo11. EMBO J. 18, 6177-6188. Otwinowski, Z. & Minor, W. (1997). Processing of X-ray diffraction data collected in oscillation mode. Methods Enzymol. 276, 307-326. Pan, H. & Wigley, D. B. (2000). Structure of the zincbinding domain of Bacillus stearothermophilus DNA primase. Structure, 8, 231-239. Parks, T. D., Leuther, K. K., Howard, E. D., Johnston, S. A. & Dougherty, W. G. (1994). Release of proteins and peptides from fusion proteins using a recombinant plant virus proteinase. Annu. Rev. Biochem. 216, 413-417. Steitz, T. A. (1998). A mechanism for all polymerases. Nature, 391, 231-232. Strack, B., Lessl, M., Calendar, R. & Lanka, E. (1992). A common sequence motif, -E-G-Y-A-T-A-, ident- i®ed within the primase domains of plasmidencoded I- and P-type DNA primases and the alpha protein of the Escherichia coli satellite phage P4. J. Biol. Chem. 267, 13062-13072. Terwilliger, T. C. & Berendzen, J. (1995). Difference re®nement: obtaining differences between two related structures. Acta. Crystallog. sect. D, 51, 609618. Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. & Higgins, D. G. (1997). The CLUSTAL X windows interface: ¯exible strategies for multiple sequence alignment aided by quality analysis tools. Nucl. Acids Res. 25, 4876-4882. Tougu, K. & Marians, K. J. (1996). The interaction between helicase and primase sets the replication fork clock. J. Biol. Chem. 271, 21398-21405. Tougu, K., Peng, H. & Marians, K. J. (1994). Identi®cation of a domain of Escherichia coli primase required for functional interaction with the DnaB helicase at the replication fork. J. Biol. Chem. 269, 4675-4682. Urlacher, T. M. & Griep, M. A. (1995). Magnesium acetate induces a conformational change in Escherichia coli primase. Biochemistry, 34, 16708-16714. Versalovic, J. & Lupski, J. R. (1993). The Haemophilus in¯uenzae dnaG sequence and conserved bacterial primase motifs. Gene, 136, 281-286. Wang, J. C. (1996). DNA topoisomerases. Annu. Rev. Biochem. 65, 635-692. Zhang, G., Campbell, E. A., Minakhin, L., Richter, C., Severinov, K. & Darst, S. A. (1999). Crystal structure of Thermus aquaticus core RNA polymerase at Ê resolution. Cell, 98, 811-824. 3.3 A Zhu, C. X., Roche, C. J., Papanicolaou, N., DiPietrantonio, A. & Tse-Dinh, Y. C. (1998). Sitedirected mutagenesis of conserved aspartates, glutamates and arginines in the active site region of Escherichia coli DNA topoisomerase I. J. Biol. Chem. 273, 8783-8789. Edited by P. E. Wright (Received 3 April 2000; received in revised form 28 April 2000; accepted 28 April 2000)