Download A TOPRIM Domain in the Crystal Structure of the Catalytic Core of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsatellite wikipedia , lookup

DNA nanotechnology wikipedia , lookup

Helicase wikipedia , lookup

DNA replication wikipedia , lookup

DNA polymerase wikipedia , lookup

Helitron (biology) wikipedia , lookup

Replisome wikipedia , lookup

Transcript
doi:10.1006/jmbi.2000.3844 available online at http://www.idealibrary.com on
J. Mol. Biol. (2000) 300, 353±362
A TOPRIM Domain in the Crystal Structure of the
Catalytic Core of Escherichia coli Primase Confirms a
Structural Link to DNA Topoisomerases
Marjetka Podobnik1,3, Peter McInerney2, Mike O'Donnell2
and John Kuriyan1*
1
Department of Biochemistry
and Molecular Biology, Jozef
Stefan Institute, Slovenia
Primases synthesize short RNA strands on single-stranded DNA templates, thereby generating the hybrid duplexes required for the initiation
of synthesis by DNA polymerases. We present the crystal structure of the
catalytic unit of a primase enzyme, that of a 320 residue fragment of
Ê resolution. Central to the catEscherichia coli primase, determined at 2.9 A
alytic unit is a TOPRIM domain that is strikingly similar in its structure
to that of corresponding domains in DNA topoisomerases, but is unrelated to the catalytic centers of other DNA or RNA polymerases. The catalytic domain of primase is crescent-shaped, and the concave face of the
crescent is predicted to accommodate about 10 base-pairs of RNA-DNA
duplex in a loose interaction, thereby limiting processivity.
*Corresponding author
Keywords: DNA replication; RNA-polymerase; primase; TOPRIM;
topoisomerase
Laboratories of Molecular
Biophysics and
2
Laboratory of DNA
Replication, Howard Hughes
Medical Institute, The
Rockefeller University, New
York, NY 10021, USA
3
# 2000 Academic Press
Introduction
None of the known DNA polymerases is able to
initiate DNA synthesis by using single-stranded
DNA as templates. Instead, DNA replication relies
on a variety of mechanisms to generate short primer regions, typically RNA-DNA hybrid duplexes,
that then serve as initiation sites for DNA synthesis
by DNA polymerases. During chromosomal replication, in particular, the repeated generation of
RNA primers required for the replication of lagging strands, is brought about by DNA-dependent
RNA polymerases, known as primases.
The prototypical bacterial primase is the enzyme
encoded by the dnaG gene of Escherichia coli
(Kornberg & Baker, 1991). All known bacterial primases, as well as primases from several bacteriophages, are homologous to E. coli primase.
Referred to collectively as DnaG-type primases,
these are often closely associated with DNA helicases in mobile assemblies known as primosomes,
which track the moving replication fork
(McMacken & Kornberg, 1978). E. coli primase is a
Abbreviations used: MAD, multiwavelength
anomalous dispersion; TEV, tobacco etch virus; NCS,
non-crystallographic symmetry.
E-mail address of the corresponding author:
[email protected]
0022-2836/00/020353±10 $35.00/0
65.5 kDa protein (581 residues) that contains
three functionally distinct regions that are separable by proteolysis. The N-terminal region
(12 kDa, residues 1-110) contains a zinc-binding
domain that is required for primase function,
potentially because of a role in recognizing singlestranded DNA. The crystal structure of this
domain has been determined (Pan & Wigley,
2000). A central catalytic domain (36 kDa, residues 111-433) is responsible for the catalysis of
RNA synthesis, and a C-terminal fragment (residues 434-581) is involved in interactions between
the primase and the helicase (Tougu & Marians,
1996; Tougu et al., 1994). Primase activity requires
both the N-terminal zinc-binding domain and the
central catalytic domain.
Bacterial and bacteriophage primases of the
DnaG family have no apparent relationship to
eukaryotic and archaebacterial primases, or to any
other RNA or DNA polymerases (Aravind et al.,
1998; Leipe et al., 1999). Unexpectedly, iterative
sequence pro®le searches using the program PSIBLAST and the E. coli primase sequence as a starting input revealed a potential structural relationship between a central region (100 residues) of
the catalytic domain of DnaG-type primases and
otherwise unrelated proteins such as certain DNA
topoisomerases, several nucleases and proteins
involved in recombinational repair (Aravind et al.,
# 2000 Academic Press
354
1998). This shared region was named the TOPRIM
domain (for topoisomerase/primase). The level of
sequence similarity upon which the presence of
TOPRIM domains in bacterial primases was postulated is relatively low. The TOPRIM domains in
primases are smaller than those in topoisomerases
(80 residues instead of 120 residues), and only
10 % of the residues in the TOPRIM domains of
known structure (all of which are in topoisomerases) are preserved identically in any one of the
primase sequences.
We have used X-ray crystallography to determine the crystal structure of the catalytic core of
E. coli primase. Embedded within the catalytic
domain is a TOPRIM fold that is similar to that
seen in the topoisomerases, con®rming the striking
conservation of this domain among functionally
different replication proteins. The structural similarity between the TOPRIM domains of primase
and topoisomerase VI (Nichols et al., 1999) allows
us to identify a metal-binding site in the primase
catalytic domain, and thereby to locate the likely
site for nucleotide addition to a growing primer
chain. Combined with analysis of surface features
of the primase this has enabled a hypothesis to be
generated regarding the interaction of the single-
Structure of E. coli Primase Catalytic Core
stranded DNA template and the nascent RNADNA duplex with the primase. Our conclusions
are in general agreement with an independent
report of the structure of the catalytic core of E. coli
primase that was published at the same time as the
submission of this paper (Keck et al., 2000).
Results and Discussion
General features of the structure
The boundaries of the catalytic core of E. coli primase were de®ned by subjecting the full-length
protein to limited proteolytic digestion with trypsin, as described (Tougu et al., 1994), and characterizing the resulting fragments by N-terminal
sequencing and mass spectrometry (Figure 1(a);
data not shown). A stable fragment comprising
residues 111-429 was identi®ed, corresponding to
the central catalytic region of primase (Mustaev &
Godson, 1995). The structure of E. coli primase catÊ resolution
alytic core was determined at 2.9 A
using multiwavelength anomalous dispersion
(MAD) and a single crystal of selenomethionyl
substituted protein (Table 1).
The crystallographic asymmetric unit contains
®ve molecules of primase, which are arranged in a
Figure 1. The structure of the E. coli primase catalytic domain. (a) A diagram of the domain organization of fulllength primase. The residue numbers corresponding to the domain boundaries in the E. coli protein are indicated. (b)
Ribbon representation of the structure of the primase catalytic domain. The orientation shown here is similar to that
used in most of the Figures in the paper. The b strands are marked with the red numbers (1-12) and a-helices with
black letters (A-N). The circles indicate the boundaries of the TOPRIM sub-domain.
355
Structure of E. coli Primase Catalytic Core
Table 1. Data collection and re®nement statistics
A. MAD phasing
Energies (eV)
No. of reflections (total/unique)
Completeness (%)
l1 12,570
273,234/40,874
l2 12,660
276,321/40,801
l3 12,664
279,034/40,836
l4 12,845
279,399/40,704
Ê -2.9 A
Ê ) (centric/acentric)ˆ0.51/0.58
Mean overall figure of merit (15.0 A
B. Refinement and stereochemical statistics
R-value (%)
Free R-value (%)
Ê 2)
Average B-factorsa (A
Main-chain
Side-chains
Whole
RMS deviations
Ê)
Bonds (A
Angles (deg.)
97.6
97.9
98.0
98.5
(90.8)
(93.2)
(93.7)
(97.4)
Rsym
5.8
5.8
7.5
6.2
(20.7)
(21.2)
(23.4)
(22.2)
21.9
27.6
46.81
50.95
48.82
0.0114
1.414
Rsym ˆ 100 jI ÿ hIij/I, where I is the integrated intensity of a given re¯ection. For Rsym and completeness, numbers in
parentheses refer to data in the highest resolution shell.
Figure of merit ˆ hjP(a)eia/P(a)ji, where a is the phase and P(a) is the phase probability distribution.
R-value ˆ jFp ÿ Fp(calc)j/Fp, where Fp is the structure factor amplitude. The free R-value is the R-value for a 10 % subset of the
data that was not included in the crystallographic re®nement.
a
Average B-factors for the ®ve molecules in the asymmetric unit.
helical spiral. The ®ve molecules differ slightly in
their inter-domain orientation, but the structures of
the individual domains are virtually unchanged.
This assembly is unlikely to be of physiological signi®cance, but it is of interest, since it originates
from the interaction of an arginine residue sidechain (residue 299) in one protein with the putative
metal-binding site of another. This arrangement
positions the guanidium group of the arginine residue such that it mimics the metal observed in the
TOPRIM domain of topoisomerase VI (not shown).
Our attempts to soak nucleotides into this crystal
form have been unsuccessful, almost certainly
because the addition of divalent metal ions will
disrupt this interaction between molecules in the
crystal.
The polypeptide chain of the catalytic domain
folds into three consecutive sub-domains, with no
cross-back connections. The TOPRIM sub-domain
is at the center of the structure, and is ¯anked on
one side by an a/b N-terminal domain that contains an anti-parallel b-sheet and several helices,
and on the other side by a helical C-terminal subdomain (Figure 1(b)). The overall shape of the molecule is that of a crescent, with the conserved
acidic residues of the TOPRIM domain being
located at the base of a concave depression on the
inner surface of the crescent. The high degree of
sequence conservation that is seen between various
DnaG-type primases indicates that the structure
described here will be a good model for the catalytic domains of the DnaG family members
(Figure 2).
The TOPRIM domain
Type I and type II DNA topoisomerases are fundamentally different from each other in terms of
their molecular architecture and their enzymatic
mechanisms (Lima et al., 1994; Berger et al., 1996;
Nichols et al., 1999). Nevertheless, the three-dimensional structures of both type IA and type II topoisomerases have a small core region in common,
which corresponds to the TOPRIM domain
(Aravind et al., 1998; Berger et al., 1998). The fold
of the TOPRIM domains in these proteins
resembles a Rossmann-like nucleotide-binding
fold, with a central b-sheet formed by four parallel
b-strands, ¯anked by three a-helices. Both type IA
and type II topoisomerases contain a tyrosine residue that forms a covalent linkage with DNA
during one stage of the topoisomerase reaction
cycle. This tyrosine residue is not part of the
TOPRIM domain, but is positioned near it during
one or more of the conformational intermediates
that occur during DNA strand exchange.
As in the comparison between the structures of
type IA and type II topoisomerases, the architecture of the catalytic domain of primase is similar to
that of the topoisomerases only in this restricted
region (Figure 3(a)). Based on the structural
elements that are common to these proteins, the
TOPRIM domain in primase is de®ned to extend
from residues 259 to 341. The TOPRIM fold in primases is more appropriately referred to as a subdomain that is embedded within the larger catalytic domain, since it has extensive hydrophobic
interfaces with the N and C-terminal sub-domains.
The structural elements that make up the TOPRIM
sub-domain include b-strands 8-11, which form a
parallel b-sheet, and helices aG, aH and aI, which
pack against this sheet in a manner that resembles
the Rossmann fold. The inferred location of the
active site of the primase, which is within the
TOPRIM domain (see below), overlaps closely with
the binding site for dinucleotide phosphates in
356
Structure of E. coli Primase Catalytic Core
Figure 2. Sequence alignment of the catalytic domains of bacterial primases of the DnaG family. Ec, E. coli; St, Salmonella typhimurium; Hi, Haemophilus in¯uenzae; Bs, Bacillus subtilis; Lm, Listeria monocytogenes; Rp, Rickettsia prowazakeii. The residues are highlighted according to their similarity, from white (40 % or lower sequence similarity) to dark
green for 100 % similarity. The numbering is of the E. coli protein. Several regions of sequence conservation that had
been noted previously (but not discussed further here) are indicated by boxes. Cyan boxes designate the conserved
motifs in bacterial and bacteriophage primases (Ilyina et al., 1992). The so-called RNAP (RNA-polymerase) region is
marked by a pink box, but note that the structure presented here reveals no similarity to other RNA polymerases.
The bacterial signature sequence (Versalovic & Lupski, 1993) is indicated by a magenta box. The conserved acidic
residues are marked by black stars. The secondary structure elements for E. coli primase are represented by cylinders
for helices and arrows for strands. The elements colored in red represent the TOPRIM region. Alignments were
obtained using CLUSTALX (Thompson et al., 1997) and edited by hand to match the alignment of (Ilyina et al., 1992).
The sequence similarity levels used for coloring the alignment were calculated by averaging the similarity scores at
each position of all possible pairs of sequences (David Jeruzalmi, unpublished software). The level of equivalence of
non-identical residues was established by use of the BLOSUM62 amino acid substitution matrix (Henikoff &
Henikoff, 1993).
Rossman fold-containing proteins, such as alcohol
dehydrogenases (not shown).
When the TOPRIM sub-domain of primase is
aligned structurally with the corresponding regions
of topoisomerases IA, II and VI the rms deviation
Ê within the a helices and
in Ca positions is 2.0 A
b-strands. This is a relatively high degree of structural conservation, given that the level of sequence
identity between the primase and topoisomerase
TOPRIM domains is low. While only 10 % of the
residues within individual members of the larger
topoisomerase TOPRIM domains are identical in
primase, the fractional sequence identity is higher
when the smaller primase sequence is used as a
reference, with 18 %, 14 % and 16 % of the residues
in the primase TOPRIM domain being conserved
identically in topoisomerase IA, topoisomerase II
and topoisomerase VI, respectively.
Only ®ve residues are strictly conserved across
all TOPRIM sequences (Figure 3(c)). Two of these
are glycine residues that are likely to play a structural role. The other three are acidic residues
Structure of E. coli Primase Catalytic Core
357
Figure 3. Structural alignment of the TOPRIM domains. (a) Backbone structure of the TOPRIM domains. The individual domains are indicated as follows: primase, E. coli primase; TopoI, E. coli topoisomerase I (Lima et al., 1994);
TopoII, yeast topoisomerase II (Berger et al., 1996); TopoVI, M. jannaschii topoisomerase VI (Nichols et al., 1999). The
alignment was performed by the DALI server (Holm & Sander, 1993). (b) The conserved acidic side-chains in the
TOPRIM domains, colored as in (a). (c) Sequence alignment based on the structural alignment of TOPRIM domains.
(Glu265, Asp309 and Asp311 in E. coli primase)
that are present in two conserved sequence motifs.
The importance of these acidic residues has been
documented for the primases (Dracheva et al.,
1995; Strack et al., 1992), as well as for topoisomerases (Chen & Wang, 1998; Zhu et al., 1998).
The manner in which TOPRIM domains interact
with substrates is unknown. One clue is provided
by the structure of topoisomerase VI, a type II
topoisomerase, in which Mg2‡ is seen to be coordinated by the three conserved acidic residues of the
TOPRIM domain (Nichols et al., 1999). The acidic
358
side-chains of all the TOPRIM domains of known
structure superimpose closely (Figure 3(b)), and it
is likely that they all share a conserved metal-binding function. However, the Rossmann-fold architecture of the TOPRIM domain, and the manner in
which it coordinates metal ions does not resemble
any feature of DNA or RNA polymerases of
known structure.
The primase catalytic site and implications for
DNA/RNA binding
The structure of the primase catalytic core has
been determined in the absence of any substrates.
Nevertheless, several features of the structure
reveal the likely location of the polymerase active
site. A large groove is formed at the center of the
concave surface of the catalytic core (Figure 4).
This groove is encircled by residues that are very
highly conserved among DnaG-type primases, and
these constitute the only such patch of extremely
high sequence conservation on the surface of the
molecule (Figure 4(a)). In addition to the sequence
conservation, electrostatic features of this surface
groove are strongly suggestive of how the DNA
template and the RNA-DNA hybrid duplex might
be bound by the primase catalytic core.
One corner of this groove contains several conserved acidic residues, including the three that are
Structure of E. coli Primase Catalytic Core
characteristic of the TOPRIM domain (Figures 1
and 3). While our structure of primase contains no
bound metal ions, we infer that these three residues will coordinate divalent metal ions based
upon the similarity to the structure of topoisomerase VI, which does have metal ion bound to its
active site (Nichols et al., 1999) (Figure 3(b)). It is
known that primases require Mg2‡ for their
activity (Urlacher & Griep, 1995), and it is likely
that the polymerization reaction is catalyzed in a
metal-facilitated manner, as is the case for other
DNA and RNA polymerases (Steitz, 1998) as well
as for the topoisomerases IA and topoisomerases II
(Wang, 1996; Zhu et al., 1998). The location of the
three conserved acidic residues thus allows us to
localize the site where the phosphate transfer reaction is likely to occur in primase (Figure 4(b)).
It might be expected that the catalytic center of
primase would contain binding sites for one or
more nucleotides that act as substrates for the formation of the ®rst phosphodiester linkage in the
nascent RNA strand. A notable feature of the primase surface is a relatively deep and largely
hydrophobic pocket that is adjacent to the putative
metal binding site and that is ringed by very
highly conserved side-chains (Figure 4(a)). This
pocket is large enough to accommodate a purine
base (Figure 5(b)). The location of the pocket is
Figure 4. Molecular surface of
the primase catalytic domain.
(a) Distribution of conserved
residues among the DnaG-family of
primases. The surface is colored in
ramp from white to green, with
white representing sequence identity (calculated as described in the
legend to Figure 2) of 40 % or
lower, and darkest green corresponding to regions with 100 %
sequence identity, according to the
sequence similarity (40-100 %) calculated as in Figure 2. (b) The electrostatic potential mapped onto the
molecular surface, calculated using
GRASP (Nicholls et al., 1991). The
green sphere represents the location
of the Mg2‡ in the structure of the
TOPRIM domain of topoisomerase
VI, which was superimposed onto
the TOPRIM domain of primase in
order to position the metal ion
(Nichols et al., 1999). The black circle marks the potential site of catalysis and the yellow ellipses mark
potential binding sites for the phosphate backbone of DNA or RNA
(see Figure 5(a)). Both surfaces are
in the standard view (shown in
Figure 1).
Structure of E. coli Primase Catalytic Core
359
Figure 5. Potential modes of
interaction between the primase
and nucleic acids. (a) A speculative
model for how a newly synthesized
RNA-DNA hybrid might interact
with the primase. The molecular
surface of the primase catalytic
domain is shown, colored according to electrostatic potential as in
Figure 4(b). An A-form double
helix has been placed so as to
locate the end of one strand near
the metal-binding site, indicated by
the yellow ellipse. The shape of the
central groove is such that the
double helix can extend only up or
down along the surface. The
groove narrows towards the top,
and so the helix was extended
downwards. Note that the phosphate backbone of the template
strand (blue) runs close to surface
regions with positive electrostatic
potential
(blue).
The
singlestranded template is extended
upwards (broken line) along a
region of positive electrostatic
potential, generated by highly conserved residues (see Figure 4(a)).
The model shown here is not based
on any kind of energetic analysis,
but represents instead a simple
hypothesis that awaits experimental testing. (b) A potential binding
site for nucleotides in the TOPRIM
domain. A cross-sectional view of
the primase is shown, rotated
approximately 90o with respect to
the view in (a). A deep pocket,
lined by conserved surface residues
(see Figure 4(a)) can be seen. This pocket is adjacent to the metal-binding site, which is directly behind it in this
view. The pocket is large enough to accommodate a purine base, and an adenine residue is shown here merely to
illustrate this fact. If such an interaction were to correspond to an actual binding mode for an adenine nucleotide in
the initiation of primer synthesis, it would have to swing out of this pocket in order to base-pair with the complementary base in the template strand.
such that the phosphate groups of the nucleotide
bound at the site might be able to coordinate metal
ions at the adjacent acidic site. Such an interaction
would require that the base swing out of the pocket in order to base-pair with the complementary
nucleotide in the template strand.
In order to visualize how DNA and RNA
strands might interact with the surface of primase,
we carried out the following simple modeling exercise (Figure 5(a)). An A-form double helix, corresponding to an RNA-DNA hybrid, was placed on
the surface of the primase such that one of the
phosphate linkages in one strand was close to the
metal-binding site (this marks the site of nucleotide
addition to the growing RNA strand). It was
immediately apparent that the surface groove is so
constructed that the path of the duplex is constrained to move away from the active site essen-
tially in one direction only. Upward extension of
the double helix (in the orientation of Figure 5(a))
is prevented by a narrowing of the surface groove
due to the juxtaposition of elements from the Nterminal and TOPRIM sub-domains. Lateral movements of the helix are restricted by the anti-parallel
b-sheet of the N-terminal domain on one side, and
elements of the TOPRIM domain on the other.
These form a cup-shaped depression into which
the A-form double helix can be placed nicely.
The side of the groove opposite to the metalbinding site is lined by several conserved basic
residues, and these appear to be positioned for
favorable interaction with the phosphate backbone
of the template DNA strand in the model
(Figure 5(a)). Residues in the upper patch could
interact with single-stranded DNA, which is proposed to extend upwards, away from the active
360
site. The N-terminal residue of the catalytic core is
on the surface that is distal to the proposed activesite groove (Figure 1(b)). This suggests that the
zinc-binding domain, which is thought to interact
with single-stranded DNA (Pan & Wigley, 2000),
might be placed behind the catalytic core (with
reference to the standard orientation in Figure 5(a)).
It is not known how the domain interacts with
DNA, and whether its interaction with the catalytic
domain places any particular constraints on the
direction of approach of the template.
Primases usually synthesize short RNA primers,
that are often less than ten nucleotides long
(Kornberg & Baker, 1991). Thus, one may expect
that certain features of the structure of the catalytic
domain of the primase may have a bearing on the
length of primer synthesis. The proposed interaction between the DNA-RNA duplex and the surface of the catalytic domain of the primase
involves a small area, and therefore does not seem
likely to confer a high level of processivity upon
the primase. The interaction between the primase
and the newly synthesized primer is limited to less
than about ten nucleotides, and there are no apparent mechanisms for trapping the duplex on the
surface. As a consequence, it is likely that the primer may disengage from the template after the
synthesis of just a few nucleotides. The interaction
of primase with the DnaB helicase may help primase remain associated with DNA, but such an
interaction could have consequences for the lengths
of the primers that are synthesized, as discussed
below.
One intriguing consequence of the predicted
interaction between the DNA-RNA duplex and the
primase concerns potential restrictions on the continued synthesis of RNA by primase. Our model
suggests that the hybrid DNA-RNA duplex is
extruded by the catalytic domain along the path
indicated in Figure 5(a) such that it emerges from
the surface of the primase catalytic domain more
or less perpendicular to a rather ¯at edge of the
domain. This ¯at surface might abut other proteins
of the replication complex, and thereby affect the
ability of primase to continue synthesis. In many
cases, primases are recruited to single-stranded
DNA by helicases, such as the DnaB helicase of
E. coli, and the primases interact with the helicases
either covalently or non-covalently via the C-terminal region of the primase catalytic domain. The
positioning of a helicase below the primase (with
reference to the orientation shown in Figure 5)
might enforce a length restriction on the newly
synthesized primer that is set by the extent to
which the linkage between the helicase and the primase can ¯ex. The geometry of the predicted interaction between the RNA-DNA duplex and the
surface of the primase catalytic domain is such that
up to ten nucleotides of the primer could potentially emerge from the primase active site without
running into a protein that would abut the lower
edge of the primase.
Structure of E. coli Primase Catalytic Core
Conclusions
Primases are DNA-dependent RNA polymerases, and it seems remarkable to us that their
catalytic centers are closely related to that of topoisomerases and not to RNA or DNA polymerases.
Almost all DNA polymerases (as well as some
RNA polymerases) share a common architecture at
their catalytic centers, which involves the presentation of metal-coordinating acidic residues by a bsheet platform known as the palm (Steitz, 1998).
The structure of a bacterial RNA polymerase has
been determined, and while this has a metalcoordination site at its catalytic center, the organization of the protein is unrelated to either the DNA
polymerases or to primase (Zhang et al., 1999). The
connection between primases and topoisomerases
which is established by the identi®cation of
TOPRIM domains in these two crucial sets of replication proteins suggests that diverse proteins
involved in replication and repair, including the
primase RNA polymerases, have originated from a
common ancestor that is unrelated to DNA polymerases and the transcriptional RNA polymerases.
The impressive accuracy of the sequence comparisons by Koonin and co-workers (Aravind et al.,
1998), which ®rst showed the commonality of the
TOPRIM domain, bodes well for the ¯ow of
insights from the further elucidation of highly
divergent patterns of protein evolution to continue.
In conclusion, the structure of the catalytic core
of E. coli primase shows a novel fold for a nucleic
acid polymerase. A centrally placed Rossman fold,
the TOPRIM domain, helps form a surface groove
that we have tentatively identi®ed as the channel
for binding double-helical RNA-DNA hybrid. The
groove includes a putative active-site cleft that contains a conserved cluster of acidic residues,
suggesting that metal ions play a crucial role in the
phosphotransfer reaction. It is anticipated that the
structural results reported here will enable rapid
progress to now be made in obtaining a complete
understanding of the mechanism of primase
action.
Materials and Methods
Purification and crystallization
The catalytic core of E. coli primase (residues 111 to
429) was sub-cloned into the pPROEXHTa expression
vector, using the SfoI and NotI restriction sites, and
expressed in E. coli BL21 cells. The resulting protein contains a histidine tag connected to the N terminus of the
primase domain by a cleavage site for the tobacco etch
virus (TEV) protease (Parks et al., 1994). The cells were
lysed in 20 mM Tris-HCl (pH 8.0), 300 mM NaCl,
10 mM imidazole, containing 5 mM b-mercaptoethanol
and 1 mM PMSF, using an Emulsi¯ex French press
(Avestin). The soluble lysate was applied to Ni-NTA
(Qiagen) column to isolate the histidine-tagged fragment.
After the treatment with TEV protease, the tag-free protein was ¯owed over the Ni-NTA column again to
remove uncleaved fusion protein, and was further puri-
361
Structure of E. coli Primase Catalytic Core
®ed by gel ®ltration over a Superdex S75 column (Pharmacia) equilibrated in 20 mM Tris-HCl (pH 7.5), 50 mM
NaCl, and 1 mM DTT.
Crystals of the catalytic core were prepared using
hanging drop vapor diffusion by mixing equal volumes
of a 15 mg/ml protein solution and a reservoir solution
containing 16-19 % PEG 4000 (Fluka), 0.053-0.063 mM
Tris (pH 7.9-8.5) and 0.11-0.13 M sodium acetate, and
then equilibrating over 0.5 ml reservoir at 20 C for
about one week. Crystals form in space group P212121
Ê , b ˆ 107.6 A
Ê,
with unit cell dimensions of a ˆ 62.7 A
Ê . There are ®ve molecules of p36 fragment
c ˆ 263.1 A
per asymmetric unit, with a solvent content of approximately 50 %. A selenomethionine-substituted primase
fragment (with nine methionine residues per molecule)
was also produced and crystallized as for the native protein, except for an additional 5 mM DTT in the crystallization
solution.
Complete
incorporation
of
selenomethionine was con®rmed by mass spectrometry
(data not shown).
Structure determination
Crystals of the catalytic core were transferred to stabilization solution containing 22.35 % PEG 4000, 0.075 M
Tris (pH 8.3), 0.15 M sodium acetate, 5 mM DTT and
20 % (v/v) ethlyene glycol. Multiwavelength anomalous
dispersion (MAD) data were collected on one selenomethionyl-derivatized crystal at four wavelengths at
beamline X4A of the National Synchrotron Light Source
(Brookhaven, NY) (Table 1). All data were integrated
and reduced with DENZO/SCALEPACK programs
(Otwinowski & Minor, 1997). The scaled and reduced
intensity data were converted to amplitudes using
TRUNCATE (CCP4, 1994), and cross-wavelength scaling
was performed using FHSCALE (CCP4, 1994), treating
l2 as native (Table 1). Out of 45 selenium sites, 22 were
found by SOLVE (Terwilliger & Berendzen, 1995) and
the rest by difference Fourier maps. Phase calculation
and heavy-atom re®nement were performed in
MLPHARE (CCP4, 1994), and map modi®cations were
carried out using DM (CCP4, 1994) and MAMA
(Kleywegt & Read, 1998). Automated solvent ¯attening
resulted in an electron density map of exceptional quality into which a nearly complete model could be built
using the program O (Jones et al., 1991).
The primase catalytic core consists of three domains
(residues 120-241, 242-367, 368-415), and the inter-domain
orientations are slightly, but signi®cantly, different in
each of the ®ve molecules in the asymmetric unit. Re®nement of the model by CNS (BruÈnger et al., 1998) proceeded by utilizing non-crystallographic symmetry (NCS)
restraints for each of the three domains separately. The
Ê , at l1, for the selenostructure factors measured to 2.9 A
methionine crystals were used as the native data set for
re®nement. The ®nal re®ned model has a crystallographic
R-value of 21.9 %, a free R-value of 27.6 % and is well
de®ned for the residues 113-421. No electron density
could be seen for the residues 111, 112 and 429, or for the
residues 177, 193, 237, 287, 288, 364 and 422-428, only the
main chain could be traced. The geometry was analyzed
using PROCHECK (Laskowski et al.,1993), with 88 % of
the residues lying in the most favored regions of the
Ramachandran plot, and no residues in disallowed
regions.
Protein Data Bank accession number
The coordinates have been deposited with the RCSB
Protein Data Bank and have the accession code 1EQN.
Acknowledgments
We thank Jeffrey Bonanno, David Jeruzalmi and Hiroto Yamaguchi for valuable assistance and discussions.
We thank the staff at the X4A beamline at the National
Synchrotron Light Source for assistance with synchrotron
data collection. This work was supported in part by a
grant from National Institutes of Health (GM38839; MO).
References
Aravind, L., Leipe, D. D. & Koonin, E. V. (1998). Toprim
± a conserved catalytic domain in type IA and II
topoisomerases, DnaG-type primases, OLD family
nucleases and RecR proteins. Nucl. Acids Res. 26,
4205-4213.
Berger, J. M., Fass, D., Wang, J. C. & Harrison, S. C.
(1998). Structural similarities between topoisomerases that cleave one or both DNA strands. Proc.
Natl Acad. Sci. USA, 95, 7876-7881.
Berger, J. M., Gamblin, S. J., Harrison, S. C. & Wang,
J. C. (1996). Structure and mechanism of DNA
topoisomerase. Nature, 379, 225-232.
BruÈnger, A. T., Adams, P. D., Clore, G. M., DeLano,
W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J. S.,
Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J.,
Rice, L. M., Simonson, T. & Warren, G. L. (1998).
Crystallography and NMR system: a new software
suite for macromolecular structure determination.
Acta Crystallog. sect. D, 54, 905-921.
Callaborative Computational Project No. 4 (1994). The
CCP4 suite: programs for protein crystallography.
Acta Crystallog. sect. D, 50, 760-763.
Chen, S. J. & Wang, J. C. (1998). Identi®cation of activesite residues in Escherichia coli DNA topoisomerase
I. J. Biol. Chem. 273, 6050-6056.
Dracheva, S., Koonin, E. V. & Crute, J. J. (1995). Identi®cation of the primase active site of the herpes simplex virus type 1 helicase-primase. J. Biol. Chem.
270, 14148-14153.
Henikoff, S. & Henikoff, J. G. (1993). Performance evaluation of amino acid substitution matrices. Proteins:
Struct. Funct. Genet. 17, 49-61.
Holm, L. & Sander, C. (1993). Protein structure comparison by alignment of distance matrices. J. Mol. Biol.
233, 123-138.
Ilyina, T. V., Gorbalenya, A. E. & Koonin, E. V. (1992).
Organization and evolution of bacterial and bacteriophage primase-helicase systems. J. Mol. Evol.
34, 351-357.
Jones, T. A., Zou, J. Y., Cowan, S. W. & Kjeldgaard, M.
(1991). Improved methods for building protein
models in electron density maps and the location of
errors in these models. Acta Crystallog sect. A, 47,
110-119.
Keck, J. L., Roche, D. D., Lynch, A. S. & Berger, J. M.
(2000). Structure of the RNA polymerase domain of
E. coli primase. Science, 287, 2482-2486.
Kleywegt, G. J. & Read, R. J. (1998). Not your average
density. Structure, 5, 1557-1569.
Kornberg, A. & Baker, T. A. (1991). DNA Replication,
2nd edit., W.H. Freeman and Company, New York.
362
Structure of E. coli Primase Catalytic Core
Laskowski, R. A., MacArthur, M. W., Moss, D. S. &
Thornton, J. M. (1993). PROCHECK: a program to
check the stereochemical quality of protein structures. J. Appl. Crystallog. 26, 283-291.
Leipe, D. D., Aravind, L. & Koonin, E. V. (1999). Did
DNA replication evolve twice independently? Nucl.
Acids Res. 27, 3389-3401.
Lima, C. D., Wang, J. C. & Mondragon, A. (1994).
Three-dimensional structure of the 67 K N-terminal
fragment of E. coli DNA topoisomerase I. Nature,
367, 138-146.
McMacken, R. & Kornberg, A. (1978). A multienzyme
system for priming the replication of phiX174 viral
DNA. J. Biol. Chem. 253, 3313-3319.
Mustaev, A. A. & Godson, G. N. (1995). Studies of the
functional topography of the catalytic center of
Escherichia coli primase. J. Biol. Chem. 270, 1571115718.
Nicholls, A., Sharp, K. A. & Honig, B. (1991). Protein
folding and association: insights from the interfacial
and thermodynamic properties of hydrocarbons.
Proteins: Struct. Funct. Genet. 11, 281-296.
Nichols, M. D., DeAngelis, K., Keck, J. L. & Berger, J. M.
(1999). Structure and function of an archaeal topoisomerase VI subunit with homology to the meiotic
recombination factor Spo11. EMBO J. 18, 6177-6188.
Otwinowski, Z. & Minor, W. (1997). Processing of X-ray
diffraction data collected in oscillation mode.
Methods Enzymol. 276, 307-326.
Pan, H. & Wigley, D. B. (2000). Structure of the zincbinding domain of Bacillus stearothermophilus DNA
primase. Structure, 8, 231-239.
Parks, T. D., Leuther, K. K., Howard, E. D., Johnston,
S. A. & Dougherty, W. G. (1994). Release of proteins and peptides from fusion proteins using a
recombinant plant virus proteinase. Annu. Rev.
Biochem. 216, 413-417.
Steitz, T. A. (1998). A mechanism for all polymerases.
Nature, 391, 231-232.
Strack, B., Lessl, M., Calendar, R. & Lanka, E. (1992).
A common sequence motif, -E-G-Y-A-T-A-, ident-
i®ed within the primase domains of plasmidencoded I- and P-type DNA primases and the
alpha protein of the Escherichia coli satellite phage
P4. J. Biol. Chem. 267, 13062-13072.
Terwilliger, T. C. & Berendzen, J. (1995). Difference
re®nement: obtaining differences between two
related structures. Acta. Crystallog. sect. D, 51, 609618.
Thompson, J. D., Gibson, T. J., Plewniak, F.,
Jeanmougin, F. & Higgins, D. G. (1997). The CLUSTAL X windows interface: ¯exible strategies for
multiple sequence alignment aided by quality analysis tools. Nucl. Acids Res. 25, 4876-4882.
Tougu, K. & Marians, K. J. (1996). The interaction
between helicase and primase sets the replication
fork clock. J. Biol. Chem. 271, 21398-21405.
Tougu, K., Peng, H. & Marians, K. J. (1994). Identi®cation of a domain of Escherichia coli primase
required for functional interaction with the DnaB
helicase at the replication fork. J. Biol. Chem. 269,
4675-4682.
Urlacher, T. M. & Griep, M. A. (1995). Magnesium acetate induces a conformational change in Escherichia
coli primase. Biochemistry, 34, 16708-16714.
Versalovic, J. & Lupski, J. R. (1993). The Haemophilus
in¯uenzae dnaG sequence and conserved bacterial
primase motifs. Gene, 136, 281-286.
Wang, J. C. (1996). DNA topoisomerases. Annu. Rev. Biochem. 65, 635-692.
Zhang, G., Campbell, E. A., Minakhin, L., Richter, C.,
Severinov, K. & Darst, S. A. (1999). Crystal structure of Thermus aquaticus core RNA polymerase at
Ê resolution. Cell, 98, 811-824.
3.3 A
Zhu, C. X., Roche, C. J., Papanicolaou, N.,
DiPietrantonio, A. & Tse-Dinh, Y. C. (1998). Sitedirected mutagenesis of conserved aspartates, glutamates and arginines in the active site region of
Escherichia coli DNA topoisomerase I. J. Biol. Chem.
273, 8783-8789.
Edited by P. E. Wright
(Received 3 April 2000; received in revised form 28 April 2000; accepted 28 April 2000)