Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Histone acetylation and deacetylation wikipedia , lookup
List of types of proteins wikipedia , lookup
Protein structure prediction wikipedia , lookup
Eukaryotic transcription wikipedia , lookup
RNA polymerase II holoenzyme wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Cell, Vol. 87, 127–136, October 4, 1996, Copyright 1996 by Cell Press Crystal Structure of a s70 Subunit Fragment from E. coli RNA Polymerase Arun Malhotra, Elena Severinova,* and Seth A. Darst The Rockefeller University 1230 York Avenue New York, New York 10021 Summary The 2.6 Å crystal structure of a fragment of the s70 promoter specificity subunit of E. coli RNA polymerase is described. Residues involved in core RNA polymerase binding lie on one face of the structure. On the opposite face, aligned along one helix, are exposed residues that interact with the 210 consensus promoter element (the Pribnow box), including four aromatic residues involved in promoter melting. The structure suggests one way in which DNA interactions may be inhibited in the absence of RNA polymerase and provides a framework for the interpretation of a large number of genetic and biochemical analyses. Introduction Transcription is the major control point of gene expression and DNA-dependent RNA polymerase (RNAP) is the central enzyme of transcription. The core RNAPs from bacterial and eukaryotic cells, which are catalytically active in RNA chain elongation, are homologous in structure and function (Allison et al., 1985; Biggs et al., 1985; Ahearn et al., 1987; Sweetser et al., 1987; Darst et al., 1989; Darst et al., 1991; Schultz et al., 1993; Polyakov et al., 1995). Specific initiation of transcription requires additional protein factors. Promoter-specific initiation of mRNA synthesis in eukaryotes requires a collection of basal initiation factors (TFIIB, TBP, TFIIE, TFIIF, and TFIIH, reviewed in Roeder, 1991; Conaway and Conaway, 1993) comprising more than a dozen polypeptides with a total mass z750 kDa. In bacteria, specific initiation by RNAP requires a single polypeptide known as a s factor, which binds to core RNAP to form the holoenzyme (Burgess et al., 1969; Travers and Burgess, 1969). One primary s factor directs the bulk of transcription during exponential growth. Specialized, alternative s factors direct transcription of specific regulons during unusual physiological or developmental conditions (reviewed in Helmann and Chamberlin, 1988; Gross et al., 1992). The primary and most of the alternative s factors comprise a highly homologous family of proteins (Stragier et al., 1985; Gribskov and Burgess, 1986) with four regions of highly conserved amino acid sequence (Figure 1; reviewed in Lonetto et al., 1992). The primary s factor in Escherichia coli, s70 , directs transcription from promoters characterized by two elements of consensus DNA sequence: TATAAT (the Pribnow box), centered at about 210 with respect to the *On leave from the Institute of Molecular Genetics, Russian Academy of Sciences, Moscow, Russia. transcription start site (11), and TTGACA, centered at about 235. The 210 and 235 elements are usually separated by 17 base pairs of nonconserved sequence (Hawley and McClure, 1983; Harley and Reynolds, 1987). Alternative s factors of the s70 family also direct transcription from promoters organized into 210 and 235 elements but with different consensus sequences (reviewed in Helmann and Chamberlin, 1988), leading to the proposal that s factors themselves directly contact the DNA at the consensus promoter elements to confer sequence-specific recognition (Losick and Pero, 1981). Studies demonstrating the formation of chemical crosslinks between promoter DNA and s70 support this proposal (Simpson, 1979; Park et al., 1980; Hilton and Whiteley, 1985; Buckle et al., 1991). Stronger support comes from genetic studies demonstrating allele-specific suppression of promoter mutations by specific mutations in the corresponding s factor (Gardella et al., 1989; Siegele et al., 1989; Zuber et al., 1989; Daniels et al., 1990; Waldburger et al., 1990), identifying residues in s conserved regions 2.4 and 4.2 as specifying recognition of the 210 and 235 promoter elements, respectively. However, s70 does not bind promoter DNA in the absence of core RNAP. Recently, specific interactions between N-terminally truncated derivatives of s70 and promoter DNA were indicated by competetive filter retention assays, leading to the hypothesis that the latent DNA binding activity of s70 is inhibited by N-terminal regions and that this inhibition is relieved upon the binding of s70 to core RNAP (Dombroski et al., 1992). Since only the holoenzyme form of RNAP forms transcription-competent open complexes on doublestranded DNA at promoters, and since the melted DNA includes the 210 consensus element (Siebenlist et al., 1980; Kirkegaard et al., 1983), it has also been suggested that s factors are directly involved in DNA melting (Hinkle and Chamberlin, 1972), perhaps by sequence-specific binding to one of the DNA strands (Helmann and Chamberlin, 1988). Chemical probing of open complexes indicates strong protection of the nontemplate strand by RNAP holoenzyme, particularly in the region around the 210 consensus element (Siebenlist et al., 1980; Buckle and Buc, 1989; Chan et al., 1990). Moreover, cross-links between s and promoter DNA in open complexes form preferentially with the nontemplate strand (Simpson, 1979; Park et al., 1980; Hilton and Whiteley, 1985; Buckle et al., 1991). Interaction of s with the nontemplate strand would maintain the melted region of DNA in the open complex while allowing access of the RNAP catalytic machinery and nucleotide substrates to the template strand. Despite the central importance of s factors in the control of bacterial gene expression, fundamental questions regarding the mechanism of action, the regulation, and the role of s in such processes as promoter recognition, promoter melting, and promoter clearance, remain unanswered. This is due, in large part, to a total lack of structural information. We now report the X-ray crystal structure of a s70 fragment containing conserved region 2, which is the most highly conserved region in the s 70 Cell 128 Figure 1. Conserved Regions of the s70 Family of s Factors The bar at top represents the E. coli s70 primary sequence with amino acid numbering shown above. Evolutionarily conserved regions are labeled below the bar and colored either gray or as follows: 1.2, red; 2.1, green; 2.2, yellow; 2.3, cyan; 2.4, orange. Expanded underneath is a sequence alignment showing regions 1.2 to 2.4 of 11 primary s factors taken from the more extensive alignment of Lonetto et al., 1992, but also including the primary s from E. coli C (Christie and Cale, 1995). The sequences are presented in one-letter amino acid code and are identified by a three-letter species code (ANA, Anabaena; BSU, Bacillus subtilis; CTR, Chlamydia trachomatis; ECO, E. coli; MXA, Myxococcus xanthus; PAE, Pseudomonas aeruginosa; SAU, Staphylococcus aureaus; SCE, Streptomyces coelicolor; STY, Salmonella typhimurium) and a four-letter s abbreviation. Numbers at the beginning of each line indicate amino acid positions relative to the start of each mature protein sequence. Numbers at the top indicate the amino acid position in E. coli K-12 s 70. Amino acid similarity >50% in the full alignment of Lonetto et al. (1992) is indicated by a gray background, gaps are indicated by dotted lines. Groups of residues considered similar are: ST, RK, DE, NQ, FYW, and ILVM. Helices are indicated above the sequences as rectangles; loops, as a solid black line. The dashed parts indicate disordered regions that were not modeled. The patterned helices form intramolecular, antiparallel coiled-coil dimers with the like-patterned helix. Indicated below the sequences are the locations of deletions (lines) or point mutations (dots) that affect core RNAP binding (green; Lesley and Burgess, 1989; Shuler et al., 1995), promoter melting (blue; Jones and Moran, 1992; Juang and Helmann, 1994; deHaseth and Helmann, 1995; Juang and Helmann, 1995), or 210 consensus element recognition (orange; Kenney et al., 1989; Siegele et al., 1989; Zuber et al., 1989; Daniels et al., 1990; Waldburger et al., 1990; Tatti et al., 1991). Also denoted is a short region of homology between s 70 and s54, an alternative s factor that is not a member of the s 70 family, in which point mutations occur that cause loss of s 54 binding to core RNAP (green bracket; Tintut and Gralla, 1995). family (Lonetto et al., 1992). Conserved region 2 has been implicated in core RNAP binding (Lesley and Burgess, 1989), recognition and binding of the 210 promoter element (Siegele et al., 1989; Zuber et al., 1989; Daniels et al., 1990; Waldburger et al., 1990), and promoter melting (Helmann and Chamberlin, 1988; Juang and Helmann, 1994; Rong and Helmann, 1994). The structure reveals a new fold with a topology unlike other known double- or single-stranded DNA binding proteins. The structure also provides insight into the role of s in promoter melting, indicates one way in which DNA interactions could be inhibited in free s, and provides a framework for the interpretation of a large number of genetic and biochemical analyses as well as for the design of future studies. Results and Discussion Crystallographic Analysis Limited proteolysis, N-terminal sequencing and mass spectrometry were used to determine the domain organization of E. coli s70 (E. S. et al., submitted). One of the domains (s70 residues 114 to 448, hereafter referred to as s702) contains a part of conserved region 1.2 and all Structure of a s 70 Subunit Fragment 129 Figure 3. Core RNAP Binding Surface of s702 Figure 2. Structure of s702 (a) Schematic diagram of the secondary structure of s 702. Helices are shown as cylinders. Three disordered regions that were not modeled are indicated (not to scale) as dotted lines. These are residues 168–172, 192–211, and 238–241. Conserved regions 1.2, 2.1, 2.2, 2.3, and 2.4 of the s70 family are color coded as in Figure 1. The nonconserved region inserted between conserved regions 1.2 and 2.1 is colored gray. (b) RIBBONS (Carson, 1991) diagram of the three-dimensional structure of s702. Color coding is as before. Unmodeled, disordered regions are indicated (not to scale) as dotted lines: (left) similar view as Figure 2a, perpendicular to the horizontal, pseudo-2-fold symmetry axis; (right) view along the pseudo-2-fold symmetry axis. but the C-terminal 8 residues of conserved region 2 (Figure 1). The 39 kDa domain binds core RNAP competetively with s70 , and the complex thus formed specifically binds single-stranded DNA containing the 210 consensus sequence of the nontemplate strand. The structure was solved using multiwavelength anomalous diffraction (MAD) data collected from a single crystal of selenomethionyl-substituted protein (Hendrickson, 1991). The current model, refined to 2.6 Å resolution, contains residues 114–446 of s 70, along with one N-terminal methionine residue from the expression vector. Three N-terminal residues from the vector, the s70 C-terminal residues 447–448, and a total of 29 internal residues in 3 loops are disordered and could not be modeled (Figure 1). Overall Topology and Organization The structure of s702 (Figure 2), which consists entirely of helices and connecting loops, can be divided conceptually into three substructures comprising two structural motifs. One motif consists of an antiparallel a-helical The backbone trace is displayed as a white tube, showing the cluster of four helices comprising the conserved regions. A partially transparent, solvent-accessible surface encloses the structure. The large black labels denote the helices. The view is directly at the face containing helices 12a and 12b, comprising conserved region 2.1. The kink between these helices is centered about Asn-383. Helices 1 and 12a form an intramolecular, antiparallel coiled-coil dimer, only part of which is shown. Some selected, conserved hydrophobic residues are shown in yellow. Hydrophobic a and d positions of the coiled-coil heptad repeat are occupied by Ile-119, Ile-123, Ala-375, and Met-379. The green backbone and side chains (residues 380– 385) denote a region known from mutagenesis studies to be important for core RNAP binding (see Figure 1). Shown also are other highly conserved residues that we speculate may be involved in core RNAP binding. These are, first, an adjacent patch of conserved, solvent-exposed hydrophobic residues (Leu-384, Ile-388, Phe-401, Leu-402, Ile-405, shown in yellow) that are suggestive of a protein interaction surface and, second, three other highly conserved, exposed residues (shown in blue) (generated using the program GRASP; Nicholls et al., 1991). coiled-coil dimer (Figure 3) with a helical bundle at one end. This motif is roughly repeated twice, giving rise to a “V”-shaped structure with pseudo-2-fold symmetry (Figures 2A and 2B) that is not detectably reflected in the sequence. The second motif comprises a small helical domain that is situated at the vertex of the V. The width of the V is z50 Å, and the distance separating the ends of the arms is z75 Å. In the perpendicular direction, the molecule is z30 Å thick. The conserved regions (colored) are clustered together and closely associated in the tertiary structure (Figure 2). The rest of the structure (gray) comprises a large insertion between conserved regions 1.2 and 2.1 (up to 250 amino acids in the primary s of Pseudomonas aeruginosa, 245 in E. coli s70 ), present only in some primary s factors but nonconserved in both sequence and length (Figure 1). The C-terminus of conserved region 1.2 is close enough to the N-terminus of conserved region 2.1 to be linked by only a few residues, suggesting how the conserved regions could constitute a stably Cell 130 folded domain with essentially the same structure in the absence of the nonconserved region. The clustering and tertiary fold of the conserved regions appears to be determined primarily by a tightly packed hydrophobic core (Figures 4A and 5). As might be expected, residues contributing to this hydrophobic core are some of the most highly conserved of all s factors, including several absolutely conserved residues (Val-387, Gly-408, Gly-411, Leu-412, Ala-431, and Ile435). The absolutely conserved glycine residues occur at the points of closest approach between helices 12a and 13 (Gly-408) and helices 13 and 14 (Gly-411). Overall, at the 17 positions contributing to the hydrophobic core, the residues found at each position in E. coli s70 are 93% identical when compared with primary s factors (77% in all s factors), and hydrophobic residues are 100% conserved in primary s factors (97% in all s factors). Helix 13 (essentially region 2.2, yellow in Figure 2), being sandwiched between the helices constituting the other conserved regions, contributes the most residues to the hydrophobic core. This helps explain why region 2.2 is the most highly conserved region in s factors (Lonetto et al., 1992). Core RNA Polymerase Binding A number of observations indicate that conserved region 2.1 is critical for high affinity binding of s to core RNAP (Figure 1). Deletion analysis has identified a region of s70 , residues 361–390, including most of region 2.1, that seems to be necessary and sufficient for core RNAP binding (Lesley and Burgess, 1989). This is consistent with a similar analysis investigating core RNAP binding of E. coli s 32 truncation mutants (Lesley et al., 1991). Mutations in region 2.1 of B. subtilis s E cause defects in binding to both B. subtilis and E. coli core RNAP (Figure 1) (Shuler et al., 1995). PCR mutagenesis was used to identify specific amino acids important for core RNAP binding within E. coli s 54, an alternative s factor that is not a member of the s70 family (Wong et al., 1994; Tintut and Gralla, 1995). The largest concentration of mutations causing loss of core RNAP binding fell within or just adjacent to a short stretch of residues that bears resemblance to s 70 residues 381–385 within region 2.1. Moreover, the sequence through this region of s54 is arranged in a heptad repeat characteristic of coiledcoils, as is found in this region of the s70 2 structure. Conserved region 2.1 (green in Figure 2) forms two a helices, the C-terminal part of helix 12a and helix 12b, with an z458 kink between them, centered about Asn383 (Figure 3). Helix 12a makes hydrophobic coiled-coil interactions with helix 1 while both helices 12a and 12b contribute to the hydrophobic core formed by the cluster of conserved region helices (Figures 4A and 5). On the opposite face, the helices are solvent exposed. Some of the residues that appear to be important for core RNAP binding are illustrated in Figure 3. These residues include or fall very near the kink between helices 12a and 12b, suggesting that this structural feature may be important for core RNAP binding. Adjacent to this feature is a cluster of conserved, hydrophobic residues that form a solvent-exposed hydrophobic patch that is suggestive of a protein-binding surface and may also be involved in s–core RNAP interactions (Figure 3). Also denoted in Figure 3 are nearby solvent-exposed but highly conserved residues (Glu-114, Arg-374, Lys-392). Because these residues do not appear to make intramolecular contacts contributing to the s702 structure, their location and conservation suggest they may also be involved in core RNAP contacts. Pribnow Box Recognition and Melting A number of studies using a variety of primary and alternative s factors from E. coli and B. subtilis (summarized in Figure 1) have identified site-specific mutations within region 2.4 that suppress single-base mutations within the 210 promoter element (Figure 1). All of these studies converge on the conclusion that residues corresponding to 437 and 440 of s 70 (unless otherwise specified, all amino acid numbering refers to E. coli K-12 s70 ) suppress specific base changes at the promoter position corresponding to 212 (Kenney et al., 1989; Siegele et al., 1989; Daniels et al., 1990; Waldburger et al., 1990). Specifically in E. coli s70 , substitution of Gln-437 with His (Gln-437-His) causes a substantial increase in activity from mutant promoters having a T-to-C substitution at 212 (Waldburger et al., 1990), and a Thr-440-Ile substitution increases activity from mutant promoters having T-to-C or -G substitutions at 212 (Siegele et al., 1989). Although the 213 position is not conserved in the 210 element recognized by s 70, many alternative s factors recognize 210 consensus elements that extend upstream to 213 or even further. Similar analyses for two of these s factors, sH and sE of B. subtilis, have demonstrated that substitution of the residue corresponding to 441 of s70 suppresses specific base changes at 213 (Zuber et al., 1989; Tatti et al., 1991). The spacing of these residues, along with the presence of three absolutely conserved hydrophobic residues at 435, 439, and 443 (Figure 1) led to the proposal that this region of s factors forms an amphipathic helix with exposed residues at positions 437 and 440 directly contacting the 212 base, while the residue corresponding to position 441 contacts the 213 base (Kenney et al., 1989; Daniels et al., 1990; Waldburger et al., 1990). Region 2.4 does indeed form an amphipathic a helix (helix 14), with the conserved hydrophobic residues contributing to the hydrophobic core formed by the conserved regions, while the residues implicated in 210 element recognition are solvent-exposed on the opposite face of the helix (Figures 4 and 5). A role for conserved region 2.3 in promoter melting has been proposed (Helmann and Chamberlin, 1988) based on its proximity to region 2.4 (the 210 recognition region) and its high proportion of conserved aromatic and basic residues, which might serve to stack with exposed bases and neutralize the DNA phosphate backbone, as in single-stranded nucleic acid–binding proteins (Nagai et al., 1990; Shamoo et al., 1995). This model has been tested with B. subtilis s A, which is 83% identical with E. coli s70 in region 2.3 (100% homologous since the only substitutions are conservative), by substituting each of the seven conserved aromatic residues in region 2.3 (Figure 1) with alanine (Juang and Helmann, 1994). The results of this study are interesting to consider in light of the s702 structure. Four of the substitutions (in E. coli s70 numbering), Tyr-425, Tyr-430, Trp-433, and Structure of a s 70 Subunit Fragment 131 Figure 4. DNA Interaction Surface of s 702 (A) Stereo RIBBONS (Carson, 1991) diagram of the cluster of four helices comprising the conserved regions. The view is 1808 about a vertical axis from the view of Figure 3. Helix 14, containing part of conserved region 2.3 and conserved region 2.4, runs nearly horizontally across the middle of the picture. Shown in yellow are residues that comprise the conserved hydrophobic core (Ile-119, Ile-123, Ala-375, Met-379, Val380, Val-387, Ala-391, Leu-399, Leu-404, Leu-412, Ala-415, Val-416, Phe-419, Phe-427, Ala-431, Ile-435, Ile-439, Ile-443). Other residues are shown in color as follows: cyan, exposed conserved aromatic residues from region 2.3, important for promoter melting; orange, residues known to interact with the 212 position of the 210 consensus element; blue, conserved basic residues flanking the promoter recognition and promoter melting residues that may be involved in DNA phosphate backbone interactions. (B) Likely orientation of helix 14/nontemplate DNA strand interactions. The backbone of helix 14 is shown as a coil with the solvent-exposed face of the helix facing down. The a-carbon positions of residues important for promoter recognition or melting are indicated. Schematically illustrated below is the nontemplate strand sequence of the 210 consensus element. Interactions between specific residues and bases determined from genetic or biochemical studies are indicated by dashed lines. The interaction indicated between the residue at position 441 and the 213 position is not specific in the case of s70 (the 213 position is not conserved in the 210 element recognized by s70) but is indicated from genetic studies on alternative s factors that recognize 210 elements with a conserved 213 position (Daniels et al., 1990). Trp-434 (cyan in Figures 4A and 5), cause impaired DNA melting even when the promoter is saturated with RNAP (Juang and Helmann, 1994). Consistent with melting defects, the impairment can be overcome by using supercoiled templates or by raising the reaction temperature (Aiyar et al., 1994; Juang and Helmann, 1994). Several of these mutations exert a trans-dominant lethal effect in vivo (Rong and Helmann, 1994). In addition, a substitution in an alternative s factor at the position corresponding to 434 also results in what appears to be a melting defect (Jones and Moran, 1992). Each of these conserved aromatic residues is solvent exposed and on the same face of helix 14 as the region 2.4 residues implicated in 210 element recognition (Figure 4). Two other substitutions, Phe-419 and Phe-427, exhibit properties that suggest these mutants are unable to fold properly. Consistent with this result, these two residues are buried and play important roles in forming the conserved hydrophobic core (Figures 4A and 5). Substitution of Tyr-421 has little to no effect in vivo and in vitro, and in the s702 structure this residue does not participate in the hydrophobic core and is not in a position to participate in DNA interactions since it faces the opposite side of the structure as the 210 element recognition and melting residues. Potassium-permanganate footprinting indicates that promoter melting nucleates within the 210 element around the 211/210 position. Interestingly, the Tyr-430 Cell 132 Figure 5. Potential Autoinhibition of DNA Binding RIBBONS (Carson, 1991) diagram showing a view of the conserved region helices, z908 about a vertical axis from the view of Figure 4. Helix 14 is viewed from the C-terminal end nearly down its axis. Selected conserved residues are color coded as follows: yellow, residues comprising conserved hydrophobic core; green, residues in region important for core RNAP binding; cyan, aromatic residues important for promoter melting; orange, residues important for recognition of the 212 position of the 210 consensus element. The location of core binding and DNA binding determinants on opposite sides of the structure is noted. Illustrated schematically are the sequence (in single-letter amino acid code) and charge of the 20residue, disordered acidic loop (residues 192–211). cleft might sterically inhibit DNA interaction and would also repel the negatively charged DNA electrostatically. This would help to explain why s70 and also s70 2 do not interact with DNA in the absence of core RNAP, consistent with studies implicating regions of s70 N-terminal of region 2 in inhibiting DNA interactions (Dombroski et al., 1992, 1993). Since many s factors lack the acidic loop (Figure 1), it cannot be the only mechanism by which specific interactions of s factors with promoter DNA are inhibited in the absence of core RNAP, also consistent with the studies of Dombroski et al., 1992. Since the s70 2-core RNAP complex binds specifically to a single-stranded DNA oligo with the nontemplate 210 consensus sequence, a mechanism involving substantial conformational changes of s and/or neutralization of the acidic loop upon core RNAP binding must exist for the autoinhibition of DNA binding from the acidic loop to be overcome. and Trp-433 substitutions appear to be defective in the nucleation of melting (Juang and Helmann, 1995), suggesting that these residues interact with the bases at the 211 and/or 210 positions. Thus, residues implicated in interacting with the 213, 212, and 211/210 bases are aligned along one face of helix 14. Flanking this row of aligned residues on top and bottom are several conserved, positively charged arginine and lysine residues (Figure 4A) that could interact with the negatively charged phosphate backbone of the DNA. Cross-linking and chemical probing experiments (discussed above) suggest that s interacts primarily with the nontemplate strand in the open promoter complex. Furthermore, a complex between core RNAP and s70 2 specifically binds a single-stranded DNA oligo with the 210 consensus sequence of the nontemplate strand (E. S. et al., submitted). Thus, we propose that in the open promoter complex, the single-strand region of the nontemplate strand interacts with the residues of region 2.3 and 2.4 in the manner and orientation schematically illustrated in Figure 4B. Relationship to Eukaryotic Factors Putative sequence relationships have been suggested between s conserved region 2 and various eukaryotic basal transcription factors such as TATA-binding protein (TBP; Horikoshi et al., 1989), TFIIF or RAP30/74 (Sopta et al., 1989), TFIIB (Ha et al., 1991), subunits of TFIIE (Ohkuma et al., 1991; Sumimoto et al., 1991), and the mitochondrial RNAP factor MTF1 (Jang and Jaehning, 1991). The proposed similarity between s 70 conserved region 2 and TBP is particularly intriguing considering that each protein recognizes essentially the same DNA sequence. However, none of these sequence alignments are statistically significant based on the criteria of Lonetto et al., 1992. Indeed, considering s 70 region 2 and TBP in light of their structures, the relevant region of s70 (residues 417–457, conserved regions 2.3 and 2.4) is mostly an a helix, while the relevant region of TBP (residues 129–170 of Arabidopsis thaliana TBP) is a short a helix and three strands of antiparallel b sheet (Nikolov et al., 1992). Thus, this proposed sequence relationship does not appear to be meaningful, and functional evidence will be required to establish the significance of the other noted similarities. In one case, the proposed similarity between the RAP30 subunit of RAP30/74 (TFIIF) and the core binding region 2.1 of s factors (Sopta et al., 1989) is supported by the finding that RAP30/74 binds E. coli core RNAP competetively with s 70 (McCracken and Greenblatt, 1991). It is also interesting to note that bg, the TFIIF homolog derived from rat liver, binds to RNAP II and inhibits its nonspecific DNA binding, which is also a function attributed to s factors (Conaway and Conaway, 1990). Inhibition of DNA Binding The core RNAP binding determinants and DNA binding determinants of s 702 face opposite sides of the structure (Figure 5). The residues implicated in 210 recognition and melting all face into a cleft-like feature. However, potentially occupying this cleft is a highly acidic stretch of residues from 188 to 209. In this stretch of 22 residues, 18 are negatively charged (Figure 1). Most of this acidic loop (192–209) is disordered. The presence of this highly acidic loop within or near the apparent DNA-binding Conclusion The crystal structure of an E. coli RNAP s70 fragment reveals an entirely helical, “V”-shaped protein with all the regions of conserved primary sequence closely associated with one another. Residues known to interact with core RNAP (Lesley and Burgess, 1989; Shuler et al., 1995) face one side of the structure, while on the opposite side, residues important for promoter recognition (Siegele et al., 1989; Waldburger et al., 1990) and four conserved aromatic residues involved in promoter Structure of a s 70 Subunit Fragment 133 Table 1. Summary of the Crystallographic Analysis Diffraction Data Native Se-Met Se-Met Se-Met Se-Met Wavelength (Å) Resolution (Å) Total reflections Unique reflections Completeness (%) [last shell] I/s [last shell] Rsym (%) [last shell] 0.9663 2.6 76887 13160 94.9 [78.3] 15.1 [2.5] 6.3 [27.0] l1 5 0.9879 2.8 58230 10267 92.3 [65.7] 17.4 [3.6] 6.8 [17.5] l2 5 0.9793 2.8 58476 10229 92.6 [67.9] 16.6 [3.2] 7.6 [19.6] l3 5 0.9791 2.8 58421 10221 92.7 [68.0] 16.1 [3.0] 8.8 [22.5] l4 5 0.9686 2.8 59371 10327 93.2 [67.6] 16.0 [2.9] 8.3 [20.2] — — 10 1.13 10 — 10 0.55 10 1.11 Phasing statistics Number of sites Phasing power Overall figure of merit: 0.591 Refinement Resolution (Å): 6-2.6 Reflections (|F | . 2s|F |): 11376 Number of atoms in model: 2593 Number of solvent molecules: 111 R-factor (%): 21.8 Free R-factor (%): 31.5 RMS bond-length (Å): 0.012 RMS bond angle (degrees): 1.637 Rsym 5 S|I 2 ,I.|/SI, where I 5 observed intensity, ,I. 5 average intensity from multiple observations of symmetry related reflections. Phasing power 5 root mean square (|FH|/E), where |FH| 5 heavy-atom structure factor amplitude and E 5 residual lack of closure. RMS bond lengths and angles are the deviations from ideal values. melting (Juang and Helmann, 1994) are aligned along the solvent-exposed face of a single helix. The promoter recognition and melting helix is surrounded on top and bottom by conserved basic residues that may interact with the DNA phosphate-backbone. Finally, we suggest that a disordered loop of acidic residues may sterically and electrostatically inhibit DNA interactions of s70 2 in the absence of RNAP. Substantial conformational changes of s 702 upon RNAP binding and/or binding and neutralization of the acidic loop by a basic region of RNAP would be required to overcome this inhibition. Exposed aromatic residues surrounded by basic residues are a characteristic of single-strand nucleic acid binding proteins, and this feature of s conserved region 2.3, noted earlier from the primary sequence (Helmann and Chamberlin, 1988), is borne out in the structure. Moreover, s702 promotes the specific binding of a singlestranded oligo containing the nontemplate strand sequence of the 210 consensus element by RNAP (E. S. et al., submitted). These findings suggest that, in the open complex between RNAP holoenzyme and promoter DNA, s functions in part as a sequence-specific single-strand DNA binding protein. The 210 promoter consensus element, with its T-A base steps, has a distorted structure in solution (Spassky et al., 1988). Further distortion and unwinding would occur upon RNAP binding as a result of DNA bending (Amouyal and Buc, 1987; Travers, 1990). The promoter melting and recognition region of s conserved regions 2.3 and 2.4 may thus be poised to recognize and promote melting of the highly distorted 210 consensus element. Sequence-specific binding of the nontemplate strand would stabilize the transcription bubble in the open promoter complex and leave the template strand available for the RNAP catalytic machinery. Experimental Procedures The identification, characterization, purification, and crystallization of s702 (s70 residues 114–448 plus four N-terminal residues, Gly-SerHis-Met, left from the expression vector after thrombin treatment) was as described (E. S. et al., submitted). The crystals belong to the space group P41 21 2 with unit cell parameters a 5 b 5 79.229 Å, c 5 134.06 Å and one molecule in the asymmetric unit. Data were collected from frozen crystals held at 21708C, which diffracted X-rays (produced from a rotating anode generator) strongly to z3.5 Å and weakly to 2.9 Å resolution. Synchrotron radiation was used to collect diffraction data to 2.6 Å resolution. We were unable to obtain isomorphous derivatized crystals, prompting us to prepare the protein with its 10 methionine residues (9 in the s domain and 1 from the expression vector) substituted with selenomethionine (Hendrickson et al., 1990). MAD data were collected from one selenomethionyl crystal held at 21708C at the HHMI beamline X4A (NSLS, Brookhaven, NY) using Fuji imaging plates. The crystal was aligned with its c-axis parallel with v (the oscillation axis) to facilitate the simultaneous collection of anomalous pairs. The four wavelengths correspond to the inflection, the peak, and two remote values of the X-ray absorption spectrum of the derivatized crystal (l2, l3, l1 , and l4, respectively, Table 1). Data were processed using DENZO and SCALEPACK (Z. Otwinowski and W. Minor). Six of the ten possible selenium sites were found using SHELXS-90 in Patterson search mode (Sheldrick, 1991). The other 4 sites were found from difference Fouriers. Phasing was treated as a MIRAS problem, all 10 selenium sites were refined and phases calculated to 2.8 Å using MLPHARE (Z. Otwinowski) with three of the wavelengths treated as derivatives and one (l2) treated as parent. The resulting map revealed clear delineation between protein and solvent and a few helices were discernable (Figure 6). A dramatically improved map was generated using SQUASH (Zhang and Main, 1990), into which a polyalanine model of z50% of the residues was built using the program O (Jones et al., 1991). The map was improved by cycles of refinement using X-PLOR (Brünger, 1992), phase combination using SIGMAA (Read, 1986), and model building. None of the nonglycine backbone torsion angle combinations lie in unfavorable regions of the Ramachandran plot; 92% lie in the most favored regions. Coordinates will be submitted to the Brookhaven Protein Data Bank. Cell 134 Figure 6. Electron Density Maps Stereo views of (upper panel) the experimental, nonsolvent flattened electron density map (2.8 Å, 1.0 s), and (lower panel) the 2|Fo| 2 |Fc | electron density map (2.6 Å, 1.3 s) calculated from the final refined coordinates, shown as atomic stick figures using color coding for atom type (C, yellow; O, red; N, blue; two water molecules are shown as red crosses), generated using the program O (Jones et al., 1991). Solventexposed aromatic residues surrounded by basic residues from conserved region 2.3, proposed to be involved in promoter melting (Juang and Helmann, 1994; deHaseth and Helmann, 1995), are displayed. Structure of a s 70 Subunit Fragment 135 Acknowledgments (1991). Three-dimensional structure of yeast RNA polymerase II at 16 Å resolution. Cell 66, 121–128. We thank J. Kuriyan and S. K. Burley and the members of their laboratories for help and encouragement, and in particular J. Goldberg for invaluable advice. We thank W. A. Hendrickson and C. Ogata for access to and support at beamline X4A, which is funded by the Howard Hughes Medical Institute at the Brookhaven National Laboratory. We also thank W. Minor for help with DENZO and M. Lonetto for communicating unpublished sequence alignment information. S. A. D. is a Lucille P. Markey Scholar and a Pew Scholar in the Biomedical Sciences. This work was supported in part by grants to S. A. D. from the Lucille P. Markey Charitable Trust, the Irma T. Hirschl Trust, the Human Frontier Science Project, the Pew Foundation, and the National Institutes of Health. deHaseth, P.L., and Helmann, J.D. (1995). Open complex formation by Escherichia coli RNA polymerase: The mechanism of polymerase-induced strand separation of double helical DNA. Mol. Microbiol. 16, 817–824. Received July 19, 1996; revised August 13, 1996. References Ahearn, J.M., Bartolomei, M.S., West, M.L., Cisek, L.J., and Corden, J.L. (1987). Cloning and sequence analysis of the mouse genomic locus encoding the largest subunit of RNA polymerase II. J. Biol. Chem. 262, 10695–10705. Aiyar, S.E., Juang, Y.L., Helmann, J.D., and deHaseth, P.L. (1994). Mutations in sigma factor that affect the temperature dependence of transcription from a promoter, but not from a mismatch bubble in double-stranded DNA. Biochemistry 33, 11501–11506. Allison, L.A., Moyle, M., Shales, M., and Ingles, C.J. (1985). Extensive homology among the largest subunits of eukaryotic and prokaryotic RNA polymerases. Cell 42, 599–610. Amouyal, M., and Buc, H. (1987). Topological unwinding of strong and weak promoters by RNA polymerase. A comparison between the lac wild-type and the UV5 sites of Escherichia coli. J. Mol. Biol. 195, 795–808. Biggs, J., Searles, L.L., and Greenleaf, A.L. (1985). Structure of the eukaryotic transcription apparatus: Features of the gene for the largest subunit of Drosophila RNA polymerase II. Cell 42, 611–621. Brünger, A. (1992). X-PLOR (Version 3.1) Manual. 260 Whitney Avenue, New Haven, CT 06511, The Howard Hughes Medical Institute and Department of Molecular Biophysics and Biochemistry, Yale University. Buckle, M., and Buc, H. (1989). Fine mapping of DNA single-stranded regions using base-specific chemical probes. Biochemistry 28, 4388–4396. Buckle, M., Geiselmann, J., Kolb, A., and Buc, H. (1991). Protein-DNA cross-linking at the lac promoter. Nucleic Acids Res. 19, 833–840. Burgess, R.R., Travers, R.R., Dunn, J.J., and Bautz, E.K. F. (1969). Factor stimulating transcription by RNA polymerase. Nature 221, 43–44. Dombroski, A.J., Walter, W.A., Record, M.T., Siegele, D.A., and Gross, C.A. (1992). Polypeptides containing highly conserved regions of transcription initiation factor sigma70 exhibit specificity of binding to promoter DNA. Cell 70, 501–512. Dombroski, A.J., Walter, W.A., and Gross, C.A. (1993). Amino-terminal amino acids modulate sigma-factor DNA-binding activity. Genes Dev. 7, 2446–2455. Gardella, T., Moyle, T., and Susskind, M.M. (1989). A mutant Escherichia coli sigma 70 subunit of RNA polymerase with altered promoter specificity. J. Mol. Biol. 206, 579–590. Gribskov, M., and Burgess, R.R. (1986). Sigma factors from E. coli, B. subtilis, phase SPO1, and phage T4 are homologous proteins. Nucleic Acids Res. 14, 6745–6763. Gross, C.A., Lonetto, M., and Losick, R. (1992). Bacterial sigma factors. Transcriptional regulation Eds. K. Yamamoto and S. McKnight. Cold Spring Harbor, New York, Cold Spring Harbor Laboratory. Ha, I., Lane, W.S., and Reinberg, D. (1991). Cloning of a human gene encoding the general transcription initiation factor IIB. Nature 352, 689–694. Harley, C.B., and Reynolds, R.P. (1987). Analysis of E. coli promoter sequences. Nucleic Acids Res. 15, 2343–2361. Hawley, D.K., and McClure, W.R. (1983). Compilation and analysis of Escherichia coli promoter DNA sequences. Nucleic Acids Res. 11, 2237–2255. Helmann, J.D., and Chamberlin, M.J. (1988). Structure and function of bacterial sigma factors. Ann. Rev. Biochem. 57, 839–872. Hendrickson, W., Norton, J.R., and LeMaster, D.M. (1990). Selenomethionyl proteins produced for analysis by multiwavelength anomalous diffraction (MAD). EMBO J. 9, 1665–1672. Hendrickson, W.A. (1991). Determination of macromolecular structures from anomalous diffraction of synchrotron radiation. Science 254, 51–58. Hilton, M.D., and Whiteley, H.R. (1985). UV cross-linking of the Bacillus subtilis RNA polymerase to DNA in promoter and non-promoter complexes. J. Biol. Chem. 260, 8121–8117. Hinkle, D.C., and Chamberlin, M.J. (1972). Studies of the binding of Escherichia coli RNA polymerase to DNA. I. The role of sigma subunit in site selection. J. Mol. Biol. 70, 157–185. Carson, M. (1991). Ribbons 2.0. J. Appl. Crystallogr. 24, 958–961. Horikoshi, M., Wang, C.K., Fujii, H., Cromlish, J.A., Weil, P.A., and Roeder, R.B. (1989). Cloning and structure of a yeast gene encoding a general transcription initiation factor TFIID that binds to the TATA box. Nature 341, 299–303. Chan, B., Minchin, S., and Busby, S. (1990). Unwinding of duplex DNA during transcription initiation at the Escherichia coli galactose operon overlapping promoters. FEBS Lett. 267, 46–50. Jang, S.H., and Jaehning, J.A. (1991). The yeast mitochondrial RNA polymerase specificity factor, MTF1, is similar to bacterial s factors. J. Biol. Chem. 266, 22671–22677. Christie, G.E., and Cale, S.B. (1995). The sigma-70 subunit from Escherichia coli C differs from that of E. coli K-12. Gene 162, 161–162. Jones, C.H., and Moran, C.P.J. (1992). Mutant s factor blocks transition between promoter binding and initiation of transcription. Proc. Natl. Acad. Sci. USA 89, 1958–1962. Conaway, J.W., and Conaway, R.C. (1990). An RNA polymerase II transcription factor shares functional properties with Escherichia coli sigma 70. Science 248, 1550–1553. Jones, T.A., Zou, J.-Y., Cowan, S., and Kjeldgaard, M. (1991). Improved methods for building protein models in electron denstity maps and the location of errors in these models. Acta Crystallogr. A47, 110–119. Conaway, R.C., and Conaway, J.W. (1993). General initiation factors for RNA polymerase II. Ann. Rev. Biochem. 62, 161–190. Daniels, D., Zuber, P., and Losick, R. (1990). Two amino acids in an RNA polymerase s factor involved in the recognition of adjacent base pairs in the 210 region of a cognate promoter. Proc. Natl. Acad. Sci. USA 87, 8075–8079. Juang, Y.-L., and Helmann, J.D. (1994). A promoter melting region in the primary sigma factor of Bacillus subtilis: identification of functionally important aromatic amino acids. J. Mol. Biol. 235, 1470– 1488. Darst, S.A., Kubalek, E.W., and Kornberg, R.D. (1989). Three-dimensional structure of Escherichia coli RNA polymerase holoenzyme determined by electron crystallography. Nature 340, 730–732. Juang, Y.-L., and Helmann, J.D. (1995). Pathway of promoter melting by Bacillus subtilis RNA polymerase at a stable RNA promoter: effects of temperature, d protein, and s factor mutations. Biochemistry 34, 8465–8473. Darst, S.A., Edwards, A.M., Kubalek, E.W., and Kornberg, R.D. Kenney, T.J., York, K., Youngman, P., and Moran, C.P.J. (1989). Cell 136 Genetic evidence that RNA polymerase associated with s A factor uses a sporulation-specific promoter in Bacillus subtilis. Proc. Natl. Acad. Sci. USA 86, 9109–9113. Kirkegaard, K., Buc, H., Spassky, A., and Wang, J.C. (1983). Mapping of single-stranded regions in duplex DNA at the sequence level: single-strand-specific cytosine methylation in RNA polymerase-promoter complexes. Proc. Natl. Acad. Sci. USA 80, 2544–2548. Lesley, S.A., Brow, M.A.D., and Burgess, R.R. (1991). Use of in vitro protein synthesis from polymerase chain reaction-generated templates to study interactions of Escherichia coli transcription factors with core RNA polymerase and for epitope mapping of monoclonal antibodies. J. Biol. Chem. 266, 2632–2638. Lesley, S.A., and Burgess, R.R. (1989). Characterization of the Escherichia coli transcription factor sigma 70: localization of a region involved in the interaction with core RNA polymerase. Biochemistry 28, 7728–7734. Lonetto, M., Gribskov, M., and Gross, C.A. (1992). The s 70 family: sequence conservation and evolutionary relationships. J. Bacteriol. 174, 3843–3849. polymerase interacts homologously with two different promoters. Cell 20, 269–281. Siegele, D.A., Hu, J.C., Walter, W.A., and Gross, C.A. (1989). Altered promoter recognition by mutant forms of the sigma 70 subunit of Escherichia coli RNA polymerase. J. Mol. Biol. 206, 591–603. Simpson, R.B. (1979). The molecular topology of RNA polymerasepromoter interaction. Cell 18, 277–285. Sopta, M., Burton, Z.F., and Greenblatt, J. (1989). Structure and associated DNA-helicase activity of a general transcription initiation factor that binds to RNA polymerase II. Nature 341, 410–414. Spassky, A., Rimsky, S., Buc, H., and Busby, S. (1988). Correlation between the conformation of Escherichia coli 210 hexamer sequences and promoter strength: use of orthophenanthroline cuprous complex as a structural index. EMBO J. 7, 1871–1879. Stragier, P., Parsot, C., and Bouvier, J. (1985). Two functional domains conserved in major and alternate bacterial sigma factors. FEBS Lett. 187, 11–15. Losick, R., and Pero, J. (1981). Cascades of sigma factors. Cell 25, 582–584. Sumimoto, H., Ohkuma, Y., Sinn, E., Kato, H., Shimasaki, S., Horikoshi, M., and Roeder, R.G. (1991). Conserved sequence motifs in the small subunit of human general transcription factor TFIIE. Nature 354, 401–404. McCracken, S., and Greenblatt, J. (1991). Related RNA polymerasebinding regions in human RAP30/74 and Escherichia coli s 70. Science 253, 900–902. Sweetser, D., Nonet, M., and Young, R.A. (1987). Prokaryotic and eukaryotic RNA polymerases have homologous core subunits. Proc. Natl. Acad. Sci. USA 84, 1192–1196. Nagai, K., Oubridge, C., Jessen, T.H., Li, J., and Evans, P.R. (1990). Crystal structure of the RNA-binding domain of the U1 small nuclear ribonucleoprotein A. Nature 348, 515–520. Tatti, K.M., Jones, C.H., and Moran, C.P.J. (1991). Genetic evidence for interaction of sigma E with the spoIIID promoter in Bacillus subtilis. J. Bacteriol. 173, 7828–7833. Nicholls, A., Sharp, K.A., and Honig, B. (1991). Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons. Proteins Struct. Funct. Genet. 11, 281–296. Tintut, Y., and Gralla, J.D. (1995). PCR mutagenesis identifies a polymerase-binding sequence of sigma 54 that includes a sigma 70 homology region. J. Bacteriol. 177, 5818–5825. Nikolov, D.B., Hu, S.H., Lin, J., Gasch, A., Hoffmann, A., Horikoshi, M., Chua, N.H., Roeder, R.G., and Burley, S.K. (1992). Crystal structure of TFIID TATA-box binding protein. Nature 360, 40–46. Travers, A.A. (1990). Why bend DNA? Cell 60, 177–180. Ohkuma, Y., Sumimoto, H., Hoffmann, A., Shimasaki, S., Horikoshi, M., and Roeder, R.G. (1991). Structural motifs and potential s homologies in the large subunit of human general transcription factor TFIIE. Nature 354, 398–401. Waldburger, C., Gardella, T., Wong, R., and Susskind, M.M. (1990). Changes in conserved region 2 of Escherichia coli sigma 70 affecting promoter recognition. J. Mol. Biol. 215, 267–276. Park, C.S., Hillel, Z., and Wu, C.-W. (1980). DNA strand specificity in promoter recognition by RNA polymerase. Nucleic Acids Res. 8, 5895–5912. Polyakov, A., Severinova, E., and Darst, S.A. (1995). Three-dimensional structure of Escherichia coli core RNA polymerase: promoter binding and elongation conformations of the enzyme. Cell 83, 365–373. Read, R. (1986). Improved Fourier coefficients for maps using phases from partial structures with errors. Acta Crystallogr. A42, 140–149. Roeder, R.G. (1991). The complexities of eukaryotic transcription initiation: regulation of preinitiation complex assembly. Trends Biochem. Sci. 16, 402–408. Rong, J.C., and Helmann, J.D. (1994). Genetic and physiological studies of Bacillus subtilis sA mutants defective in promoter melting. J. Bacteriol. 176, 5218–5224. Schultz, P., Celia, H., Riva, M., Sentenac, A., and Oudet, P. (1993). Three-dimensional model of yeast RNA polymerase I determined by electron microscopy of two-dimensional crystals. EMBO J. 12, 2601–2607. Shamoo, Y., Friedman, A.M., Parsons, M.R., Konigsberg, W.H., and Steitz, T.A. (1995). Crystal structure of a replication fork singlestranded DNA binding protein (T4 gp32) complexed to DNA. Nature 376, 362–366. Sheldrick, G.M. (1991). Isomorphous Replacement and Anomalous Scattering, In Proc. CCP4 St. W., W. Wolf, P.R. Evans, and A.G.W. Leslie, eds. (Warrington, UK: Daresbury Laboratory), 23–28. Shuler, M.F., Tatti, K.M., Wade, K.H., and Moran, C.P.J. (1995). A single amino acid substitution in s E affects its ability to bind core RNA polymerase. J. Bacteriol. 177, 3687–3694. Siebenlist, U., Simpson, R.B., and Gilbert, W. (1980). E. coli RNA Travers, A.A., and Burgess, R.R. (1969). Cyclic re-use of the RNA polymerase sigma factor. Nature 222, 537–540. Wong, C., Tintut, Y., and Gralla, J.D. (1994). The domain structure of sigma 54 as determined by analysis of a set of deletion mutants. J. Mol. Biol. 236, 81–90. Zhang, K.Y.J., and Main, P. (1990). Histogram matching as a new density modification technique for phase refinement and extension of protein molecules. Acta Crystallogr. A46, 41–46. Zuber, P., Healy, J., Carter, H.L., III, Cutting, S., Moran, C.P., Jr., and Losick, R. (1989). Mutation changing the specificity of an RNA polymerase sigma factor. J. Mol. Biol. 206, 605–614. Note Added in Proof The data referred to above as E.S. et al., submitted, are now in press as the following: Severinova, E., Severinov, K., Fenyö, D., Marr, M., Brody, E.N., Roberts, J.W., Chait, B.T., and Darst, S.A. (1996). Domain organization of the Escherichia coli RNA polymerase s70 subunit. J. Microbiol., in press.