Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
10 Similarities and Differences between RNA and DNA Recognition by Proteins Thomas A. Steitz Department of Molecular Biophysics and Biochemistry and Department of Chemistry and Howard Hughes Medical Institute Yale University New Haven, Connecticut 06511 Many DNA and RNA molecules are recognized by proteins that interact preferentially with a specific DNA sequence or a particular RNA molecule. I address here the structural basis by which these proteins recognize their target nucleic acid and show in what ways recognition of RNA and DNA is both similar and different. Sequence-specific DNAbinding proteins interact with duplex DNA that is in B-form. RNA molecules, on the other hand, invariably consist of duplex regions, often stacked one on another, that are A-form, as well as regions of singlestranded loops and bulges, making possible a more complex and richly varied three-dimensional shape than can be assumed by duplex DNA. Presently, the crystallographic and nuclear magnetic resonance (NMR) structural database of proteins complexed with DNA is very large, revealing some patterns and general conclusions about the source of sequence-specific DNA recognition (for reviews, see Steitz 1990; Harrison 1991; Pabo and Sauer 1992). On the other hand, the structural database for RNA-binding proteins, particularly in complex with RNA, is very meager indeed, so that any generalizations made may soon be overturned by the next structure determination of an RNA-protein complex. Nevertheless, some patterns of similarity and difference in the structural basis of nucleic acid recognition by proteins can be seen at this time. Structural, biochemical, and molecular genetic studies of protein nucleic acid complexes have established at least three important sources of sequence specificity in protein-nucleic acid interactions: (1) Direct hydrogen bonding and van der Waals interaction between protein side chains and the exposed edges of base pairs provide structural complementarity to the correct, but not to the incorrect, sequences. The interactions are primarily, but not exclusively, in the major groove of B-DNA The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0/93 $5 + .00 The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. 219 220 T.A. Steitz and to both the minor groove and the major groove at the end of a helix or at a bulge in RNA structures. (2) The sequence-dependent bendability or deformability of duplex DNA or RNA molecules provides sequence selectivity by virtue of the ability of some nucleic acid sequences to take up a particular structure required for binding to a protein at a lower free energy cost than other sequences. (3) Bases of RNA that are in singlestranded regions or in bulges can be directly recognized by pockets on the protein that are complementary to these bases in shape and hydrogenbonding capabilities. THE PROBLEM THAT IS SET: WHAT IS BEING RECOGNIZED? Let us first consider the problem confronting proteins interacting with either duplex DNA or the duplex portion of an RNA molecule. The threedimensional structure of double-stranded DNA is highly polymorphic (Kennard and Hunter 1989), but variations of two forms, A-form and Bform, are of relevance to the proteins of interest here. Figure 1 shows an important difference between A-form and B-form DNA. In B-DNA, the major groove is wide enough to accommodate either an a-helix or an antiparallel ^-ribbon, and the functional groups on the exposed edges of the base pairs can be directly contacted by side chains of the protein. The minor groove, on the other hand, is deep and narrow (5.8 A wide) and thus less accessible to secondary structures such as an a-helix. For RNA, which is always A-form, the opposite is true. The minor groove is shallow and broad (10-11 A wide), whereas the major groove is very deep and narrow (4 A) (Delarue and Moras 1989). The width of the minor groove in B-DNA varies depending on its base composition, AT-rich sequences have a narrower minor groove (3.5 A) than GC-rich sequences (Yoon et al. 1988). Where adequate information is available it appears (as might be expected) that in general most DNA-binding proteins directly decode DNA sequences via interactions in the major groove, although some important exceptions are known. Escherichia coli integration host factor and eukaryotic TFIID appear to recognize sequences by interactions in the minor groove, although co-crystal structures of these proteins interacting with DNA are not yet available. Whereas on the basis of RNA structure alone one might expect proteins to discriminate among duplex RNA molecules by interactions with sequences via the minor groove, examples of interaction between protein and RNA in both the major and minor groove are now known (Rould et al. 1989; Ruff et al. 1991). Although it is true that the edges of base pairs are inaccessible in the major groove of A-form RNA in the central portion of a long duplex, most naturally occurring RNA mole- The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. Protein Recognition of RNA and DNA 221 Figure I Stru ctures of A-fo rm (top) and B-form (bottom) DNA in space -fi lling represent ation showing diffe rences in major and min or groove widt hs and shapes . In the models on the left, the hel ix axes are parallel to the page; on the right , the helix axes have bee n tilted up by 32° to show the groove shapes . Bases are co lored blue, phosphoru s atoms are gree n, and all ot her atoms are w hite. Th e edges of the bases are easi ly accessible fro m the major groove of B-DNA and the min or or shallow groove of A-DNA (or RN A). (m) Minor groove; (M) major groove. (Reprinted, w ith permiss ion, fro m Steitz 1990) . The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. 222 TA Steitz cules contain relatively short duplex regions interrupted by bulges or loops. Although these short duplex regions may be expected to stack as occurs in tRNA, the edges of base pairs exposed in the major grooves are accessible at the ends of these RNA helices. A second important consideration in the suitability of the major and minor grooves for direct sequence recognition is the degree of structural variation of the four base pairs as viewed from the two grooves. Seeman et al. (1976) pointed out that the base pairs presented a more richly varied set of hydrogen-bond donors to the major groove as compared to the minor groove. Figure 2 shows that the minor groove side of base pairs is a veritable recognition desert with only the N2 of guanine distinguishing AT from Gc. The patterns of donors and acceptors on the major groove side, however, can distinguish all four base pairs. In duplex regions, RNA has an opportunity available that does not exist in duplex DNA sequences: Non-Watson-Crick base pairs can exist within RNA helices (see, e.g., Fig. 8) and thus present to both the major and the minor groove hydrogen-bonding and shape differences not seen in the four orientations of the two Watson-Crick base pairs. Although GU base pairs Major groove Minor groove -+----'+- Figure 2 Hydrogen-bond donors and acceptors presented by Watson-Crick pairs to the major groove and the minor groove (adapted from Lewis et a1. 1985). The symbols for hydrogen-bond donors (hourglasses) and acceptors (diamonds) (Woodbury et al. 1980) show a varied pattern presented by the base pairs into the major groove and a poor information array presented into the minor groove. Although it is possible to distinguish among AT, TA, GC, and CB in the major groove, functional groups in the minor groove allow easy discrimination only between AT- and GC-containing base pairs. (Open circle) Methyl group. (Reprinted, with permission, from Steitz 1990.) The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. Protein Recognition of RNA and DNA 223 are perhaps the most common non-Watson-Crick base pairs seen in RNA, AG and various kinds of UU base pairs have been seen, and others may exist. Three other recognition opportunities that occur in RNA and not in duplex DNA, and that appear to be utilized by proteins binding specifically to RNA, are the single-stranded loop regions at the ends of helices, single-stranded bulges within helices, and modified bases. ROLE OF THE MAJOR GROOVE IN DNA AND RNA RECOGNITION The extensive hydrogen bonding in shape complementarity between the major groove of B-DNA and the surfaces of many of the sequencespecific DNA-binding proteins as a source of recognition has been extensively documented from high-resolution crystal structures and a few NMR structures of DNA complexes (for reviews, see Steitz 1990; Harrison 1991; Pabo and Sauer 1992). In general, structural complementarity between a protein and a specific DNA sequence is achieved in idiosyncratic manners: There does not appear to be a code for nucleic acid sequence recognition (Pabo 1983; Matthews 1988). Although particular amino acid side chains do not always recognize the same base pair, there are some apparent preferences, as suggested by Seeman et al. (1976). The guanidinium group of arginine very often makes a bidentate interaction with the N7 and 0 6 of guanine, although other interactions are also seen. Similarly, the hydrogen-bond donors and acceptors of the glutamine side chains are observed frequently to interact with the corresponding hydrogen-bond donors and acceptors of adenine. The ability of these side chains to make bidentate interactions with DNA greatly enhances their suitability for sequence-specific recognition (Seeman et al. 1976). The van der Waals interactions between the protein and the 5methyl group of thymine appear also to contribute to specificity. Presumably, the close packing of a protein against the GC base pair would in many cases sterically exclude its replacement by an AT base pair with its accompanying bulky 5-methyl group. Information concerning protein interaction in the major groove of RNA is sparse. Although details of the interaction have not yet been published, a protein loop at the end of a (J-hairpin in aspartyl-tRNA synthetase is observed to interact with at least the terminal base pair via the major groove of the acceptor stem of t R N A P (Ruff et al. 1991). Interactions between human immunodeficiency virus (HIV) tat and its target RNA TAR are hypothesized to occur in the region of a 3nucleotide bulge and on the major groove side (Weeks et al. 1990,1991), As The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. 224 TA Steitz The potential accessibility of an RNA major groove to protein side chains has been probed by K.M. Weeks and D.M. Crothers (in prep.), using diethyl pyrocarbonate (DEPC). DEPC carbethoxylates purines primarily at the N7 position in a reaction that is sensitive to the solvent exposure of the base (Vincze et al. 1973; Peattie and Gilbert 1980). The reagent is comparable in size to those protein side chains such as arginine that mediate RNA-protein interactions, suggesting that the rates of reactivity of this probe are likely to reflect the steric accessibility of purines to protein interaction. Although the major groove of an uninterrupted RNA duplex is relatively inaccessible to this reagent, as expected, the major groove at helix termini is accessible to modification, with the effect extending further on the 3' strand (Fig. 3). Furthermore, bulges in RNA helices larger than one nucleotide greatly increase the accessibility of flanking duplexes to reaction with DEPC. The structure of a portion of mv TAR RNA containing a 3nucleotide bulge and bound to arginine has been deduced from NMR data, showing one example of how a bulge can make the major groove of RNA accessible to a protein side chain (Puglisi et a1. 1992). A cytosine from the 3-nucleotide bulge makes a triple base pair with an adjacent GC forming a binding site for the guanidinium group of arginine and opening the major groove. ROLE OF NUCLEIC ACID BENDABILITY The sequence-dependent nucleic acid distortability is a very important source of specificity in many protein-RNA, as well as protein-DNA, interactions. Nucleic acid distortability as a more indirect source of sequence specificity arises from two facts: (1) Proteins often bind a con- major groove .lJ. .lJ. Figure 3 Schematic representation of an RNA duplex with the reactivity of purines to DEPC shown by filled circles whose diameter is proportional to reactivity (accessibility).Although the 5 'base is most accessible, the accessibility to DEPC extends further into the duplex on the 3' strand (K.M. Weeks and D.M. Crothers, in prep.), The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. Protein Recognition of RNA and DNA 225 formation of a nucleic acid that is altered from its uncomplexed solution conformation. (2) The free energy cost for various nucleic acid sequences to assume the conformation that is required for its binding to the protein is not the same for different sequences. Evidence for significant distortion of DNA upon binding to proteins now abounds, and in a few cases this protein-induced DNA distortion has been experimentally correlated with the ability of a protein to bind a specificity sequence. DNA distortion is seen in the crystal structures of DNA complexes with EcoRI (Frederick et al. 1984), 434 repressor (Aggarwal et al. 1988), trp repressor (Otwinowski et al. 1988), DNase I (Suck et al. 1988), Klenow fragment (Freemont et al. 1988), CAP (Schultz et a1. 1991), met repressor, and a growing roster of other proteins. The distortions of duplex DNA structure that have been observed in complexes include changes in twist, groove width, and kinks (Steitz 1990). Perhaps the most completely documented example of the correlations among DNA sequence, bendability, and affinity for proteins is in the case of E. coli catabolite gene activator protein (CAP). Gartenberg and Crothers (1988) found that CAP-binding sites containing AT bases at base-pair positions 10 and 11 from the center of the binding site bend more than those containing GC bases when bound to CAP (as assessed by polyacrylamide gel electrophoresis) and also bind CAP 14-fold more tightly. Thus, sequence, bending, and binding are correlated. The crystal structure of the CAP-DNA complex shows an 80 bend near base pair 10 and a very narrow minor groove that allows better interaction with the protein (Schultz et al. 1991). AT-rich sequences favor bending into the minor groove (as occurs) (Drew and Travers 1984) and also favor a narrow minor groove (Yoon et al. 1988). Experiments with 434 repressor also show a sequence dependence to DNA binding that most likely is a result of DNA distortability (Koudelka et al. 1987). Replacement of AT by GC base pairs at the dyad axis reduces binding of intact repressor by as much as 50-fold, despite the lack of base-specific contacts in this region. The sequence-dependent deform ability of duplex DNA or RNA that provides specificity for sequences being recognized by a protein can include the melting of base pairs. If binding to a protein requires melting of one or more base pairs, then the binding of mismatched base pairs should be favored over AT pairs, which in turn should bind better than GC base pairs. The order of binding should reflect the thermodynamic stability of base pairs. Two examples of the role of duplex meltability in sequence specificity can be cited-one in RNA and one in DNA. Binding of tRNA Gin to its cognate synthetase results in the breaking of the terminal The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. 226 T.A. Steitz base pair of the acceptor stem between nucleotides VI and A72 (Rould et aI. 1989). For glutaminyl-tRNA synthetase (GlnRS) recognition in charging of tRNA, it is important that this base-pair not be GC (Yarus et aI. 1977). The added free energy cost of breaking the GC base pair makes tRNAs containing a GC at 1-72 less suitable for proper binding to the enzyme, reducing kcatlKm by about 10-fold (Jahn et aI. 1991). In a second example, the 3' ,5 ' -exonuclease active site of E. coli DNA polymerase I is observed to denature duplex DNA and bind four single-stranded nucleotides at the 3' terminus (Freemont et aI. 1988). In a competition between the duplex-binding polymerase active site and the single-strandbinding exonuclease active site for the 3' end of the primer strand, duplex DNA containing a mismatch base pair will bind to the exonuclease site with greater frequency than a correctly matched duplex, thus enhancing the editing out of mismatch base pairs (Joyce and Steitz 1987; Freemont et al. 1988). Sequence recognition in RNA also arises from the sequencedependent ability of single-stranded RNA to take up the conformation required for protein binding, as occurs in the single-stranded acceptor end of tRNA Gin (Fig. 4). The observed interaction between the N2 of G73 and the backbone phosphate of A72 is not possible for the other three bases (Rould et al. 1989), consistent with the observation that changing G73 to A, C, or V reduces the kcat/Km for charging by one, three, and four orders of magnitude, respectively (Jahn et al. 1991). Furthermore, two non- Watson-Crick base pairs are formed at the end of the anticodon stem in tRNA Gin (see Fig. 8), producing a structure that is recognized by the synthetase (Rould et al. 1991). Other bases unable to make these nonWatson-Crick base pairs would not allow formation of the structure being recognized and bound to this enzyme. Although binding of tRNAAsp to its cognate synthetase results in a very major change in the conformation of the anticodon loop (Ruff et aI. 1991), it is not yet published whether or not any part of this structural change involves alterations in RNA-RNA interactions that are dependent on the RNA sequence, as occurs with GlnRS. ROLE OF WATER MOLECULES IN SEQUENCE RECOGNITION Buried water molecules appear to play a very important but underrecognized role in both DNA and RNA sequence recognition. Ascertaining the role of water molecules in sequence recognition requires crystal structures at sufficiently high resolution (usually 2.5 A or better) and refinement that water molecules can be reliably located. Water (or a protein hydroxyl group) can only make a base-specific hydrogen bond if The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. Protein Recognition of RNA and DNA 227 Figure 4 Conformation of the end of the acceptor stem and the 3' strand in tRNA GIn bound to GlnRS (from Rould and Steitz 1992). The expected base pair between VI and A72 is broken by Leu-136 , which packs against the guanine of the G2-C71 base pair. The 2-amino group of guanine 73 hydrogen-bonds to the phosphate backbone, stabilizing the hairpin conformation of the 3' strand into the active site. Cytosine 74 binds into a tight pocket in the protein, allowing the bases of nucleotides 73, 75, and 76 to stack. it is also making at least two other hydrogen bonds with obligate donors or acceptors on the protein and is sequestered from bulk solvent. In this circumstance, the two unsatisfied water H-bond donors/acceptors directed toward the nucleic acid become obligate donors/acceptors and consequently become part of the H-bonding template surface of the protein to which the nucleic acid must be complementary for optimal binding (Fig. 5). In trp repressor-DNA complex, there are three water molecules per half operator bound in the major groove between the protein and the DNA bases; at least two of them appear to be making The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. 228 T.A. Steitz NUCLEIC ACID 5 Schematic drawing showing how a water molecule can be specifically oriented by interactions with the protein turning it into a surrogate side chain. For example, here two obligate proton donors from the protein bind a water molecule such that it requires H-bond acceptors on the nucleic acid. Figure hydrogen bonds that specify base pairs 5, 6, and 7 from the dyad axis (Otwinowski et al. 1988; Steitz 1990). In this case, water molecules are playing the role of "honorary" protein side chains. In the GlnRS complex with tRNA, two buried water molecules are an integral part of the hydrogen-bonding matrix presented in the shallow groove of the tRNA acceptor stem (Rould et al. 1989; Rould and Steitz 1992). Hydrogen bonds between these two water molecules, as well as both a buried carboxy late of Asp-235 and a backbone amide of residue 183, serve to orient one hydrogen-bond donor of water toward the 0 2 of cytosine 71 and one acceptor toward the N2 of guanine (Fig. 6). ROLE OF THE MINOR GROOVE IN DNA AND RNA RECOGNITION As pointed out by Seeman et al. (1976), there are fewer features presented by base pairs in the minor groove that allow discrimination among the two base pairs in their two orientations (Fig. 2). The hydrogen-bond acceptors (N3 on guanine and adenine and 0 2 on cytosine and thymine) occur in almost the identical place in the minor groove for all four bases. Only the exocyclic of N2 of guanine distinguishes AT from GC and perhaps GC from CG. Furthermore, the minor groove of B-DNA is in general too narrow to accommodate an a-helix or too deep for bases to be reached by side chains alone. The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. Protein Recognition of RNA and DNA 229 6 View of the recognition interface between GlnRS and base pairs G2C71 and G3-C70 of tRNA (from Rould and Steitz 1992). Asp-235 directly bonds to the 2-amino group of guanine 3 via the minor groove. The backbone carbonyl of Pro-181 is rigidly directed to hydrogen-bond to the 2-amino group of guanine 2. A network of water molecules between the proteins and minor groove of the tRNA, only two of which are shown here, appear to enforce a requirement for GC base pairs at these positions. The hydrophobic environment formed by the proline, phenylalanine, isoleucine, and the underside of the ribose sugars enhances the strength and specificity of these direct and water-mediated hydrogen bonds. Figure Gln DNA Interaction in the Minor Groove There are ways, however, in which the interactions in the minor groove can be made sequence-specific. For example, the sequence preferences exhibited in the DNase I cleavage of DNA arise from its interactions in the minor groove (Suck et al. 1988). This side chain of a tyrosine observed to bind in the minor groove will fit into the normal-width minor groove but not into the narrower minor groove that characterizes AT-rich sequences. Biochemical evidence for several sequence-specific DNA-binding proteins implies that they interact with DNA via the minor groove, although direct structural visualization of such an interaction has not yet been achieved. Yang and Nash (1989) have argued on the basis of methylation protection studies that E. coli integration host factor (IHF) interacts in the minor groove. IHF has significant sequence similarity with E. coli Hu protein, whose crystal structure (Tanaka et al. 1984) The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. 230 T.A. Steitz shows two long antiparallel ~-Ioops, one from each subunit of the dimer, which form outstretched arms that create a large cleft sufficient in size to accommodate duplex DNA. The model for an IHF-DNA complex (Yang and Nash 1989) is based on one for Hu-DNA (Tanaka et a!. 1984) and places the antiparallel ~-loops in the minor groove in a manner proposed earlier for antiparallel f)-strands (Carter and Kraut 1974; Church et al. 1977). The recently determined crystal structure of Arabidopsis thaliana TFIID similarly portrays a protein with pseudo-dyad symmetry and a twisted, antiparallel ~-sheet forming a cleft of size sufficient to accommodate B-DNA (Nikolov et al. 1992). Biochemical data likewise point to minor groove interaction by this protein (Lee et al. 1991; Starr and Hawley 1991), although the structural basis of this interaction is not yet established. RNA Interaction in the Minor Groove There are now well-established examples of specific recognition of RNA in the minor groove (Rould et al. 1989; Musier-Forsyth and Schimmel 1992). With duplex RNA, which is A-form, the minor groove is shallow, wide, and accessible. Several sequence-specific interactions between GlnRS in the minor groove of tRNAGln have been observed (Rould et al. 1989,1991; Rould and Steitz 1992). Base pairs G2-C71 and G3-C70 are recognized in a base-specific manner by two protein "fingers," one an ahelix and the other a turn of an antiparallel f)-loop. In both cases, recognition involves contact between the 2-amino group of the guanine and hydrogen-bond acceptors of the protein. The carboxylate of an aspartic acid side chain 235 emanating from the amino end of a-helix H interacts with both the N2 of guanine 3 and a buried water molecule (Fig. 6). The peptide carboxyl group of Pro-181 interacts with the N2 of guanine 2. Substantiating the hypothesis that these two base pairs are among the recognition elements of tRNAGln, replacement of either by AU reduces the kcatlKm for charging by two to three orders of magnitude (Jahn et al. 1991). Furthermore, mutations in GlnRS that have increased rates of mischarging of noncognate tRNAs are changes of Asp-235 to asparagine or glycine (Conley et al. 1988; Perona et al. 1989), showing the importance of this interaction for discrimination. An additional interaction in the minor groove that is important for discrimination is between the carboxylate of Glu-323 and the N2 of GlO. The importance of protein interaction with the 2-amino group of guanine in the minor groove of RNA has also been established in the case of alanine tRNA synthetase recognition of tRNAAla (Hou and Schimmel 1988; McClain and Foss 1988; Hou et al. 1989; Musier- The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. Protein Recognition of RNA and DNA 231 Forsyth and Schimmel 1992). The alanine synthetase has been clearly shown to recognize base pair 3-70, which is GU in t R N A . Replacing G3 by an inosine, which lacks the N2, dramatically reduces charging of a minihelix (Musier-Forsyth and Schimmel 1992). Aia ROLE OF SINGLE-STRANDED REGIONS IN RECOGNITION Since RNA molecules have single-stranded regions in loops and bulges and between helical stems, these regions are potential targets for recognition by proteins that are not available in duplex DNA. The recognition of anticodon loops in tRNAs by cognate synthetases provides the bestcharacterized examples of protein recognition of single-stranded regions. Molecular genetic and biochemical studies have shown that the anticodon bases of tRNA serve as recognition elements for many of the aminoacyl-tRNA synthetases (Schulman and Pelka 1985; Normanly and Abelson 1989; Sampson et al. 1989). The co-crystal structures of glutaminyl-tRNA synthetase and aspartyl-tRNA synthetase complexed with their cognate tRNAs show that, upon forming a complex, the anticodon bases become unstacked so that they may bind into separate base recognition pockets. The energy required to unstack the anticodon bases (as they exist in the uncomplexed tRNA) is provided by interactions with the protein. Since bases in loop regions of uncomplexed RNAs tend to be stacked on each other and since optimal recognition of bases by a protein requires their unstacking in order for them to interact in separate recognition pockets, it may be the case more often than not that protein recognition of an RNA single-stranded region is accompanied by a significant conformational change in the RNA. Although details of the anticodon base interactions are not yet published for the aspartic acid enzyme (Ruff et al. 1991), Figure 7 shows how the three anticodon bases of t R N A are interacting with GlnRS (Rould et al. 1991; Rould and Steitz 1992). Each anticodon nucleotide is recognized primarily by a polypeptide segment of five or six amino acids. In all three cases, at least one positively charged amino acid from this segment forms a salt link with an adjacent negatively charged phosphate. The aliphatic portion of this residue generally packs against either the base or the hydrophobic "underside" of ribose. With all three anticodon bases, recognition is achieved by direct hydrogen bonding between the backbone and side chains of a short recognition peptide and the Watson-Crick hydrogen-bonding groups of the bases. Furthermore, several of the interactions presumed to be discriminating involve hydrogen bonds with charged side chains that are buried from solvent in the complex. Although there is no conserved sequence or structural simGln The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. , ,/ ;~J~ )1' e l' .. NH,o.,/ t 0 "'-'" e ji R341 ;} '--00 ~"!"";/ c<k' aff-!$~/~o~ R412 !I[~\ \..... co ..o.~'.t.'g'~ C-34 <. . ~irl' R341 E519 '..•' Irl·r f{, ~~\~" UJJl If 0 'b • . ,j(...... .... ""Il! -~ {O\( ~ '% ! V- • ct " , '7 " aff-1:.I~o R412 ,r~~\ : :.. 0=\ .' .Il! .,¥.. '"t/" r ~J'-! = IlII ~ NH~,C"'/ U-35 R520 ' 1?'. J....!. .V-),,{ . ._I~ ~<t'\ ~R410~ " .... .e Ii, .g••c~~;:;\.~ C-34 r. ~&" co II J,,,J' '=/l.~ R402.··: K401 . _ .,.... i: ~~ .~ ,:.{'b~~ 0 517 f .., ~ J~. ~Q\ 0.0 ,,,, .. .. e. ~ "- V ~~ G-36 J .)'. 1 ~ J \ y6' J{f. \ o\~ ,,' •••~. .9 co) .. e~,l' ~ Figure 7 Stereo view of the binding pockets for the anticodon bases C34, U35, and G36 (dark) in the GlnRS-tRNAGln complex. Each nucleotide is recognized primarily by a single short polypeptide segment in the enzyme (light). In each case, an arginine or lysine from the polypeptide anchors the nucleotide by its phosphate group, allowing peptide backbone and side chains of the segment to specifically recognize the base (from Rould and Steitz 1992). l~1 '\r" E519 ,irl·""o~~,\ nf!t .....__ !' 0 /1 •••• 1": ~& .... ....• .! Co ::.. U-35 ~0' -I -II. II" . 0 ~,}~~ . " J.. v-Q~r . ._)<l~'\~o~R410~ R520 "r .' i \.~~ {~•.('••' r:;=:!/.. .•' •~ h' v-:» <.i? ~</~ ~R402 .......:: K401 0...0 ". 05 17 . O~,( G-36 It t -T/r ~1'1 i i ~ ...\ if 1".. N .,~s.,. co) ", .9 U. .0';· ~ e~~ ~ N Ff ~ >- -l N Coo) Protein Recognition of RNA and DNA 233 ilarity among these segments, they are predominantly in extended p-type conformation. That the three anticodon bases of t R N A serve as recognition elements is confirmed by kinetic studies of mutant tRNAs (Jahn et al. 1991). Changes of the anticodon bases reduce k /K by 3 - 4 orders of magnitude. Interestingly, it is k that changes the most, not K . A structural mechanism involving an anticodon-induced conformational change in the protein transmitted to the ATP-binding site has been hypothesized to account for the importance of anticodon base identity for catalysis (Rould et al. 1991). G i n CM m cat m ROLE OF MODIFIED BASES Many of the RNA molecules that are recognized by proteins contain modified bases, whose role in specific recognition remains largely unknown. There are at least two ways that modified bases might enhance recognition: by stabilizing RNA conformations that otherwise would be less favored and by changing the shape of a recognition site. The N at position 5 of pseudo-uridine is observed in t R N A to interact with a water molecule, which in turn is interacting with the phosphate of the pseudo-uridine and the preceding phosphate, an interaction and conformation seen in all but one of the pseudo-uridines in tRNAs of known structure (J. Arnez and T.A. Steitz, unpubl.). This structure so stabilized may be significant in protein recognition. Base modifications of noncognate tRNAs at nucleotides 34 and 37 may act as negative determinants of aminoacylation by GlnRS. The tightly packed interface between A37 and the protein (Fig. 8) suggests that bases with bulky modifications of A37, for example, 6-carbamoylthreonyl adenine or 2-methylthio-N6-isopentenyl adenine, may provide an additional source of discrimination against the many noncognate tRNA molecules bearing these modifications. A role for modified bases at position 37 in tRNA discrimination has already been suggested by biochemical studies showing that ArgRS misacylates t R N A P lacking modified bases (Ferret et al. 1990). Likewise, C34 is tightly packed into a pocket that is covered by a loop of protein; this pocket may not accommodate certain modified bases at position 34, such as queuosine. The 2'-ribosylated adenosine 64 in the initiator methionine tRNA from yeast appears to play an important role in assuring that this tRNA is only used in the initiation of protein synthesis and not in elongation (Kiesewetter et al. 1990). Again, the modification appears to function as a negative effector, since the modified t R N A j will not bind to an EFTu or get inserted in elongation, whereas the t R N A j that is unG , n As Met Met The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. 234 T.A. Steitz 8 Stereo view of the two novel non-Watson-Crick base pairs that extend the anticodon stem of tRNA when complexed to GlnRS, showing the water network between these bases and the sugar-phosphate backbone. Asp-370 directly contacts both base pairs via the minor groove. Figure Gln modified at position 64 will both bind to EF-Tu and participate in elongation. SUMMARY Many aspects of protein recognition of RNA and of DNA are very similar, such as the importance of sequence-dependent distortability of the nucleic acid and the role of specific water-mediated interaction. Although recognition of both nucleic acids can be achieved through true direct protein interaction with the exposed edges of base pairs in either the major or minor grooves of duplex, interactions via the major groove appear to dominate in DNA recognition, whereas the opposite preference may occur with RNA (although many more examples are required to establish this point). Interactions with single-stranded bases in RNA may prove to be the most significant in RNA recognition and are not at all characteristic of DNA recognition. Whether there are simple, recurring protein motifs involved in RNA recognition, as has been found for DNA, is not yet known. Direct recognition of bases in DNA is achieved by various simple motifs that present an a-helix, antiparallel pMoop, or polypeptide chain end into the major groove of DNA. Although RNA recognition domains such as the RNP motif are known, it is too early to tell whether there are simple and general ways in which protein secondary structures —helix, antiparallel (3-strands, or loops — interact with RNA. The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. Protein Recognition of RNA and DNA 235 REFERENCES Aggarwal, A.K., D.W. Rodgers, M. Drottar, M. Ptashne, and S.c. Harrison. 1988. Recognition of a DNA operator by the repressor of phage 434: A view at high resolution. Science 242: 99-107. Carter, C.W. and J. Kraut. 1974. A proposed model for interaction of polypeptides with RNA. Proc. NaIl. Acad. Sci. 71: 283-287. Church, G.M., J.L. Sussman, and S.-H. Kim. 1977. Secondary structure complementarity between DNA and proteins. Proc. Natl. Acad. Sci. 74: 1458-1462. Conley, J., H. Uemura, F. Yamao, J. Rogers, and D. SOIL 1988. E. coli glutaminyl tRNA synthetase: A single amino acid replacement relaxes tRNA specificity. Protein Sequences Data Anal. 1: 479-485. Delarue, M. and D. Moras. 1989. RNA structure. Nucleic Acids Mol. Biol. 3: 182-196. Drew, H.R. and A.A. Travers. 1984. DNA structural variations in the E. coli tyrT promoter. Cell 37: 491-502. Frederick, CA, J. Grable, M. Melia, C. Samudzi, L. Jen-Jacobsen, B.-C. Wang, P. Greene, H.W. Boyer, and J.M. Rosenberg. 1984. Kinked DNA in crystalline complex with EcoRI endonuclease. Nature 309: 327-331. Freemont, P.S., J.M. Friedman, L.S. Beese, M.R. Sanderson, and T.A. Steitz. 1988. Cocrystal structure of an editing complex of Klenow fragment with DNA. Proc. Natl. Acad. Sci. 85: 8924-8928. Gartenberg, M.R. and D.M. Crothers. 1988. DNA sequence determinants of CAPinduced bending and protein binding affinity. Nature 333: 824-829. Harrison, S.c. 1991. A structural taxonomy of DNA-binding domains. Nature 353: 715-719. Hou, Y.-M. and P. Schimmel. 1988. A simple structural feature is a major determinant of the identity of a transfer RNA. Nature 333: 140-145. Hou, Y.-M., C. Francklyn, and P. Schimmel. 1989. Molecular dissection of a transfer RNA and the basis for its identity. Trends Biochem. Sci. 14: 233-237. Jahn, M., J. Rogers, and D. S611. 1991. Anticodon and acceptor stem nucleotides in tRNA GIn are major recognition elements for E. coli glutaminyl-tRNA synthetase. Nalure 352: 258-260. Joyce, C.M. and T.A. Steitz. 1987. DNA polymerase 1. From crystal structure to function via genetics. Trends Biochem. Sci. 12: 288-292. Kennard, O. and W.N. Hunter. 1989. Oligonucleotide structure: A decade of results from single crystal X-ray diffraction studies. Q. Rev. Biophys. 22: 327-379. Kiesewetter, S., G. Ott, and M. Sprinzl. 1990. The role of modified purine 64 in initiator/elongator discrimination of tRNA j Met from yeast and wheat germ. Nucleic Acids Res. 18: 4677-4682. Koudelka, G.B., S.C. Harrison, and M. Ptashne. 1987. Effect of non-contacted bases on the affinity of 434 operator for 434 repressor and cro. Nature 326: 886-888. Lee, D.K., M. Horikoshi, and R.G. Roeder. 1991. Interaction of TFIID in the minor groove of the TATA element. Cell 67: 1241-1250. Lewis, M., J. Wang, and C. Pabo. 1985. Structure of the operator binding domain of lambda repressor. In Biological macromolecules and assemblies, vol. 2 (ed. F.A. Jurnak and A. McPherson), pp. 266-287. Wiley, New York. Matthews, B.W. 1988. No code for recognition. Nature 335: 294-295. McClain, W.C. and K. Foss. 1988. Changing the identity of a tRNA by introducing a GU wobble pair near the 3' acceptor end. Science 240: 793-796. <, Musier-Forsyth, K. and P. Schimmel. 1992. Functional contact of a transfer RNA The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright. Protein Recognition of RNA and DNA 237 ture of a protein with histone-like properties in prokaryotes. Nature MO: 3 7 6 - 3 8 1 . Vincze, A., R.E.L. Henderson, J.J. McDonald, and N.J. Leonard. 1973. Reaction of diethyl pyrocarbonate with nucleic acid components. Bases and nucleosides derived from guanine, cytosine, and uracil. J. Am. Chem. Soc. 95: 2677-2682. Weeks, K.M. and D.M. Crothers. 1991. RNA recognition by tat-derived peptides: Interaction in the major groove? Cell 66: 577-588. Weeks, K.M., C. Ampe, S.C. Schultz, T.A. Steitz, and D.M. Crothers. 1990. Fragments of the HIV-1 tat protein specifically bind TAR RNA: Peptide recognition of bulged RNA. Science 249: 1281-1285. Woodbury, C P . , O. Hagenbiichle and P.H. von Hippel. 1980. DNA site recognition and reduced specificity of the Ecor I endonuclease. J. Biol. Chem, 255: 11534-11546. Yang, C.-C. and H.W. Nash. 1989. The interaction of E. coli IHF protein with its specific-binding sites. Cell 57: 869-880. Yarus, M., R. Knowlton, and L. Soil. 1977. Aminoacylation of the ambivalent Su+7 amber suppressor tRNA. In Nucleic acids protein recognition (ed. H.J. Vogel), pp. 391-409. Academic Press, New York. Yoon, C , G.G. Prive, D.S. Goodsell, and R.E. Dickerson. 1988. Structure of an alternating-B DNA helix and its relationship to A-tract DNA. Proc. Natl. Acad. Sci. 85: 6332-6336. The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0 For conditions see www.cshlpress.com/copyright.