Download Similarities and Differences between RNA and DNA Recognition by

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Replisome wikipedia , lookup

Helicase wikipedia , lookup

DNA nanotechnology wikipedia , lookup

Transcript
10
Similarities and Differences between
RNA and DNA Recognition by Proteins
Thomas A. Steitz
Department of Molecular Biophysics and Biochemistry
and Department of Chemistry
and Howard Hughes Medical Institute
Yale University
New Haven, Connecticut 06511
Many DNA and RNA molecules are recognized by proteins that interact
preferentially with a specific DNA sequence or a particular RNA
molecule. I address here the structural basis by which these proteins
recognize their target nucleic acid and show in what ways recognition of
RNA and DNA is both similar and different. Sequence-specific DNAbinding proteins interact with duplex DNA that is in B-form. RNA
molecules, on the other hand, invariably consist of duplex regions, often
stacked one on another, that are A-form, as well as regions of singlestranded loops and bulges, making possible a more complex and richly
varied three-dimensional shape than can be assumed by duplex DNA.
Presently, the crystallographic and nuclear magnetic resonance (NMR)
structural database of proteins complexed with DNA is very large,
revealing some patterns and general conclusions about the source of
sequence-specific DNA recognition (for reviews, see Steitz 1990; Harrison 1991; Pabo and Sauer 1992). On the other hand, the structural database for RNA-binding proteins, particularly in complex with RNA, is
very meager indeed, so that any generalizations made may soon be overturned by the next structure determination of an RNA-protein complex.
Nevertheless, some patterns of similarity and difference in the structural
basis of nucleic acid recognition by proteins can be seen at this time.
Structural, biochemical, and molecular genetic studies of protein
nucleic acid complexes have established at least three important sources
of sequence specificity in protein-nucleic acid interactions: (1) Direct
hydrogen bonding and van der Waals interaction between protein side
chains and the exposed edges of base pairs provide structural complementarity to the correct, but not to the incorrect, sequences. The interactions are primarily, but not exclusively, in the major groove of B-DNA
The RNA World
© 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0/93 $5 + .00
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
219
220
T.A. Steitz
and to both the minor groove and the major groove at the end of a helix
or at a bulge in RNA structures. (2) The sequence-dependent bendability
or deformability of duplex DNA or RNA molecules provides sequence
selectivity by virtue of the ability of some nucleic acid sequences to take
up a particular structure required for binding to a protein at a lower free
energy cost than other sequences. (3) Bases of RNA that are in singlestranded regions or in bulges can be directly recognized by pockets on
the protein that are complementary to these bases in shape and hydrogenbonding capabilities.
THE PROBLEM THAT IS SET: WHAT IS BEING RECOGNIZED?
Let us first consider the problem confronting proteins interacting with either duplex DNA or the duplex portion of an RNA molecule. The threedimensional structure of double-stranded DNA is highly polymorphic
(Kennard and Hunter 1989), but variations of two forms, A-form and Bform, are of relevance to the proteins of interest here. Figure 1 shows an
important difference between A-form and B-form DNA. In B-DNA, the
major groove is wide enough to accommodate either an a-helix or an
antiparallel ^-ribbon, and the functional groups on the exposed edges of
the base pairs can be directly contacted by side chains of the protein. The
minor groove, on the other hand, is deep and narrow (5.8 A wide) and
thus less accessible to secondary structures such as an a-helix. For RNA,
which is always A-form, the opposite is true. The minor groove is shallow and broad (10-11 A wide), whereas the major groove is very deep
and narrow (4 A) (Delarue and Moras 1989). The width of the minor
groove in B-DNA varies depending on its base composition, AT-rich sequences have a narrower minor groove (3.5 A) than GC-rich sequences
(Yoon et al. 1988). Where adequate information is available it appears
(as might be expected) that in general most DNA-binding proteins directly decode DNA sequences via interactions in the major groove, although
some important exceptions are known. Escherichia coli integration host
factor and eukaryotic TFIID appear to recognize sequences by interactions in the minor groove, although co-crystal structures of these proteins
interacting with DNA are not yet available.
Whereas on the basis of RNA structure alone one might expect
proteins to discriminate among duplex RNA molecules by interactions
with sequences via the minor groove, examples of interaction between
protein and RNA in both the major and minor groove are now known
(Rould et al. 1989; Ruff et al. 1991). Although it is true that the edges of
base pairs are inaccessible in the major groove of A-form RNA in the
central portion of a long duplex, most naturally occurring RNA mole-
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
Protein Recognition of RNA and DNA
221
Figure I Stru ctures of A-fo rm (top) and B-form (bottom) DNA in space -fi lling
represent ation showing diffe rences in major and min or groove widt hs and
shapes . In the models on the left, the hel ix axes are parallel to the page; on the
right , the helix axes have bee n tilted up by 32° to show the groove shapes . Bases
are co lored blue, phosphoru s atoms are gree n, and all ot her atoms are w hite. Th e
edges of the bases are easi ly accessible fro m the major groove of B-DNA and
the min or or shallow groove of A-DNA (or RN A). (m) Minor groove; (M) major
groove. (Reprinted, w ith permiss ion, fro m Steitz 1990) .
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
222
TA Steitz
cules contain relatively short duplex regions interrupted by bulges or
loops. Although these short duplex regions may be expected to stack as
occurs in tRNA, the edges of base pairs exposed in the major grooves are
accessible at the ends of these RNA helices.
A second important consideration in the suitability of the major and
minor grooves for direct sequence recognition is the degree of structural
variation of the four base pairs as viewed from the two grooves. Seeman
et al. (1976) pointed out that the base pairs presented a more richly
varied set of hydrogen-bond donors to the major groove as compared to
the minor groove. Figure 2 shows that the minor groove side of base
pairs is a veritable recognition desert with only the N2 of guanine distinguishing AT from Gc. The patterns of donors and acceptors on the
major groove side, however, can distinguish all four base pairs. In duplex
regions, RNA has an opportunity available that does not exist in duplex
DNA sequences: Non-Watson-Crick base pairs can exist within RNA
helices (see, e.g., Fig. 8) and thus present to both the major and the minor
groove hydrogen-bonding and shape differences not seen in the four
orientations of the two Watson-Crick base pairs. Although GU base pairs
Major groove
Minor groove
-+----'+-
Figure 2 Hydrogen-bond donors and acceptors presented by Watson-Crick pairs
to the major groove and the minor groove (adapted from Lewis et a1. 1985). The
symbols for hydrogen-bond donors (hourglasses) and acceptors (diamonds)
(Woodbury et al. 1980) show a varied pattern presented by the base pairs into
the major groove and a poor information array presented into the minor groove.
Although it is possible to distinguish among AT, TA, GC, and CB in the major
groove, functional groups in the minor groove allow easy discrimination only
between AT- and GC-containing base pairs. (Open circle) Methyl group.
(Reprinted, with permission, from Steitz 1990.)
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
Protein Recognition of RNA and DNA
223
are perhaps the most common non-Watson-Crick base pairs seen in
RNA, AG and various kinds of UU base pairs have been seen, and others
may exist.
Three other recognition opportunities that occur in RNA and not in
duplex DNA, and that appear to be utilized by proteins binding specifically to RNA, are the single-stranded loop regions at the ends of helices,
single-stranded bulges within helices, and modified bases.
ROLE OF THE MAJOR GROOVE IN DNA AND
RNA RECOGNITION
The extensive hydrogen bonding in shape complementarity between the
major groove of B-DNA and the surfaces of many of the sequencespecific DNA-binding proteins as a source of recognition has been extensively documented from high-resolution crystal structures and a few
NMR structures of DNA complexes (for reviews, see Steitz 1990; Harrison 1991; Pabo and Sauer 1992). In general, structural complementarity
between a protein and a specific DNA sequence is achieved in idiosyncratic manners: There does not appear to be a code for nucleic acid sequence recognition (Pabo 1983; Matthews 1988). Although particular
amino acid side chains do not always recognize the same base pair, there
are some apparent preferences, as suggested by Seeman et al. (1976).
The guanidinium group of arginine very often makes a bidentate interaction with the N7 and 0 6 of guanine, although other interactions are also
seen. Similarly, the hydrogen-bond donors and acceptors of the
glutamine side chains are observed frequently to interact with the corresponding hydrogen-bond donors and acceptors of adenine. The ability
of these side chains to make bidentate interactions with DNA greatly enhances their suitability for sequence-specific recognition (Seeman et al.
1976). The van der Waals interactions between the protein and the 5methyl group of thymine appear also to contribute to specificity.
Presumably, the close packing of a protein against the GC base pair
would in many cases sterically exclude its replacement by an AT base
pair with its accompanying bulky 5-methyl group.
Information concerning protein interaction in the major groove of
RNA is sparse. Although details of the interaction have not yet been published, a protein loop at the end of a (J-hairpin in aspartyl-tRNA
synthetase is observed to interact with at least the terminal base pair via
the major groove of the acceptor stem of t R N A P (Ruff et al. 1991). Interactions between human immunodeficiency virus (HIV) tat and its
target RNA TAR are hypothesized to occur in the region of a 3nucleotide bulge and on the major groove side (Weeks et al. 1990,1991),
As
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
224
TA Steitz
The potential accessibility of an RNA major groove to protein side
chains has been probed by K.M. Weeks and D.M. Crothers (in prep.),
using diethyl pyrocarbonate (DEPC). DEPC carbethoxylates purines
primarily at the N7 position in a reaction that is sensitive to the solvent
exposure of the base (Vincze et al. 1973; Peattie and Gilbert 1980). The
reagent is comparable in size to those protein side chains such as arginine
that mediate RNA-protein interactions, suggesting that the rates of reactivity of this probe are likely to reflect the steric accessibility of purines
to protein interaction. Although the major groove of an uninterrupted
RNA duplex is relatively inaccessible to this reagent, as expected, the
major groove at helix termini is accessible to modification, with the effect extending further on the 3' strand (Fig. 3). Furthermore, bulges in
RNA helices larger than one nucleotide greatly increase the accessibility
of flanking duplexes to reaction with DEPC.
The structure of a portion of mv TAR RNA containing a 3nucleotide bulge and bound to arginine has been deduced from NMR
data, showing one example of how a bulge can make the major groove of
RNA accessible to a protein side chain (Puglisi et a1. 1992). A cytosine
from the 3-nucleotide bulge makes a triple base pair with an adjacent GC
forming a binding site for the guanidinium group of arginine and opening
the major groove.
ROLE OF NUCLEIC ACID BENDABILITY
The sequence-dependent nucleic acid distortability is a very important
source of specificity in many protein-RNA, as well as protein-DNA, interactions. Nucleic acid distortability as a more indirect source of sequence specificity arises from two facts: (1) Proteins often bind a con-
major groove
.lJ.
.lJ.
Figure 3 Schematic representation of an RNA duplex with the reactivity of
purines to DEPC shown by filled circles whose diameter is proportional to reactivity (accessibility).Although the 5 'base is most accessible, the accessibility to
DEPC extends further into the duplex on the 3' strand (K.M. Weeks and D.M.
Crothers, in prep.),
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
Protein Recognition of RNA and DNA
225
formation of a nucleic acid that is altered from its uncomplexed solution
conformation. (2) The free energy cost for various nucleic acid sequences to assume the conformation that is required for its binding to the
protein is not the same for different sequences.
Evidence for significant distortion of DNA upon binding to proteins
now abounds, and in a few cases this protein-induced DNA distortion has
been experimentally correlated with the ability of a protein to bind a
specificity sequence. DNA distortion is seen in the crystal structures of
DNA complexes with EcoRI (Frederick et al. 1984), 434 repressor (Aggarwal et al. 1988), trp repressor (Otwinowski et al. 1988), DNase I
(Suck et al. 1988), Klenow fragment (Freemont et al. 1988), CAP
(Schultz et a1. 1991), met repressor, and a growing roster of other
proteins. The distortions of duplex DNA structure that have been observed in complexes include changes in twist, groove width, and kinks
(Steitz 1990).
Perhaps the most completely documented example of the correlations
among DNA sequence, bendability, and affinity for proteins is in the
case of E. coli catabolite gene activator protein (CAP). Gartenberg and
Crothers (1988) found that CAP-binding sites containing AT bases at
base-pair positions 10 and 11 from the center of the binding site bend
more than those containing GC bases when bound to CAP (as assessed
by polyacrylamide gel electrophoresis) and also bind CAP 14-fold more
tightly. Thus, sequence, bending, and binding are correlated. The crystal
structure of the CAP-DNA complex shows an 80 bend near base pair 10
and a very narrow minor groove that allows better interaction with the
protein (Schultz et al. 1991). AT-rich sequences favor bending into the
minor groove (as occurs) (Drew and Travers 1984) and also favor a narrow minor groove (Yoon et al. 1988). Experiments with 434 repressor
also show a sequence dependence to DNA binding that most likely is a
result of DNA distortability (Koudelka et al. 1987). Replacement of AT
by GC base pairs at the dyad axis reduces binding of intact repressor by
as much as 50-fold, despite the lack of base-specific contacts in this
region.
The sequence-dependent deform ability of duplex DNA or RNA that
provides specificity for sequences being recognized by a protein can include the melting of base pairs. If binding to a protein requires melting of
one or more base pairs, then the binding of mismatched base pairs should
be favored over AT pairs, which in turn should bind better than GC base
pairs. The order of binding should reflect the thermodynamic stability of
base pairs. Two examples of the role of duplex meltability in sequence
specificity can be cited-one in RNA and one in DNA. Binding of
tRNA Gin to its cognate synthetase results in the breaking of the terminal
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
226
T.A. Steitz
base pair of the acceptor stem between nucleotides VI and A72 (Rould et
aI. 1989). For glutaminyl-tRNA synthetase (GlnRS) recognition in charging of tRNA, it is important that this base-pair not be GC (Yarus et aI.
1977). The added free energy cost of breaking the GC base pair makes
tRNAs containing a GC at 1-72 less suitable for proper binding to the enzyme, reducing kcatlKm by about 10-fold (Jahn et aI. 1991). In a second
example, the 3' ,5 ' -exonuclease active site of E. coli DNA polymerase I
is observed to denature duplex DNA and bind four single-stranded
nucleotides at the 3' terminus (Freemont et aI. 1988). In a competition
between the duplex-binding polymerase active site and the single-strandbinding exonuclease active site for the 3' end of the primer strand,
duplex DNA containing a mismatch base pair will bind to the exonuclease site with greater frequency than a correctly matched duplex,
thus enhancing the editing out of mismatch base pairs (Joyce and Steitz
1987; Freemont et al. 1988).
Sequence recognition in RNA also arises from the sequencedependent ability of single-stranded RNA to take up the conformation required for protein binding, as occurs in the single-stranded acceptor end
of tRNA Gin (Fig. 4). The observed interaction between the N2 of G73
and the backbone phosphate of A72 is not possible for the other three
bases (Rould et al. 1989), consistent with the observation that changing
G73 to A, C, or V reduces the kcat/Km for charging by one, three, and
four orders of magnitude, respectively (Jahn et al. 1991). Furthermore,
two non- Watson-Crick base pairs are formed at the end of the anticodon
stem in tRNA Gin (see Fig. 8), producing a structure that is recognized by
the synthetase (Rould et al. 1991). Other bases unable to make these nonWatson-Crick base pairs would not allow formation of the structure
being recognized and bound to this enzyme. Although binding of
tRNAAsp to its cognate synthetase results in a very major change in the
conformation of the anticodon loop (Ruff et aI. 1991), it is not yet published whether or not any part of this structural change involves alterations in RNA-RNA interactions that are dependent on the RNA sequence, as occurs with GlnRS.
ROLE OF WATER MOLECULES IN SEQUENCE RECOGNITION
Buried water molecules appear to play a very important but underrecognized role in both DNA and RNA sequence recognition. Ascertaining the role of water molecules in sequence recognition requires
crystal structures at sufficiently high resolution (usually 2.5 A or better)
and refinement that water molecules can be reliably located. Water (or a
protein hydroxyl group) can only make a base-specific hydrogen bond if
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
Protein Recognition of RNA and DNA
227
Figure 4 Conformation of the end of the acceptor stem and the 3' strand in
tRNA GIn bound to GlnRS (from Rould and Steitz 1992). The expected base pair
between VI and A72 is broken by Leu-136 , which packs against the guanine of
the G2-C71 base pair. The 2-amino group of guanine 73 hydrogen-bonds to the
phosphate backbone, stabilizing the hairpin conformation of the 3' strand into
the active site. Cytosine 74 binds into a tight pocket in the protein, allowing the
bases of nucleotides 73, 75, and 76 to stack.
it is also making at least two other hydrogen bonds with obligate donors
or acceptors on the protein and is sequestered from bulk solvent. In this
circumstance, the two unsatisfied water H-bond donors/acceptors
directed toward the nucleic acid become obligate donors/acceptors and
consequently become part of the H-bonding template surface of the
protein to which the nucleic acid must be complementary for optimal
binding (Fig. 5). In trp repressor-DNA complex, there are three water
molecules per half operator bound in the major groove between the
protein and the DNA bases; at least two of them appear to be making
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
228
T.A. Steitz
NUCLEIC ACID
5 Schematic drawing showing how a water molecule can be specifically
oriented by interactions with the protein turning it into a surrogate side chain.
For example, here two obligate proton donors from the protein bind a water
molecule such that it requires H-bond acceptors on the nucleic acid.
Figure
hydrogen bonds that specify base pairs 5, 6, and 7 from the dyad axis
(Otwinowski et al. 1988; Steitz 1990). In this case, water molecules are
playing the role of "honorary" protein side chains. In the GlnRS complex
with tRNA, two buried water molecules are an integral part of the
hydrogen-bonding matrix presented in the shallow groove of the tRNA
acceptor stem (Rould et al. 1989; Rould and Steitz 1992). Hydrogen
bonds between these two water molecules, as well as both a buried carboxy late of Asp-235 and a backbone amide of residue 183, serve to
orient one hydrogen-bond donor of water toward the 0 2 of cytosine 71
and one acceptor toward the N2 of guanine (Fig. 6).
ROLE OF THE MINOR GROOVE IN DNA AND
RNA RECOGNITION
As pointed out by Seeman et al. (1976), there are fewer features presented by base pairs in the minor groove that allow discrimination among the
two base pairs in their two orientations (Fig. 2). The hydrogen-bond acceptors (N3 on guanine and adenine and 0 2 on cytosine and thymine)
occur in almost the identical place in the minor groove for all four bases.
Only the exocyclic of N2 of guanine distinguishes AT from GC and perhaps GC from CG. Furthermore, the minor groove of B-DNA is in general too narrow to accommodate an a-helix or too deep for bases to be
reached by side chains alone.
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
Protein Recognition of RNA and DNA
229
6 View of the recognition interface between GlnRS and base pairs G2C71 and G3-C70 of tRNA (from Rould and Steitz 1992). Asp-235 directly
bonds to the 2-amino group of guanine 3 via the minor groove. The backbone
carbonyl of Pro-181 is rigidly directed to hydrogen-bond to the 2-amino group
of guanine 2. A network of water molecules between the proteins and minor
groove of the tRNA, only two of which are shown here, appear to enforce a requirement for GC base pairs at these positions. The hydrophobic environment
formed by the proline, phenylalanine, isoleucine, and the underside of the ribose
sugars enhances the strength and specificity of these direct and water-mediated
hydrogen bonds.
Figure
Gln
DNA Interaction in the Minor Groove
There are ways, however, in which the interactions in the minor groove
can be made sequence-specific. For example, the sequence preferences
exhibited in the DNase I cleavage of DNA arise from its interactions in
the minor groove (Suck et al. 1988). This side chain of a tyrosine observed to bind in the minor groove will fit into the normal-width minor
groove but not into the narrower minor groove that characterizes AT-rich
sequences.
Biochemical evidence for several sequence-specific DNA-binding
proteins implies that they interact with DNA via the minor groove, although direct structural visualization of such an interaction has not yet
been achieved. Yang and Nash (1989) have argued on the basis of
methylation protection studies that E. coli integration host factor (IHF)
interacts in the minor groove. IHF has significant sequence similarity
with E. coli Hu protein, whose crystal structure (Tanaka et al. 1984)
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
230
T.A. Steitz
shows two long antiparallel ~-Ioops, one from each subunit of the dimer,
which form outstretched arms that create a large cleft sufficient in size to
accommodate duplex DNA. The model for an IHF-DNA complex (Yang
and Nash 1989) is based on one for Hu-DNA (Tanaka et a!. 1984) and
places the antiparallel ~-loops in the minor groove in a manner proposed
earlier for antiparallel f)-strands (Carter and Kraut 1974; Church et al.
1977). The recently determined crystal structure of Arabidopsis thaliana
TFIID similarly portrays a protein with pseudo-dyad symmetry and a
twisted, antiparallel ~-sheet forming a cleft of size sufficient to accommodate B-DNA (Nikolov et al. 1992). Biochemical data likewise point to
minor groove interaction by this protein (Lee et al. 1991; Starr and
Hawley 1991), although the structural basis of this interaction is not yet
established.
RNA Interaction in the Minor Groove
There are now well-established examples of specific recognition of RNA
in the minor groove (Rould et al. 1989; Musier-Forsyth and Schimmel
1992). With duplex RNA, which is A-form, the minor groove is shallow,
wide, and accessible. Several sequence-specific interactions between
GlnRS in the minor groove of tRNAGln have been observed (Rould et al.
1989,1991; Rould and Steitz 1992). Base pairs G2-C71 and G3-C70 are
recognized in a base-specific manner by two protein "fingers," one an ahelix and the other a turn of an antiparallel f)-loop. In both cases, recognition involves contact between the 2-amino group of the guanine and
hydrogen-bond acceptors of the protein. The carboxylate of an aspartic
acid side chain 235 emanating from the amino end of a-helix H interacts
with both the N2 of guanine 3 and a buried water molecule (Fig. 6). The
peptide carboxyl group of Pro-181 interacts with the N2 of guanine 2.
Substantiating the hypothesis that these two base pairs are among the
recognition elements of tRNAGln, replacement of either by AU reduces
the kcatlKm for charging by two to three orders of magnitude (Jahn et al.
1991). Furthermore, mutations in GlnRS that have increased rates of mischarging of noncognate tRNAs are changes of Asp-235 to asparagine or
glycine (Conley et al. 1988; Perona et al. 1989), showing the importance
of this interaction for discrimination. An additional interaction in the
minor groove that is important for discrimination is between the carboxylate of Glu-323 and the N2 of GlO.
The importance of protein interaction with the 2-amino group of
guanine in the minor groove of RNA has also been established in the
case of alanine tRNA synthetase recognition of tRNAAla (Hou and
Schimmel 1988; McClain and Foss 1988; Hou et al. 1989; Musier-
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
Protein Recognition of RNA and DNA
231
Forsyth and Schimmel 1992). The alanine synthetase has been clearly
shown to recognize base pair 3-70, which is GU in t R N A . Replacing
G3 by an inosine, which lacks the N2, dramatically reduces charging of a
minihelix (Musier-Forsyth and Schimmel 1992).
Aia
ROLE OF SINGLE-STRANDED REGIONS IN RECOGNITION
Since RNA molecules have single-stranded regions in loops and bulges
and between helical stems, these regions are potential targets for recognition by proteins that are not available in duplex DNA. The recognition of
anticodon loops in tRNAs by cognate synthetases provides the bestcharacterized examples of protein recognition of single-stranded regions.
Molecular genetic and biochemical studies have shown that the
anticodon bases of tRNA serve as recognition elements for many of the
aminoacyl-tRNA synthetases (Schulman and Pelka 1985; Normanly and
Abelson 1989; Sampson et al. 1989). The co-crystal structures of
glutaminyl-tRNA synthetase and aspartyl-tRNA synthetase complexed
with their cognate tRNAs show that, upon forming a complex, the
anticodon bases become unstacked so that they may bind into separate
base recognition pockets. The energy required to unstack the anticodon
bases (as they exist in the uncomplexed tRNA) is provided by interactions with the protein. Since bases in loop regions of uncomplexed RNAs
tend to be stacked on each other and since optimal recognition of bases
by a protein requires their unstacking in order for them to interact in
separate recognition pockets, it may be the case more often than not that
protein recognition of an RNA single-stranded region is accompanied by
a significant conformational change in the RNA.
Although details of the anticodon base interactions are not yet published for the aspartic acid enzyme (Ruff et al. 1991), Figure 7 shows
how the three anticodon bases of t R N A
are interacting with GlnRS
(Rould et al. 1991; Rould and Steitz 1992). Each anticodon nucleotide is
recognized primarily by a polypeptide segment of five or six amino
acids. In all three cases, at least one positively charged amino acid from
this segment forms a salt link with an adjacent negatively charged
phosphate. The aliphatic portion of this residue generally packs against
either the base or the hydrophobic "underside" of ribose. With all three
anticodon bases, recognition is achieved by direct hydrogen bonding between the backbone and side chains of a short recognition peptide and
the Watson-Crick hydrogen-bonding groups of the bases. Furthermore,
several of the interactions presumed to be discriminating involve
hydrogen bonds with charged side chains that are buried from solvent in
the complex. Although there is no conserved sequence or structural simGln
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
,
,/
;~J~
)1' e
l'
..
NH,o.,/ t
0
"'-'" e
ji R341
;}
'--00
~"!"";/
c<k'
aff-!$~/~o~
R412
!I[~\ \.....
co
..o.~'.t.'g'~ C-34
<. .
~irl'
R341
E519 '..•'
Irl·r
f{, ~~\~"
UJJl
If
0
'b
• .
,j(......
....
""Il! -~
{O\(
~ '% !
V-
•
ct "
,
'7 "
aff-1:.I~o
R412
,r~~\ :
:..
0=\ .'
.Il!
.,¥..
'"t/"
r
~J'-!
= IlII
~
NH~,C"'/ U-35
R520
' 1?'. J....!. .V-),,{
. ._I~ ~<t'\ ~R410~
" .... .e
Ii,
.g••c~~;:;\.~ C-34
r. ~&" co
II J,,,J'
'=/l.~
R402.··: K401
. _ .,.... i:
~~
.~ ,:.{'b~~
0 517
f
..,
~ J~.
~Q\
0.0 ,,,,
.. ..
e.
~
"-
V ~~
G-36 J
.)'. 1 ~
J \ y6'
J{f. \ o\~ ,,'
•••~.
.9
co) ..
e~,l'
~
Figure 7 Stereo view of the binding pockets for the anticodon bases C34, U35, and G36 (dark) in the GlnRS-tRNAGln complex.
Each nucleotide is recognized primarily by a single short polypeptide segment in the enzyme (light). In each case, an arginine or
lysine from the polypeptide anchors the nucleotide by its phosphate group, allowing peptide backbone and side chains of the segment to specifically recognize the base (from Rould and Steitz 1992).
l~1
'\r"
E519
,irl·""o~~,\
nf!t
.....__
!' 0
/1 •••• 1": ~&
....
....• .!
Co
::..
U-35
~0' -I
-II.
II"
.
0
~,}~~
. " J.. v-Q~r
. ._)<l~'\~o~R410~
R520
"r
.' i
\.~~
{~•.('••' r:;=:!/.. .•'
•~
h'
v-:»
<.i? ~</~
~R402 .......:: K401
0...0 ".
05 17
.
O~,(
G-36 It t
-T/r
~1'1 i i ~ ...\
if 1"..
N .,~s.,.
co) ",
.9 U.
.0';· ~
e~~
~
N
Ff
~
>-
-l
N
Coo)
Protein Recognition of RNA and DNA
233
ilarity among these segments, they are predominantly in extended p-type
conformation.
That the three anticodon bases of t R N A
serve as recognition elements is confirmed by kinetic studies of mutant tRNAs (Jahn et al.
1991). Changes of the anticodon bases reduce k /K
by 3 - 4 orders of
magnitude. Interestingly, it is k that changes the most, not K . A structural mechanism involving an anticodon-induced conformational change
in the protein transmitted to the ATP-binding site has been hypothesized
to account for the importance of anticodon base identity for catalysis
(Rould et al. 1991).
G i n
CM
m
cat
m
ROLE OF MODIFIED BASES
Many of the RNA molecules that are recognized by proteins contain
modified bases, whose role in specific recognition remains largely unknown. There are at least two ways that modified bases might enhance
recognition: by stabilizing RNA conformations that otherwise would be
less favored and by changing the shape of a recognition site. The N at
position 5 of pseudo-uridine is observed in t R N A
to interact with a
water molecule, which in turn is interacting with the phosphate of the
pseudo-uridine and the preceding phosphate, an interaction and conformation seen in all but one of the pseudo-uridines in tRNAs of known
structure (J. Arnez and T.A. Steitz, unpubl.). This structure so stabilized
may be significant in protein recognition.
Base modifications of noncognate tRNAs at nucleotides 34 and 37
may act as negative determinants of aminoacylation by GlnRS. The
tightly packed interface between A37 and the protein (Fig. 8) suggests
that bases with bulky modifications of A37, for example, 6-carbamoylthreonyl adenine or 2-methylthio-N6-isopentenyl adenine, may provide
an additional source of discrimination against the many noncognate
tRNA molecules bearing these modifications. A role for modified bases
at position 37 in tRNA discrimination has already been suggested by
biochemical studies showing that ArgRS misacylates t R N A P lacking
modified bases (Ferret et al. 1990). Likewise, C34 is tightly packed into
a pocket that is covered by a loop of protein; this pocket may not accommodate certain modified bases at position 34, such as queuosine.
The 2'-ribosylated adenosine 64 in the initiator methionine tRNA
from yeast appears to play an important role in assuring that this tRNA is
only used in the initiation of protein synthesis and not in elongation
(Kiesewetter et al. 1990). Again, the modification appears to function as
a negative effector, since the modified t R N A j will not bind to an EFTu or get inserted in elongation, whereas the t R N A j
that is unG , n
As
Met
Met
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
234
T.A. Steitz
8 Stereo view of the two novel non-Watson-Crick base pairs that extend
the anticodon stem of tRNA when complexed to GlnRS, showing the water
network between these bases and the sugar-phosphate backbone. Asp-370
directly contacts both base pairs via the minor groove.
Figure
Gln
modified at position 64 will both bind to EF-Tu and participate in
elongation.
SUMMARY
Many aspects of protein recognition of RNA and of DNA are very
similar, such as the importance of sequence-dependent distortability of
the nucleic acid and the role of specific water-mediated interaction. Although recognition of both nucleic acids can be achieved through true
direct protein interaction with the exposed edges of base pairs in either
the major or minor grooves of duplex, interactions via the major groove
appear to dominate in DNA recognition, whereas the opposite preference
may occur with RNA (although many more examples are required to establish this point). Interactions with single-stranded bases in RNA may
prove to be the most significant in RNA recognition and are not at all
characteristic of DNA recognition.
Whether there are simple, recurring protein motifs involved in RNA
recognition, as has been found for DNA, is not yet known. Direct recognition of bases in DNA is achieved by various simple motifs that present
an a-helix, antiparallel pMoop, or polypeptide chain end into the major
groove of DNA. Although RNA recognition domains such as the RNP
motif are known, it is too early to tell whether there are simple and general ways in which protein secondary structures —helix, antiparallel
(3-strands, or loops — interact with RNA.
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
Protein Recognition of RNA and DNA
235
REFERENCES
Aggarwal, A.K., D.W. Rodgers, M. Drottar, M. Ptashne, and S.c. Harrison. 1988. Recognition of a DNA operator by the repressor of phage 434: A view at high resolution.
Science 242: 99-107.
Carter, C.W. and J. Kraut. 1974. A proposed model for interaction of polypeptides with
RNA. Proc. NaIl. Acad. Sci. 71: 283-287.
Church, G.M., J.L. Sussman, and S.-H. Kim. 1977. Secondary structure complementarity
between DNA and proteins. Proc. Natl. Acad. Sci. 74: 1458-1462.
Conley, J., H. Uemura, F. Yamao, J. Rogers, and D. SOIL 1988. E. coli glutaminyl tRNA
synthetase: A single amino acid replacement relaxes tRNA specificity. Protein Sequences Data Anal. 1: 479-485.
Delarue, M. and D. Moras. 1989. RNA structure. Nucleic Acids Mol. Biol. 3: 182-196.
Drew, H.R. and A.A. Travers. 1984. DNA structural variations in the E. coli tyrT
promoter. Cell 37: 491-502.
Frederick, CA, J. Grable, M. Melia, C. Samudzi, L. Jen-Jacobsen, B.-C. Wang, P.
Greene, H.W. Boyer, and J.M. Rosenberg. 1984. Kinked DNA in crystalline complex
with EcoRI endonuclease. Nature 309: 327-331.
Freemont, P.S., J.M. Friedman, L.S. Beese, M.R. Sanderson, and T.A. Steitz. 1988. Cocrystal structure of an editing complex of Klenow fragment with DNA. Proc. Natl.
Acad. Sci. 85: 8924-8928.
Gartenberg, M.R. and D.M. Crothers. 1988. DNA sequence determinants of CAPinduced bending and protein binding affinity. Nature 333: 824-829.
Harrison, S.c. 1991. A structural taxonomy of DNA-binding domains. Nature 353:
715-719.
Hou, Y.-M. and P. Schimmel. 1988. A simple structural feature is a major determinant of
the identity of a transfer RNA. Nature 333: 140-145.
Hou, Y.-M., C. Francklyn, and P. Schimmel. 1989. Molecular dissection of a transfer
RNA and the basis for its identity. Trends Biochem. Sci. 14: 233-237.
Jahn, M., J. Rogers, and D. S611. 1991. Anticodon and acceptor stem nucleotides in
tRNA GIn are major recognition elements for E. coli glutaminyl-tRNA synthetase. Nalure 352: 258-260.
Joyce, C.M. and T.A. Steitz. 1987. DNA polymerase 1. From crystal structure to function
via genetics. Trends Biochem. Sci. 12: 288-292.
Kennard, O. and W.N. Hunter. 1989. Oligonucleotide structure: A decade of results from
single crystal X-ray diffraction studies. Q. Rev. Biophys. 22: 327-379.
Kiesewetter, S., G. Ott, and M. Sprinzl. 1990. The role of modified purine 64 in initiator/elongator discrimination of tRNA j Met from yeast and wheat germ. Nucleic Acids
Res. 18: 4677-4682.
Koudelka, G.B., S.C. Harrison, and M. Ptashne. 1987. Effect of non-contacted bases on
the affinity of 434 operator for 434 repressor and cro. Nature 326: 886-888.
Lee, D.K., M. Horikoshi, and R.G. Roeder. 1991. Interaction of TFIID in the minor
groove of the TATA element. Cell 67: 1241-1250.
Lewis, M., J. Wang, and C. Pabo. 1985. Structure of the operator binding domain of
lambda repressor. In Biological macromolecules and assemblies, vol. 2 (ed. F.A.
Jurnak and A. McPherson), pp. 266-287. Wiley, New York.
Matthews, B.W. 1988. No code for recognition. Nature 335: 294-295.
McClain, W.C. and K. Foss. 1988. Changing the identity of a tRNA by introducing a GU wobble pair near the 3' acceptor end. Science 240: 793-796.
<,
Musier-Forsyth, K. and P. Schimmel. 1992. Functional contact of a transfer RNA
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
Protein Recognition of RNA and DNA
237
ture of a protein with histone-like properties in prokaryotes. Nature MO: 3 7 6 - 3 8 1 .
Vincze, A., R.E.L. Henderson, J.J. McDonald, and N.J. Leonard. 1973. Reaction of
diethyl pyrocarbonate with nucleic acid components. Bases and nucleosides derived
from guanine, cytosine, and uracil. J. Am. Chem. Soc. 95: 2677-2682.
Weeks, K.M. and D.M. Crothers. 1991. RNA recognition by tat-derived peptides: Interaction in the major groove? Cell 66: 577-588.
Weeks, K.M., C. Ampe, S.C. Schultz, T.A. Steitz, and D.M. Crothers. 1990. Fragments
of the HIV-1 tat protein specifically bind TAR RNA: Peptide recognition of bulged
RNA. Science 249: 1281-1285.
Woodbury, C P . , O. Hagenbiichle and P.H. von Hippel. 1980. DNA site recognition and
reduced specificity of the Ecor I endonuclease. J. Biol. Chem, 255: 11534-11546.
Yang, C.-C. and H.W. Nash. 1989. The interaction of E. coli IHF protein with its
specific-binding sites. Cell 57: 869-880.
Yarus, M., R. Knowlton, and L. Soil. 1977. Aminoacylation of the ambivalent Su+7 amber suppressor tRNA. In Nucleic acids protein recognition (ed. H.J. Vogel), pp.
391-409. Academic Press, New York.
Yoon, C , G.G. Prive, D.S. Goodsell, and R.E. Dickerson. 1988. Structure of an
alternating-B DNA helix and its relationship to A-tract DNA. Proc. Natl. Acad. Sci. 85:
6332-6336.
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.