Download Microenvironment analysis and identification of magnesium binding

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Messenger RNA wikipedia , lookup

Membrane potential wikipedia , lookup

List of types of proteins wikipedia , lookup

SR protein wikipedia , lookup

Bottromycin wikipedia , lookup

P-type ATPase wikipedia , lookup

RNA interference wikipedia , lookup

Silencer (genetics) wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

Magnesium in biology wikipedia , lookup

Ligand binding assay wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Polyadenylation wikipedia , lookup

Evolution of metal ions in biological systems wikipedia , lookup

Cooperative binding wikipedia , lookup

Gene expression wikipedia , lookup

RNA wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

RNA-Seq wikipedia , lookup

RNA silencing wikipedia , lookup

Non-coding RNA wikipedia , lookup

Epitranscriptome wikipedia , lookup

Transcript
4450±4460 Nucleic Acids Research, 2003, Vol. 31, No. 15
DOI: 10.1093/nar/gkg471
Microenvironment analysis and identi®cation of
magnesium binding sites in RNA
D. Rey Banatao, Russ B. Altman* and Teri E. Klein
Department of Genetics and Stanford Medical Informatics, 251 Campus Drive, Stanford University, CA 94305,
USA
Received April 16, 2003; Revised and Accepted May 21, 2003
ABSTRACT
Interactions with magnesium (Mg2+) ions are essential for RNA folding and function. The locations and
function of bound Mg2+ ions are dif®cult to characterize both experimentally and computationally. In
particular, the P456 domain of the Tetrahymena
thermophila group I intron, and a 58 nt 23s rRNA
from Escherichia coli have been important systems
for studying the role of Mg2+ binding in RNA, but
characteristics of all the binding sites remain
unclear. We therefore investigated the Mg2+ binding
capabilities of these RNA systems using a computational approach to identify and further characterize
their Mg2+ binding sites. The approach is based on
the FEATURE algorithm, reported previously for
microenvironment analysis of protein functional
sites. We have determined novel physicochemical
descriptions of site-bound and diffusely bound
Mg2+ ions in RNA that are useful for prediction.
Electrostatic calculations using the Non-Linear
Poisson Boltzmann (NLPB) equation provided
further evidence for the locations of site-bound
ions. We con®rmed the locations of experimentally
determined sites and further differentiated between
classes of ion binding. We also identi®ed potentially
important, high scoring sites in the group I intron
that are not currently annotated as Mg2+ binding
sites. We note their potential function and believe
they deserve experimental follow-up.
INTRODUCTION
The importance of RNA and Mg2+
RNA molecules form complex three-dimensional folds that
ultimately determine their biological role. Metal ions, particularly Mg2+, are crucial to stabilizing RNA structure and
hence, its function. They counter the negatively charged
phosphate backbone and participate in the catalytic reactions
of some ribozymes (1±3). Reliably locating these binding sites
is important for understanding RNA folding, catalysis and
ultimately in drug development (4±10).
Two main modes of Mg2+ binding have been proposed (11).
In diffuse binding, fully hydrated Mg2+ ions interact with RNA
through long-range electrostatic interactions, while site-bound
Mg2+ ions are bound directly via anionic ligands from the
RNA. Site binding is energetically costly, as it requires 10±
30 kcal/mol to remove a single water molecule from the outer
and inner hydration layers of an ion, respectively, and
therefore requires a large electrostatic contribution to binding
(12).
Locating Mg2+ binding sites
Mg2+ binding sites in RNA are typically identi®ed through
X-ray crystallography or NMR methods. However, electron
densities of metal ions can be dif®cult to distinguish and NMR
techniques have only just recently given direct evidence of ion
binding in RNA (13). Many sites can be easily mislabeled as
waters or may be missing entirely from fully re®ned crystal
structures or solution structures of RNA, and must be located
through further experimentation. Ion binding sites of high
af®nity can be probed using cleavage experiments, where
hydrolytic or oxidative metals cleave surrounding RNA (14).
Additionally, mutation experiments can create the loss of a
functional Mg2+ binding site (15,16). Another approach
involves rescuing phosphorothioate effects in the presence
of thiophilic metal ions such as Cd2+ or Mn2+ (17,18). This
method can detect site-bound ions, as they require direct
coordination between metal ions and phosphoryl oxygens in
the RNA. If a structure is known, computational methods may
be useful for predicting and characterizing Mg2+ binding
sites in RNA structures and can aid crystallographers in
distinguishing ion densities.
A straightforward and computationally fast algorithm is
valence screening for metal ions (19,20). This method
essentially calculates the valence potential of oxygens within
a de®ned radius along a ®ne three-dimensional grid laid over a
molecular structure. A more rigorous approach would be to
use electrostatic calculations, where solutions to the Poisson±
Boltzmann equation, or in the case of nucleic acids, the nonlinear Poisson±Boltzmann (NLPB) equation, can be used to
identify negative potential on the surface of an RNA molecule
where ions most likely aggregate (4,21). In addition to
calculating electrostatic potentials, Brownian-Dynamics (BD)
simulations can be used to further elucidate ion binding sites in
RNA folds (22±24). However, none of these computational
approaches are suitable for analyzing large complexes without
signi®cant calculation (25).
*To whom correspondence should be addressed. Tel: +1 650 725 3394; Fax: +1 650 725 7944; Email: [email protected]
Nucleic Acids Research, Vol. 31 No. 15 ã Oxford University Press 2003; all rights reserved
Nucleic Acids Research, 2003, Vol. 31, No. 15
A novel approach in the context of structural genomics
The Protein Data Bank (PDB; http://www.rcsb.org/pdb/) now
includes over 1900 structures of nucleic acids, ranging from
simple helices to complex, entangled, assemblies of RNA
and protein (26). With advances in structure determination
techniques, and the many structural genomic efforts
currently underway, the number of solved RNA structures
and complexes is rapidly increasing (see http://www.rcsb.org/
pdb/strucgen.htmlWorldwide for projects worldwide) (27,28).
A major goal of structural genomics is to classify all the
possible molecular folds that exist in nature. Although, the
number of known RNA structural motifs is less than those for
proteins, structures of RNA±protein complexes, such as the
ribosome, are providing numerous examples of these motifs
(29±34), and causing increased interest in computational
methods for analysis of RNA structural motifs (35,36).
Computational approaches that can scale to analyze databases
of RNA structures, including large assemblages, will be able
to mine the structural data and provide examples of these
motifs.
FEATURE was originally developed for the analysis of
protein microenvironments (37). The FEATURE method uses
a supervised learning algorithm to recognize sites from a
training set of sites and non-sites. Details of the algorithm and
its performance characteristics have been published elsewhere
(37±43). Site models have been created for different types of
functional sites in proteins, such as various metal binding
sites, serine protease active sites and ATP binding sites
(40,41). These sites were detected with high sensitivity and
speci®city. Until now, FEATURE has not been applied to
analyzing nucleic acid structures.
Although Mg2+ binding is driven mostly by electrostatics, it
has been shown that non-coulombic properties can in¯uence
the electrostatic ®eld (44). We believe that Mg2+ binding sites
can be differentiated not only by charge alone, but also by the
biochemical and structural properties surrounding the binding
site. We used FEATURE to study the differences between
site-bound and diffusely bound Mg2+ ions in complex RNA
folds. To con®rm our hypothesis that sites could be differentiated by these microenvironments, we used our statistical
models to identify sites in well-characterized RNA systems
where both types of binding occur. These RNA structures
were not included in the training sets and served as an
independent test set. FEATURE analysis was compared with
the results of valence and electrostatic calculations on these
various systems.
MATERIALS AND METHODS
RNA modi®ed FEATURE
FEATURE was originally developed and implemented for
analyzing protein structures (37,41). We extended the
property set to include RNA biochemical and structural
properties. The resulting properties are listed in Figure 1.
To create the diffuse binding model, we de®ned sites as
Ê of any RNA atom. The atomic
Mg2+ ions not within 3 A
density surrounding each site was calculated and averaged
over the total number of sites. We generated non-sites by
randomly choosing grid points laid over an RNA structure.
Ê of a Mg2+ ion, and whose
Only grid points not within 10 A
4451
radial density was within one standard deviation of the
average atomic density of the sites were included as non-sites.
These selection criteria prevent bias for sites or non-sites
Ê of any
based on atomic density alone. Mg2+ ions within 3 A
three RNA ligands (site bound) were also included as nonsites. A total of 126 sites and 334 non-sites were used to
generate the diffuse binding model. For the site-bound model,
Ê of any three RNA ligands were
Mg2+ ions within 3 A
Ê of an
considered as sites, while Mg2+ ions not within 3 A
RNA ligand were used as non-sites. The randomly generated
non-sites from the diffuse model were also used as non-sites
for the site-bound model. A total of 30 sites and 330 non-sites
were used to generate the site-bound model. The structures
used for training are listed in Table 1. FEATURE source code
and documentation is available at http://feature.stanford.edu/.
A web interface for FEATURE (http://feature.stanford.edu/
webfeature/) (45) is available for real-time scanning and
visualization of functional sites, including Mg2+ binding sites,
in both RNA and protein structures.
FEATURE results consist of two separate outputs. The
result of training is the site model, visualized as a twodimensional plot of properties against volumes in Figure 1.
Statistical signi®cance is determined using the Wilcoxon ranksum test. Statistically signi®cant property-volume pairs (at
P = .05) are colored blue, while statistically de®cient pairs are
colored green. Insigni®cant pairs are left blank. The second
output is the result of scanning a query structure for sites.
Detected hits are scored using a log-odds scoring function (41)
as shown below (Equation 1); where higher scoring hits are
more likely to be a site of interest.
score ˆ
P…site j environment†
P…site†
1
Score cut-offs for the site-bound and diffusely bound models
were determined using a ROC (Receiver Operating Curve)
plot of 1-speci®city against sensitivity; where sensitivity and
speci®city were calculated by FEATURE's ability to detect
crystallographically bound Mg2+ ions in a set of RNA
structures. Sensitivity provided a measure of the method's
ability to identify all crystallographical Mg2+ ions, while
speci®city provided a measure of the false positive rate. As
crystallographical Mg2+ ions are not comprehensively determined, we opted for score cut-offs that gave a high speci®city
at the loss of sensitivity so as to assure the reliability of
predicted sites. The ROC plot is shown in Figure 2. Details of
the statistical model, learning method, and inference method
are available in previous publications (37,38,40±43).
MC-Annotate
We used a modi®ed version of MC-Annotate (45), to calculate
secondary and tertiary RNA structural properties. MCAnnotate takes RNA pdb ®les as input and generates a list
of RNA secondary and tertiary structure properties. These
properties were translated into a FEATURE readable format
and then assigned to each base within an RNA molecule. For
analysis of proteins, FEATURE previously used DSSP (46)
®les to calculate properties such as solvent accessibility,
mobility and secondary structural properties. FEATURE uses
4452
Nucleic Acids Research, 2003, Vol. 31, No. 15
Figure 1. The microenvironments of site-bound and diffusely bound Mg2+ ions in RNA. Physicochemical properties of signi®cance (at P = 0.05) are listed
Ê in width. Colored squares represent property-volume pairs
down the left-most column. Increasing volumes move away from the center of the site and are 2 A
of statistical signi®cance (see legend). Properties that were statistically insigni®cant between sites and non-sites for both models are not shown. These include:
atom-based properties: Sulfur, Other, Positive Charge; residue-based properties: T, Sugar Pucker C1¢ endo, Sugar Pucker C4¢ endo; secondary structure based
properties: Adjacent Base Pair, A-Form Helix, X-Form Helix, Strand is Other; and tertiary structure based properties: Loop±Strand, Helix±Strand,
Bulge±Loop, Bulge±Helix, Internal Loop±Bulge, Internal Loop±Loop, Other and Pseudoknot.
Nucleic Acids Research, 2003, Vol. 31, No. 15
4453
Table 1. A list of the structures used for training site-bound and
diffusely bound models
For scans of the group I intron, sites from 1gid and 1hr2 were removed
from the training set.
Figure 2. ROC curve for FEATURE detection of site-bound and diffusely
bound Mg2+ ions in RNA were used to determine score cut-offs. Higher
sensitivity and speci®city is achieved for site-bound ions as their locations
are fewer, more localized, and thus, more easily detected in RNA. The
score cut-offs used in our analysis are labeled on the curve in relation to the
resulting sensitivity and speci®city of detection. Cut-offs were chosen to
maximize speci®city over sensitivity, so as to assure reliable detection of
sites.
RESULTS
a proprietary format of the MC-Annotate output to count the
secondary and tertiary structural properties. The current
implementation of FEATURE can read both DSSP and our
formatted version of MC-Annotate ®les in order to support
complexes of both RNA and proteins. MC-Annotate is
accessible at http://www-lbit.iro.umontreal.ca/mcannotate/.
ViewFEATURE
The results of FEATURE scans were analyzed in
ViewFEATURE, a visualization tool developed previously
(39). ViewFEATURE allows the user to view the FEATURE
hits as spheres within the RNA structure of interest. Radial
shells, sites and non-sites, as well as the properties of interest,
can be visualized with an interactive graphical user interface.
ViewFEATURE is available as an extension to the molecular
modeling tool, Chimera (47). The ViewFEATURE source
code can be downloaded from http://feature.stanford.edu/.
Chimera is available at http://www.cgl.ucsf.edu/chimera/.
Electrostatics
The non-linear solution for the Poisson±Boltzmann equation
was used to calculate the electrostatic potential of each
RNA system. The speci®c program we used, Qnifft, was
downloaded from http://pylelab.org/research/computation/
electrostatics/. Calculations were performed as published
previously (21).
Valence calculations
For valence calculations (19), parameters were varied to try
and predict both site-bound and diffusely bound ions. For siteÊ,
bound ions, grid spacing was varied from 0.1 to 0.15 A
Ê , and number of ligands from one
ligation sphere from 3 to 4 A
to three. For diffusely bound ions, the ligation sphere was
Ê to simulate a hydration layer. Valence
increased to 6±7 A
source code was downloaded from http://biochem.wustl.edu/
~enrico/vale.htm/.
In order to support analysis of RNA structures, in addition to
proteins, we selected a set of physicochemical and structural
properties that pertain to either or both types of molecules,
which resulted in a total of 112 of properties. These properties
provided descriptions for the statistical models of the
microenvironments surrounding site-bound and diffusely
bound Mg2+ ions, and are summarized in Figure 1.
Our statistical models reveal, in addition to charge, the
major differences between the microenvironments of sitebound and diffusely bound Mg2+ ions. We found that diffusely
bound Mg2+ ions are coordinated by a larger repertoire of
potential ligands than are site-bound ions. However, sitebound Mg2+ ions are more often located near `irregular' or
non-helical RNA tertiary interactions. To verify these structural motifs, we used FEATURE to identify binding sites in a
number of RNA systems not included in the training set. We
compared FEATURE with two other computational methods
for ion binding site detection and found that often FEATURE
could better pinpoint crystallographically observed Mg2+
binding sites. Because of their unique environment, sitebound Mg2+ ions were easier to positively predict than
diffusely bound Mg2+ ions in complex RNA systems.
Microenvironment analysis of Mg2+ binding
To differentiate between site-bound and diffusely bound Mg2+
ions, we computed distances between crystallographically
observed Mg2+ ions and the surrounding RNA. For diffuse
binding, we trained FEATURE on Mg2+ ions that were either
Ê of any RNA-derived atom.
fully hydrated or not within 3 A
Ê of at least three possible
Ions that were within 3 A
coordinating ligands, such as phosphate oxygens or amine
nitrogens, were considered site-bound. In high-resolution
Ê ), inner-sphere contacts with the ion can be
structures (~2 A
observed, as in the 50s ribosomal subunit (33). Site-bound
Mg2+ ions as well as randomly chosen non-sites served as
controls (see Materials and Methods for description of how
these non-sites were generated). For the site-bound model,
diffusely bound Mg2+ ions and randomly generated non-sites
4454
Nucleic Acids Research, 2003, Vol. 31, No. 15
Table 2. A list of the cyrstallographical Mg2+ ions in the group I intron and 23s rRNA for which FEATURE was able to distinguish between site-bound
and diffusely bound sites
The ®rst three columns, from left, list the structures and their PDB identi®ers. The fourth column lists crystallographical Mg2+ ions as labeled in their
respective structures. Site-bound ions are labeled with a single asterisk. Columns 5 through to 8 describe the FEATURE hits near the crystallographically
bound ion. Column 5 provides the FEATURE hit's rank, by score, with respect to the total number of hits above the de®ned cut-offs of 50 and 35 for the
site-bound and diffusely bound models, respectively. Column 6 has the percentile rank of the hit's score compared with the hits above cut-off. Column 7
lists the score of the hit while column 8 gives the hit's distance to the crystallographical Mg2+ ion. Columns 9 and 10 provide the rank and percentile, with
respect to all detected locations above cut-off, for valence calculations. The values in columns 9 and 10 provide a measure of comparison for FEATURE's
values in columns 5 and 6. Valence calculations were not able to detect many of the crystallographical sites, while those that were detected were below the
cut-off of 1.8. Column 10 shows the valence score, above cut-off, of a detected location, while column 11 provides its distance to the crystallographical
ion. The highest scoring valence positions, as labeled in the legend, did not agree with electrostatics and FEATURE. As electrostatics calculations do not
provide single locations for potential sites, column 12 refers to the ®gures for a visual comparison of the results of all three methods. B/C, below cutoff;
***, distance to nearest site-bound ion; **, as labelled in PDB; *, site bound ion; Ù, highest valence points, not near any crystallographically bound
magnesium.
served as controls. The resulting statistical models revealed
that biochemical and structural properties, in addition to
charge, contribute to differentiating between site-bound and
diffusely bound Mg2+ ions. The site-bound and diffusely
bound models are plotted in Figure 1 as property-volume pairs
of statistical signi®cance.
Mg2+ binding site detection
To detect Mg2+ binding sites based on their microenvironments, we used FEATURE to identify and differentiate
between sites in simple and complex RNA folds. The P5b
stem±loop (1ajf) (48) and P456 domain (1gid and 1hr2)
(49,50) from the Tetrahymena thermophila group I intron,
and a 58 nt sequence from Escherichia coli and Bacillus
stearothermophilus 23s rRNA (1qa6 and 1hc8) (51,52)
provided examples of known, important metal binding sites
in RNA. We performed electrostatic and valence calculations
on these structures and compared results with those of
FEATURE. Previous studies differentiated between the
modes of Mg2+ binding in these structures (4,21). In our
analyses, we used these examples of site-bound and diffusely
bound Mg2+ ions as test cases and found that we were able to
identify both types of sites occurring in simple (P5b stem±
loop) and complex systems (23s rRNA and group I intron).
Table 2 summarizes the results of the various computational
methods for Mg2+ binding site prediction on the test structures.
In most cases, FEATURE identi®ed the crystallographically
bound ion more precisely than the other methods. Figures 3±5
display the results of these predictions superimposed upon the
RNA test structures.
Nucleic Acids Research, 2003, Vol. 31, No. 15
4455
Figure 3. The results of FEATURE detection for site-bound Mg2+ ions (a), valence calculations (b) and electrostatic calculations (c) on the P456 domain of
the Tetrahymena group I intron. Crystallographically bound Mg2+ ions are shown as yellow spheres. FEATURE successfully detects [red spheres in (a)] the
well-known ion binding core of the intron. Valence hits [blue spheres in (b)] do not correlate well with the ion binding core while electrostatics (c), in
agreement with FEATURE, detects this site as the most negative potential in the molecule.
Predicted sites in the Tetrahymena group I intron
We predicted two binding sites in the P456 domain of the
group I intron. The hits at the sites and their potential
coordinating ligands are highlighted in Figure 6 and labeled as
`Site 1' and `Site 2'. Table 3 contains the coordinates and
scores of the FEATURE hits. Site 1 was observed using both
the site-bound and diffusely bound models for Mg2+ binding.
Site 2 was seen only using the diffusely bound model.
Although the scores of the hits were just above the de®ned cutoff, they nevertheless ranked higher than other sites for
crystallographically observed Mg2+ ions. Interestingly, the hits
at Site 1 overlap with a crystallographically observed water
molecule in 1hr2.
DISCUSSION
Statistical models for Mg2+ binding
A major advantage FEATURE has over both electrostatic and
valence calculations is that FEATURE reveals detailed
statistical differences of the physicochemical properties
between the two site models for Mg2+ binding in RNA. At a
signi®cance of P = 0.05, the following description of these
statistical differences can also be visualized in Figure 1. A
major difference between the site models for both types of
Mg2+ binding was the presence of most of the atom-derived
Ê ) in the site-bound
properties in the ®rst two shells (0±4 A
model versus a lack thereof in the diffuse model. The
hydration layer surrounding diffusely bound Mg2+ may
explain this result. Diffusely bound Mg2+ ions tend to sit in
large pockets or grooves on the surface of the RNA. Whereas
site-bound Mg2+ ions locate in tighter, more electronegative
pockets where potential coordinating ligands can directly bind
the ion, and penetrate the surrounding layer of water
molecules (53). Our results con®rm the notion that site
binding occurs in highly electronegative pockets. In contrast
to the diffuse binding model, many of the signi®cant
properties in the site bound model occur within the innermost
Ê of a site. Speci®cally, phosphate groups, which are most
4A
often the coordinating ligand to Mg2+ ions, are signi®cant
Ê . Also, a signi®cance of oxygen atoms
from 0 to 8 A
Ê.
continuously surround the Mg2+ ion from 2 to 10 A
Ê
Ê
Hydroxyl groups are signi®cant at 4 A and extend to 10 A.
Our site-bound model shows other potential ligands, such as
amines and carbonyls, occurring only in the outermost shell
Ê , suggesting that phosphates and sometimes
from 8 to 10 A
hydroxyl groups are the predominant ligand donors for siteÊ
bound Mg2+ ions. Negative charge is signi®cant from 2 to 4 A
Ê within the site model. This is most
and then from 6 to 10 A
likely due to the fact that negatively charged atoms that
contribute to the site-bound pocket are covalently bound to
non-polar atoms, such as carbons. This creates a binding
Ê layer surrounded immediately by a nonpocket in the 2±4 A
Ê . This contrasts the diffuse model,
polar layer from 4 to 6 A
where negative charge only ®rst appears signi®cant outside
Ê from a Mg2+ ion.
4A
4456
Nucleic Acids Research, 2003, Vol. 31, No. 15
Figure 4. FEATURE detects both site-bound (a) and diffusely bound (b) ions in the E.coli 58 nucleotide 23s rRNA. Crystallographically bound Mg2+ ions
are shown as yellow spheres. FEATURE also detects the K+ ion (orange sphere), suggested to help stabilize the buried RNA backbone (52). Valence hits
[values 1.6±1.7 as blue spheres in (c)] are observed in the site-bound region, but are below cut-off. Electrostatic calculations (d) reveal the highly negative
potential (red surface color) contributing to the binding of a site-bound ion near the buried RNA backbone.
Of all the bases, adenosines were the most signi®cant from 2
Ê in the site-bound model. This suggests that adenosines
to 10 A
are highly prevalent near site-bound Mg2+ ions. This may be
explained by adenosine, of all the bases, having the most
number of negatively charged atoms that could behave as
ligands. Another explanation is the regular participation of
adenosines in irregular RNA conformations that cause the
tight twisting of the phosphate backbone, such as in the
A-minor motif, as seen in the Tetrahymena group I intron. In
this motif, ¯ipped adenosines are stabilized through the
interaction with the minor groove of an adjacent helix and
inner sphere contacts of phosphate oxygens with a Mg2+ ion.
This contrasts the diffusely bound model where adenosines
had no signi®cant role in the site and was de®cient in the ®rst
Ê ). Instead, uridines and guanosines were
two shells (0±4 A
signi®cant in the outer two to three shells of the diffusely
Nucleic Acids Research, 2003, Vol. 31, No. 15
4457
Figure 5. The results of FEATURE detection for diffusely bound Mg2+ ions (a), valence calculations (b) and electrostatic calculations (c) on the P5b stem±
loop of the group I intron. The crystallographically determined cobalt hexamine, located at the center of the stem±loop is displayed in yellow in (a) and (b).
Cobalt hexamine is representative of diffuse ion binding in RNA. Feature's hits [red spheres in (a)] above cut-off nearly superimpose the cobalt hexamine.
Highest valence values [blue spheres in (b)] range from 1.0±1.2 and miss the diffusely bound ion. Electrostatics calculations (c) show a negatively charged
pocket (red = approximately ±20 kT/e) where the diffusely bound ion locates.
bound model. As non-Watson±Crick base pairs were also
signi®cant in the outer two shells, it may be that this
signi®cance of G's and U's are actually base paired in a
non-canonical fashion. Interestingly, it has been described that
the 5¢-UG/GU sequence is the most common and stable among
non-Watson±Crick pairs, and that GU tandem pairs provide
potential metal binding sites with a unique structural environment (54). This GU tandem metal binding motif is seen in the
group I intron as well as other smaller RNA structures (48,49).
Our analysis provides further evidence of GU tandems
providing locations for ion binding.
Finally, many of the properties that might be associated
with irregular, non-helical, RNA structure were signi®cant
Ê in the site-bound model. These properties
from 2 to 10 A
included base conformation in syn, non-WC base pairing, base
triples, non-adjacent base stacking, single-stranded RNA, and
many tertiary RNA interactions. These results suggests that
irregular RNA conformations most likely generate regions
where site-bound Mg2+ ions would be needed to counteract
highly negative pockets formed by the twisting RNA backbone and bases. This is seen in both the Tetrahymena group I
intron and Escherichia coli 23s rRNA where the phosphate
backbone forms a tight kink and becomes buried beneath the
surface of the RNA, thus requiring a site-bound Mg2+ to
stabilize the fold (see Figs 3 and 4).
FEATURE site detection
The P456 domain from the group I intron, and the 58 nt 23s
rRNA provided well-characterized examples of known sitebound Mg2+s (see Table 2). In all cases, using the site-bound
Figure 6. Two predicted Mg2+ binding sites in the group I intron.
FEATURE hits (red spheres) at Site 1 overlap the crystallographically
bound water (cyan colored sphere, WAT 123 in PDB ID 1hr2). Site 1 is
potentially site-bound (micro view of Site 1 is rotated 180 degrees to provide a better view of the site). Residues that could potentially participate in
coordination of an ion are displayed as sticks. Site 2 shows a potential
diffusely bound Mg2+ site. Atoms are colored by type: C = green, N = blue,
O = red.
model, FEATURE was able to locate the site-bound Mg2+
ions. The top scoring hits, above the cut-off, were always
Ê of the crystallographically bound
located within several A
2+
Mg and in most cases overlapped the documented site-bound
ion. In all these examples, the highest scoring hits were well
4458
Nucleic Acids Research, 2003, Vol. 31, No. 15
Table 3. Coordinates and scores of predicted Mg2+ binding sites
above the cut-off, which gave us con®dence in predicting
those locations as site-bound. Lower scoring hits also located
near the true binding site. In theory, site-bound locations
should be easier to predict as their microenvironments are
somewhat unique compared with diffuse binding sites, as seen
in the statistical models.
Diffuse sites have been described to be more pervasive
throughout RNA structure and bind in grooves and pockets on
the surface of the RNA. Although this makes their binding
sites more dif®cult to pinpoint to a single location, diffuse
binding sites have been observed in crystal structures, such as
those in the aforementioned structures, as well as in less
complex systems such as in the P5b stem±loop, hairpin
ribozyme, lead-dependent ribozyme and 7s RNA of human
SRP. These sites might be described as more `speci®c' than
most diffusely bound ions involved in counter ion effects, yet
not quite site-bound (by de®nition) since they do not make
contact with non-water ligands. Using the diffusely bound
model, FEATURE was able to locate diffuse sites in the
aforementioned test structures. Because FEATURE was
trained on diffuse sites in crystal structures, it should be
successful in predicting diffuse sites to single locations in
query structures. In the case of structures containing sitebound Mg2+ ions, the diffuse model also identi®ed those
locations as high-scoring sites.
Comparison of methods for Mg2+ site prediction
P456 domain of Tetrahymena group I intron. Figure 3 shows
the results of a FEATURE analysis, as well as valence and
electrostatics calculations for the P456 domain of the
Tetrahymena group I intron. Near the center of the molecule
is the well-known ion binding core, implicated in folding,
stability and function of this ribozyme (2). The electrostatic
potential is highly negative over this region of the RNA, where
the tight kink in the phosphate backbone results in the
placement of many electronegative groups near one another.
The Mg2+ ions observed in the crystal structure have an
important role in countering the negative potential and
stabilizing the fold. FEATURE's top hits, above the score
cut-off, also lie in this region, and nearly superimpose the
experimentally observed Mg2+ ions. In contrast, we found that
valence calculations did not perform as well in picking out
site-bound Mg2+ ions. The highest valence hits (valence values
of 2.0±2.1) above the de®ned cut-off of 1.8 (19) were not
located at the ion binding core of the group I intron, in
disagreement with electrostatic and FEATURE calculations.
Some valence hits above the cut-off did locate near the ion
binding core, but other hits of the same value also scattered
over the surface of the molecule without much correlation
with the crystallographically bound ions, electrostatic and
FEATURE predictions. Although valence calculations should
theoretically work on RNA structures, we show that inclusion
of other properties in addition to the coordinating ligands can
improve ion binding site detection.
The two predicted binding sites we observed in the group I
intron had scores just above the de®ned cut-off for both the
site-bound and diffusely bound model. Our results suggest that
a partially site-bound ion may exist at Site 1, while more
diffuse ion binding interactions might take place at Site 2 (see
Fig. 6). Valence calculations detected hits at Site 1 (valence
values 1.4±1.6), but were below the required cut-off. Valence
hits were not detected at Site 2. Electrostatics calculations
show negative potential at both locations, but at cut-offs too
low to locate a speci®c binding site. Using the site-bound
model, we observed that Site 1 overlaps a crystallographically
observed water. Visual inspection of Site 1 shows that the
potential site is surrounded by three 2¢ hydroxyl groups from
the ribose groups of three nearby guanosines that could act as
potential ligands to a site-bound ion. Site 2, on the other hand,
was only observed when using the diffusely bound model. As
in Site 1, 2¢ hydroxyl groups from nearby ribose groups may
also be the major coordinating ligands in diffuse interactions
with a bound ion at Site 2.
23s rRNA
Figure 4 shows the results of the same calculations on the 23s
rRNA. All the known site-bound Mg2+ ions are successfully
detected by both electrostatics and FEATURE. Diffusely
bound Mg2+ ions were also detected by FEATURE. However,
FEATURE missed two of the diffuse binding sites because the
cut-off was set high enough to maintain a reasonable false
positive rate. It was more dif®cult to pinpoint diffuse binding
sites using electrostatics calculations since the RNA surface is
so negatively charged. As the kT/e cut-off was moved towards
zero, the entire surface of the RNA molecule quickly became
covered by negative potential and describes more of a
distribution in which binding could occur. In the case of
the 23s rRNA, electrostatics had a harder time detecting the
speci®c locations of more diffusely bound Mg2+ ions on the
whole RNA structure.
P5b stem±loop
Figure 5 shows our binding site analysis of the P5b stem±loop,
a simple RNA fold. The crystal structure reveals a single
cobalt hexamine, representative of a diffusely bound Mg2+ ion.
Both electrostatic calculations and valence calculations only
give general indications of where diffusely bound ions can be
observed. In the case of valence calculations, sites of low
valence (values of 1.1±1.2) were detected along the phosphate
backbone, while the crystallographically bound site was
missed completely. Theoretically, in solution, diffusely
bound Mg2+ ions can migrate through deep grooves and
along the phosphate backbone in RNA (4). At a cut-off around
10±20 kT/e, characteristic of diffuse ion binding, electrostatic
calculations reveal that most of the major groove and
phosphate backbone are regions where diffusely bound Mg2+
could migrate. However, FEATURE analysis of this RNA
molecule predicts a highly localized binding site, precisely at
Nucleic Acids Research, 2003, Vol. 31, No. 15
the location of the crystallographically observed ion.
FEATURE's ability to pinpoint binding sites is most likely
due to the fact that our training set consists of sites from
crystal structures. It is therefore more likely to pick out sites
that could potentially be observed in actual crystal structures
versus those in solution. Thus we ®nd that the computational
methods may be complementary to each other as they provide
different pieces of evidence towards predicting binding sites.
FEATURE is complementary to other computational
methods for predicting Mg2+ binding sites in RNA.
FEATURE provides statistical evidence for ion binding in
crystal structures, whereas electrostatics and valence calculations provide physical quanti®cation of the electrostatic
potential of regions within the molecule. Together these
methods can provide complementary evidence for site binding
of Mg2+. We have analyzed RNA systems whose site-bound
and diffusely bound Mg2+ binding sites are well characterized.
FEATURE was successful at detecting both site-bound and
diffusely bound Mg2+ ions. We ®nd that site-bound ions are
easier to detect because of their unique microenvironment.
Such is the case in both the E.coli 23s rRNA and the P456
domain from the Tetrahymena group I intron. In simple RNA
folds, such as the P5b stem±loop, FEATURE is successful at
detecting diffuse Mg2+ binding sites and locates them to
speci®c areas, whereas the other techniques give more general
descriptions of ion binding. We also report the observation of
two predicted Mg2+ binding sites in the P456 domain of the
Tetrahymena group I intron; one site-bound and one diffusely
bound. Presently, we do not report overall sensitivity and
speci®city for FEATURE's performance, as it is unclear as to
the number of missing or mislabeled Mg2+ ions in known RNA
crystal structures. In other words, there lacks a standard for
statistical comparison. As the number of known Mg2+ ions in
RNA crystal structures increases, and as structure determination methods improve, we will be able to report better
performance statistics for FEATURE. Since FEATURE runs
in linear time, we propose that FEATURE be used as a
screening tool for Mg2+ binding site detection. High scoring
hits can be further explored with rigorous NLPB calculations.
As structure determination methods improve, the types of
RNA structures and complexes being solved become increasingly sophisticated. FEATURE lends itself well to the analysis
of large complexes, such as the ribosome, where there are
thousands of interactions. Once site-models are built, databases can be quickly screened for functional sites (38).
Computational methods, like FEATURE, provide a high level
annotation of function to molecular structures, and are needed
as structural genomics projects provide a wealth of structural
data. Finally, as the importance of RNA as a tool and target for
drug discovery increases, our approach provides a foundation
for the analysis of other RNA functional sites, such as RNA±
RNA, RNA±protein and RNA±small molecule interactions.
ACKNOWLEDGEMENTS
The authors would like to thank Mike Liang, Dr Sean Mooney
and Dr Andrew Bogan for helpful discussions on this work and
manuscript. We also thank Dr Francois Major and Dr Patrick
Gendron for providing a modi®ed version of MC-Annotate.
This work is supported by NIH Grants LM-05652 and LM06422 (R.B. Altman P.I.). D.R.B. is supported by NIH 2 R25
4459
GM056847-04 and is a PhD candidate in the Graduate
Program in Biological and Medical Informatics at the
University of California, San Francisco.
REFERENCES
1. Hanna,R. and Doudna,J.A. (2000) Metal ions in ribozyme folding and
catalysis. Curr. Opin. Chem. Biol., 4, 166±170.
2. Cate,J.H., Hanna,R.L. and Doudna,J.A. (1997) A magnesium ion core at
the heart of a ribozyme domain. Nature Struct. Biol., 4, 553±558.
3. Brannvall,M. and Kirsebom,L.A. (2001) Metal ion cooperativity in
ribozyme cleavage of RNA. Proc. Natl Acad. Sci. USA, 98, 12943±
12947.
4. Misra,V.K. and Draper,D.E. (2002) The linkage between magnesium
binding and RNA folding. J. Mol. Biol., 317, 507±521.
5. Shan,S., Yoshida,A., Sun,S., Piccirilli,J.A. and Herschlag,D. (1999)
Three metal ions at the active site of the Tetrahymena group I ribozyme.
Proc. Natl Acad. Sci. USA, 96, 12299±12304.
6. Tinoco,I.,Jr, and Bustamante,C. (1999) How RNA folds. J. Mol. Biol.,
293, 271±281.
7. Hermann,T., Auf®nger,P., Scott,W.G. and Westhof,E. (1997) Evidence
for a hydroxide ion bridging two magnesium ions at the active site of the
hammerhead ribozyme. Nucleic Acids Res., 25, 3421±3427.
8. Mikkelsen,N.E., Johansson,K., Virtanen,A. and Kirsebom,L.A. (2001)
Aminoglycoside binding displaces a divalent metal ion in a tRNAneomycin B complex. Nature Struct. Biol., 8, 510±514.
9. Hermann,T. and Westhof,E. (1998) Aminoglycoside binding to the
hammerhead ribozyme: a general model for the interaction of cationic
antibiotics with RNA. J. Mol. Biol., 276, 903±912.
10. Lind,K.E., Du,Z., Fujinaga,K., Peterlin,B.M. and James,T.L. (2002)
Structure-based computational database screening, in vitro assay and
NMR assessment of compounds that target TAR RNA. Chem. Biol., 9,
185±193.
11. Misra,V.K. and Draper,D.E. (1998) On the role of magnesium ions in
RNA stability. Biopolymers, 48, 113±135.
12. Peschke,M., Blades,A.T. and Kebarle,P. (1998) Hydration energies and
entropies for Mg2+, Ca2+, Sr2+, Ba2+. J. Phys. Chem. A, 102, 9978±9985.
13. Tanaka,Y., Kojima,C., Morita,E.H., Kasai,Y., Yamasaki,K., Ono,A.,
Kainosho,M. and Taira,K. (2002) Identi®cation of the metal ion binding
site on an RNA motif from hammerhead ribozymes using (15)N NMR
spectroscopy. J. Am. Chem. Soc., 124, 4595±4601.
14. Sigel,R.K., Vaidya,A. and Pyle,A.M. (2000) Metal ion binding sites in a
group II intron core. Nature Struct. Biol., 7, 1111±1116.
15. Rasmussen,T.A. and Nolan,J.M. (2002) G350 of Escherichia coli RNase
P RNA contributes to Mg2+ binding near the active site of the enzyme.
Gene, 294, 177±185.
16. Tzokov,S.B., Murray,I.A. and Grasby,J.A. (2002) The role of magnesium
ions and 2¢-hydroxyl groups in the VS ribozyme-substrate interaction.
J. Mol. Biol., 324, 215±226.
17. Crary,S.M., Kurz,J.C. and Fierke,C.A. (2002) Speci®c phosphorothioate
substitutions probe the active site of Bacillus subtilis ribonuclease P.
RNA, 8, 933±947.
18. Christian,E.L. and Yarus,M. (1993) Metal coordination sites that
contribute to structure and catalysis in the group I intron from
Tetrahymena. Biochemistry, 32, 4475±4480.
19. Nayal,M. and Di Cera,E. (1994) Predicting Ca(2+)-binding sites in
proteins. Proc. Natl Acad. Sci. USA, 91, 817±821.
20. Nayal,M. and Di Cera,E. (1996) Valence screening of water in protein
crystals reveals potential Na+ binding sites. J. Mol. Biol., 256, 228±234.
21. Chin,K., Sharp,K.A., Honig,B. and Pyle,A.M. (1999) Calculating the
electrostatic properties of RNA provides new insights into molecular
interactions and function. Nature Struct. Biol., 6, 1055±1061.
22. Madura,J.D., Davis,M.E., Gilson,M.K., Wade,R.C., Luty,B.A. and
McCammon,J.A. (1994) Biological applications of electrostatic
calculations and Brownian Dynamics simulations. Rev. Comput. Chem.,
5, 229±267.
23. Hermann,T. and Westhof,E. (1998) Exploration of metal ion binding
sites in RNA folds by Brownian-dynamics simulations. Structure, 6,
1303±1314.
24. vanBuuren,B.N., Hermann,T., Wijmenga,S.S. and Westhof,E. (2002)
Brownian-dynamics simulations of metal-ion binding to four-way
junctions. Nucleic Acids Res., 30, 507±514.
4460
Nucleic Acids Research, 2003, Vol. 31, No. 15
25. Baker,N.A., Sept,D., Joseph,S., Holst,M.J. and McCammon,J.A. (2001)
Electrostatics of nanosystems: application to microtubules and the
ribosome. Proc. Natl Acad. Sci. USA, 98, 10037±10041.
26. Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N.,
Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data
Bank. Nucleic Acids Res., 28, 235±242.
27. Al-Hashimi,H.M., Gorin,A., Majumdar,A., Gosser,Y. and Patel,D.J.
(2002) Towards structural genomics of RNA: rapid NMR resonance
assignment and simultaneous RNA tertiary structure determination using
residual dipolar couplings. J. Mol. Biol., 318, 637±649.
28. Burley,S.K. and Bonanno,J.B. (2002) Structuring the universe of
proteins. Annu. Rev. Genomics Hum. Genet., 3, 243±262.
29. Hansen,J.L., Ippolito,J.A., Ban,N., Nissen,P., Moore,P.B. and Steitz,T.A.
(2002) The structures of four macrolide antibiotics bound to the large
ribosomal subunit. Mol. Cell, 10, 117±128.
30. Yusupov,M.M., Yusupova,G.Z., Baucom,A., Lieberman,K.,
Earnest,T.N., Cate,J.H. and Noller,H.F. (2001) Crystal structure of the
Ê resolution. Science, 292, 883±896.
ribosome at 5.5 A
31. Wimberly,B.T., Brodersen,D.E., Clemons,W.M. Jr, Morgan-Warren,R.J.,
Carter,A.P., Vonrhein,C., Hartsch,T. and Ramakrishnan,V. (2000)
Structure of the 30S ribosomal subunit. Nature, 407, 327±339.
32. Ban,N., Nissen,P., Hansen,J., Moore,P.B. and Steitz,T.A. (2000) The
Ê
complete atomic structure of the large ribosomal subunit at 2.4 A
resolution. Science, 289, 905±920.
33. Klein,D.J., Schmeing,T.M., Moore,P.B. and Steitz,T.A. (2001) The kinkturn: a new RNA secondary structure motif. EMBO J., 20, 4214±4221.
34. Nissen,P., Ippolito,J.A., Ban,N., Moore,P.B. and Steitz,T.A. (2001) RNA
tertiary interactions in the large ribosomal subunit: the A-minor motif.
Proc. Natl Acad. Sci. USA, 98, 4899±4903.
35. Doudna,J.A. (2000) Structural genomics of RNA. Nature Struct. Biol., 7
(Suppl), 954±956.
36. Klosterman,P.S., Tamura,M., Holbrook,S.R. and Brenner,S.E. (2002)
SCOR: a structural classi®cation of RNA database. Nucleic Acids Res.,
30, 392±394.
37. Bagley,S.C. and Altman,R.B. (1995) Characterizing the
microenvironment surrounding protein sites. Protein Sci., 4, 622±635.
38. Waugh,A., Williams,G.A., Wei,L. and Altman,R.B. (2001) Using meta
computing tools to facilitate large-scale analyses of biological databases.
Pac. Symp. Biocomput., 6, 360±371.
39. Banatao,D.R., Huang,C.C., Babbitt,P.C., Altman,R.B. and Klein,T.E.
(2001) ViewFeature: integrated feature analysis and visualization. Pac.
Symp. Biocomput., 6, 240±250.
40. Bagley,S.C. and Altman,R.B. (1996) Conserved features in the active site
of nonhomologous serine proteases. Fold Des., 1, 371±379.
41. Wei,L. and Altman,R.B. (1998) Recognizing protein binding sites using
statistical descriptions of their 3D environments. Pac. Symp. Biocomput.,
3, 497±508.
42. Wei,L., Altman,R.B. and Chang,J.T. (1997) Using the radial distributions
of physical features to compare amino acid environments and align
amino acid sequences. Pac. Symp. Biocomput., 2, 465±476.
43. Wei,L., Huang,E.S. and Altman,R.B. (1999) Are predicted structures
good enough to preserve functional sites? Structure Fold Des., 7, 643±
650.
44. van Buuren,B.N., Schleucher,J. and Wijmenga,S.S. (2000) NMR
structural studies on a DNA four-way junction: stacking preferences and
localization of the metal-ion binding site. J. Biomol. Struct. Dyn., 237±
244.
45. Liang,M.P., Banatao,D.R., Klein,T.E., Brutlag,D.L. and Altman,R.B.
(2003) WebFEATURE: an interactive web tool for identifying and
visualizing functional sites on macromolecular structures. Nucleic Acids
Res., 31, 3324±3327.
46. Gendron,P., Lemieux,S. and Major,F. (2001) Quantitative analysis of
nucleic acid three-dimensional structures. J. Mol. Biol., 308, 919±936.
47. Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary
structure: pattern recognition of hydrogen-bonded and geometrical
features. Biopolymers, 22, 2577±2637.
48. Huang,C.C., Couch,G.S., Pettersen,E.F. and Ferrin,T.E. (1996) Chimera:
an extensible molecular modeling application constructed using standard
components. Pac. Symp. Biocomput., 1, 724.
49. Kieft,J.S. and Tinoco,I.,Jr (1997) Solution structure of a metal-binding
site in the major groove of RNA complexed with cobalt (III) hexammine.
Structure, 5, 713±721.
50. Cate,J.H., Gooding,A.R., Podell,E., Zhou,K., Golden,B.L., Kundrot,C.E.,
Cech,T.R. and Doudna,J.A. (1996) Crystal structure of a group I
ribozyme domain: principles of RNA packing. Science, 273, 1678±1685.
51. Juneau,K., Podell,E., Harrington,D.J. and Cech,T.R. (2001) Structural
basis of the enhanced stability of a mutant ribozyme domain and a
detailed view of RNA±solvent interactions. Structure, 9, 221±231.
52. Conn,G.L., Draper,D.E., Lattman,E.E. and Gittis,A.G. (1999) Crystal
structure of a conserved ribosomal protein±RNA complex. Science, 284,
1171±1174.
53. Conn,G.L., Gittis,A.G., Lattman,E.E., Misra,V.K. and Draper,D.E.
(2002) A compact RNA tertiary structure contains a buried backbone-K+
complex. J. Mol. Biol., 318, 963±973.
54. Misra,V.K. and Draper,D.E. (2001) A thermodynamic framework for
Mg2+ binding to RNA. Proc. Natl Acad. Sci. USA, 98, 12456±12461.
55. Serra,M.J., Baird,J.D., Dale,T., Fey,B.L., Retatagos,K. and Westhof,E.
(2002) Effects of magnesium ions on the stabilization of RNA oligomers
of de®ned structures. RNA, 8, 307±323.