Download Microenvironment analysis and identification of magnesium binding

4450±4460 Nucleic Acids Research, 2003, Vol. 31, No. 15 DOI: 10.1093/nar/gkg471 Microenvironment analysis and identi®cation of magnesium binding sites in RNA D. Rey Banatao, Russ B. Altman* and Teri E. Klein Department of Genetics and Stanford Medical Informatics, 251 Campus Drive, Stanford University, CA 94305, USA Received April 16, 2003; Revised and Accepted May 21, 2003 ABSTRACT Interactions with magnesium (Mg2+) ions are essential for RNA folding and function. The locations and function of bound Mg2+ ions are dif®cult to characterize both experimentally and computationally. In particular, the P456 domain of the Tetrahymena thermophila group I intron, and a 58 nt 23s rRNA from Escherichia coli have been important systems for studying the role of Mg2+ binding in RNA, but characteristics of all the binding sites remain unclear. We therefore investigated the Mg2+ binding capabilities of these RNA systems using a computational approach to identify and further characterize their Mg2+ binding sites. The approach is based on the FEATURE algorithm, reported previously for microenvironment analysis of protein functional sites. We have determined novel physicochemical descriptions of site-bound and diffusely bound Mg2+ ions in RNA that are useful for prediction. Electrostatic calculations using the Non-Linear Poisson Boltzmann (NLPB) equation provided further evidence for the locations of site-bound ions. We con®rmed the locations of experimentally determined sites and further differentiated between classes of ion binding. We also identi®ed potentially important, high scoring sites in the group I intron that are not currently annotated as Mg2+ binding sites. We note their potential function and believe they deserve experimental follow-up. INTRODUCTION The importance of RNA and Mg2+ RNA molecules form complex three-dimensional folds that ultimately determine their biological role. Metal ions, particularly Mg2+, are crucial to stabilizing RNA structure and hence, its function. They counter the negatively charged phosphate backbone and participate in the catalytic reactions of some ribozymes (1±3). Reliably locating these binding sites is important for understanding RNA folding, catalysis and ultimately in drug development (4±10). Two main modes of Mg2+ binding have been proposed (11). In diffuse binding, fully hydrated Mg2+ ions interact with RNA through long-range electrostatic interactions, while site-bound Mg2+ ions are bound directly via anionic ligands from the RNA. Site binding is energetically costly, as it requires 10± 30 kcal/mol to remove a single water molecule from the outer and inner hydration layers of an ion, respectively, and therefore requires a large electrostatic contribution to binding (12). Locating Mg2+ binding sites Mg2+ binding sites in RNA are typically identi®ed through X-ray crystallography or NMR methods. However, electron densities of metal ions can be dif®cult to distinguish and NMR techniques have only just recently given direct evidence of ion binding in RNA (13). Many sites can be easily mislabeled as waters or may be missing entirely from fully re®ned crystal structures or solution structures of RNA, and must be located through further experimentation. Ion binding sites of high af®nity can be probed using cleavage experiments, where hydrolytic or oxidative metals cleave surrounding RNA (14). Additionally, mutation experiments can create the loss of a functional Mg2+ binding site (15,16). Another approach involves rescuing phosphorothioate effects in the presence of thiophilic metal ions such as Cd2+ or Mn2+ (17,18). This method can detect site-bound ions, as they require direct coordination between metal ions and phosphoryl oxygens in the RNA. If a structure is known, computational methods may be useful for predicting and characterizing Mg2+ binding sites in RNA structures and can aid crystallographers in distinguishing ion densities. A straightforward and computationally fast algorithm is valence screening for metal ions (19,20). This method essentially calculates the valence potential of oxygens within a de®ned radius along a ®ne three-dimensional grid laid over a molecular structure. A more rigorous approach would be to use electrostatic calculations, where solutions to the Poisson± Boltzmann equation, or in the case of nucleic acids, the nonlinear Poisson±Boltzmann (NLPB) equation, can be used to identify negative potential on the surface of an RNA molecule where ions most likely aggregate (4,21). In addition to calculating electrostatic potentials, Brownian-Dynamics (BD) simulations can be used to further elucidate ion binding sites in RNA folds (22±24). However, none of these computational approaches are suitable for analyzing large complexes without signi®cant calculation (25). *To whom correspondence should be addressed. Tel: +1 650 725 3394; Fax: +1 650 725 7944; Email: [email protected] Nucleic Acids Research, Vol. 31 No. 15 ã Oxford University Press 2003; all rights reserved Nucleic Acids Research, 2003, Vol. 31, No. 15 A novel approach in the context of structural genomics The Protein Data Bank (PDB; http://www.rcsb.org/pdb/) now includes over 1900 structures of nucleic acids, ranging from simple helices to complex, entangled, assemblies of RNA and protein (26). With advances in structure determination techniques, and the many structural genomic efforts currently underway, the number of solved RNA structures and complexes is rapidly increasing (see http://www.rcsb.org/ pdb/strucgen.htmlWorldwide for projects worldwide) (27,28). A major goal of structural genomics is to classify all the possible molecular folds that exist in nature. Although, the number of known RNA structural motifs is less than those for proteins, structures of RNA±protein complexes, such as the ribosome, are providing numerous examples of these motifs (29±34), and causing increased interest in computational methods for analysis of RNA structural motifs (35,36). Computational approaches that can scale to analyze databases of RNA structures, including large assemblages, will be able to mine the structural data and provide examples of these motifs. FEATURE was originally developed for the analysis of protein microenvironments (37). The FEATURE method uses a supervised learning algorithm to recognize sites from a training set of sites and non-sites. Details of the algorithm and its performance characteristics have been published elsewhere (37±43). Site models have been created for different types of functional sites in proteins, such as various metal binding sites, serine protease active sites and ATP binding sites (40,41). These sites were detected with high sensitivity and speci®city. Until now, FEATURE has not been applied to analyzing nucleic acid structures. Although Mg2+ binding is driven mostly by electrostatics, it has been shown that non-coulombic properties can in¯uence the electrostatic ®eld (44). We believe that Mg2+ binding sites can be differentiated not only by charge alone, but also by the biochemical and structural properties surrounding the binding site. We used FEATURE to study the differences between site-bound and diffusely bound Mg2+ ions in complex RNA folds. To con®rm our hypothesis that sites could be differentiated by these microenvironments, we used our statistical models to identify sites in well-characterized RNA systems where both types of binding occur. These RNA structures were not included in the training sets and served as an independent test set. FEATURE analysis was compared with the results of valence and electrostatic calculations on these various systems. MATERIALS AND METHODS RNA modi®ed FEATURE FEATURE was originally developed and implemented for analyzing protein structures (37,41). We extended the property set to include RNA biochemical and structural properties. The resulting properties are listed in Figure 1. To create the diffuse binding model, we de®ned sites as Ê of any RNA atom. The atomic Mg2+ ions not within 3 A density surrounding each site was calculated and averaged over the total number of sites. We generated non-sites by randomly choosing grid points laid over an RNA structure. Ê of a Mg2+ ion, and whose Only grid points not within 10 A 4451 radial density was within one standard deviation of the average atomic density of the sites were included as non-sites. These selection criteria prevent bias for sites or non-sites Ê of any based on atomic density alone. Mg2+ ions within 3 A three RNA ligands (site bound) were also included as nonsites. A total of 126 sites and 334 non-sites were used to generate the diffuse binding model. For the site-bound model, Ê of any three RNA ligands were Mg2+ ions within 3 A Ê of an considered as sites, while Mg2+ ions not within 3 A RNA ligand were used as non-sites. The randomly generated non-sites from the diffuse model were also used as non-sites for the site-bound model. A total of 30 sites and 330 non-sites were used to generate the site-bound model. The structures used for training are listed in Table 1. FEATURE source code and documentation is available at http://feature.stanford.edu/. A web interface for FEATURE (http://feature.stanford.edu/ webfeature/) (45) is available for real-time scanning and visualization of functional sites, including Mg2+ binding sites, in both RNA and protein structures. FEATURE results consist of two separate outputs. The result of training is the site model, visualized as a twodimensional plot of properties against volumes in Figure 1. Statistical signi®cance is determined using the Wilcoxon ranksum test. Statistically signi®cant property-volume pairs (at P = .05) are colored blue, while statistically de®cient pairs are colored green. Insigni®cant pairs are left blank. The second output is the result of scanning a query structure for sites. Detected hits are scored using a log-odds scoring function (41) as shown below (Equation 1); where higher scoring hits are more likely to be a site of interest. score Psite j environment Psite 1 Score cut-offs for the site-bound and diffusely bound models were determined using a ROC (Receiver Operating Curve) plot of 1-speci®city against sensitivity; where sensitivity and speci®city were calculated by FEATURE's ability to detect crystallographically bound Mg2+ ions in a set of RNA structures. Sensitivity provided a measure of the method's ability to identify all crystallographical Mg2+ ions, while speci®city provided a measure of the false positive rate. As crystallographical Mg2+ ions are not comprehensively determined, we opted for score cut-offs that gave a high speci®city at the loss of sensitivity so as to assure the reliability of predicted sites. The ROC plot is shown in Figure 2. Details of the statistical model, learning method, and inference method are available in previous publications (37,38,40±43). MC-Annotate We used a modi®ed version of MC-Annotate (45), to calculate secondary and tertiary RNA structural properties. MCAnnotate takes RNA pdb ®les as input and generates a list of RNA secondary and tertiary structure properties. These properties were translated into a FEATURE readable format and then assigned to each base within an RNA molecule. For analysis of proteins, FEATURE previously used DSSP (46) ®les to calculate properties such as solvent accessibility, mobility and secondary structural properties. FEATURE uses 4452 Nucleic Acids Research, 2003, Vol. 31, No. 15 Figure 1. The microenvironments of site-bound and diffusely bound Mg2+ ions in RNA. Physicochemical properties of signi®cance (at P = 0.05) are listed Ê in width. Colored squares represent property-volume pairs down the left-most column. Increasing volumes move away from the center of the site and are 2 A of statistical signi®cance (see legend). Properties that were statistically insigni®cant between sites and non-sites for both models are not shown. These include: atom-based properties: Sulfur, Other, Positive Charge; residue-based properties: T, Sugar Pucker C1¢ endo, Sugar Pucker C4¢ endo; secondary structure based properties: Adjacent Base Pair, A-Form Helix, X-Form Helix, Strand is Other; and tertiary structure based properties: Loop±Strand, Helix±Strand, Bulge±Loop, Bulge±Helix, Internal Loop±Bulge, Internal Loop±Loop, Other and Pseudoknot. Nucleic Acids Research, 2003, Vol. 31, No. 15 4453 Table 1. A list of the structures used for training site-bound and diffusely bound models For scans of the group I intron, sites from 1gid and 1hr2 were removed from the training set. Figure 2. ROC curve for FEATURE detection of site-bound and diffusely bound Mg2+ ions in RNA were used to determine score cut-offs. Higher sensitivity and speci®city is achieved for site-bound ions as their locations are fewer, more localized, and thus, more easily detected in RNA. The score cut-offs used in our analysis are labeled on the curve in relation to the resulting sensitivity and speci®city of detection. Cut-offs were chosen to maximize speci®city over sensitivity, so as to assure reliable detection of sites. RESULTS a proprietary format of the MC-Annotate output to count the secondary and tertiary structural properties. The current implementation of FEATURE can read both DSSP and our formatted version of MC-Annotate ®les in order to support complexes of both RNA and proteins. MC-Annotate is accessible at http://www-lbit.iro.umontreal.ca/mcannotate/. ViewFEATURE The results of FEATURE scans were analyzed in ViewFEATURE, a visualization tool developed previously (39). ViewFEATURE allows the user to view the FEATURE hits as spheres within the RNA structure of interest. Radial shells, sites and non-sites, as well as the properties of interest, can be visualized with an interactive graphical user interface. ViewFEATURE is available as an extension to the molecular modeling tool, Chimera (47). The ViewFEATURE source code can be downloaded from http://feature.stanford.edu/. Chimera is available at http://www.cgl.ucsf.edu/chimera/. Electrostatics The non-linear solution for the Poisson±Boltzmann equation was used to calculate the electrostatic potential of each RNA system. The speci®c program we used, Qnifft, was downloaded from http://pylelab.org/research/computation/ electrostatics/. Calculations were performed as published previously (21). Valence calculations For valence calculations (19), parameters were varied to try and predict both site-bound and diffusely bound ions. For siteÊ, bound ions, grid spacing was varied from 0.1 to 0.15 A Ê , and number of ligands from one ligation sphere from 3 to 4 A to three. For diffusely bound ions, the ligation sphere was Ê to simulate a hydration layer. Valence increased to 6±7 A source code was downloaded from http://biochem.wustl.edu/ ~enrico/vale.htm/. In order to support analysis of RNA structures, in addition to proteins, we selected a set of physicochemical and structural properties that pertain to either or both types of molecules, which resulted in a total of 112 of properties. These properties provided descriptions for the statistical models of the microenvironments surrounding site-bound and diffusely bound Mg2+ ions, and are summarized in Figure 1. Our statistical models reveal, in addition to charge, the major differences between the microenvironments of sitebound and diffusely bound Mg2+ ions. We found that diffusely bound Mg2+ ions are coordinated by a larger repertoire of potential ligands than are site-bound ions. However, sitebound Mg2+ ions are more often located near `irregular' or non-helical RNA tertiary interactions. To verify these structural motifs, we used FEATURE to identify binding sites in a number of RNA systems not included in the training set. We compared FEATURE with two other computational methods for ion binding site detection and found that often FEATURE could better pinpoint crystallographically observed Mg2+ binding sites. Because of their unique environment, sitebound Mg2+ ions were easier to positively predict than diffusely bound Mg2+ ions in complex RNA systems. Microenvironment analysis of Mg2+ binding To differentiate between site-bound and diffusely bound Mg2+ ions, we computed distances between crystallographically observed Mg2+ ions and the surrounding RNA. For diffuse binding, we trained FEATURE on Mg2+ ions that were either Ê of any RNA-derived atom. fully hydrated or not within 3 A Ê of at least three possible Ions that were within 3 A coordinating ligands, such as phosphate oxygens or amine nitrogens, were considered site-bound. In high-resolution Ê ), inner-sphere contacts with the ion can be structures (~2 A observed, as in the 50s ribosomal subunit (33). Site-bound Mg2+ ions as well as randomly chosen non-sites served as controls (see Materials and Methods for description of how these non-sites were generated). For the site-bound model, diffusely bound Mg2+ ions and randomly generated non-sites 4454 Nucleic Acids Research, 2003, Vol. 31, No. 15 Table 2. A list of the cyrstallographical Mg2+ ions in the group I intron and 23s rRNA for which FEATURE was able to distinguish between site-bound and diffusely bound sites The ®rst three columns, from left, list the structures and their PDB identi®ers. The fourth column lists crystallographical Mg2+ ions as labeled in their respective structures. Site-bound ions are labeled with a single asterisk. Columns 5 through to 8 describe the FEATURE hits near the crystallographically bound ion. Column 5 provides the FEATURE hit's rank, by score, with respect to the total number of hits above the de®ned cut-offs of 50 and 35 for the site-bound and diffusely bound models, respectively. Column 6 has the percentile rank of the hit's score compared with the hits above cut-off. Column 7 lists the score of the hit while column 8 gives the hit's distance to the crystallographical Mg2+ ion. Columns 9 and 10 provide the rank and percentile, with respect to all detected locations above cut-off, for valence calculations. The values in columns 9 and 10 provide a measure of comparison for FEATURE's values in columns 5 and 6. Valence calculations were not able to detect many of the crystallographical sites, while those that were detected were below the cut-off of 1.8. Column 10 shows the valence score, above cut-off, of a detected location, while column 11 provides its distance to the crystallographical ion. The highest scoring valence positions, as labeled in the legend, did not agree with electrostatics and FEATURE. As electrostatics calculations do not provide single locations for potential sites, column 12 refers to the ®gures for a visual comparison of the results of all three methods. B/C, below cutoff; ***, distance to nearest site-bound ion; **, as labelled in PDB; *, site bound ion; Ù, highest valence points, not near any crystallographically bound magnesium. served as controls. The resulting statistical models revealed that biochemical and structural properties, in addition to charge, contribute to differentiating between site-bound and diffusely bound Mg2+ ions. The site-bound and diffusely bound models are plotted in Figure 1 as property-volume pairs of statistical signi®cance. Mg2+ binding site detection To detect Mg2+ binding sites based on their microenvironments, we used FEATURE to identify and differentiate between sites in simple and complex RNA folds. The P5b stem±loop (1ajf) (48) and P456 domain (1gid and 1hr2) (49,50) from the Tetrahymena thermophila group I intron, and a 58 nt sequence from Escherichia coli and Bacillus stearothermophilus 23s rRNA (1qa6 and 1hc8) (51,52) provided examples of known, important metal binding sites in RNA. We performed electrostatic and valence calculations on these structures and compared results with those of FEATURE. Previous studies differentiated between the modes of Mg2+ binding in these structures (4,21). In our analyses, we used these examples of site-bound and diffusely bound Mg2+ ions as test cases and found that we were able to identify both types of sites occurring in simple (P5b stem± loop) and complex systems (23s rRNA and group I intron). Table 2 summarizes the results of the various computational methods for Mg2+ binding site prediction on the test structures. In most cases, FEATURE identi®ed the crystallographically bound ion more precisely than the other methods. Figures 3±5 display the results of these predictions superimposed upon the RNA test structures. Nucleic Acids Research, 2003, Vol. 31, No. 15 4455 Figure 3. The results of FEATURE detection for site-bound Mg2+ ions (a), valence calculations (b) and electrostatic calculations (c) on the P456 domain of the Tetrahymena group I intron. Crystallographically bound Mg2+ ions are shown as yellow spheres. FEATURE successfully detects [red spheres in (a)] the well-known ion binding core of the intron. Valence hits [blue spheres in (b)] do not correlate well with the ion binding core while electrostatics (c), in agreement with FEATURE, detects this site as the most negative potential in the molecule. Predicted sites in the Tetrahymena group I intron We predicted two binding sites in the P456 domain of the group I intron. The hits at the sites and their potential coordinating ligands are highlighted in Figure 6 and labeled as `Site 1' and `Site 2'. Table 3 contains the coordinates and scores of the FEATURE hits. Site 1 was observed using both the site-bound and diffusely bound models for Mg2+ binding. Site 2 was seen only using the diffusely bound model. Although the scores of the hits were just above the de®ned cutoff, they nevertheless ranked higher than other sites for crystallographically observed Mg2+ ions. Interestingly, the hits at Site 1 overlap with a crystallographically observed water molecule in 1hr2. DISCUSSION Statistical models for Mg2+ binding A major advantage FEATURE has over both electrostatic and valence calculations is that FEATURE reveals detailed statistical differences of the physicochemical properties between the two site models for Mg2+ binding in RNA. At a signi®cance of P = 0.05, the following description of these statistical differences can also be visualized in Figure 1. A major difference between the site models for both types of Mg2+ binding was the presence of most of the atom-derived Ê ) in the site-bound properties in the ®rst two shells (0±4 A model versus a lack thereof in the diffuse model. The hydration layer surrounding diffusely bound Mg2+ may explain this result. Diffusely bound Mg2+ ions tend to sit in large pockets or grooves on the surface of the RNA. Whereas site-bound Mg2+ ions locate in tighter, more electronegative pockets where potential coordinating ligands can directly bind the ion, and penetrate the surrounding layer of water molecules (53). Our results con®rm the notion that site binding occurs in highly electronegative pockets. In contrast to the diffuse binding model, many of the signi®cant properties in the site bound model occur within the innermost Ê of a site. Speci®cally, phosphate groups, which are most 4A often the coordinating ligand to Mg2+ ions, are signi®cant Ê . Also, a signi®cance of oxygen atoms from 0 to 8 A Ê. continuously surround the Mg2+ ion from 2 to 10 A Ê Ê Hydroxyl groups are signi®cant at 4 A and extend to 10 A. Our site-bound model shows other potential ligands, such as amines and carbonyls, occurring only in the outermost shell Ê , suggesting that phosphates and sometimes from 8 to 10 A hydroxyl groups are the predominant ligand donors for siteÊ bound Mg2+ ions. Negative charge is signi®cant from 2 to 4 A Ê within the site model. This is most and then from 6 to 10 A likely due to the fact that negatively charged atoms that contribute to the site-bound pocket are covalently bound to non-polar atoms, such as carbons. This creates a binding Ê layer surrounded immediately by a nonpocket in the 2±4 A Ê . This contrasts the diffuse model, polar layer from 4 to 6 A where negative charge only ®rst appears signi®cant outside Ê from a Mg2+ ion. 4A 4456 Nucleic Acids Research, 2003, Vol. 31, No. 15 Figure 4. FEATURE detects both site-bound (a) and diffusely bound (b) ions in the E.coli 58 nucleotide 23s rRNA. Crystallographically bound Mg2+ ions are shown as yellow spheres. FEATURE also detects the K+ ion (orange sphere), suggested to help stabilize the buried RNA backbone (52). Valence hits [values 1.6±1.7 as blue spheres in (c)] are observed in the site-bound region, but are below cut-off. Electrostatic calculations (d) reveal the highly negative potential (red surface color) contributing to the binding of a site-bound ion near the buried RNA backbone. Of all the bases, adenosines were the most signi®cant from 2 Ê in the site-bound model. This suggests that adenosines to 10 A are highly prevalent near site-bound Mg2+ ions. This may be explained by adenosine, of all the bases, having the most number of negatively charged atoms that could behave as ligands. Another explanation is the regular participation of adenosines in irregular RNA conformations that cause the tight twisting of the phosphate backbone, such as in the A-minor motif, as seen in the Tetrahymena group I intron. In this motif, ¯ipped adenosines are stabilized through the interaction with the minor groove of an adjacent helix and inner sphere contacts of phosphate oxygens with a Mg2+ ion. This contrasts the diffusely bound model where adenosines had no signi®cant role in the site and was de®cient in the ®rst Ê ). Instead, uridines and guanosines were two shells (0±4 A signi®cant in the outer two to three shells of the diffusely Nucleic Acids Research, 2003, Vol. 31, No. 15 4457 Figure 5. The results of FEATURE detection for diffusely bound Mg2+ ions (a), valence calculations (b) and electrostatic calculations (c) on the P5b stem± loop of the group I intron. The crystallographically determined cobalt hexamine, located at the center of the stem±loop is displayed in yellow in (a) and (b). Cobalt hexamine is representative of diffuse ion binding in RNA. Feature's hits [red spheres in (a)] above cut-off nearly superimpose the cobalt hexamine. Highest valence values [blue spheres in (b)] range from 1.0±1.2 and miss the diffusely bound ion. Electrostatics calculations (c) show a negatively charged pocket (red = approximately ±20 kT/e) where the diffusely bound ion locates. bound model. As non-Watson±Crick base pairs were also signi®cant in the outer two shells, it may be that this signi®cance of G's and U's are actually base paired in a non-canonical fashion. Interestingly, it has been described that the 5¢-UG/GU sequence is the most common and stable among non-Watson±Crick pairs, and that GU tandem pairs provide potential metal binding sites with a unique structural environment (54). This GU tandem metal binding motif is seen in the group I intron as well as other smaller RNA structures (48,49). Our analysis provides further evidence of GU tandems providing locations for ion binding. Finally, many of the properties that might be associated with irregular, non-helical, RNA structure were signi®cant Ê in the site-bound model. These properties from 2 to 10 A included base conformation in syn, non-WC base pairing, base triples, non-adjacent base stacking, single-stranded RNA, and many tertiary RNA interactions. These results suggests that irregular RNA conformations most likely generate regions where site-bound Mg2+ ions would be needed to counteract highly negative pockets formed by the twisting RNA backbone and bases. This is seen in both the Tetrahymena group I intron and Escherichia coli 23s rRNA where the phosphate backbone forms a tight kink and becomes buried beneath the surface of the RNA, thus requiring a site-bound Mg2+ to stabilize the fold (see Figs 3 and 4). FEATURE site detection The P456 domain from the group I intron, and the 58 nt 23s rRNA provided well-characterized examples of known sitebound Mg2+s (see Table 2). In all cases, using the site-bound Figure 6. Two predicted Mg2+ binding sites in the group I intron. FEATURE hits (red spheres) at Site 1 overlap the crystallographically bound water (cyan colored sphere, WAT 123 in PDB ID 1hr2). Site 1 is potentially site-bound (micro view of Site 1 is rotated 180 degrees to provide a better view of the site). Residues that could potentially participate in coordination of an ion are displayed as sticks. Site 2 shows a potential diffusely bound Mg2+ site. Atoms are colored by type: C = green, N = blue, O = red. model, FEATURE was able to locate the site-bound Mg2+ ions. The top scoring hits, above the cut-off, were always Ê of the crystallographically bound located within several A 2+ Mg and in most cases overlapped the documented site-bound ion. In all these examples, the highest scoring hits were well 4458 Nucleic Acids Research, 2003, Vol. 31, No. 15 Table 3. Coordinates and scores of predicted Mg2+ binding sites above the cut-off, which gave us con®dence in predicting those locations as site-bound. Lower scoring hits also located near the true binding site. In theory, site-bound locations should be easier to predict as their microenvironments are somewhat unique compared with diffuse binding sites, as seen in the statistical models. Diffuse sites have been described to be more pervasive throughout RNA structure and bind in grooves and pockets on the surface of the RNA. Although this makes their binding sites more dif®cult to pinpoint to a single location, diffuse binding sites have been observed in crystal structures, such as those in the aforementioned structures, as well as in less complex systems such as in the P5b stem±loop, hairpin ribozyme, lead-dependent ribozyme and 7s RNA of human SRP. These sites might be described as more `speci®c' than most diffusely bound ions involved in counter ion effects, yet not quite site-bound (by de®nition) since they do not make contact with non-water ligands. Using the diffusely bound model, FEATURE was able to locate diffuse sites in the aforementioned test structures. Because FEATURE was trained on diffuse sites in crystal structures, it should be successful in predicting diffuse sites to single locations in query structures. In the case of structures containing sitebound Mg2+ ions, the diffuse model also identi®ed those locations as high-scoring sites. Comparison of methods for Mg2+ site prediction P456 domain of Tetrahymena group I intron. Figure 3 shows the results of a FEATURE analysis, as well as valence and electrostatics calculations for the P456 domain of the Tetrahymena group I intron. Near the center of the molecule is the well-known ion binding core, implicated in folding, stability and function of this ribozyme (2). The electrostatic potential is highly negative over this region of the RNA, where the tight kink in the phosphate backbone results in the placement of many electronegative groups near one another. The Mg2+ ions observed in the crystal structure have an important role in countering the negative potential and stabilizing the fold. FEATURE's top hits, above the score cut-off, also lie in this region, and nearly superimpose the experimentally observed Mg2+ ions. In contrast, we found that valence calculations did not perform as well in picking out site-bound Mg2+ ions. The highest valence hits (valence values of 2.0±2.1) above the de®ned cut-off of 1.8 (19) were not located at the ion binding core of the group I intron, in disagreement with electrostatic and FEATURE calculations. Some valence hits above the cut-off did locate near the ion binding core, but other hits of the same value also scattered over the surface of the molecule without much correlation with the crystallographically bound ions, electrostatic and FEATURE predictions. Although valence calculations should theoretically work on RNA structures, we show that inclusion of other properties in addition to the coordinating ligands can improve ion binding site detection. The two predicted binding sites we observed in the group I intron had scores just above the de®ned cut-off for both the site-bound and diffusely bound model. Our results suggest that a partially site-bound ion may exist at Site 1, while more diffuse ion binding interactions might take place at Site 2 (see Fig. 6). Valence calculations detected hits at Site 1 (valence values 1.4±1.6), but were below the required cut-off. Valence hits were not detected at Site 2. Electrostatics calculations show negative potential at both locations, but at cut-offs too low to locate a speci®c binding site. Using the site-bound model, we observed that Site 1 overlaps a crystallographically observed water. Visual inspection of Site 1 shows that the potential site is surrounded by three 2¢ hydroxyl groups from the ribose groups of three nearby guanosines that could act as potential ligands to a site-bound ion. Site 2, on the other hand, was only observed when using the diffusely bound model. As in Site 1, 2¢ hydroxyl groups from nearby ribose groups may also be the major coordinating ligands in diffuse interactions with a bound ion at Site 2. 23s rRNA Figure 4 shows the results of the same calculations on the 23s rRNA. All the known site-bound Mg2+ ions are successfully detected by both electrostatics and FEATURE. Diffusely bound Mg2+ ions were also detected by FEATURE. However, FEATURE missed two of the diffuse binding sites because the cut-off was set high enough to maintain a reasonable false positive rate. It was more dif®cult to pinpoint diffuse binding sites using electrostatics calculations since the RNA surface is so negatively charged. As the kT/e cut-off was moved towards zero, the entire surface of the RNA molecule quickly became covered by negative potential and describes more of a distribution in which binding could occur. In the case of the 23s rRNA, electrostatics had a harder time detecting the speci®c locations of more diffusely bound Mg2+ ions on the whole RNA structure. P5b stem±loop Figure 5 shows our binding site analysis of the P5b stem±loop, a simple RNA fold. The crystal structure reveals a single cobalt hexamine, representative of a diffusely bound Mg2+ ion. Both electrostatic calculations and valence calculations only give general indications of where diffusely bound ions can be observed. In the case of valence calculations, sites of low valence (values of 1.1±1.2) were detected along the phosphate backbone, while the crystallographically bound site was missed completely. Theoretically, in solution, diffusely bound Mg2+ ions can migrate through deep grooves and along the phosphate backbone in RNA (4). At a cut-off around 10±20 kT/e, characteristic of diffuse ion binding, electrostatic calculations reveal that most of the major groove and phosphate backbone are regions where diffusely bound Mg2+ could migrate. However, FEATURE analysis of this RNA molecule predicts a highly localized binding site, precisely at Nucleic Acids Research, 2003, Vol. 31, No. 15 the location of the crystallographically observed ion. FEATURE's ability to pinpoint binding sites is most likely due to the fact that our training set consists of sites from crystal structures. It is therefore more likely to pick out sites that could potentially be observed in actual crystal structures versus those in solution. Thus we ®nd that the computational methods may be complementary to each other as they provide different pieces of evidence towards predicting binding sites. FEATURE is complementary to other computational methods for predicting Mg2+ binding sites in RNA. FEATURE provides statistical evidence for ion binding in crystal structures, whereas electrostatics and valence calculations provide physical quanti®cation of the electrostatic potential of regions within the molecule. Together these methods can provide complementary evidence for site binding of Mg2+. We have analyzed RNA systems whose site-bound and diffusely bound Mg2+ binding sites are well characterized. FEATURE was successful at detecting both site-bound and diffusely bound Mg2+ ions. We ®nd that site-bound ions are easier to detect because of their unique microenvironment. Such is the case in both the E.coli 23s rRNA and the P456 domain from the Tetrahymena group I intron. In simple RNA folds, such as the P5b stem±loop, FEATURE is successful at detecting diffuse Mg2+ binding sites and locates them to speci®c areas, whereas the other techniques give more general descriptions of ion binding. We also report the observation of two predicted Mg2+ binding sites in the P456 domain of the Tetrahymena group I intron; one site-bound and one diffusely bound. Presently, we do not report overall sensitivity and speci®city for FEATURE's performance, as it is unclear as to the number of missing or mislabeled Mg2+ ions in known RNA crystal structures. In other words, there lacks a standard for statistical comparison. As the number of known Mg2+ ions in RNA crystal structures increases, and as structure determination methods improve, we will be able to report better performance statistics for FEATURE. Since FEATURE runs in linear time, we propose that FEATURE be used as a screening tool for Mg2+ binding site detection. High scoring hits can be further explored with rigorous NLPB calculations. As structure determination methods improve, the types of RNA structures and complexes being solved become increasingly sophisticated. FEATURE lends itself well to the analysis of large complexes, such as the ribosome, where there are thousands of interactions. Once site-models are built, databases can be quickly screened for functional sites (38). Computational methods, like FEATURE, provide a high level annotation of function to molecular structures, and are needed as structural genomics projects provide a wealth of structural data. Finally, as the importance of RNA as a tool and target for drug discovery increases, our approach provides a foundation for the analysis of other RNA functional sites, such as RNA± RNA, RNA±protein and RNA±small molecule interactions. ACKNOWLEDGEMENTS The authors would like to thank Mike Liang, Dr Sean Mooney and Dr Andrew Bogan for helpful discussions on this work and manuscript. We also thank Dr Francois Major and Dr Patrick Gendron for providing a modi®ed version of MC-Annotate. This work is supported by NIH Grants LM-05652 and LM06422 (R.B. Altman P.I.). D.R.B. is supported by NIH 2 R25 4459 GM056847-04 and is a PhD candidate in the Graduate Program in Biological and Medical Informatics at the University of California, San Francisco. REFERENCES 1. Hanna,R. and Doudna,J.A. (2000) Metal ions in ribozyme folding and catalysis. Curr. Opin. Chem. Biol., 4, 166±170. 2. Cate,J.H., Hanna,R.L. and Doudna,J.A. (1997) A magnesium ion core at the heart of a ribozyme domain. Nature Struct. Biol., 4, 553±558. 3. Brannvall,M. and Kirsebom,L.A. (2001) Metal ion cooperativity in ribozyme cleavage of RNA. Proc. Natl Acad. Sci. USA, 98, 12943± 12947. 4. Misra,V.K. and Draper,D.E. (2002) The linkage between magnesium binding and RNA folding. J. Mol. Biol., 317, 507±521. 5. Shan,S., Yoshida,A., Sun,S., Piccirilli,J.A. and Herschlag,D. (1999) Three metal ions at the active site of the Tetrahymena group I ribozyme. Proc. Natl Acad. Sci. USA, 96, 12299±12304. 6. Tinoco,I.,Jr, and Bustamante,C. (1999) How RNA folds. J. Mol. Biol., 293, 271±281. 7. Hermann,T., Auf®nger,P., Scott,W.G. and Westhof,E. (1997) Evidence for a hydroxide ion bridging two magnesium ions at the active site of the hammerhead ribozyme. Nucleic Acids Res., 25, 3421±3427. 8. Mikkelsen,N.E., Johansson,K., Virtanen,A. and Kirsebom,L.A. (2001) Aminoglycoside binding displaces a divalent metal ion in a tRNAneomycin B complex. Nature Struct. Biol., 8, 510±514. 9. Hermann,T. and Westhof,E. (1998) Aminoglycoside binding to the hammerhead ribozyme: a general model for the interaction of cationic antibiotics with RNA. J. Mol. Biol., 276, 903±912. 10. Lind,K.E., Du,Z., Fujinaga,K., Peterlin,B.M. and James,T.L. (2002) Structure-based computational database screening, in vitro assay and NMR assessment of compounds that target TAR RNA. Chem. Biol., 9, 185±193. 11. Misra,V.K. and Draper,D.E. (1998) On the role of magnesium ions in RNA stability. Biopolymers, 48, 113±135. 12. Peschke,M., Blades,A.T. and Kebarle,P. (1998) Hydration energies and entropies for Mg2+, Ca2+, Sr2+, Ba2+. J. Phys. Chem. A, 102, 9978±9985. 13. Tanaka,Y., Kojima,C., Morita,E.H., Kasai,Y., Yamasaki,K., Ono,A., Kainosho,M. and Taira,K. (2002) Identi®cation of the metal ion binding site on an RNA motif from hammerhead ribozymes using (15)N NMR spectroscopy. J. Am. Chem. Soc., 124, 4595±4601. 14. Sigel,R.K., Vaidya,A. and Pyle,A.M. (2000) Metal ion binding sites in a group II intron core. Nature Struct. Biol., 7, 1111±1116. 15. Rasmussen,T.A. and Nolan,J.M. (2002) G350 of Escherichia coli RNase P RNA contributes to Mg2+ binding near the active site of the enzyme. Gene, 294, 177±185. 16. Tzokov,S.B., Murray,I.A. and Grasby,J.A. (2002) The role of magnesium ions and 2¢-hydroxyl groups in the VS ribozyme-substrate interaction. J. Mol. Biol., 324, 215±226. 17. Crary,S.M., Kurz,J.C. and Fierke,C.A. (2002) Speci®c phosphorothioate substitutions probe the active site of Bacillus subtilis ribonuclease P. RNA, 8, 933±947. 18. Christian,E.L. and Yarus,M. (1993) Metal coordination sites that contribute to structure and catalysis in the group I intron from Tetrahymena. Biochemistry, 32, 4475±4480. 19. Nayal,M. and Di Cera,E. (1994) Predicting Ca(2+)-binding sites in proteins. Proc. Natl Acad. Sci. USA, 91, 817±821. 20. Nayal,M. and Di Cera,E. (1996) Valence screening of water in protein crystals reveals potential Na+ binding sites. J. Mol. Biol., 256, 228±234. 21. Chin,K., Sharp,K.A., Honig,B. and Pyle,A.M. (1999) Calculating the electrostatic properties of RNA provides new insights into molecular interactions and function. Nature Struct. Biol., 6, 1055±1061. 22. Madura,J.D., Davis,M.E., Gilson,M.K., Wade,R.C., Luty,B.A. and McCammon,J.A. (1994) Biological applications of electrostatic calculations and Brownian Dynamics simulations. Rev. Comput. Chem., 5, 229±267. 23. Hermann,T. and Westhof,E. (1998) Exploration of metal ion binding sites in RNA folds by Brownian-dynamics simulations. Structure, 6, 1303±1314. 24. vanBuuren,B.N., Hermann,T., Wijmenga,S.S. and Westhof,E. (2002) Brownian-dynamics simulations of metal-ion binding to four-way junctions. Nucleic Acids Res., 30, 507±514. 4460 Nucleic Acids Research, 2003, Vol. 31, No. 15 25. Baker,N.A., Sept,D., Joseph,S., Holst,M.J. and McCammon,J.A. (2001) Electrostatics of nanosystems: application to microtubules and the ribosome. Proc. Natl Acad. Sci. USA, 98, 10037±10041. 26. Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235±242. 27. Al-Hashimi,H.M., Gorin,A., Majumdar,A., Gosser,Y. and Patel,D.J. (2002) Towards structural genomics of RNA: rapid NMR resonance assignment and simultaneous RNA tertiary structure determination using residual dipolar couplings. J. Mol. Biol., 318, 637±649. 28. Burley,S.K. and Bonanno,J.B. (2002) Structuring the universe of proteins. Annu. Rev. Genomics Hum. Genet., 3, 243±262. 29. Hansen,J.L., Ippolito,J.A., Ban,N., Nissen,P., Moore,P.B. and Steitz,T.A. (2002) The structures of four macrolide antibiotics bound to the large ribosomal subunit. Mol. Cell, 10, 117±128. 30. Yusupov,M.M., Yusupova,G.Z., Baucom,A., Lieberman,K., Earnest,T.N., Cate,J.H. and Noller,H.F. (2001) Crystal structure of the Ê resolution. Science, 292, 883±896. ribosome at 5.5 A 31. Wimberly,B.T., Brodersen,D.E., Clemons,W.M. Jr, Morgan-Warren,R.J., Carter,A.P., Vonrhein,C., Hartsch,T. and Ramakrishnan,V. (2000) Structure of the 30S ribosomal subunit. Nature, 407, 327±339. 32. Ban,N., Nissen,P., Hansen,J., Moore,P.B. and Steitz,T.A. (2000) The Ê complete atomic structure of the large ribosomal subunit at 2.4 A resolution. Science, 289, 905±920. 33. Klein,D.J., Schmeing,T.M., Moore,P.B. and Steitz,T.A. (2001) The kinkturn: a new RNA secondary structure motif. EMBO J., 20, 4214±4221. 34. Nissen,P., Ippolito,J.A., Ban,N., Moore,P.B. and Steitz,T.A. (2001) RNA tertiary interactions in the large ribosomal subunit: the A-minor motif. Proc. Natl Acad. Sci. USA, 98, 4899±4903. 35. Doudna,J.A. (2000) Structural genomics of RNA. Nature Struct. Biol., 7 (Suppl), 954±956. 36. Klosterman,P.S., Tamura,M., Holbrook,S.R. and Brenner,S.E. (2002) SCOR: a structural classi®cation of RNA database. Nucleic Acids Res., 30, 392±394. 37. Bagley,S.C. and Altman,R.B. (1995) Characterizing the microenvironment surrounding protein sites. Protein Sci., 4, 622±635. 38. Waugh,A., Williams,G.A., Wei,L. and Altman,R.B. (2001) Using meta computing tools to facilitate large-scale analyses of biological databases. Pac. Symp. Biocomput., 6, 360±371. 39. Banatao,D.R., Huang,C.C., Babbitt,P.C., Altman,R.B. and Klein,T.E. (2001) ViewFeature: integrated feature analysis and visualization. Pac. Symp. Biocomput., 6, 240±250. 40. Bagley,S.C. and Altman,R.B. (1996) Conserved features in the active site of nonhomologous serine proteases. Fold Des., 1, 371±379. 41. Wei,L. and Altman,R.B. (1998) Recognizing protein binding sites using statistical descriptions of their 3D environments. Pac. Symp. Biocomput., 3, 497±508. 42. Wei,L., Altman,R.B. and Chang,J.T. (1997) Using the radial distributions of physical features to compare amino acid environments and align amino acid sequences. Pac. Symp. Biocomput., 2, 465±476. 43. Wei,L., Huang,E.S. and Altman,R.B. (1999) Are predicted structures good enough to preserve functional sites? Structure Fold Des., 7, 643± 650. 44. van Buuren,B.N., Schleucher,J. and Wijmenga,S.S. (2000) NMR structural studies on a DNA four-way junction: stacking preferences and localization of the metal-ion binding site. J. Biomol. Struct. Dyn., 237± 244. 45. Liang,M.P., Banatao,D.R., Klein,T.E., Brutlag,D.L. and Altman,R.B. (2003) WebFEATURE: an interactive web tool for identifying and visualizing functional sites on macromolecular structures. Nucleic Acids Res., 31, 3324±3327. 46. Gendron,P., Lemieux,S. and Major,F. (2001) Quantitative analysis of nucleic acid three-dimensional structures. J. Mol. Biol., 308, 919±936. 47. Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577±2637. 48. Huang,C.C., Couch,G.S., Pettersen,E.F. and Ferrin,T.E. (1996) Chimera: an extensible molecular modeling application constructed using standard components. Pac. Symp. Biocomput., 1, 724. 49. Kieft,J.S. and Tinoco,I.,Jr (1997) Solution structure of a metal-binding site in the major groove of RNA complexed with cobalt (III) hexammine. Structure, 5, 713±721. 50. Cate,J.H., Gooding,A.R., Podell,E., Zhou,K., Golden,B.L., Kundrot,C.E., Cech,T.R. and Doudna,J.A. (1996) Crystal structure of a group I ribozyme domain: principles of RNA packing. Science, 273, 1678±1685. 51. Juneau,K., Podell,E., Harrington,D.J. and Cech,T.R. (2001) Structural basis of the enhanced stability of a mutant ribozyme domain and a detailed view of RNA±solvent interactions. Structure, 9, 221±231. 52. Conn,G.L., Draper,D.E., Lattman,E.E. and Gittis,A.G. (1999) Crystal structure of a conserved ribosomal protein±RNA complex. Science, 284, 1171±1174. 53. Conn,G.L., Gittis,A.G., Lattman,E.E., Misra,V.K. and Draper,D.E. (2002) A compact RNA tertiary structure contains a buried backbone-K+ complex. J. Mol. Biol., 318, 963±973. 54. Misra,V.K. and Draper,D.E. (2001) A thermodynamic framework for Mg2+ binding to RNA. Proc. Natl Acad. Sci. USA, 98, 12456±12461. 55. Serra,M.J., Baird,J.D., Dale,T., Fey,B.L., Retatagos,K. and Westhof,E. (2002) Effects of magnesium ions on the stabilization of RNA oligomers of de®ned structures. RNA, 8, 307±323.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Microenvironment analysis and identification of magnesium binding