Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Using Robotics to Fold Proteins and Dock Ligands Serkan Apaydin (J.C. Latombe) Carlos Guestrin (Daphne Koller) Chris Varma (MIT) David Hsu (UNC) Amit Singh Doug Brutlag © Doug Brutlag 2002 Why Robotics? Ligand ? = Articulated Robot © Doug Brutlag 2002 Simulating Ligand Docking with Robotic Motion Planning Articulated Robot Ligand © Doug Brutlag 2002 Obstacles in a Workspace Obstacle seen by a 0-D robot Obstacles seen by fixed orientation 1-D robots © Doug Brutlag 2002 Work Space vs.. Configuration Space θ θ (x,y) x Work Space y Configuration Space • DOF = 3 : x, y, θ • 1-D robot in 2-D workspace = 0-D robot in 3-D configuration space © Doug Brutlag 2002 Ligand Modeling x,y,z α, β ψ ψ ψ ψ • Degrees of Freedom (DOF) = 9 – 3 coordinates to position root atom (x,y,z) – 2 angles to specify first bond (α, β) – Torsional angles for all remaining nonterminal atoms (ψ ) – Bond angles are assumed constant – Terminal hydrogens are modeled by increasing radius of terminal atoms © Doug Brutlag 2002 Robotic Roadmap Planner • • • Complete representation of obstacles in high dimensional configuration space is very difficult Hence milestones are generated by sampling randomly from C-space and only accepting samples that are collision free Connect milestones to their nearest neighbors with a local path planner © Doug Brutlag 2002 Distribution of Samples © Doug Brutlag 2002 Local Path Planner • • • • Connect the two milestones in C-space with a straight line Discretize the line into small segments such that likelihood of a collision within a segment is very small Check for collision at each discretized point along the straight line path If there is no collision then a path exists © Doug Brutlag 2002 Energy of Interaction Energy = electrostatic interaction (Ec) + van der Waals interaction (Ev) Ec = 332 QiQj/(εRij) Ev = 0.2[(R0/Rij)12 - 2(R0/Rij)6 ] Ev Ec Rij Rij © Doug Brutlag 2002 Solvent Effects Ec = 332 QiQj/(εRij) • Is only valid for an infinite medium of uniform dielectric • Dielectric discontinuities result in induced surface charges • Solution: Poisson-Boltzman equation [ε(r) . φ(r)] - ε(r)k(r)2sinh([φ(r)] + 4πrf(r)/kT = 0 • Can only be solved analytically for simple dielectric boundaries like spheres and planes • Finite difference solution by Delphi [Sharp and Honig, 1990] is based on discretizing the workspace into a uniform grid © Doug Brutlag 2002 Computing Energy • • • Both Ec and Ev are pre-computed on a uniform grid of 0.5 Å resolution van der Waals interactions are cutoff after 10 Å Total energy of ligand: • Energy of interaction of the ligand with the receptor • Two lookups into precomputed arrays for Ec and Ev • Internal energy of the ligand • Standard van der Waal’s and Coulombic equations © Doug Brutlag 2002 Grid Points with Energy <= -3 kCal/Mol (For a single negatively charged Oxygen atom) © Doug Brutlag 2002 Energy-Based Probabilistic Roadmaps configuration space energy space • Key Differences: – Each point in configuration space has an associated energy – Randomly generated landmarks are probabilistically accepted based on energy of the configuration – Local path planner is energy based such that paths are weighted proportional to difficulty of motion © Doug Brutlag 2002 Computing Path Weights • Need to assign weights to each link in the graph such that the minimum path weight between any two nodes corresponds to energetically favorable motion energy ∆E1= Ei+1 - Ei ∆E2= Ei -1 - Ei i-1 i i+1 P(going from i to i+1) = P(going from i to i-1) = 1/N*e 1/N*e - ∆E1/kT - ∆E2/kT © Doug Brutlag 2002 Local Probabilistic Path Planning • Edge Weight = Σ - log (Probability going from i to i+1) configuration space energy space “Difficulty score” of a given path = sum of individual edge weights along the path © Doug Brutlag 2002 Results © Doug Brutlag 2002 Results - 1ldm Row number 0 1 2 3 4 5 6 7 8 9 10 RMSD from catalytic configuration (Å) 0.00 31.04 27.49 1.73 28.99 24.67 29.84 29.32 27.07 31.00 28.24 Receptor: Ligand: Configuration energy (kcal/mol) -11.79 -13.65 -12.66 -11.72 -11.54 -11.31 -11.27 -11.04 -10.96 -10.13 -9.97 Avg path weight entering configuration 112.98 85.07 90.48 113.81 85.32 86.26 86.49 85.24 81.70 87.69 86.36 Avg path weight leaving configuration 134.54 109.94 111.98 137.28 105.19 103.95 107.53 104.64 102.28 104.50 98.89 Lactate Dehydrogenase (2386 atoms, 309 residues) Oxamate (6 atoms, 7 degrees of freedom) © Doug Brutlag 2002 Results - 4ts1 Row number 0 1 2 3 4 5 6 7 8 9 10 RMSD from catalytic configuration (Å) 0.00 1.91 21.59 15.16 23.55 20.59 22.19 24.62 19.13 17.05 36.81 Receptor: Ligand: Configuration energy (kcal/mol) -19.44 -20.31 -15.92 -14.53 -14.39 -14.30 -13.97 -12.89 -12.74 -12.31 -11.81 Avg path weight entering configuration 130.73 128.61 105.65 109.82 111.87 114.13 113.84 118.82 115.45 120.24 115.48 Avg path weight leaving configuration 173.76 166.73 118.72 129.15 134.96 133.87 135.90 138.15 136.72 142.72 131.98 Tyrosyl-tRNA synthetase (2423 atoms, 319 residues) Tyrosine (13 atoms, 9 degrees of freedom) © Doug Brutlag 2002 Results - 1stp Row number 0 1 2 3 4 5 6 7 8 9 10 RMSD from true binding configuration (Å) 0.00 21.76 27.14 18.59 23.52 13.67 15.18 13.93 14.63 24.64 20.43 Configuration energy (kcal/mol) Receptor: Ligand: -15.06 -15.79 -12.83 -12.82 -11.45 -11.36 -10.79 -10.68 -10.42 -9.96 -9.87 Avg path weight entering configuration 110.80 80.78 96.29 85.84 96.45 86.51 88.22 95.14 85.61 85.71 83.81 Avg path weight leaving configuration 146.87 108.42 117.67 101.24 122.01 106.05 96.89 116.92 105.16 105.17 102.54 Streptavidin (901 atoms, 121 residues) Biotin (16 atoms, 11 degrees of freedom) © Doug Brutlag 2002 Distinguishing the Catalytic Site from Other Potential Binding Sites • Results indicate the following: – The catalytic binding site is not necessarily the one with the lowest ligand energy – The catalytic binding site is instead characterized by a distinct energy barrier around the site – The difficulty of leaving the catalytic site is higher than other potential binding sites. The difficulty of entering the catalytic site is also correspondingly higher. • Robotics permits calculation of association and dissociation energies, not just the binding energy. energy 15-20 kcal/mol 10 -12 kcal/mol Other Low Energy Site Catalytic Binding Site 10-12 kcal/mol Other Low Energy Site © Doug Brutlag 2002 Advantages of Robotic Approach • Can sample conformation space and workspace simultaneously. • Can use arbitrary energy functions • Can calculate association and dissociation energies simultaneously, not just binding energies. • Permits estimation of association rates and dissociation rates. © Doug Brutlag 2002 Molecular Motion: Ligand Docking & Protein Folding Lactate Dehydrogenase NAD and Oxamate [Singh et al. ‘99] HIV integrase http://foldingathome.stanford.edu © Doug Brutlag 2002 Simulating Protein Folding Using Robotics • Molecule represented by parameters • Energy for each conformation • Use Monte Carlo(MC) or Molecular Dynamics(MD) simulation to study molecular motion α3 © Doug Brutlag 2002 Monte Carlo Simulation © Doug Brutlag 2002 Monte Carlo Simulation © Doug Brutlag 2002 Difficulties Calculating Monte Carlo Simulations • • • • Expensive to compute Gets stuck at local minimum Single path at a time Similar problems with molecular dynamics © Doug Brutlag 2002 Stochastic Roadmap Simulation (SRS) • • • • Multiple paths at once No local minimum problem Molecular properties computed analytically Monte Carlo simulation converges to the same distribution as stochastic roadmap simulation. © Doug Brutlag 2002 Roadmap Construction vi Pij vj • Sample nodes from conformation space • Edge weights are probabilities © Doug Brutlag 2002 Edge Probabilities Follow Metropolis criteria: exp(−∆E ij /k B T) , if ∆E ij > 0; Ni Pij = 1 , otherwise. N i Self transition probabilities: Pii = 1 − ∑ Pij j ≠i Pii vi Pij vj • Correspond to probabilities in Monte Carlo simulation • Different from roadmaps in previous work © Doug Brutlag 2002 Relationship to Monte Carlo simulation vs vg • • • Each path on graph = a path of Monte Carlo simulation Roadmap represents many Monte Carlo simulation paths simultaneously Proven: SRS and MC converge to the same distribution © Doug Brutlag 2002 Application: probability of folding pfold [Du et al. ‘98] 1- pfold Unfolded set pfold Folded set © Doug Brutlag 2002 Computing pfold: The Monte Carlo approach • For each conformation: – Perform many MC/MD simulations; – Count number of folded simulations. • Too slow for any practical application! © Doug Brutlag 2002 Roadmap for pfold computation Folded set Unfolded set fi Want to compute: fi = probability of folding starting from node i © Doug Brutlag 2002 First Step Analysis Consider what happens after one step of simulation: k l m j i Unfolded set After one step: But fl=fm=1 : Folded set fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm fi = Pii fi + Pij fj + Pik fk + Pil 1 + Pim 1 © Doug Brutlag 2002 First Step Analysis Consider what happens after one step of simulation: • One linear equation for each node; k • Find pfold for all nodes in the roadmap simultaneously. j l m Folded set i Unfolded set After one step: fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm But fl=fm=1 : fi = Pii fi + Pij fj + Pik fk + Pil 1 + Pim 1 © Doug Brutlag 2002 pfold on real protein: 1ROP • Comparison: • SRS; • MC simulation for 36 starting conformations. © Doug Brutlag 2002 Correlation: SRS versus MC © Doug Brutlag 2002 Correlation: SRS versus MC © Doug Brutlag 2002 Correlation: SRS versus MC © Doug Brutlag 2002 Computation time on 1ROP Monte Carlo: 36 conformations 100 days of computer time Over 109 energy computations 1 hour of computer time 5000 energy computations SRS: 5000 conformations © Doug Brutlag 2002 Conclusion • • • Roadmap for analysis of molecular motion; Efficiently considers many MC paths simultaneously; Application to pfold: • • • • More accurate results; Fewer energy computations; Many orders of magnitude speed-up. Other applications: • • Ligand-protein docking; Order of formation of secondary structure. © Doug Brutlag 2002 Conclusion “To conclude, we stress that we do not suggest using pfold as a transition coordinate for practical purposes as it is very computationally intensive.” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate for Protein Folding” Journal of Chemical Physics (1998). We believe that Stochastic Roadmap Simulation could make computations such as pfold practical for the first time. © Doug Brutlag 2002 Ligand-Protein Binding © Doug Brutlag 2002 Kinetics of binding Singh et. al. 1999 © Doug Brutlag 2002 Potential Low-Energy Binding Sites © Doug Brutlag 2002 Two biological concepts(1): • Funnel of attraction; – Energy gradient around a site that guides the ligand to that site. © Doug Brutlag 2002 Funnels © Doug Brutlag 2002 Funnels on Protein Surface © Doug Brutlag 2002 Funnels in Space © Doug Brutlag 2002 Ligand-protein modeling • Ligand modeled as flexible, protein as rigid; • Electrostatic – VanDerWaals - Solvation free energies terms considered; • Used a grid of 0.5A-1A resolution, and solved Poisson-Boltzmann Equation on this grid; • Funnels: regions within 10A rmsd of the catalytic site. [Camacho and Vajda ‘01] © Doug Brutlag 2002 Previous work [Singh-Brutlag99] •Energy is not a good metric to distinguish the catalytic site •Energy barrier around catalytic site: detected using average path weight of all the BEST paths. energy Catalytic Binding Site © Doug Brutlag 2002 Escape time [Singh et al. ‘99] energy Catalytic Site Potential binding site Potential binding site © Doug Brutlag 2002 Our hypothesis • Binding interactions affect escape time; • Example: • • Favorable interactions: larger time to leave the funnel; Bad steric complementarity: smaller escape time; • Energy is not a good discriminator to determine catalytic site; • Will compute escape time with SRS. © Doug Brutlag 2002 Computational mutagenesis • Based on site-directed mutagenesis; • Some amino acids deleted entirely, replaced by other amino acids, or sidechains altered; • Useful to understand binding mechanism, identifying catalytic residues; • A new area in computer-aided protein design. © Doug Brutlag 2002 Enzyme studied: LDH • Catalyzes the reduction of pyruvate to lactate, in the presence of NADH. A S P 1 9 5 L o o p H I S -A 1S 9P 31 6 6 A R+ G+ 1 0 6 G L N - N O1 C 0 1 C TH R245 O O + A R G - N A D H Chemical environment of LDH-NADHsubstrate complex © Doug Brutlag 2002 LDH: Lactate Dehydrogenase, NAD and Oxamate © Doug Brutlag 2002 Roadmap data • Created 4000 samples, biased in energy; • 100 nodes around the catalytic site; • average of 20 roadmaps. © Doug Brutlag 2002 Escape Time Results Mutant •Esc ape Tim e •Wildtype •3.2 16E 6 •His193 → Ala•Arg106 → Ala •4.1 26E 2 L o o p A R+ A G+ S H P I 1 - S 0 1 A - 6 9 S 1 5 P 9 -3 1 6 6 G L N THR-245 1 N O0 C 1 C N A O O D + H A R G 1 6 © Doug Brutlag 2002 Distinguishing the catalytic site • Goal: Given some potential binding sites, predict which one is the catalytic site. © Doug Brutlag 2002 Roadmap data • Initial sampling of the space to pick 4 potential binding sites; • Samples N nodes biased in energy; • Sampled extra M nodes around both the catalytic site and potential binding sites; • geometric average of 20 roadmaps. © Doug Brutlag 2002 Complexes Studied l i g a n d (M = 100) © Doug Brutlag 2002 Energies P r o t e i n B o u n d s t a t e (kcal/mol) B e Able to s distinguish t catalytic p osite t able Not e n t i a © Doug Brutlag 2002 Escape times P r o t e i n (# steps) B o u n d s t a t e B e Able to s distinguish t catalytic p site o t Not able e n t i © Doug Brutlag 2002 Summary of results • Computational mutagenesis : – Results agree with biological interpretation; – Computing the rate at which the molecules diffuse away from each other; • Ligand-protein complexes : – 5/7 cases, escape time largest for catalytic site; – Hardest to escape from the funnel of attraction of the catalytic site. © Doug Brutlag 2002 Conclusion • Studied ligand-protein binding with SRS, computed the expected time to escape from the catalytic site • Mutagenesis study agreed well with biological validation • Escape time is a good discriminator for the catalytic site • Future work: Binding time study • Future work: Connection between escape time and association/dissociation constants © Doug Brutlag 2002