Download Using Robotic Motion Planning to Fold Proteins and Dock Ligands, European Conference on Computational Biology, October 7, 2002

Document related concepts
no text concepts found
Transcript
Using Robotics to
Fold Proteins and
Dock Ligands
Serkan Apaydin (J.C. Latombe)
Carlos Guestrin (Daphne Koller)
Chris Varma (MIT)
David Hsu (UNC)
Amit Singh
Doug Brutlag
© Doug Brutlag 2002
Why Robotics?
Ligand
?
=
Articulated
Robot
© Doug Brutlag 2002
Simulating Ligand Docking with
Robotic Motion Planning
Articulated Robot
Ligand
© Doug Brutlag 2002
Obstacles in a Workspace
Obstacle seen
by a 0-D
robot
Obstacles seen by fixed orientation 1-D robots
© Doug Brutlag 2002
Work Space vs.. Configuration
Space
θ
θ
(x,y)
x
Work Space
y
Configuration Space
• DOF = 3 : x, y, θ
• 1-D robot in 2-D workspace = 0-D robot in 3-D
configuration space
© Doug Brutlag 2002
Ligand Modeling
x,y,z
α, β
ψ
ψ
ψ
ψ
• Degrees of Freedom (DOF) = 9
– 3 coordinates to position root atom (x,y,z)
– 2 angles to specify first bond (α, β)
– Torsional angles for all remaining nonterminal atoms (ψ )
– Bond angles are assumed constant
– Terminal hydrogens are modeled by
increasing radius of terminal atoms
© Doug Brutlag 2002
Robotic Roadmap Planner
•
•
•
Complete representation of obstacles in high dimensional
configuration space is very difficult
Hence milestones are generated by sampling randomly
from C-space and only accepting samples that are collision
free
Connect milestones to their nearest neighbors with a local
path planner
© Doug Brutlag 2002
Distribution of Samples
© Doug Brutlag 2002
Local Path Planner
•
•
•
•
Connect the two milestones in C-space with a straight line
Discretize the line into small segments such that likelihood of
a collision within a segment is very small
Check for collision at each discretized point along the
straight line path
If there is no collision then a path exists
© Doug Brutlag 2002
Energy of Interaction
Energy = electrostatic interaction (Ec)
+
van der Waals interaction (Ev)
Ec = 332 QiQj/(εRij) Ev = 0.2[(R0/Rij)12 - 2(R0/Rij)6 ]
Ev
Ec
Rij
Rij
© Doug Brutlag 2002
Solvent Effects
Ec = 332 QiQj/(εRij)
• Is only valid for an infinite medium of uniform
dielectric
• Dielectric discontinuities result in induced surface
charges
• Solution: Poisson-Boltzman equation
[ε(r) . φ(r)] - ε(r)k(r)2sinh([φ(r)] + 4πrf(r)/kT = 0
• Can only be solved analytically for simple
dielectric boundaries like spheres and planes
• Finite difference solution by Delphi [Sharp and
Honig, 1990] is based on discretizing the
workspace into a uniform grid
© Doug Brutlag 2002
Computing Energy
•
•
•
Both Ec and Ev are pre-computed on a uniform grid of
0.5 Å resolution
van der Waals interactions are cutoff after 10 Å
Total energy of ligand:
• Energy of interaction of the ligand with the
receptor
• Two lookups into precomputed arrays for Ec and Ev
• Internal energy of the ligand
• Standard van der Waal’s and Coulombic equations
© Doug Brutlag 2002
Grid Points with Energy <= -3 kCal/Mol
(For a single negatively charged Oxygen atom)
© Doug Brutlag 2002
Energy-Based Probabilistic
Roadmaps
configuration space
energy space
• Key Differences:
– Each point in configuration space has an associated
energy
– Randomly generated landmarks are probabilistically
accepted based on energy of the configuration
– Local path planner is energy based such that paths
are weighted proportional to difficulty of motion
© Doug Brutlag 2002
Computing Path Weights
• Need to assign weights to each link in the graph
such that the minimum path weight between any
two nodes corresponds to energetically favorable
motion
energy
∆E1= Ei+1 - Ei
∆E2= Ei -1 - Ei
i-1 i i+1
P(going from i to i+1) =
P(going from i to i-1) =
1/N*e
1/N*e
- ∆E1/kT
- ∆E2/kT
© Doug Brutlag 2002
Local Probabilistic Path Planning
•
Edge Weight = Σ - log (Probability going from i to i+1)
configuration space
energy space
“Difficulty score” of a given path = sum of individual edge weights
along the path
© Doug Brutlag 2002
Results
© Doug Brutlag 2002
Results - 1ldm
Row
number
0
1
2
3
4
5
6
7
8
9
10
RMSD from
catalytic
configuration (Å)
0.00
31.04
27.49
1.73
28.99
24.67
29.84
29.32
27.07
31.00
28.24
Receptor:
Ligand:
Configuration
energy (kcal/mol)
-11.79
-13.65
-12.66
-11.72
-11.54
-11.31
-11.27
-11.04
-10.96
-10.13
-9.97
Avg path weight
entering
configuration
112.98
85.07
90.48
113.81
85.32
86.26
86.49
85.24
81.70
87.69
86.36
Avg path weight
leaving
configuration
134.54
109.94
111.98
137.28
105.19
103.95
107.53
104.64
102.28
104.50
98.89
Lactate Dehydrogenase (2386 atoms, 309 residues)
Oxamate (6 atoms, 7 degrees of freedom)
© Doug Brutlag 2002
Results - 4ts1
Row
number
0
1
2
3
4
5
6
7
8
9
10
RMSD from
catalytic
configuration (Å)
0.00
1.91
21.59
15.16
23.55
20.59
22.19
24.62
19.13
17.05
36.81
Receptor:
Ligand:
Configuration
energy (kcal/mol)
-19.44
-20.31
-15.92
-14.53
-14.39
-14.30
-13.97
-12.89
-12.74
-12.31
-11.81
Avg path weight
entering
configuration
130.73
128.61
105.65
109.82
111.87
114.13
113.84
118.82
115.45
120.24
115.48
Avg path weight
leaving
configuration
173.76
166.73
118.72
129.15
134.96
133.87
135.90
138.15
136.72
142.72
131.98
Tyrosyl-tRNA synthetase (2423 atoms, 319 residues)
Tyrosine (13 atoms, 9 degrees of freedom)
© Doug Brutlag 2002
Results - 1stp
Row
number
0
1
2
3
4
5
6
7
8
9
10
RMSD from
true binding
configuration (Å)
0.00
21.76
27.14
18.59
23.52
13.67
15.18
13.93
14.63
24.64
20.43
Configuration
energy (kcal/mol)
Receptor:
Ligand:
-15.06
-15.79
-12.83
-12.82
-11.45
-11.36
-10.79
-10.68
-10.42
-9.96
-9.87
Avg path weight
entering
configuration
110.80
80.78
96.29
85.84
96.45
86.51
88.22
95.14
85.61
85.71
83.81
Avg path weight
leaving
configuration
146.87
108.42
117.67
101.24
122.01
106.05
96.89
116.92
105.16
105.17
102.54
Streptavidin (901 atoms, 121 residues)
Biotin (16 atoms, 11 degrees of freedom)
© Doug Brutlag 2002
Distinguishing the Catalytic Site
from Other Potential Binding Sites
•
Results indicate the following:
– The catalytic binding site is not necessarily the one with the
lowest ligand energy
– The catalytic binding site is instead characterized by a
distinct energy barrier around the site
– The difficulty of leaving the catalytic site is higher than other
potential binding sites. The difficulty of entering the catalytic
site is also correspondingly higher.
•
Robotics permits calculation of association and dissociation
energies, not just the binding energy.
energy
15-20 kcal/mol
10 -12 kcal/mol
Other Low
Energy Site
Catalytic Binding
Site
10-12 kcal/mol
Other Low
Energy Site
© Doug Brutlag 2002
Advantages of Robotic Approach
• Can sample conformation space and
workspace simultaneously.
• Can use arbitrary energy functions
• Can calculate association and dissociation
energies simultaneously, not just binding
energies.
• Permits estimation of association rates and
dissociation rates.
© Doug Brutlag 2002
Molecular Motion:
Ligand Docking & Protein Folding
Lactate Dehydrogenase
NAD and Oxamate
[Singh et al. ‘99]
HIV integrase
http://foldingathome.stanford.edu
© Doug Brutlag 2002
Simulating Protein Folding
Using Robotics
• Molecule represented
by parameters
• Energy for each
conformation
• Use Monte Carlo(MC)
or Molecular
Dynamics(MD)
simulation to study
molecular motion
α3
© Doug Brutlag 2002
Monte Carlo Simulation
© Doug Brutlag 2002
Monte Carlo Simulation
© Doug Brutlag 2002
Difficulties Calculating
Monte Carlo Simulations
•
•
•
•
Expensive to compute
Gets stuck at local minimum
Single path at a time
Similar problems with molecular
dynamics
© Doug Brutlag 2002
Stochastic Roadmap Simulation
(SRS)
•
•
•
•
Multiple paths at once
No local minimum problem
Molecular properties computed analytically
Monte Carlo simulation converges to the
same distribution as stochastic roadmap
simulation.
© Doug Brutlag 2002
Roadmap Construction
vi
Pij
vj
• Sample nodes from conformation space
• Edge weights are probabilities
© Doug Brutlag 2002
Edge Probabilities
Follow Metropolis criteria:
 exp(−∆E ij /k B T)
, if ∆E ij > 0;

Ni

Pij = 
 1 , otherwise.
 N i
Self transition probabilities:
Pii = 1 − ∑ Pij
j ≠i
Pii
vi
Pij
vj
• Correspond to probabilities in Monte Carlo
simulation
• Different from roadmaps in previous work © Doug Brutlag 2002
Relationship to Monte Carlo
simulation
vs
vg
•
•
•
Each path on graph = a path of Monte Carlo simulation
Roadmap represents many Monte Carlo simulation paths
simultaneously
Proven: SRS and MC converge to the same distribution
© Doug Brutlag 2002
Application:
probability of folding pfold
[Du et al. ‘98]
1- pfold
Unfolded set
pfold
Folded set
© Doug Brutlag 2002
Computing pfold:
The Monte Carlo approach
• For each conformation:
– Perform many MC/MD simulations;
– Count number of folded simulations.
• Too slow for any practical application!
© Doug Brutlag 2002
Roadmap for pfold computation
Folded set
Unfolded set
fi
Want to compute: fi = probability of folding
starting from node i
© Doug Brutlag 2002
First Step Analysis
Consider what happens after one step of simulation:
k
l
m
j
i
Unfolded set
After one step:
But fl=fm=1 :
Folded set
fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm
fi = Pii fi + Pij fj + Pik fk + Pil 1 + Pim 1
© Doug Brutlag 2002
First Step Analysis
Consider what happens after one step of simulation:
• One linear equation for each node;
k
• Find pfold for all nodes in the roadmap
simultaneously.
j
l
m
Folded set
i
Unfolded set
After one step:
fi = Pii fi + Pij fj + Pik fk + Pil fl + Pim fm
But fl=fm=1 :
fi = Pii fi + Pij fj + Pik fk + Pil 1 + Pim 1
© Doug Brutlag 2002
pfold on real protein: 1ROP
• Comparison:
• SRS;
• MC simulation
for 36 starting
conformations.
© Doug Brutlag 2002
Correlation: SRS versus MC
© Doug Brutlag 2002
Correlation: SRS versus MC
© Doug Brutlag 2002
Correlation: SRS versus MC
© Doug Brutlag 2002
Computation time on 1ROP
Monte Carlo:
36 conformations
100 days of
computer time
Over 109 energy
computations
1 hour of
computer time
5000 energy
computations
SRS:
5000 conformations
© Doug Brutlag 2002
Conclusion
•
•
•
Roadmap for analysis of molecular
motion;
Efficiently considers many MC paths
simultaneously;
Application to pfold:
•
•
•
•
More accurate results;
Fewer energy computations;
Many orders of magnitude speed-up.
Other applications:
•
•
Ligand-protein docking;
Order of formation of secondary structure.
© Doug Brutlag 2002
Conclusion
“To conclude, we stress that we do not suggest
using pfold as a transition coordinate for
practical purposes as it is very
computationally intensive.”
Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition
Coordinate for Protein Folding” Journal of Chemical Physics (1998).
We believe that Stochastic Roadmap Simulation
could make computations such as pfold practical
for the first time.
© Doug Brutlag 2002
Ligand-Protein Binding
© Doug Brutlag 2002
Kinetics of binding
Singh et. al. 1999
© Doug Brutlag 2002
Potential Low-Energy Binding
Sites
© Doug Brutlag 2002
Two biological concepts(1):
• Funnel of attraction;
– Energy gradient around a site that guides the
ligand to that site.
© Doug Brutlag 2002
Funnels
© Doug Brutlag 2002
Funnels on Protein Surface
© Doug Brutlag 2002
Funnels in Space
© Doug Brutlag 2002
Ligand-protein modeling
• Ligand modeled as flexible, protein as rigid;
• Electrostatic – VanDerWaals - Solvation free
energies terms considered;
• Used a grid of 0.5A-1A resolution, and solved
Poisson-Boltzmann Equation on this grid;
• Funnels: regions within 10A rmsd of the catalytic
site. [Camacho and Vajda ‘01]
© Doug Brutlag 2002
Previous work
[Singh-Brutlag99]
•Energy is not a good metric to distinguish the catalytic site
•Energy barrier around catalytic site: detected using
average path weight of all the BEST paths.
energy
Catalytic Binding
Site
© Doug Brutlag 2002
Escape time
[Singh et al. ‘99]
energy
Catalytic
Site
Potential binding site
Potential binding site
© Doug Brutlag 2002
Our hypothesis
•
Binding interactions affect escape time;
•
Example:
•
•
Favorable interactions: larger time to leave the
funnel;
Bad steric complementarity: smaller escape time;
•
Energy is not a good discriminator to
determine catalytic site;
•
Will compute escape time with SRS.
© Doug Brutlag 2002
Computational mutagenesis
• Based on site-directed mutagenesis;
• Some amino acids deleted entirely,
replaced by other amino acids, or
sidechains altered;
• Useful to understand binding mechanism,
identifying catalytic residues;
• A new area in computer-aided protein
design.
© Doug Brutlag 2002
Enzyme studied: LDH
• Catalyzes the reduction of pyruvate to lactate,
in the presence of NADH.
A
S
P
1
9
5
L
o
o
p
H
I
S
-A
1S
9P
31
6
6
A
R+
G+
1
0
6
G
L
N
- N
O1 C
0
1
C
TH
R245
O
O
+
A
R
G
-
N
A
D
H
Chemical
environment of
LDH-NADHsubstrate complex
© Doug Brutlag 2002
LDH: Lactate Dehydrogenase,
NAD and Oxamate
© Doug Brutlag 2002
Roadmap data
• Created 4000 samples, biased in energy;
• 100 nodes around the catalytic site;
• average of 20 roadmaps.
© Doug Brutlag 2002
Escape Time Results
Mutant
•Esc
ape
Tim
e
•Wildtype
•3.2
16E
6
•His193
→ Ala•Arg106
→ Ala
•4.1
26E
2
L
o
o
p
A
R+
A
G+
S H P I 1
- S 0
1 A
- 6
9 S
1
5 P
9
-3
1
6
6
G
L
N
THR-245
1 N
O0 C
1
C
N
A
O
O
D
+
H
A
R
G
1
6
© Doug Brutlag 2002
Distinguishing the catalytic site
• Goal: Given some potential binding sites,
predict which one is the catalytic site.
© Doug Brutlag 2002
Roadmap data
• Initial sampling of the space to pick 4
potential binding sites;
• Samples N nodes biased in energy;
• Sampled extra M nodes around both the
catalytic site and potential binding sites;
• geometric average of 20 roadmaps.
© Doug Brutlag 2002
Complexes Studied
l
i
g
a
n
d
(M = 100)
© Doug Brutlag 2002
Energies
P
r
o
t
e
i
n
B
o
u
n
d
s
t
a
t
e
(kcal/mol)
B
e
Able
to
s
distinguish
t
catalytic
p
osite
t able
Not
e
n
t
i
a © Doug Brutlag 2002
Escape times
P
r
o
t
e
i
n
(# steps)
B
o
u
n
d
s
t
a
t
e
B
e
Able to
s
distinguish
t
catalytic
p
site
o
t Not able
e
n
t
i
© Doug Brutlag 2002
Summary of results
• Computational mutagenesis :
– Results agree with biological interpretation;
– Computing the rate at which the molecules
diffuse away from each other;
• Ligand-protein complexes :
– 5/7 cases, escape time largest for catalytic
site;
– Hardest to escape from the funnel of attraction
of the catalytic site.
© Doug Brutlag 2002
Conclusion
• Studied ligand-protein binding with SRS,
computed the expected time to escape from the
catalytic site
• Mutagenesis study agreed well with biological
validation
• Escape time is a good discriminator for the
catalytic site
• Future work: Binding time study
• Future work: Connection between escape time
and association/dissociation constants
© Doug Brutlag 2002