Download Determination of Enzymatic Reaction Pathways Using QM/MM

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Particle in a box wikipedia , lookup

Wave–particle duality wikipedia , lookup

Theoretical and experimental justification for the Schrödinger equation wikipedia , lookup

Interpretations of quantum mechanics wikipedia , lookup

EPR paradox wikipedia , lookup

Quantum key distribution wikipedia , lookup

Max Born wikipedia , lookup

Molecular orbital wikipedia , lookup

History of quantum field theory wikipedia , lookup

Quantum teleportation wikipedia , lookup

Quantum machine learning wikipedia , lookup

Quantum state wikipedia , lookup

Canonical quantization wikipedia , lookup

Quantum group wikipedia , lookup

Symmetry in quantum mechanics wikipedia , lookup

Hidden variable theory wikipedia , lookup

Molecular Hamiltonian wikipedia , lookup

T-symmetry wikipedia , lookup

Bohr model wikipedia , lookup

Atom wikipedia , lookup

Atomic orbital wikipedia , lookup

Hydrogen atom wikipedia , lookup

Tight binding wikipedia , lookup

Electron configuration wikipedia , lookup

Chemical bond wikipedia , lookup

Atomic theory wikipedia , lookup

Transcript
Determination of Enzymatic Reaction
Pathways Using QM/MM Methods
GÉRALD MONARD,1 XAVIER PRAT-RESINA,2
ANGELS GONZÁLEZ-LAFONT,2 JOSÉ M. LLUCH2
1
Equipe de Chimie et Biochimie Théorique, UMR 7675 CNRS-UHP-INPL, Université Henry
Poincaré Nancy I, Faculté des Sciences—B.P. 239, F-54506 Vandœuvre-lès-Nancy, France
2
Unitat de Quı́mica Fı́sica, Departament de Quı́mica, Universitat Autònoma de Barcelona,
08193 Bellaterra, Barcelona, Spain
Received 25 May 2002; accepted 8 January 2003
DOI 10.1002/qua.10555
ABSTRACT: Enzymes are among the most powerful known catalysts.
Understanding the functions of these proteins is one of the central goals of
contemporary chemistry and biochemistry. But, because these systems are large they are
difficult to handle using standard theoretical chemistry tools. In the last 10 years, we
have seen the rapid development of so-called QM/MM methods that combined
quantum chemistry and molecular mechanics to elucidate the structure and functions of
systems with many degrees of freedom, including enzymatic systems. In this article, we
review the numerical aspects of QM/MM methods applied to enzymes: The energy
definition, the special treatment of the covalent QM/MM frontiers, and the exploration
of QM/MM potential energy surface. A special emphasis is made on the use of local
self-consistent field and rational function optimization. © 2003 Wiley Periodicals, Inc. Int J
Quantum Chem 93: 229 –244, 2003
Key words: QM/MM methods; geometry optimization; transition-state search;
enzyme catalysis
1. Introduction
L
ife implies the ability for a complex and highly
organized organism to synthesize chemicals
through metabolic processes into structures that
have defined purposes [1]. Most of the reactions
Correspondence to: G. Monard; e-mail: gmonard@lctn.
u-nancy.fr
International Journal of Quantum Chemistry, Vol 93, 229 –244 (2003)
© 2003 Wiley Periodicals, Inc.
carried out in these organisms are mediated by a
series of remarkable biologic catalysts known as
enzymes. These enzymes are proteins, which differ
from ordinary chemical catalysts in several important aspects: Their high kinetic rates compared with
corresponding uncatalyzed reactions; the relative
mild reaction conditions under which the catalyzed
reactions can occur (i.e., temperature below 100°C,
atmospheric pressure, nearly neutral pH, etc.); their
great reaction specifity, which enables them to se-
MONARD ET AL.
lect reactants and transform the latter into welldefined chemical products (i.e., enzymatic reactions
rarely have side products); and their capacity for
regulation in response to the concentration of substances other than their substrates. Elucidating and
mastering enzymatic reactions is one of the most
thrilling challenge facing contemporary chemistry
and biochemistry. Broad range of applications can
follow from it, going from the development of new
drugs to the design of new protein-based catalysts.
Contributions of computational chemistry can be
a determining factor in understanding enzymatic
reactivity because theoretical tools can give molecular-level insights into enzyme catalysis, which can
be difficult to obtain by experimental means. The
main problem a theoretical chemists will face to
determine realistic enzymatic reaction pathways is
to use a proper modeling approach. For a long time,
most researchers confined the study of enzyme reactivity to models containing only a few representative atoms (i.e., those believed to contribute
mostly to the reactivity) either inserted or not in a
cavity representing the electrostatic effect of surrounding enzyme and aqueous environment [2, 3].
This drastic limitation in the size of model systems
was mainly due to both the limited computing
power available and the necessity of using quantum chemistry to access the making and breaking of
bonds that usually appear in enzymatic reactions.
The main advantage of this type of approach was to
answer what we could call the intrinsic reactivity of
a system: Putting some atoms together in a defined
position will transcribe into a possible reactivity or
not (e.g., a nucleophile can react onto an electrophile, whereas two nucleophiles will not react together). However, these studies were not able to
account for the main specificity of enzymes as described above. For example, they cannot explain the
differences in activity of different enzymes that
bear the same active site (i.e., the model systems are
identical, but experimental results usually show
different kinetics). They also cannot explain the
catalytic effect of an enzyme compared to the same
reaction in aqueous solution.
Several answers to these problems have been
suggested in the literature. They can be divided into
three main groups:
1. The empirical valance bond (EVB) method [4,
5] from Warshel and coworkers, in which a
chemical reaction is described using a valence
bond approach, i.e., the system wave function
is represented by a linear combination of the
230
most important ionic and covalent resonance
forms and the potential energy is found by
solving the related secular equation. The electronic interaction Hamiltonian is built using
parameter terms extracted from empirical values and/or ab initio surfaces [6].
The main advantage of the EVB approach is
its ability to give good quantitative results in
comparison with experiment as long as the
incorporated empirical terms are carefully
chosen. This is mainly accomplished by first
calibrating free energy surfaces from reference reactions in solution before incorporating the enzyme effects. However, the choice of
correct EVB parameters is crucial and can also
be seen as a disadvantage of EVB methods: In
case one has not properly defined the valence
bond forms (i.e., the most prevalent ionic and
covalent forms), one can miss either unusual
reaction pathways that can occur in reactive
chemical systems or a chemical reaction not
previously introduced in the valence bond
forms.
2. The linear scaling approach [7, 8], which
changes the way quantum calculations are
done, enabling computations on large systems. Numerous examples of calculations on
systems with more than 1000 atoms have been
reported [9 –11]. However, while this new
methodology seems promising, the CPU time
involved in today’s calculations only allow for
single energy point calculations. Some improvements, both in linear scaling algorithms
and computing power, are still needed to address useful full quantum statistical simulations.
3. The combined quantum mechanics/molecular mechanics (QM/MM) [12–17] methodology, where the small reactive part of a chemical system is described by QM, whereas the
remaining large nonreactive part is described
by MM. This last methodology is today the
most used to address the reactivity of biochemical systems.
The main advantage of QM/MM methods
is its easy implementation in computational
codes while giving good chemical results. Its
main disadvantage, especially in enzymatic
systems, is to go beyond qualitative results
and, thus, obtain quantitative numbers out of
QM/MM computations. This problem is
mainly due to three factors: (1) The need for
VOL. 93, NO. 3
ENZYMATIC REACTION PATHWAYS
good ab initio description for the QM part,
whereas the usual size of the QM part mostly
only allow for semiempirical calculations; (2)
the need for accessing free energy numbers
through extensive sampling that is (too) computationally expensive; (3) the difficult calibration of the interaction between the QM
part and the MM part, especially in biochemical systems as mentioned thereafter.
The first two factors are related to actual
computational bottlenecks and should be
overpassed in the near future.
In this article, we review QM/MM methodology
as used to describe biochemical systems. We first
define the QM/MM energy and emphasize the
problem of the interactions between the QM and
MM parts. Second, we present the problem of the
cutting of covalent bonds in biochemical systems
and its possible solutions, in particular the local
self-consistent field (LSCF) method. In a third part
we show what can be done with a potential energy
surface as defined by QM/MM methodology and
especially how to locate efficiently transition-state
structures and energy minima on large molecules.
H QM ⫽ ⫺
1
2
冘
electrons
冘 冘 Zr
electrons nuclei
K
⌬i ⫺
i
i
冘 冘
electrons electrons
⫹
i
iK
K
1
⫹
rij
i⬎j
冘 冘 ZR Z .
nuclei nuclei
K
K
L
(2)
KL
K⬎L
The first term defines the kinetic energy of electrons i, the second term expresses the electron–
nuclei attraction between electrons i and nuclei K of
charges ZK, the third term is the electron– electron
repulsion, and the last term defines the nuclei–
nuclei repulsion.
In Eq. (1), the Hamiltonian HMM describes the
classic part. As usually defined in molecular mechanics, a set of atoms interacting in this part can be
seen as a set of point charges {Qc} in space interacting through a defined force field. For example, if we
choose the AMBER force field [18] we have
H MM ⫽
冘 k 共r ⫺ r 兲 ⫹ 冘 k 共␪ ⫺ ␪ 兲
r
0
2
␪
bonds
⫹
0
2
angles
冘 冘 V2 关1 ⫹ cos共n␾ ⫺ ␻兲兴
n
dihedrals n
2. Energy Definition
⫹
共i, j兲
2.1. SPLITTING THE SYSTEM
In most reactive systems, the number of atoms
involved in a chemical reaction is fairly limited (i.e.,
whose electronic properties are changed during the
reaction); the rest of the atoms may have a strong
influence on the reaction, but this is usually limited
to short- and long-range nonbonded interactions
that can be represented through both electrostatic
and van der Waals interactions. The main idea of
QM/MM methodology [12, 14] is to split the chemical system into two parts: The first is small and
described by quantum mechanics (it is called the
quantum part); the second is the rest of the system
and is described by molecular mechanics (it is
called the classic part). The full Hamiltonian can
therefore be expressed as
H ⫽ HQM ⫹ HMM ⫹ HQM/MM,
冘 冋RA ⫺ RB ⫹ ⑀qRq 册.
ij
6
ij
i j
(3)
ij
The last term HQM/MM in Eq. (1) represents the
interaction between the quantum part and the classic one. It can be represented as the sum of two
terms: A van der Waals term describing the nonelectrostatic interactions between quantum and
classic atoms and an electrostatic term describing
the interaction between a classic point charge {Qc}
and the electrons and nuclei of the quantum part:
冘 冘
electrons classical
H QM/MM ⫽ V
van der Waals
QM/MM
⫺
i
C
QC
riC
冘 冘
nuclei classical
⫹
K
(1)
where HQM is the Hamiltonian describing the quantum atoms. In the Born–Oppenheimer approximation, it can be defined by
ij
12
ij
C
ZKQC
.
RKC
(4)
We can group the terms in Eqs. (2), (3), and (4)
depending on whether they describe electrons on
the quantum part (Helec) or not (Hnon-elec). This gives
INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY
231
MONARD ET AL.
1
H elec ⫽ ⫺
2
冘
electrons
冘 冘
electrons nuclei
⌬i ⫺
i
i
冘 冘
electrons electrons
⫹
i
i⬎j
K
1
⫺
rij
ZK
riK
冘 冘
electrons classical
i
QC
(5)
riC
C
and
van der Waals
H non-elec ⫽ HMM ⫹ VQM/MM
冘 冘
nuclei classical
⫹
K
C
ZKQC
⫹
RKC
冘 冘 ZR Z
nuclei nuclei
K
K
K⬎L
L
KL
van der Waals
nuclei
⫽ H MM ⫹ VQM/MM
⫹ VQM⫹QM/MM
.
(6)
nuclei
Helec and VQM⫹QM/MM
can be computed using a
standard quantum mechanics code. The term describing the electrons– classic charge interaction is
incorporated into the core Hamiltonian of the quantum subsystem. Examples of QM/MM studies in
the literature have used popular quantum mechanics codes such as MOPAC [19], Gaussian [20],
Gamess [21], etc.
van der Waals
HMM and VQM/MM
are computed using standard molecular mechanics code and are relatively
easy to implement. Examples in the literature include the use of codes like AMBER [18], CHARMM
[22], GROMOS [23], etc.
2.2. QM/MM INTERACTIONS
The main problem arising from the use of
QM/MM methods is the calibration of the interaction between the quantum and the classic part. As
stated previously, this interaction can be divided
into two parts: An electrostatic and a nonelectrostatic interaction. The influence of the classic part
on the reactive part is usually described using point
charges and van der Waals potentials, which
should reproduce quantitatively the interaction between the QM and MM parts as if the system was
computed fully quantum mechanically [14, 24, 25].
The quantitative reproduction of the QM/MM interactions depends on three points: (1) The choice of
the set of nonpolarizable point charges {QC} or
more in general of the MM force field [26]; (2) the
choice of the van der Waals parameters to describe
van der Waals
VQM/MM
; (3) the way the classic charges polarize the quantum subsystem.
232
The set of point charges {QC} must be chosen to
reproduce the electrostatic field due to the MM part
onto the QM part. It is usually a good approximation to take the charge definition from an empirical
force field and incorporate those charges into the
core Hamiltonian of the quantum subsystem. This
gives reliable results because the charges in molecular mechanics are defined to reproduce electrostatic potential properly. However, it is well known
that between different force fields the charge definition on atoms can change dramatically, even between different generations of the same force field
[27]. These differences between different sets of
charges should have a nonnegligible impact on the
quality of a QM/MM study. Although this point
seems important, to the best of our knowledge a
publication dealing with the use of different charge
sets on the quality of an QM/MM computation in
enzymatic systems was still lacking.
The choice of van der Waals parameters is another crucial point in the production of a good
QM/MM interaction. In theory, a new set of parameters should be defined for each atom in the system
van der Waals
to exclusively compute VQM/MM
. In practice,
one uses the van der Waals parameters from the
current force field definition for each atom in the
MM part and, when possible, defines a new set of
van der Waals parameters for the QM atoms. This
set of parameters is only available at a defined QM
level (e.g., AM1, PM3, 6-31G, etc.) and in conjunction with a defined force field. This approach has
been used by Luque et al. [28] and Cummins et al.
[29] for modeling reactions in solution, but to the
best of our knowledge this has not been done in
enzymatic systems. This is fairly understandable as
it is easier to optimize a few van der Waals parameters for a small set of atoms in solution than to
optimize a full set of van der Waals parameters
compatible with a given force field at a given QM
level. Thus, most of the literature devoted to
QM/MM studies of biochemical systems have used
van der Waals parameters coming from standard
force fields like CHARMM or AMBER [30 –32].
A third point needing to be clarified in the description of QM/MM interactions is the way the
classic set of charges {QC} polarizes the electronic
wave function from the QM subsystem. This is
usually done by adding to the core Hamiltonian of
the QM part a perturbation describing the interaction between the QM electrons and the classic
charges:
VOL. 93, NO. 3
ENZYMATIC REACTION PATHWAYS
冘 冘
electrons classical
Hⴕ
core
⫽H
core
⫺
i
C
QC
.
ric
(7)
In Ref. [28], Luque et al. have shown one of the
following conditions was to be fulfilled to reproduce correctly the electrostatic interaction between
a classic atom C and a quantum atom K:
At the ab initio level, an element of the core
Hamiltonian matrix is expressed as
C
1. (␳K
x )x⫽0,1,2 ⫽ (␳x )x⫽0,1,2 ⫽ 0, f(K, C) ⫽ 1, and
g(K, C) ⫽ 0 (i.e., Vcharge–nuclei ⫽ ZKQC/RKC)
2. (␳C
1 ⫾ e⫺␣KRKC, and g(K,
x )x⫽0,1,2 ⫽ 0, f(K, C) ⫽
⫺bi,K(RKC⫺ci,K)2
C) ⫽ ¥i ai,Ke
.
core
H⬘ ␮␯
⫽ 具␮兩Hⴕcore兩␯典
core
⫽ H ␮␯
⫺
冘 冘 冓␮冏Qr 冏␯冔.
C
i
(8)
iC
C
The second term in the computation is similar to the
QM electron–nuclei interaction and can be straightforwardly computed in the same way. Likewise, the
nuclei
classical
term ¥K
¥C
(ZKQC/RKC) from Eq. (4) is
computed similarly as QM nuclei–nuclei interactions.
However, with NDDO semiempirical methods,
Luque et al. [28] have shown electron– classic
charges and nuclei– classic charges interactions
should not be treated the same way as electron–
nuclei and nuclei–nuclei interactions.
In AM1 [33] or PM3 [34], the matrix element of
the core Hamiltonian describing the electron–nuclei
interaction between the electrons projected onto
two atomic orbitals ␮ and ␯ centered on a quantum
atom K and all other quantum atoms L ⫽ K is
expressed as
electrons–nuclei
H ␮␯
⫽
冘P
␮␯
ZL共␮␯兩sLsL兲,
(9)
L⫽K
where (␮␯兩sLsL) is a two-center two electron integral
depending on electronic parameters (␳K
x )x⫽0,1,2 and
(␳Lx )x⫽0,1,2. The core– core interaction between K and
L is expressed as
冋
V nuclei–nuclei ⫽ ZKZL 共sKsK兩sLsL兲 f共K, L兲 ⫹
册
1
g共K, L兲 ,
RKL
(10)
with
f共K, L兲 ⫽ 1 ⫾ e ⫺ ␣ KR KL ⫾ e ⫺ ␣ LR KL
(11)
and
g共K, L兲 ⫽
冘a
i
e ⫺b i,K共R KL⫺c i,K兲 ⫹
2
i,K
冘a
e ⫺b j,L共R KL⫺c j,L兲 .
2
j,L
j
(12)
As outlined above, it is also necessary with
semiempirical models to reparameterize van der
Waals potentials between the classic and quantum
atoms to reproduce properly the nonbonded interactions. The forms and coefficients of these new van
der Waals potentials are different whether solution
1 or 2 is chosen [28].
3. Cutting Covalent Bonds
3.1. DIFFERENT SOLUTIONS
The main concepts of QM/MM methods defined
in the preceeding section (i.e., splitting a molecular
system in two parts and ensuring a proper interaction between them) has proved successful especially in the study of chemical reactions in solution.
Usually, in these systems the reactants (a set of
small molecules) are described by quantum mechanics, whereas the solvent (water, methanol, etc.)
is described by molecular mechanics using polarizable or nonpolarizable [15, 32] force fields. There,
the delineation between the quantum and classic
parts is clearly defined as a molecule is exclusively
in one of the subsystems. However, in enzymatic
systems composed of an enzyme, its substrate,
sometimes a cofactor, and the surrounding solvent,
it is not possible to include the whole protein in the
quantum subsystem due to computational bottleneck. It is therefore necessary to define a small
subset of atoms (i.e., the reactive ones) that will be
incorporated into the quantum part, whereas the
others will be part of classic subsystem. Some covalent bonds are then at the frontier between the
classic and quantum parts. They link what we call a
quantum frontier atom denoted X in the rest of this
article with a classic frontier atom we denote Y (see
Fig. 1). A problem occurs at this frontier because the
electron of X involved in the covalent bond with Y
is not paired with any other electron because in
molecular mechanics the electrons (of Y) are not
INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY
233
MONARD ET AL.
FIGURE 1. Example of the problem of cutting covalent bonds in QM/MM methods: A covalent COC bond
at the frontier between a classic and a quantum part.
explicitly represented. Thus, this electron needs a
special treatment.
Several solutions have been suggested in the literature. They can be divided into two main categories: Those that add an atom or pseudoatom to fill
the valencies of the quantum frontier atoms (e.g.,
the link atom method [14], the connection atom
method [35], etc.), and those that deal specifically
with the frontier bond orbital by trying to compute
directly its main characteristics from known parameters (e.g., the local self-consistent method [36 –38]
and the generalized hybrid orbitals method [39]).
3.1.1. Link Atom Method
This is the first and simplest implemented
method. It consists of adding a monovalent atom,
the so-called link or dummy atom, along the XOY
bond to fill the valency of the quantum frontier
atom X. Usually, this link atom is a hydrogenfield.bash.ea:combined, but some implementations use
a halogen-like fluorine or chlorine [40]. There has
been some debate to determine whether this
dummy atom should interact or not with the classic
part. Today, it seems admitted that this dummy
atom should interact with the classic part as the
other quantum atoms, with the notable exception of
the few closest classic atoms [41– 43]. Another point
still in debate is whether the link atom should be
free to move or should be fixed at 1 Å from X along
the XOY bond. To the best of our knowledge, there
has not been any detailed study on this topic. However, it can be said that both solutions have their
advantages and disadvantages: When allowing the
link atom to move during a geometry optimization,
the perturbation due to this atom should be lowered as the latter should adopt an optimal conformation, different from the classic frontier atom it
represents. But, this solution could be a problem
during molecular dynamics, where this free
234
dummy atom introduce some supplementary degrees of freedom and frequencies that can be problematic when doing statistical simulations. On the
contrary, the solution of fixing the link atom along
the frontier bond does not put a monovalent atom
in its optimal conformation and, thus, introduces a
stronger perturbation, but it does not add any supplementary degree of freedom nor any new frequency in a statistical simulation.
Overall, the main advantages of the link atom
method is its easy implementation in current quantum chemistry code and its reliability in providing
accurate answers to chemical problems as long as
the frontier bonds are placed sufficiently far away
from the reactive atoms. This is why it has been
used so much in QM/MM computational study of
enzymatic systems. However, its main disadvantages are the supplementary degrees of freedom it
implies and the perturbation to the quantum calculation it adds because, for example, a COH cannot
exactly replace a COC covalent bond.
3.1.2. Connection Atom Method
To solve the problems arising with the link atom
method, some authors have suggested to replace in
the quantum calculation the classic frontier atom Y
by a monovalent pseudoatom parameterized to reproduce the behavior of the XOY bond. Antes et al.
called this dummy atom the connection atom and
developed AM1 and PM3 semiempirical parameters for a pseudoatom mimicking the behavior of a
methyl group [35, 42]. Zhang et al. [44] used an
equivalent approach to develop density functional
theory (DFT) pseudopotential for a monovalent
atom capable of representing properly covalent
frontier bond properties.
The main advantage of the connection atom
method is to avoid the problem of adding a supplementary atom in the system because the connection atom and the classic frontier atom are one.
However, the main disadvantage of this approach
is the need to reparameterize each type of covalent
frontier bond (e.g., COC, CON, COO, etc.) at each
quantum level (AM1, PM3, B3LYP, etc.), which is a
long and tedious task.
3.1.3. LSCF Method
To avoid the use of supplementary atoms, Rivail
and coworkers developed the so-called LSCF [36 –
38]. In this formalism, derived from the original
work of Warshel and Levitt, the two electrons of the
VOL. 93, NO. 3
ENZYMATIC REACTION PATHWAYS
frontier bond are described by a strictly localized
bond orbital (SLBO). By assuming this SLBO is
enough away from the reactive center of the system
(i.e., four covalent bonds at least), its electronic
properties can be considered as constant during the
chemical reaction (e.g., its electronic density, its
hybridization, etc.). Using model systems and the
transferability assumption of bond properties as
used in molecular mechanics, it is possible to determine the representation of the SLBO in the
atomic orbital basis set of the quantum part. By
freezing this representation, the molecular orbitals
describing the rest of the quantum subsystem and
that are orthogonal to the SLBOs can then be generated using a local self-consistent procedure.
This method has been implemented both at the
semiempirical [36] and ab initio levels [38]. Its main
advantage is to avoid the use of dummy atoms and
describe properly the chemical properties of the
frontier bond. However, it is more difficult to implement, especially at the ab initio level [45]. The
easier semiempirical implementation, its qualities,
and its defaults are addressed in the next section.
3.1.4. Generalized Hybrid Orbitals Method
In extension to the LSCF method, Gao and coworkers [39] developed the generalized hybrid orbitals (GHOs) method, in which the classic frontier
atom is described by a set of orbitals divided into
two sets of auxiliary and active orbitals. The latter
set is included in the SCF calculation, while the
former generates an effective core potential for the
frontier atom. Parameters for classic frontier atoms
have been computed at the semiempirical level, but
to the best of our knowledge no DFT nor ab initio
extension of the GHO method have been proposed.
In our opinion, the advantages and disadvantages
of the GHO approach are similar to the LSCF
method.
However, some differences in the two approaches can be noted: (1) With identical QM/MM
system and partitioning, semiempirical molecular
orbitals in the QM fragment are described in the
LCAO approximation with more basis functions in
the GHO method than in the LSCF method (two
more hybrid orbitals per frontier bond); (2) the
LSCF method only modified the SCF procedure
[36], whereas the GHO introduces new semiempirical parameters to describe auxiliary and active orbitals [39, 46].
3.2. SEMIEMPIRICAL LSCF: CLOSER LOOK
In QM/MM study of enzymatic systems,
semiempirical levels are often used to describe
quantum subsystems because the latter can be large
considering the need to cut covalent bonds far
away from the reactive center and the small computational resources needed by semiempirical calculations compared with ab initio and DFT calculations. We address, hereafter, the semiempirical
LSCF [36, 37, 43] formalism as well as its performance toward reactivity on small systems.
3.2.1. Semiempirical LSCF Procedure
In semiempirical approximation, only valence
shell electrons are used and overlap integrals between atomic orbitals centered on different atoms
are considered as zero. As defined previously, the
two electrons of a frontier bond XOY are represented by an SLBO that is here described by a linear
combination of two hybrid orbitals (HOs), one centered on X (兩l典) and the other centered on Y. To
generate molecular orbitals (MOs) orthogonal to
the SLBOs, it is sufficient that the hybrid orbital 兩l典
belongs to a set of four orthogonal hybrid orbitals
centered on X. The three other HOs can, therefore,
be used in conjunction with the atomic orbitals
(AOs) of the other atoms of the quantum part to
form a basis set of orbitals ready to build the molecular orbitals of the quantum subsystem.
The 兩l典 hybrid orbital can be expressed as a linear
combination of the AOs of X:
兩l典 ⫽ a l1 兩s典 ⫹ a l2 兩x典 ⫹ a l3 兩y典 ⫹ a l4 兩z典.
(13)
The parameters al1, al2, al3, and al4 must fulfill the
following requirements:
INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY
▪
▪
兩l典 is normalized.
兩l典 contains a fraction of the two electrons
involved in the corresponding SLBO. This introduces a parameter called Pll, which is the
electronic density of 兩l典. This parameter is
close to 1.0 for a covalent nonpolarized COC
bond.
▪ 兩l典 must be directed toward Y.
▪ The 兩s典 contribution (the hybridization) is supposed to be a transferable property of the
XOY bond. Thus, al1 is a precomputed parameter.
235
MONARD ET AL.
The hybrid orbital 兩l典 is called the frozen orbital
and does not enter the SCF calculation. It is, therefore, completely defined by its direction, the contribution al1 of the 兩s典 AO and its electronic density Pll.
The transformation of the four AOs 兩s典, 兩x典, 兩y典, and
兩z典 into four HOs 兩i典, 兩j典, 兩k典, and 兩l典 is made using a T
matrix built from the combination of the basis set
transformation in the diatomic referential (i.e., AOs
into HOs) and the orthogonal transformation from
the diatomic referential into the laboratory referential.
For a quantum subsystem containing N AOs and
L frontier bonds, the LSCF procedure is divided
into the following steps:
1. Choose an initial density matrix P of size
N ⫻ N.
2. Build the T matrix.
3. Build the Fock matrix F in the AO basis set.
4. Transform the Fock matrix in the hybrid orbital set:
Fⴕ ⫽ TTFT.
5. Get rid in Fⴕ of the lines and the column
corresponding to the 兩l典 HOs (i.e., they are
assumed to be zero). The size of Fⴕ is then
(N ⫺ L) ⫻ (N ⫺ L).
6. Compute the N ⫺ L eigenvalues of Fⴕ.
7. Build the density matrix Pⴕ in the HO basis
set.
8. Add the parameter element Pll to Pⴕ to form
a N ⫻ N matrix in the HO basis set.
FIGURE 2. LSCF test model: histidine ⫹ aspartic
acid.
236
TABLE I ______________________________________
Influence of the parameters Pll and as1 on LSCF
calculations.
LSCF Parameters
as1
0.50
0.50
0.50
0.50
0.50
0.50
0.50
0.50
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
Pll
⌬E (kcal/mol)
0.70
0.80
0.90
1.00
1.10
1.20
1.30
1.40
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
⫺27.1
⫺24.8
⫺22.6
⫺20.2
⫺17.8
⫺15.4
⫺13.4
⫺10.8
⫺20.3
⫺20.3
⫺20.2
⫺20.2
⫺20.2
⫺20.2
⫺20.2
⫺20.2
Full quantum calculations: ⌬E ⫽ ⫺ 19.0 kcal/mol.
9. Backtransform the density matrix in the
atomic orbital set:
P ⫽ TPⴕTT.
10. Go back to 3 unless convergence.
3.2.2. Influence of the Semiempirical LSCF
Parameters
In the semiempirical LSCF method, two parameters al1 and Pll are involved. Several tests have
been made to evaluate the influence of these parameters on the quality of the LSCF results [47]. We
show here some results on a test system composed
of a histidine and an aspartic acid as represented in
Figure 2. The starting geometry for this test system
has been taken from crystallographic data of the
catalytic site of trypsin. QM/MM frontier bonds are
located at the C␣OC␤ covalent bond of histidine
and between the methyl groups and the side peptidic bonds of aspartic acid. Between histidine and
aspartic acid, it is possible to transfer a proton.
Table I represents the energetics associated with
this proton transfer using the AM1 semiempirical
level and its variation with the variation of the
LSCF C␣OC␤ bond parameters Pll or as1 of the histidine. Here, the two peptidic backbones have been
kept fixed and the number of degrees of freedom is
identical whatever the considered test calculation.
VOL. 93, NO. 3
ENZYMATIC REACTION PATHWAYS
TABLE II ______________________________________
Mülliken charges on histidine for two different
values of as1.
and the electronic population of the frozen orbitals
Pll does not induce clear change in energies for low
polarized bonds (0.98 ⬍ Pll ⬍ 1.02).
Mülliken charges
Atom
C␤
H␤1
H ␤2
C␥
N␦
H␦1
C⑀
H ⑀1
N⑀
H⑀2
C␦
H␦2
as1 ⫽ 0.30
as1 ⫽ 0.70
⫺0.170
0.148
0.155
⫺0.035
⫺0.109
0.375
0.057
0.299
⫺0.185
0.284
⫺0.123
0.212
⫺0.021
0.088
0.097
⫺0.066
⫺0.111
0.375
0.059
0.300
⫺0.185
0.285
⫺0.125
0.211
Table I shows the more the electronic population
of the frozen hybrid orbital the less the difference in
energy between the two possible chemical states
(i.e., the proton either on the histidine or on the
aspartic acid). This can be easily explained by
the fact that while increasing the value of Pll at the
C␣OC␤ frontier bond the electronic density in the
histidine side-chain increases. Thus, the imidazole
ring becomes less proton donor. If one performs a
localization of the molecular orbitals, one will find
a value for the Pll parameter equal to 0.99. Likewise,
in most covalent peptidic bonds the Pll parameter is
always comprised between 0.95 and 1.05, which,
according to Table I gives an incertitude of ⫾1.2
kcal/mol on the energetic barrier of our test system.
This is a reliable result (⬃5% error) compared with
the incertitude on AM1 semiempirical calculations
as compared with full ab initio calculations.
Variations of the as1 parameter in Table I show
this parameter does not directly influence the energetics of the reaction pathway. In fact, its influence
is localized to the few atoms closed to the frontier
bond as shown in Table II. This phenomenon can be
explained by the fact that by increasing the s character of the frozen hybrid orbital one “moves” the
mean position of the frozen electron toward the
quantum part and, thus, increases the interaction
with it. This influence is local as it is not noticeable
after a three covalent bond distance and then it does
not perturb the reactivity of the system.
Overall, these results associated with other tests
performed in Rivail’s group show the influence of
the as1 parameter is negligible along a reaction path
4. Using the Potential Energy Surface
4.1. DYNAMICS
Coming from the Born–Oppenheimer approximation, the potential energy surface (PES) of an
enzymatic reaction would provide the total energy
of each nuclear configuration of the substrate– enzyme complex if all the nuclei were fixed at that
position. Nuclei actually are moving and the nuclear kinetic energy has to be introduced to understand enzymatic reactivity. So, molecular dynamics
simulations have to be carried out to sample extensively the configuration space, looking for new regions of the PES around minimum energy structures representing possible reactant and product
complexes. However, both energetic and entropic
factors make it impossible in practice for the molecular dynamics generation of reactive trajectories
going from the reactant region to the product region in a canonical ensemble at a given temperature. This is even true for chemical reactions in gas
phase involving just a half dozen of nuclei and with
energy barriers of more than a few kcal/mol. Then
canonical rate constants k(T) have to be calculated
by means of the transition-state theory, a statistical
approach to real dynamics. According to variational transition-state theory [48, 49], the canonical
rate constant depends on the generalized free energy barrier, that is, the maximum value of the
generalized free energies associated with a set of
dividing surfaces built up along a suitable reaction
pathway taken as a reference. The generalized free
energies can be obtained, for instance, from molecular dynamics simulations using the umbrella sampling technique with an adequate biasing potential
or by means of statistical perturbation theory. Once
localized, the reactant and the product, a progress
coordinate connecting them and based on suitable
internal coordinates, can be adopted to define the
reaction pathway. Another approach can be the
following [4, 50, 51]: Knowing the valence bond
structures of both reactant and product, a mapping
potential as a function of the diagonal elements of
an empirical valence bond Hamiltonian can be used
to define a reaction pathway as a collective reaction
coordinate analogous to the solvent coordinate
used in Marcus theory for electron transfer reac-
INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY
237
MONARD ET AL.
tions. In all cases, nuclear quantum effects and corrections accounting for the recrossing of the dividing surface can be introduced in different ways.
Although the above-described free energy approaches become fruitful, their practical implementation is in general based on a reference path constructed using the information extracted from the
reactant and product. However, this procedure
could lead to inaccurate results. Due to the complexity of the enzymatic reactions, the real reaction
pathway (that one joining reactant and product
through the set of stationary points involved in the
mechanism) can be different from the apparent one
at first glance. The enzymatic reaction can actually
take place through several parallel and kinetically
competitive channels, each consisting of multiple
steps, involving several intermediates in going
from the reactant to the product, then leading to a
priori unexpected reaction paths. As a consequence,
an exploration of the corresponding PES to locate
the set of stationary points that connects with the
reactant and the product by means of the real reaction pathway should be highly recommended prior
to the free energy calculations. So, the set of dividing surfaces should be raised along a path close to
that real reaction pathway. In what follows in this
section we review different methods to explore a
QM/MM PES looking for the stationary points.
Once the stationary points have been located, their
associated reaction pathway can be built up.
4.2. STATICS
As noted above, whatever dynamic treatment
should be preceded by a static study. Some problems arise in locating stationary points in systems
where thousands of degrees of freedom must be
taken into account. Not only the high computational effort required, which will be discussed later,
but a more general question appears. In a flexible
system, a lot of different stationary points and reaction paths connecting them will exist. Some of
them will be chemically equivalent (only differing
in noncrucial configurations of solvent or enzymatic environment), while others will be substantially different and their study will lead to different
results.
4.2.1. Algorithms for Locating Minimum
Energy Structures
To identify the reactants, products, and possible
intermediates, first a geometry optimization (mini-
238
mization or the gradient norm) has to be performed. An important aspect is the electron of the
starting structure for such minimization. If we start
from an enzyme structure obtained experimentally,
usually from X-ray or NMR spectroscopy, we will
fall in a minimum close to this experimental structure, but perhaps even hundreds of kcal/mol more
energetic than the absolute minimum in our PES
determined for our molecular model.
On the contrary, if we perform this minimization
after running molecular dynamics, using simulated
annealing or other algorithms available, to relax the
system, we will reach a structure more stable in our
model, but perhaps geometrically far from that experimentally obtained. Both strategies have been
used in the literature. However, we must keep in
mind that the deepest minimum will not always be
a representative structure of the reactants’ configuration.
There are several procedures used for the optimization of molecular structures. Systems where
molecular quantum chemistry is usually applied
have no more than 100 atoms. In this case the most
common algorithms are those using second derivatives or approximated second derivatives of the
energy. The most popular methods are quasi-Newton–Raphson [52], rational function optimization
(RFO) [53, 54], and direct inverted iterated space
(DIIS) [55].
The simplest second derivative method is Newton–Raphson. In a system involving N degrees of
freedom a quadratic Taylor expansion of the potential energy about the point qk is made, where the
subscript k states for the step number along the
optimization:
1
E共q k ⫹ ⌬qk兲 ⫽ E共qk兲 ⫹ gTk ⌬qk ⫹ ⌬qkTHk⌬qk.
2
(14)
The vector ⌬qk ⫽ (qk⫹1 ⫺ qk) describes the displacement from the reference geometry qk to the desired
new geometry qk⫹1, gk is the first derivative vector
(gradient) at the point qk, and Hk is the second
derivative matrix (Hessian) at the same geometry.
Under the approximation of a purely quadratic
PES, and imposing the condition of a stationary
point gk ⫽ 0, we have the Newton–Raphson equation that predicts the displacement that has to be
performed to reach the stationary point in just one
step:
⌬q k ⫽ ⫺Hk⫺1gk.
(15)
VOL. 93, NO. 3
ENZYMATIC REACTION PATHWAYS
Because the real PES are not quadratic, in practice
an iterative process has to be done to reach the
stationary point, and several steps will be required.
In this case the Hessian should be calculated at
every step, which is highly computationally demanding. A variation on the Newton–Raphson
method is the family of quasi-Newton–Raphson
methods, where an approximated Hessian matrix
Bk (or its inverse) is gradually updated using the
gradient and displacement vectors of the previous
steps.
While standard Newton–Raphson is based on
the optimization on a quadratic model, by replacing
this quadratic model by a rational function approximation we obtain the RFO method:
冉 冊冉 冊
冉 冊冉 冊
1
0
T
共1 ⌬qk 兲 g
2
k
E共q k ⫹ ⌬qk兲 ⫺ E共qk兲 ⬵
1
共1 ⌬qkT兲 0
gkT
1
⌬q
Bk
k
0T
1
Sk ⌬qk
.
(16)
The numerator in Eq. (16) is the quadratic model of
Eq. (14). The matrix in this numerator is the socalled augmented Hessian (AH). Bk is the Hessian
(analytic or approximated). The Sk matrix is a symmetrical matrix that has to be specified but normally is taken as the unit matrix I. The solution of
the RFO equation is obtained diagonalizing the AH
matrix, that is, solving the (N ⫹ 1)-dimensional
eigenvalue Eq. (17)
冉
冊
0 g kT 共k兲
共k兲 共k兲
gk Bk v␪ ⫽ ␭␪ v␪ ,
᭙ ␪ ⫽ 1, . . . , N ⫹ 1
(17)
and then the displacement vector ⌬qk for the kth
step is evaluated as
⌬q k ⫽
1
vⴕ共k兲,
v1,共k兲␪ ␪
(18)
where
共k兲
共vⴕ ␪共k兲兲T ⫽ 共v2,共k兲␪, . . . , vN⫹1,
␪兲.
(19)
In Eq. (19), if one is interested in locating a minimum then ␪ ⫽ 1 and for a transition structure ␪ ⫽
2. As the optimization process converges, v(k)
1,␪ tends
to 1 and ␭(k)
to
0.
␪
For quasi-Newton–Raphson and RFO methods,
at every step the approximated Hessian matrix is
updated from the information of previous steps:
冘 关j u ⫹ u j ⫺ 共j ⌬q 兲u u 兴
k
B k⫹1 ⫽ B0 ⫹
i
T
i
T
i i
T
i
i
i
T
i
i⫽0
k ⫽ 0, 1, . . . , (20)
where ji ⫽ Di ⫺ Ai, Di ⫽ gi⫹1 ⫺ gi, Ai ⫽ Bi⌬qi, and
ui ⫽ Mi⌬qi/(⌬qTi Mi⌬qi). Different election of the Mi
matrix leads to different update Hessian matrix
formulae. In particular, for the BFGS update Mi ⫽
aiBi⫹1 ⫹ biBi for some selected positive definite
scalars ai and bi. For the Powell update case the
matrix Mi is equal to unit matrix I.
These methods are effective and can reach a
stationary point in few steps. This is an important
issue when energy evaluation is expensive, as in
systems described by QM PES. On the other hand,
systems treated under a molecular mechanics potential usually have thousands of atoms, but the
energy and gradient evaluation is still computationally cheaper than in quantum potentials. So, the
limiting aspect is the storage and manipulation of a
Hessian of thousands of moving atoms. This is the
main reason why minimizations are carried out
with other algorithms that do not require the evaluation of a Hessian matrix although they are less
effective than quasi-Newton–Raphson or RFO
methods. Steepest descent or conjugate gradientlike algorithms are examples of those methods that
only need the storage of the position and gradient
vectors and that have low computer memory requirements.
In enzymatic systems treated with QM/MM potentials thousands of atoms are moved and energy
evaluation for each nuclear configuration is CPU
time demanding. So, as mentioned above, due to
the size of the system we cannot use a standard
quasi-Newton–Raphson method due to the impossibility of manipulating a high-dimensional Hessian matrix. This usually implies an O(N3) diagonalization and an O(N2) storage. In addition, we
cannot use a conjugate gradient procedure because
it needs too many optimization steps (i.e., energy
and gradient evaluations) to converge. Then, we
need a method efficient enough to reach convergence in few optimization steps but avoiding the
usage of a big amount of computer memory.
Different kind of methods, which will be explained here on, are used to solve this problem:
INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY
239
MONARD ET AL.
Limited memory, adopted basis Newton–Raphson
(ABNR), truncated Newton–Raphson, and coupled
methods. All of them are based on the Newton–
Raphson equation and have in common to avoid a
full Hessian manipulation.
4.2.1.1. Limited Memory. A first solution is the
so-called limited memory methodology. In this case
a quasi-Newton–Raphson algorithm is used. However, the inverse of the Hessian matrix is never built
up, but directly the product of the inverse of the
Hessian by the gradient, and then no Hessian diagonalization is required. What makes this method
powerful is that to update this matrix product only
information of last m steps is used. In this way only
the geometry and gradient of these last steps have
to be stored. When a BFGS [56] update formula is
used this procedure is called L-BFGS [57]. Unit
matrix can be used as an initial Hessian for the
minima search [58].
This useful method for minima, as will be explained later, cannot be applied to transition-state
search.
4.2.1.2. ABNR. In the limited memory method
described above, although the inverse of the Hessian matrix is never constructed information of the
second derivatives of the whole system is used. An
alternative solution consists of constructing the
Hessian matrix or its inverse only corresponding to
a reduced basis set of the whole space. This method
is the so-called ABNR [22]. This procedure still
avoids the diagonalization and storage of the full
Hessian. In this case Newton–Raphson equations
are solved in a subspace while conjugate gradient is
applied to the rest of the directions.
4.2.1.3. Truncated Newton–Raphson Methods. The
Newton–Raphson equation [Eq. (15)] can be rewritten as
H k⌬qk ⫽ ⫺gk.
(21)
The truncated Newton–Raphson method [59, 60]
finds iteratively an approximation to the solution
⌬qk in Eq. (21) using the preconditioned conjugate
gradients method.
4.2.1.4. Coupled Methods. Despite the fact these
methods are usually used in transition-state search,
they have also been applied for locating minimum
energy structures. It tries to take advantage of ap-
240
plying the different standard methods in the different zones, that is, it uses a quasi-Newton–Raphson
or RFO scheme for the small core of the enzyme.
This core will include most of the quantum atoms,
whereas a steepest-descent, conjugate gradient-like
or any of the last more efficient methods is applied
to the rest of environment atoms treated mostly
with molecular mechanics. This last method will be
explained in detail in Section 4.2.2 for transitionstate search.
4.2.2. Algorithms for Locating Transition
States
As we said before, location of the transition-state
(TS) structure through which the system evolves
from reactants to products on a PES is essential to
understand the reaction dynamics. Minimization
algorithms have been widely studied due to its
broad utility in macromolecular chemistry (e.g.,
preparation of structure for molecular dynamics,
docking, harmonic analysis, comparing and fitting
force fields . . .). On the other hand, because TSs is
related only to chemical reactions their search in
high-dimension systems has not been studied until
adequate potentials such as QM/MM have been
available. In this section we will describe some of
the methods to find these structures.
4.2.2.1. Reaction Coordinate Method. The most intuitive strategy to find a TS structure is to identify
an internal coordinate (bond distance, angle, dihedral) as a reaction coordinate and then perform
several restrained energy minimizations at different values of this coordinate kept frozen. At every
restrained minimization this coordinate is modified, going from reactant to product, to have a
discontinuous representation of the supposed reaction path. The way this coordinate is fixed is usually
applying a harmonic potential with a force constant
big enough to keep the atoms involved in this internal coordinate unmoved:
V total ⫽ Vsystem ⫹ k共xa,fix ⫺ xa兲2.
(22)
xa,fix is the intended fixed value at each restrained
minimization and xa is the current value of the
reaction coordinate along the simulation. When
there are more than one internal coordinate identified as the reaction coordinate we must modify all
of them in our discrete energy profile. In this case
the restraining harmonic potential is made from all
VOL. 93, NO. 3
ENZYMATIC REACTION PATHWAYS
of these distinguished coordinates (restrained distances: RESD [61]):
which constructs a set of conjugate directions
(s1, . . . , sj), starting with the direction s0:
V total ⫽ Vsystem ⫹ k共共xa,fix ⫺ xa兲 ⫹ 共xb,fix ⫺ xb兲 ⫹ . . .兲2.
s 0: given
(23)
The RESD method is also useful when we want to
discriminate between a concerted or stepwise
mechanism, where, in this case, a and b are those
coordinates governing the two reaction steps.
When there are two coupled internal coordinates, for instance, the breaking and forming bond
in a proton transfer reaction, there are several possibilities: Only the acceptor atom-transferring atom
distance or the donor atom-transferring atom distance is chosen to define the reaction coordinate,
simultaneously both of them or the difference between the two distances. In this last case
V total ⫽ Vsystem ⫹ k共difffix ⫺ diff兲2,
(24)
where diff ⫽ rdonor⫺H ⫺ racceptor⫺H. This last option
is the best because only one degree of freedom is
kept unmoved but it contemplates both distances
variation.
The point of maximum potential energy along
the reaction coordinate can be taken as a first approach to the TS structure. However, it is not always so easy or intuitive to identify an internal
coordinate as the reaction coordinate. If the coordinate is not appropriate we cannot be sure of visiting
the saddle point region. Anyway, even when a
coordinate seems to be intuitive, it should be
checked if the Hessian matrix has a unique negative
eigenvalue that will be associated with the transition eigenvector.
4.2.2.2. Conjugate Peak Refinement. There are
some more sophisticated algorithms that still avoid
any computation of second derivative. Conjugate
peak refinement (CPR) [62] has been applied to the
search of TSs in enzymatic reactivity.
To converge to a saddle point from a distance at
which the energy can be approximated by a quadratic expansion around that saddle point, it is
necessary to obtain a set of conjugate vectors with
respect to the Hessian matrix. Once a direction
along which the energy has a local maximum is
found, this direction is called s0. For instance, s0 can
be the vector that connects reactants and products.
The rest of the conjugate basis set is then built
recursively, making use of a recurrence formula,
s 1 ⫽ ⫺g1 ⫹
g1Th
s,
s0Th 0
(25)
s j ⫽ ⫺gj ⫹
gTj h
兩gj兩2
s
⫹
s , j ⬎ 1,
s0Th 0 兩gj⫺1兩2 j⫺1
(26)
where gj is the gradient vector at the energy extremum yj along sj⫺1 and h is an estimate of Hs0. One
cycle of maximizing the energy along s0 and minimizing along successive sj ( j ⬎ 0) yields the saddle
point on a quadratic energy surface. Because the
real PESs are not quadratic, in general several such
maximization/minimization cycles have to be performed to locate the saddle point. These iterative
maximization and linear conjugate minimizations
will lead to an approximation of the reaction path
whose maximum will be an approximation to the
saddle point structure. This process stops when the
gradient norm in the saddle point structure is under a given convergence criteria. Nonetheless, we
must insist on the fact that an analysis of the number of negative eigenvalues of the Hessian in this
point is necessary to be sure that we have found a
true saddle point.
In big molecular systems with N degrees of freedom, these conjugate line minimizations cannot be
done for all the N ⫺ 1 conjugate directions. These
will be interrupted at direction j as soon as the
quantity ␶j is greater than a given tolerance ␮.
␶ j ⬅ N 1/2
g Tj s0
⬎␮
兩gj储s0兩
共 j ⬎ 1兲.
(27)
4.2.2.3. Methods That Require the Hessian Matrix.
Second derivatives: Only moving a core. The easiest
approximation is to keep frozen most of atoms of
the system and move only a small part containing
those atoms that participate directly in the reaction
(the core), keeping frozen the rest of the system (the
environment). thus number f moving atoms has to
be small enough to be able to store and manipulate
their corresponding Hessian matrix. Any of the
standard methods to search for TSs in gas phase can
then be used (e.g., Newton–Raphson, RFO).
Second derivatives: Moving a core and an environment separately. An enzymatic system needs to be
INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY
241
MONARD ET AL.
FIGURE 3. Coupled optimization scheme.
represented by hundreds or even thousands of
moving atoms, where a residue can interact with
many others residues. This implies that a movement of an atom, group, or side-chain provokes in
turn a coupled movement of the interacting atoms.
This means that moving only a small core region
leaves the environment unrelaxed. This has forced
computational chemists to develop methods able to
search for a TS moving the whole of the system.
As mentioned for the case of minima, coupled
methods have been used to locate enzyme reaction
intermediates, but these methods are also especially
useful for TSs.
In the core region a TS search is performed with
a standard second derivative method (e.g., Newton–Raphson, RFO). The environment region is relaxed, minimizing it with a method able to minimize a big amount of atoms (e.g., conjugate
gradient, L-BFGS). The two procedures will be repeated iteratively until self-consistency, that is, until the gradient norm of both regions is lower than
a convergence criteria (see Fig. 3). In this case we
will find a stationary point with one small region
(core) with a first-order saddle point describing our
reaction.
Several research groups used this procedure and
applied it to enzymatic reactivity. All of them when
minimizing the environment avoid any QM/MM
energy evaluation. During the minimization the atoms of the fixed core region are substituted by
electrostatic potential (ESP) fitted charges (recalculated at the beginning of the minimization), and
then only MM energy is necessary. This approximation forces the environment to contain only MM
atoms. However, this is not restrictive because up
to today the number of QM atoms is usually
smaller than a treatable core region.
242
On the other hand, while in Refs. [63– 65] the
environment is relaxed at every step of the TS
search in the core, in Refs. [66, 67] the environment
is not relaxed until the TS search has converged. No
comparison has been done yet between these two
procedures.
Problems can arise when the two regions are
highly coupled, that is, for example, when the core
region is already converged, and a small displacement in the environment region makes the core
nonconverged. Of course, this coupling will depend
on the convergence criteria and the election of these
two zones, but we can still make a forward improvement.
Second derivatives: Moving the whole system at the
same time. The next logical step in the TS search
would be to move all the system at the same time.
In this way we would avoid this last problem of
coupling between the two zones. As in minima, a
limited-memory procedure seems to be adequate,
so that, like in L-BFGS scheme for minima, the
Hessian matrix is neither stored nor diagonalized
and only the last m steps information is used for the
update. Unfortunately, the usual L-BFGS update
cannot be used for TS search. A Powell update type
is more convenient because BFGS preserves a positive definite matrix. The RFO technique should be
used rather than Newton–Raphson because the
former only needs one eigenvector to predict the
next step while the latter will need a whole Hessian
diagonalization to invert it.
On the other hand, in minimization problems a
unit matrix is not a bad initial Hessian matrix because we only need to decrease the gradient norm
and a bad initial step can be improved with a line
search or other available methods [52]. This is not
the case in TS search, for which we need information of the PES to know where the TS is. This makes
us calculate an approximated initial second derivative matrix. A first option would be to build up a
high-dimensional square Hessian. This would imply its storage and can be problematic unless a lot
of computer memory is available. Another useful
solution is to set the matrix shown in Figure 4 as
initial Hessian: A squared Hessian is used for few
atoms of a core and only a vector describes the rest
of the atoms in the environment [58, 68].
After these two problems are pointed out the
optimal procedure could be applied to enzymatic
systems. The procedure can be outlined as follows.
An approximated initial Hessian is built up (see
Fig. 4). Solving RFO equations implies obtaining
one eigenvector of a large-scale matrix. An iterative
VOL. 93, NO. 3
ENZYMATIC REACTION PATHWAYS
FIGURE 4. Approximated initial Hessian for large systems.
method, which requires only a matrix–vector product, must be used avoiding a prohibitive full diagonalization of a big matrix. We need neither the
lowest root nor the highest one but the one whose
eigenvalue tends to zero [53, 54]. An algorithm
developed by Bofill et al. [69, 70] permits us to
extract the correct eigenvalue– eigenvector pair and
propose a displacement [71] of the geometry. The
update of the Hessian (in fact, the product of the
Hessian by a vector) must be done at every optimization step [68] but, in this case, only keeping the
position, gradient, and displacement information of
a limited m steps
paring the different approaches but some questions
remain opened: Do we need to parameterize specific van der Waals potentials for proteins to compute the QM/MM van der Waals interactions?
What is the influence of a protein force field charge
set on an enzymatic reaction pathway?
The answers to these questions should lead us
toward the improvement of QM/MM PES description, thus enabling QM/MM methods (in conjugation with the use of statistical sampling) to give
quantitative results in the exploration of enzymatic
reactivity [6, 72].
In the meantime, with the always increasing size
of the molecular systems studied in the literature, a
special effort has been devoted to the improvement
of optimization algorithms to locate efficiently both
minima and TS structures. These last efforts will
also be of great help to linear scaling methodologies, which could become a serious “competitor” to
QM/MM methodologies because they avoid the
problems of the QM/MM interactions as mentioned above. However, whether linear scaling
methods and full quantum calculations on enzymatic systems will one day replace QM/MM methods is something difficult to foretell at the time we
wrote this article.
冘 关j u v ⫹ u 共j v ⫺ 共j ⌬q 兲u v兲兴.
k
B k⫹1v ⫽ B0v ⫹
i
T
i
i
T
i
T
i
i
T
i
References
i⫽k⫺m
(28)
In this way, we can find TSs moving a system of
thousands of atoms. During the optimization process a large-scale matrix is never stored and full
diagonalization is avoided. Note that this last
method is also convenient for minima case.
1. Voet, D.; Voet, J. G. In: Biochemistry; Wiley & Sons: New
York, 1995.
2. Dive, G.; Dehareng, D.; Peeters, D. Int J Quantum Chem
1996, 58, 85.
3. Mulholland, A. J.; Richards, W. G. J Phys Chem B 1998, 102,
6635.
4. Warshel, A. In: Computer Modeling of Chemical Reactions
in Enzymes and Solutions; Wiley & Sons, New York, 1992.
5. Aqvist, J.; Warshel, A. Chem Rev 1993, 93, 2523.
5. Conclusions and Perspectives
6. Bentzien, J.; Muller, R. P.; Florian, J.; Warshel, A. J Phys
Chem B 1998, 102, 2293.
7. Goedecker, S. Rev Mod Phys 1999, 71, 1085.
During the last decade, the combined QM/MM
method has emerged as a powerful tool to simulate
enzymatic reactivity. A lot of work has been devoted to find solutions to the two main problems of
the QM/MM approach for macromolecular systems: The problem of the nonbonded interactions
between the quantum and the classic part and the
problem of cutting covalent bonds. Several solutions have been proposed and each of them have
their advantages and disadvantages. Recently, we
have seen in the literature studies devoted to com-
8. van der Vaart, A.; Gogonea, V.; Dixon, S. L.; Kenneth, M.;
Merz, J. J Comput Chem 2000, 21, 1494.
9. Stewart, J. J. P. Int J Quantum Chem 1996, 58, 133.
10. van der Vaart, A.; Suárez, D.; Kenneth, M.; Merz, J. J Chem
Phys 2000, 113, 10512.
11. Daniels, A. D.; Scuseria, G. E. J Chem Phys 1999, 110, 1321.
12. Warshel, A.; Levitt, M. J Mol Biol 1976, 103, 227.
13. Singh, U.; Kollman, P. J Comput Chem 1986, 7, 718.
14. Field, M.; Bash, P.; Karplus, M. J Comput Chem 1990, 11, 700.
15. Monard, G.; Merz, K. M. Acc Chem Res 1999, 32, 904.
16. Gao, J. In: Lipkowitz, K. B.; Boyd, D. B., eds. Reviews in
INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY
243
MONARD ET AL.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
Computational Chemistry, Vol. 7; VCH Publishers: New
York, 1996; p 119 –185.
Mordasini, T.; Thiel, W. Chimia 1998, 58, 288.
Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Kenneth,
M.; Merz, J.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. J Am Chem Soc 1995, 117, 5179.
Stewart, J. J. P. J Comput Aided Design 1990, 4, 1–105.
Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.;
Robb, M. A.; Cheeseman, J. R.; Zakrzewski, V. G.; Montgomery, J. A., Jr.; Stratmann, R. E.; Burant, J. C.; Dapprich, S.;
Millam, J. M.; Daniels, A. D.; Kudin, K. N.; Strain, M. C.;
Farkas, O.; Tomasi, J.; Barone, V.; Cossi, M.; Cammi, R.;
Mennucci, B.; Pomelli, C.; Adamo, C.; Clifford, S.; Ochterski,
J.; Petersson, G. A.; Ayala, P. Y.; Cui, Q.; Morokuma, K.;
Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; Foresman,
J. B.; Cioslowski, J.; Ortiz, J. V.; Baboul, A. G.; Stefanov, B. B.;
Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Gomperts,
R.; Martin, R. L.; Fox, D. J.; Keith, T.; Al-Laham, M. A.; Peng,
C. Y.; Nanayakkara, A.; Gonzalez, C.; Challacombe, M.; Gill,
P. M. W.; Johnson, B.; Chen, W.; Wong, M. W; Andres, J. L.;
Gonzalez, C.; Head-Gordon, M.; Replogle, E. S.; Pople, J. A.
Gaussian 98, Revision A.7; Gaussian, Inc.: Pittsburgh, PA, 1999.
Schmidt, M. W.; Baldridge, K. K.; Boatz, J. A.; Elbert, S. T.;
Gordon, M. S.; Jensen, J. J.; Koseki, S.; Matsunaga, N.;
Nguyen, K. A.; Su, S.; Windus, T. L.; Dupuis, M.; Montgomery, J. A. J Comput Chem 1993, 14, 1347.
Brooks, B. R.; Bruccoleri, R. E.; Olafson, B. D.; States, D. J.;
Swaminathan, S.; Karplus, M. J Comput Chem 1983, 4, 187.
van Gunsteren, W. F.; Billeter, S. R.; Eising, A. A.; Hünenberger, P. H.; Krüger, P. K.; Mark, A. E.; Scott, W. R.; Tironi,
I. G. GROMOS; Hochschuleverlag AG, Zurich 1996.
Bakowies, D.; Thiel, W. J Phys Chem 1996, 100, 10580.
Bakowies, D.; Thiel, W. J Comput Chem 1996, 17, 87.
Warshel, A.; Chu, Z. T. J Phys Chem B 2001, 105, 9857.
Chipot, C.; Millot, C.; Maigret, B.; Kollman, P. A. J Phys
Chem 1994, 98, 11362.
Luque, F. J.; Reuter, N.; Cartier, A.; Ruiz-López, M. F. J Phys
Chem A 2000, 104, 10923.
Cummins, P. L.; Gready, J. L. J Comput Chem 1999, 20, 1028.
Ranganathan, S.; Gready, J. E. J Phys Chem B 2997, 101, 5614.
Mulholland, A.; Richards, W. Proteins 1997, 27, 9.
Antonczak, S.; Monard, G.; Ruiz-López, M.; Rivail, J.-L. J Am
Chem Soc 1998, 120, 8825.
Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P.
J Am Chem Soc 1985, 107, 3902.
Stewart, J. J. P. J Comput Chem 1989, 10, 209.
Antes, I.; Thiel, W. J Phys Chem A 1999, 103, 9290.
Théry, V.; Rinaldi, D.; Rivail, J.-L.; Maigret, B.; Frenczy, B.
J Comput Chem 1994, 15, 269.
Monard, G.; Loos, M.; Théry, V.; Baka, K.; Rivail, J.-L. Int J
Quantum Chem 1996, 58, 153.
Assfeld, X.; Rivail, J.-L. Chem Phys Lett 1996, 263, 100.
Gao, J.; Amara, P.; Alhambra, C.; Field, M. J. J Phys Chem A
1998, 102, 4714.
HyperChem Users Manual; HyperCube, Inc.: Waterloo, Ontario, Canada, 2002.
Reuter, N. Ph.D. thesis; Université Henri Poincaré Nancy I;
Vandoeuvre-lès-Nancy, France, 1999.
244
42. Antes, I. Ph.D. thesis; Mathematischnaturwissenschaftlichen
Fakultät, Universitä Zürich: Zürich, 1998.
43. Reuter, N.; Dejaegere, A.; Maigret, B.; Karplus, M. J Phys
Chem A 2000, 104, 1720.
44. Zhang, Y. K.; Lee, T. S.; Yang, W. T. J Chem Phys 1999, 110, 46.
45. Ferré, N. Ph.D. thesis; Université Henri Poincaré Nancy I;
Vandoeuvre-lès-Nancy, France, 2001.
46. Amara, P.; Field, M. J.; Alhambra, C.; Gao, J. Theor Chem
Acc 2000, 336.
47. Monard, G. Ph.D. thesis; Université Henri Poincaré Nancy I:
Vandoeuvre-lès-Nancy, France, 1998.
48. Alhambra, C.; Corchado, J.; Sánchez, M. L.; Garcia-Viloca,
M.; Gao, J.; Truhlar, D. G. J Phys Chem B 2001, 105, 11326.
49. Truhlar, D. G.; Gao, J.; Alhambra, C.; Garcia-Viloca, M.;
Corchado, J.; Sánchez, M. L.; Villá, J. Acc Chem Res 2002, 35,
341–349.
50. Hwang, J. K.; King, G.; Warshel, A. J Am Chem Soc 1988,
110, 5297.
51. Billeter, S. R.; Webb, S. P.; Agarwal, P. K.; Iordanov, T.;
Hammes-Schiffer, S. J Am Chem Soc 2001, 123, 11262.
52. Schlegel, B. Adv Chem Phys 1987, 65, 249.
53. Simons, J.; Jorgensen, P.; Taylor, H.; Ozment, J. J Comput
Chem 1983, 87, 2745.
54. Banerjee, A.; Adams, N.; Simons, J.; Shepard, R. J Comput
Chem 1985, 89, 52.
55. Császár, P.; Pulay, P. J Mol Struct 1984, 114, 31.
56. Fletcher, R. Practical Methods of Optimization, 2nd Ed. John
Wiley & Sons: New York, 1987.
57. Liu, D. C.; Nocedal, J. Math Program 1989, 45, 503–528.
58. Prat-Resina, X.; Garcia-Viloca, M.; Monard, G.; GonzálezLafont, A.; Lluch, J. M.; Anglada, J. M.; Bofill, J. M. Theor
Chem Acc 2002, 107, 147.
59. Schlick, T.; Overton, M. J Comput Chem 1987, 8, 1025.
60. Derremaux, P.; Zhang, G.; Schlick, T.; Brooks, B. J Comput
Chem 1994, 15, 532.
61. Eurenius, K. P.; Chatfield, D. C.; Brooks, B. R.; Hodoscek, M.
Int J Quantum Chem 1996, 60, 1189.
62. Fischer, S.; Karplus, M. Chem Phys Lett 1992, 194, 252.
63. Turner, A. J.; Moliner, V.; Williams, I. H. Phys Chem Chem
Phys 1999, 1, 1323.
64. Billeter, S. R.; Turner, A. J.; Thiel, W. Phys Chem Chem Phys
2000, 2, 2177.
65. Hall, R. J.; Hindle, S. A.; Burton, N. A.; Hillier, I. H. J Comput
Chem 2000, 21, 1433.
66. Zhang, Y.; Liu, H.; Yang, W. J Chem Phys 2000, 112, 3483.
67. Prat-Resina, X.; Garcia-Viloca, M.; Monard, G.; GonzálezLafont, A.; Lluch, J. M.; Anglada, J. M.; Bofill, J. M. Unpublished results.
68. Anglada, J. M.; Besalú, E.; Bofill, J. M.; Rubio, J. J Math Chem
1999, 25, 85.
69. Anglada, J. M.; Besalú, E.; Bofill, J. M. Theor Chem Acc 1999,
103, 163.
70. Bofill, J. M.; de Pinho Ribeiro Moreira, I.; Anglada, J. M.;
Illas, F. J Comput Chem 2000, 21, 1375.
71. Besalú, E.; Bofill, J. M. Theor Chem Acc 1998, 100, 265.
72. Li, H.; Hains, A. W.; Everts, J. E.; Robertson, A. D.; Jensen,
J. H. J Phys Chem B 2002, 106, 3486.
VOL. 93, NO. 3