* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Determination of Enzymatic Reaction Pathways Using QM/MM
Particle in a box wikipedia , lookup
Wave–particle duality wikipedia , lookup
Theoretical and experimental justification for the Schrödinger equation wikipedia , lookup
Interpretations of quantum mechanics wikipedia , lookup
EPR paradox wikipedia , lookup
Quantum key distribution wikipedia , lookup
Molecular orbital wikipedia , lookup
History of quantum field theory wikipedia , lookup
Quantum teleportation wikipedia , lookup
Quantum machine learning wikipedia , lookup
Quantum state wikipedia , lookup
Canonical quantization wikipedia , lookup
Quantum group wikipedia , lookup
Symmetry in quantum mechanics wikipedia , lookup
Hidden variable theory wikipedia , lookup
Molecular Hamiltonian wikipedia , lookup
Atomic orbital wikipedia , lookup
Hydrogen atom wikipedia , lookup
Tight binding wikipedia , lookup
Electron configuration wikipedia , lookup
Determination of Enzymatic Reaction Pathways Using QM/MM Methods GÉRALD MONARD,1 XAVIER PRAT-RESINA,2 ANGELS GONZÁLEZ-LAFONT,2 JOSÉ M. LLUCH2 1 Equipe de Chimie et Biochimie Théorique, UMR 7675 CNRS-UHP-INPL, Université Henry Poincaré Nancy I, Faculté des Sciences—B.P. 239, F-54506 Vandœuvre-lès-Nancy, France 2 Unitat de Quı́mica Fı́sica, Departament de Quı́mica, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain Received 25 May 2002; accepted 8 January 2003 DOI 10.1002/qua.10555 ABSTRACT: Enzymes are among the most powerful known catalysts. Understanding the functions of these proteins is one of the central goals of contemporary chemistry and biochemistry. But, because these systems are large they are difficult to handle using standard theoretical chemistry tools. In the last 10 years, we have seen the rapid development of so-called QM/MM methods that combined quantum chemistry and molecular mechanics to elucidate the structure and functions of systems with many degrees of freedom, including enzymatic systems. In this article, we review the numerical aspects of QM/MM methods applied to enzymes: The energy definition, the special treatment of the covalent QM/MM frontiers, and the exploration of QM/MM potential energy surface. A special emphasis is made on the use of local self-consistent field and rational function optimization. © 2003 Wiley Periodicals, Inc. Int J Quantum Chem 93: 229 –244, 2003 Key words: QM/MM methods; geometry optimization; transition-state search; enzyme catalysis 1. Introduction L ife implies the ability for a complex and highly organized organism to synthesize chemicals through metabolic processes into structures that have defined purposes [1]. Most of the reactions Correspondence to: G. Monard; e-mail: gmonard@lctn. u-nancy.fr International Journal of Quantum Chemistry, Vol 93, 229 –244 (2003) © 2003 Wiley Periodicals, Inc. carried out in these organisms are mediated by a series of remarkable biologic catalysts known as enzymes. These enzymes are proteins, which differ from ordinary chemical catalysts in several important aspects: Their high kinetic rates compared with corresponding uncatalyzed reactions; the relative mild reaction conditions under which the catalyzed reactions can occur (i.e., temperature below 100°C, atmospheric pressure, nearly neutral pH, etc.); their great reaction specifity, which enables them to se- MONARD ET AL. lect reactants and transform the latter into welldefined chemical products (i.e., enzymatic reactions rarely have side products); and their capacity for regulation in response to the concentration of substances other than their substrates. Elucidating and mastering enzymatic reactions is one of the most thrilling challenge facing contemporary chemistry and biochemistry. Broad range of applications can follow from it, going from the development of new drugs to the design of new protein-based catalysts. Contributions of computational chemistry can be a determining factor in understanding enzymatic reactivity because theoretical tools can give molecular-level insights into enzyme catalysis, which can be difficult to obtain by experimental means. The main problem a theoretical chemists will face to determine realistic enzymatic reaction pathways is to use a proper modeling approach. For a long time, most researchers confined the study of enzyme reactivity to models containing only a few representative atoms (i.e., those believed to contribute mostly to the reactivity) either inserted or not in a cavity representing the electrostatic effect of surrounding enzyme and aqueous environment [2, 3]. This drastic limitation in the size of model systems was mainly due to both the limited computing power available and the necessity of using quantum chemistry to access the making and breaking of bonds that usually appear in enzymatic reactions. The main advantage of this type of approach was to answer what we could call the intrinsic reactivity of a system: Putting some atoms together in a defined position will transcribe into a possible reactivity or not (e.g., a nucleophile can react onto an electrophile, whereas two nucleophiles will not react together). However, these studies were not able to account for the main specificity of enzymes as described above. For example, they cannot explain the differences in activity of different enzymes that bear the same active site (i.e., the model systems are identical, but experimental results usually show different kinetics). They also cannot explain the catalytic effect of an enzyme compared to the same reaction in aqueous solution. Several answers to these problems have been suggested in the literature. They can be divided into three main groups: 1. The empirical valance bond (EVB) method [4, 5] from Warshel and coworkers, in which a chemical reaction is described using a valence bond approach, i.e., the system wave function is represented by a linear combination of the 230 most important ionic and covalent resonance forms and the potential energy is found by solving the related secular equation. The electronic interaction Hamiltonian is built using parameter terms extracted from empirical values and/or ab initio surfaces [6]. The main advantage of the EVB approach is its ability to give good quantitative results in comparison with experiment as long as the incorporated empirical terms are carefully chosen. This is mainly accomplished by first calibrating free energy surfaces from reference reactions in solution before incorporating the enzyme effects. However, the choice of correct EVB parameters is crucial and can also be seen as a disadvantage of EVB methods: In case one has not properly defined the valence bond forms (i.e., the most prevalent ionic and covalent forms), one can miss either unusual reaction pathways that can occur in reactive chemical systems or a chemical reaction not previously introduced in the valence bond forms. 2. The linear scaling approach [7, 8], which changes the way quantum calculations are done, enabling computations on large systems. Numerous examples of calculations on systems with more than 1000 atoms have been reported [9 –11]. However, while this new methodology seems promising, the CPU time involved in today’s calculations only allow for single energy point calculations. Some improvements, both in linear scaling algorithms and computing power, are still needed to address useful full quantum statistical simulations. 3. The combined quantum mechanics/molecular mechanics (QM/MM) [12–17] methodology, where the small reactive part of a chemical system is described by QM, whereas the remaining large nonreactive part is described by MM. This last methodology is today the most used to address the reactivity of biochemical systems. The main advantage of QM/MM methods is its easy implementation in computational codes while giving good chemical results. Its main disadvantage, especially in enzymatic systems, is to go beyond qualitative results and, thus, obtain quantitative numbers out of QM/MM computations. This problem is mainly due to three factors: (1) The need for VOL. 93, NO. 3 ENZYMATIC REACTION PATHWAYS good ab initio description for the QM part, whereas the usual size of the QM part mostly only allow for semiempirical calculations; (2) the need for accessing free energy numbers through extensive sampling that is (too) computationally expensive; (3) the difficult calibration of the interaction between the QM part and the MM part, especially in biochemical systems as mentioned thereafter. The first two factors are related to actual computational bottlenecks and should be overpassed in the near future. In this article, we review QM/MM methodology as used to describe biochemical systems. We first define the QM/MM energy and emphasize the problem of the interactions between the QM and MM parts. Second, we present the problem of the cutting of covalent bonds in biochemical systems and its possible solutions, in particular the local self-consistent field (LSCF) method. In a third part we show what can be done with a potential energy surface as defined by QM/MM methodology and especially how to locate efficiently transition-state structures and energy minima on large molecules. H QM ⫽ ⫺ 1 2 冘 electrons 冘 冘 Zr electrons nuclei K ⌬i ⫺ i i 冘 冘 electrons electrons ⫹ i iK K 1 ⫹ rij i⬎j 冘 冘 ZR Z . nuclei nuclei K K L (2) KL K⬎L The first term defines the kinetic energy of electrons i, the second term expresses the electron– nuclei attraction between electrons i and nuclei K of charges ZK, the third term is the electron– electron repulsion, and the last term defines the nuclei– nuclei repulsion. In Eq. (1), the Hamiltonian HMM describes the classic part. As usually defined in molecular mechanics, a set of atoms interacting in this part can be seen as a set of point charges {Qc} in space interacting through a defined force field. For example, if we choose the AMBER force field [18] we have H MM ⫽ 冘 k 共r ⫺ r 兲 ⫹ 冘 k 共 ⫺ 兲 r 0 2 bonds ⫹ 0 2 angles 冘 冘 V2 关1 ⫹ cos共n ⫺ 兲兴 n dihedrals n 2. Energy Definition ⫹ 共i, j兲 2.1. SPLITTING THE SYSTEM In most reactive systems, the number of atoms involved in a chemical reaction is fairly limited (i.e., whose electronic properties are changed during the reaction); the rest of the atoms may have a strong influence on the reaction, but this is usually limited to short- and long-range nonbonded interactions that can be represented through both electrostatic and van der Waals interactions. The main idea of QM/MM methodology [12, 14] is to split the chemical system into two parts: The first is small and described by quantum mechanics (it is called the quantum part); the second is the rest of the system and is described by molecular mechanics (it is called the classic part). The full Hamiltonian can therefore be expressed as H ⫽ HQM ⫹ HMM ⫹ HQM/MM, 冘 冋RA ⫺ RB ⫹ ⑀qRq 册. ij 6 ij i j (3) ij The last term HQM/MM in Eq. (1) represents the interaction between the quantum part and the classic one. It can be represented as the sum of two terms: A van der Waals term describing the nonelectrostatic interactions between quantum and classic atoms and an electrostatic term describing the interaction between a classic point charge {Qc} and the electrons and nuclei of the quantum part: 冘 冘 electrons classical H QM/MM ⫽ V van der Waals QM/MM ⫺ i C QC riC 冘 冘 nuclei classical ⫹ K (1) where HQM is the Hamiltonian describing the quantum atoms. In the Born–Oppenheimer approximation, it can be defined by ij 12 ij C ZKQC . RKC (4) We can group the terms in Eqs. (2), (3), and (4) depending on whether they describe electrons on the quantum part (Helec) or not (Hnon-elec). This gives INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY 231 MONARD ET AL. 1 H elec ⫽ ⫺ 2 冘 electrons 冘 冘 electrons nuclei ⌬i ⫺ i i 冘 冘 electrons electrons ⫹ i i⬎j K 1 ⫺ rij ZK riK 冘 冘 electrons classical i QC (5) riC C and van der Waals H non-elec ⫽ HMM ⫹ VQM/MM 冘 冘 nuclei classical ⫹ K C ZKQC ⫹ RKC 冘 冘 ZR Z nuclei nuclei K K K⬎L L KL van der Waals nuclei ⫽ H MM ⫹ VQM/MM ⫹ VQM⫹QM/MM . (6) nuclei Helec and VQM⫹QM/MM can be computed using a standard quantum mechanics code. The term describing the electrons– classic charge interaction is incorporated into the core Hamiltonian of the quantum subsystem. Examples of QM/MM studies in the literature have used popular quantum mechanics codes such as MOPAC [19], Gaussian [20], Gamess [21], etc. van der Waals HMM and VQM/MM are computed using standard molecular mechanics code and are relatively easy to implement. Examples in the literature include the use of codes like AMBER [18], CHARMM [22], GROMOS [23], etc. 2.2. QM/MM INTERACTIONS The main problem arising from the use of QM/MM methods is the calibration of the interaction between the quantum and the classic part. As stated previously, this interaction can be divided into two parts: An electrostatic and a nonelectrostatic interaction. The influence of the classic part on the reactive part is usually described using point charges and van der Waals potentials, which should reproduce quantitatively the interaction between the QM and MM parts as if the system was computed fully quantum mechanically [14, 24, 25]. The quantitative reproduction of the QM/MM interactions depends on three points: (1) The choice of the set of nonpolarizable point charges {QC} or more in general of the MM force field [26]; (2) the choice of the van der Waals parameters to describe van der Waals VQM/MM ; (3) the way the classic charges polarize the quantum subsystem. 232 The set of point charges {QC} must be chosen to reproduce the electrostatic field due to the MM part onto the QM part. It is usually a good approximation to take the charge definition from an empirical force field and incorporate those charges into the core Hamiltonian of the quantum subsystem. This gives reliable results because the charges in molecular mechanics are defined to reproduce electrostatic potential properly. However, it is well known that between different force fields the charge definition on atoms can change dramatically, even between different generations of the same force field [27]. These differences between different sets of charges should have a nonnegligible impact on the quality of a QM/MM study. Although this point seems important, to the best of our knowledge a publication dealing with the use of different charge sets on the quality of an QM/MM computation in enzymatic systems was still lacking. The choice of van der Waals parameters is another crucial point in the production of a good QM/MM interaction. In theory, a new set of parameters should be defined for each atom in the system van der Waals to exclusively compute VQM/MM . In practice, one uses the van der Waals parameters from the current force field definition for each atom in the MM part and, when possible, defines a new set of van der Waals parameters for the QM atoms. This set of parameters is only available at a defined QM level (e.g., AM1, PM3, 6-31G, etc.) and in conjunction with a defined force field. This approach has been used by Luque et al. [28] and Cummins et al. [29] for modeling reactions in solution, but to the best of our knowledge this has not been done in enzymatic systems. This is fairly understandable as it is easier to optimize a few van der Waals parameters for a small set of atoms in solution than to optimize a full set of van der Waals parameters compatible with a given force field at a given QM level. Thus, most of the literature devoted to QM/MM studies of biochemical systems have used van der Waals parameters coming from standard force fields like CHARMM or AMBER [30 –32]. A third point needing to be clarified in the description of QM/MM interactions is the way the classic set of charges {QC} polarizes the electronic wave function from the QM subsystem. This is usually done by adding to the core Hamiltonian of the QM part a perturbation describing the interaction between the QM electrons and the classic charges: VOL. 93, NO. 3 ENZYMATIC REACTION PATHWAYS 冘 冘 electrons classical Hⴕ core ⫽H core ⫺ i C QC . ric (7) In Ref. [28], Luque et al. have shown one of the following conditions was to be fulfilled to reproduce correctly the electrostatic interaction between a classic atom C and a quantum atom K: At the ab initio level, an element of the core Hamiltonian matrix is expressed as C 1. (K x )x⫽0,1,2 ⫽ (x )x⫽0,1,2 ⫽ 0, f(K, C) ⫽ 1, and g(K, C) ⫽ 0 (i.e., Vcharge–nuclei ⫽ ZKQC/RKC) 2. (C 1 ⫾ e⫺␣KRKC, and g(K, x )x⫽0,1,2 ⫽ 0, f(K, C) ⫽ ⫺bi,K(RKC⫺ci,K)2 C) ⫽ ¥i ai,Ke . core H⬘ ⫽ 具兩Hⴕcore兩典 core ⫽ H ⫺ 冘 冘 冓冏Qr 冏冔. C i (8) iC C The second term in the computation is similar to the QM electron–nuclei interaction and can be straightforwardly computed in the same way. Likewise, the nuclei classical term ¥K ¥C (ZKQC/RKC) from Eq. (4) is computed similarly as QM nuclei–nuclei interactions. However, with NDDO semiempirical methods, Luque et al. [28] have shown electron– classic charges and nuclei– classic charges interactions should not be treated the same way as electron– nuclei and nuclei–nuclei interactions. In AM1 [33] or PM3 [34], the matrix element of the core Hamiltonian describing the electron–nuclei interaction between the electrons projected onto two atomic orbitals and centered on a quantum atom K and all other quantum atoms L ⫽ K is expressed as electrons–nuclei H ⫽ 冘P ZL共兩sLsL兲, (9) L⫽K where (兩sLsL) is a two-center two electron integral depending on electronic parameters (K x )x⫽0,1,2 and (Lx )x⫽0,1,2. The core– core interaction between K and L is expressed as 冋 V nuclei–nuclei ⫽ ZKZL 共sKsK兩sLsL兲 f共K, L兲 ⫹ 册 1 g共K, L兲 , RKL (10) with f共K, L兲 ⫽ 1 ⫾ e ⫺ ␣ KR KL ⫾ e ⫺ ␣ LR KL (11) and g共K, L兲 ⫽ 冘a i e ⫺b i,K共R KL⫺c i,K兲 ⫹ 2 i,K 冘a e ⫺b j,L共R KL⫺c j,L兲 . 2 j,L j (12) As outlined above, it is also necessary with semiempirical models to reparameterize van der Waals potentials between the classic and quantum atoms to reproduce properly the nonbonded interactions. The forms and coefficients of these new van der Waals potentials are different whether solution 1 or 2 is chosen [28]. 3. Cutting Covalent Bonds 3.1. DIFFERENT SOLUTIONS The main concepts of QM/MM methods defined in the preceeding section (i.e., splitting a molecular system in two parts and ensuring a proper interaction between them) has proved successful especially in the study of chemical reactions in solution. Usually, in these systems the reactants (a set of small molecules) are described by quantum mechanics, whereas the solvent (water, methanol, etc.) is described by molecular mechanics using polarizable or nonpolarizable [15, 32] force fields. There, the delineation between the quantum and classic parts is clearly defined as a molecule is exclusively in one of the subsystems. However, in enzymatic systems composed of an enzyme, its substrate, sometimes a cofactor, and the surrounding solvent, it is not possible to include the whole protein in the quantum subsystem due to computational bottleneck. It is therefore necessary to define a small subset of atoms (i.e., the reactive ones) that will be incorporated into the quantum part, whereas the others will be part of classic subsystem. Some covalent bonds are then at the frontier between the classic and quantum parts. They link what we call a quantum frontier atom denoted X in the rest of this article with a classic frontier atom we denote Y (see Fig. 1). A problem occurs at this frontier because the electron of X involved in the covalent bond with Y is not paired with any other electron because in molecular mechanics the electrons (of Y) are not INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY 233 MONARD ET AL. FIGURE 1. Example of the problem of cutting covalent bonds in QM/MM methods: A covalent COC bond at the frontier between a classic and a quantum part. explicitly represented. Thus, this electron needs a special treatment. Several solutions have been suggested in the literature. They can be divided into two main categories: Those that add an atom or pseudoatom to fill the valencies of the quantum frontier atoms (e.g., the link atom method [14], the connection atom method [35], etc.), and those that deal specifically with the frontier bond orbital by trying to compute directly its main characteristics from known parameters (e.g., the local self-consistent method [36 –38] and the generalized hybrid orbitals method [39]). 3.1.1. Link Atom Method This is the first and simplest implemented method. It consists of adding a monovalent atom, the so-called link or dummy atom, along the XOY bond to fill the valency of the quantum frontier atom X. Usually, this link atom is a hydrogenfield.bash.ea:combined, but some implementations use a halogen-like fluorine or chlorine [40]. There has been some debate to determine whether this dummy atom should interact or not with the classic part. Today, it seems admitted that this dummy atom should interact with the classic part as the other quantum atoms, with the notable exception of the few closest classic atoms [41– 43]. Another point still in debate is whether the link atom should be free to move or should be fixed at 1 Å from X along the XOY bond. To the best of our knowledge, there has not been any detailed study on this topic. However, it can be said that both solutions have their advantages and disadvantages: When allowing the link atom to move during a geometry optimization, the perturbation due to this atom should be lowered as the latter should adopt an optimal conformation, different from the classic frontier atom it represents. But, this solution could be a problem during molecular dynamics, where this free 234 dummy atom introduce some supplementary degrees of freedom and frequencies that can be problematic when doing statistical simulations. On the contrary, the solution of fixing the link atom along the frontier bond does not put a monovalent atom in its optimal conformation and, thus, introduces a stronger perturbation, but it does not add any supplementary degree of freedom nor any new frequency in a statistical simulation. Overall, the main advantages of the link atom method is its easy implementation in current quantum chemistry code and its reliability in providing accurate answers to chemical problems as long as the frontier bonds are placed sufficiently far away from the reactive atoms. This is why it has been used so much in QM/MM computational study of enzymatic systems. However, its main disadvantages are the supplementary degrees of freedom it implies and the perturbation to the quantum calculation it adds because, for example, a COH cannot exactly replace a COC covalent bond. 3.1.2. Connection Atom Method To solve the problems arising with the link atom method, some authors have suggested to replace in the quantum calculation the classic frontier atom Y by a monovalent pseudoatom parameterized to reproduce the behavior of the XOY bond. Antes et al. called this dummy atom the connection atom and developed AM1 and PM3 semiempirical parameters for a pseudoatom mimicking the behavior of a methyl group [35, 42]. Zhang et al. [44] used an equivalent approach to develop density functional theory (DFT) pseudopotential for a monovalent atom capable of representing properly covalent frontier bond properties. The main advantage of the connection atom method is to avoid the problem of adding a supplementary atom in the system because the connection atom and the classic frontier atom are one. However, the main disadvantage of this approach is the need to reparameterize each type of covalent frontier bond (e.g., COC, CON, COO, etc.) at each quantum level (AM1, PM3, B3LYP, etc.), which is a long and tedious task. 3.1.3. LSCF Method To avoid the use of supplementary atoms, Rivail and coworkers developed the so-called LSCF [36 – 38]. In this formalism, derived from the original work of Warshel and Levitt, the two electrons of the VOL. 93, NO. 3 ENZYMATIC REACTION PATHWAYS frontier bond are described by a strictly localized bond orbital (SLBO). By assuming this SLBO is enough away from the reactive center of the system (i.e., four covalent bonds at least), its electronic properties can be considered as constant during the chemical reaction (e.g., its electronic density, its hybridization, etc.). Using model systems and the transferability assumption of bond properties as used in molecular mechanics, it is possible to determine the representation of the SLBO in the atomic orbital basis set of the quantum part. By freezing this representation, the molecular orbitals describing the rest of the quantum subsystem and that are orthogonal to the SLBOs can then be generated using a local self-consistent procedure. This method has been implemented both at the semiempirical [36] and ab initio levels [38]. Its main advantage is to avoid the use of dummy atoms and describe properly the chemical properties of the frontier bond. However, it is more difficult to implement, especially at the ab initio level [45]. The easier semiempirical implementation, its qualities, and its defaults are addressed in the next section. 3.1.4. Generalized Hybrid Orbitals Method In extension to the LSCF method, Gao and coworkers [39] developed the generalized hybrid orbitals (GHOs) method, in which the classic frontier atom is described by a set of orbitals divided into two sets of auxiliary and active orbitals. The latter set is included in the SCF calculation, while the former generates an effective core potential for the frontier atom. Parameters for classic frontier atoms have been computed at the semiempirical level, but to the best of our knowledge no DFT nor ab initio extension of the GHO method have been proposed. In our opinion, the advantages and disadvantages of the GHO approach are similar to the LSCF method. However, some differences in the two approaches can be noted: (1) With identical QM/MM system and partitioning, semiempirical molecular orbitals in the QM fragment are described in the LCAO approximation with more basis functions in the GHO method than in the LSCF method (two more hybrid orbitals per frontier bond); (2) the LSCF method only modified the SCF procedure [36], whereas the GHO introduces new semiempirical parameters to describe auxiliary and active orbitals [39, 46]. 3.2. SEMIEMPIRICAL LSCF: CLOSER LOOK In QM/MM study of enzymatic systems, semiempirical levels are often used to describe quantum subsystems because the latter can be large considering the need to cut covalent bonds far away from the reactive center and the small computational resources needed by semiempirical calculations compared with ab initio and DFT calculations. We address, hereafter, the semiempirical LSCF [36, 37, 43] formalism as well as its performance toward reactivity on small systems. 3.2.1. Semiempirical LSCF Procedure In semiempirical approximation, only valence shell electrons are used and overlap integrals between atomic orbitals centered on different atoms are considered as zero. As defined previously, the two electrons of a frontier bond XOY are represented by an SLBO that is here described by a linear combination of two hybrid orbitals (HOs), one centered on X (兩l典) and the other centered on Y. To generate molecular orbitals (MOs) orthogonal to the SLBOs, it is sufficient that the hybrid orbital 兩l典 belongs to a set of four orthogonal hybrid orbitals centered on X. The three other HOs can, therefore, be used in conjunction with the atomic orbitals (AOs) of the other atoms of the quantum part to form a basis set of orbitals ready to build the molecular orbitals of the quantum subsystem. The 兩l典 hybrid orbital can be expressed as a linear combination of the AOs of X: 兩l典 ⫽ a l1 兩s典 ⫹ a l2 兩x典 ⫹ a l3 兩y典 ⫹ a l4 兩z典. (13) The parameters al1, al2, al3, and al4 must fulfill the following requirements: INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY ▪ ▪ 兩l典 is normalized. 兩l典 contains a fraction of the two electrons involved in the corresponding SLBO. This introduces a parameter called Pll, which is the electronic density of 兩l典. This parameter is close to 1.0 for a covalent nonpolarized COC bond. ▪ 兩l典 must be directed toward Y. ▪ The 兩s典 contribution (the hybridization) is supposed to be a transferable property of the XOY bond. Thus, al1 is a precomputed parameter. 235 MONARD ET AL. The hybrid orbital 兩l典 is called the frozen orbital and does not enter the SCF calculation. It is, therefore, completely defined by its direction, the contribution al1 of the 兩s典 AO and its electronic density Pll. The transformation of the four AOs 兩s典, 兩x典, 兩y典, and 兩z典 into four HOs 兩i典, 兩j典, 兩k典, and 兩l典 is made using a T matrix built from the combination of the basis set transformation in the diatomic referential (i.e., AOs into HOs) and the orthogonal transformation from the diatomic referential into the laboratory referential. For a quantum subsystem containing N AOs and L frontier bonds, the LSCF procedure is divided into the following steps: 1. Choose an initial density matrix P of size N ⫻ N. 2. Build the T matrix. 3. Build the Fock matrix F in the AO basis set. 4. Transform the Fock matrix in the hybrid orbital set: Fⴕ ⫽ TTFT. 5. Get rid in Fⴕ of the lines and the column corresponding to the 兩l典 HOs (i.e., they are assumed to be zero). The size of Fⴕ is then (N ⫺ L) ⫻ (N ⫺ L). 6. Compute the N ⫺ L eigenvalues of Fⴕ. 7. Build the density matrix Pⴕ in the HO basis set. 8. Add the parameter element Pll to Pⴕ to form a N ⫻ N matrix in the HO basis set. FIGURE 2. LSCF test model: histidine ⫹ aspartic acid. 236 TABLE I ______________________________________ Influence of the parameters Pll and as1 on LSCF calculations. LSCF Parameters as1 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 Pll ⌬E (kcal/mol) 0.70 0.80 0.90 1.00 1.10 1.20 1.30 1.40 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 ⫺27.1 ⫺24.8 ⫺22.6 ⫺20.2 ⫺17.8 ⫺15.4 ⫺13.4 ⫺10.8 ⫺20.3 ⫺20.3 ⫺20.2 ⫺20.2 ⫺20.2 ⫺20.2 ⫺20.2 ⫺20.2 Full quantum calculations: ⌬E ⫽ ⫺ 19.0 kcal/mol. 9. Backtransform the density matrix in the atomic orbital set: P ⫽ TPⴕTT. 10. Go back to 3 unless convergence. 3.2.2. Influence of the Semiempirical LSCF Parameters In the semiempirical LSCF method, two parameters al1 and Pll are involved. Several tests have been made to evaluate the influence of these parameters on the quality of the LSCF results [47]. We show here some results on a test system composed of a histidine and an aspartic acid as represented in Figure 2. The starting geometry for this test system has been taken from crystallographic data of the catalytic site of trypsin. QM/MM frontier bonds are located at the C␣OC covalent bond of histidine and between the methyl groups and the side peptidic bonds of aspartic acid. Between histidine and aspartic acid, it is possible to transfer a proton. Table I represents the energetics associated with this proton transfer using the AM1 semiempirical level and its variation with the variation of the LSCF C␣OC bond parameters Pll or as1 of the histidine. Here, the two peptidic backbones have been kept fixed and the number of degrees of freedom is identical whatever the considered test calculation. VOL. 93, NO. 3 ENZYMATIC REACTION PATHWAYS TABLE II ______________________________________ Mülliken charges on histidine for two different values of as1. and the electronic population of the frozen orbitals Pll does not induce clear change in energies for low polarized bonds (0.98 ⬍ Pll ⬍ 1.02). Mülliken charges Atom C H1 H 2 C␥ N␦ H␦1 C⑀ H ⑀1 N⑀ H⑀2 C␦ H␦2 as1 ⫽ 0.30 as1 ⫽ 0.70 ⫺0.170 0.148 0.155 ⫺0.035 ⫺0.109 0.375 0.057 0.299 ⫺0.185 0.284 ⫺0.123 0.212 ⫺0.021 0.088 0.097 ⫺0.066 ⫺0.111 0.375 0.059 0.300 ⫺0.185 0.285 ⫺0.125 0.211 Table I shows the more the electronic population of the frozen hybrid orbital the less the difference in energy between the two possible chemical states (i.e., the proton either on the histidine or on the aspartic acid). This can be easily explained by the fact that while increasing the value of Pll at the C␣OC frontier bond the electronic density in the histidine side-chain increases. Thus, the imidazole ring becomes less proton donor. If one performs a localization of the molecular orbitals, one will find a value for the Pll parameter equal to 0.99. Likewise, in most covalent peptidic bonds the Pll parameter is always comprised between 0.95 and 1.05, which, according to Table I gives an incertitude of ⫾1.2 kcal/mol on the energetic barrier of our test system. This is a reliable result (⬃5% error) compared with the incertitude on AM1 semiempirical calculations as compared with full ab initio calculations. Variations of the as1 parameter in Table I show this parameter does not directly influence the energetics of the reaction pathway. In fact, its influence is localized to the few atoms closed to the frontier bond as shown in Table II. This phenomenon can be explained by the fact that by increasing the s character of the frozen hybrid orbital one “moves” the mean position of the frozen electron toward the quantum part and, thus, increases the interaction with it. This influence is local as it is not noticeable after a three covalent bond distance and then it does not perturb the reactivity of the system. Overall, these results associated with other tests performed in Rivail’s group show the influence of the as1 parameter is negligible along a reaction path 4. Using the Potential Energy Surface 4.1. DYNAMICS Coming from the Born–Oppenheimer approximation, the potential energy surface (PES) of an enzymatic reaction would provide the total energy of each nuclear configuration of the substrate– enzyme complex if all the nuclei were fixed at that position. Nuclei actually are moving and the nuclear kinetic energy has to be introduced to understand enzymatic reactivity. So, molecular dynamics simulations have to be carried out to sample extensively the configuration space, looking for new regions of the PES around minimum energy structures representing possible reactant and product complexes. However, both energetic and entropic factors make it impossible in practice for the molecular dynamics generation of reactive trajectories going from the reactant region to the product region in a canonical ensemble at a given temperature. This is even true for chemical reactions in gas phase involving just a half dozen of nuclei and with energy barriers of more than a few kcal/mol. Then canonical rate constants k(T) have to be calculated by means of the transition-state theory, a statistical approach to real dynamics. According to variational transition-state theory [48, 49], the canonical rate constant depends on the generalized free energy barrier, that is, the maximum value of the generalized free energies associated with a set of dividing surfaces built up along a suitable reaction pathway taken as a reference. The generalized free energies can be obtained, for instance, from molecular dynamics simulations using the umbrella sampling technique with an adequate biasing potential or by means of statistical perturbation theory. Once localized, the reactant and the product, a progress coordinate connecting them and based on suitable internal coordinates, can be adopted to define the reaction pathway. Another approach can be the following [4, 50, 51]: Knowing the valence bond structures of both reactant and product, a mapping potential as a function of the diagonal elements of an empirical valence bond Hamiltonian can be used to define a reaction pathway as a collective reaction coordinate analogous to the solvent coordinate used in Marcus theory for electron transfer reac- INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY 237 MONARD ET AL. tions. In all cases, nuclear quantum effects and corrections accounting for the recrossing of the dividing surface can be introduced in different ways. Although the above-described free energy approaches become fruitful, their practical implementation is in general based on a reference path constructed using the information extracted from the reactant and product. However, this procedure could lead to inaccurate results. Due to the complexity of the enzymatic reactions, the real reaction pathway (that one joining reactant and product through the set of stationary points involved in the mechanism) can be different from the apparent one at first glance. The enzymatic reaction can actually take place through several parallel and kinetically competitive channels, each consisting of multiple steps, involving several intermediates in going from the reactant to the product, then leading to a priori unexpected reaction paths. As a consequence, an exploration of the corresponding PES to locate the set of stationary points that connects with the reactant and the product by means of the real reaction pathway should be highly recommended prior to the free energy calculations. So, the set of dividing surfaces should be raised along a path close to that real reaction pathway. In what follows in this section we review different methods to explore a QM/MM PES looking for the stationary points. Once the stationary points have been located, their associated reaction pathway can be built up. 4.2. STATICS As noted above, whatever dynamic treatment should be preceded by a static study. Some problems arise in locating stationary points in systems where thousands of degrees of freedom must be taken into account. Not only the high computational effort required, which will be discussed later, but a more general question appears. In a flexible system, a lot of different stationary points and reaction paths connecting them will exist. Some of them will be chemically equivalent (only differing in noncrucial configurations of solvent or enzymatic environment), while others will be substantially different and their study will lead to different results. 4.2.1. Algorithms for Locating Minimum Energy Structures To identify the reactants, products, and possible intermediates, first a geometry optimization (mini- 238 mization or the gradient norm) has to be performed. An important aspect is the electron of the starting structure for such minimization. If we start from an enzyme structure obtained experimentally, usually from X-ray or NMR spectroscopy, we will fall in a minimum close to this experimental structure, but perhaps even hundreds of kcal/mol more energetic than the absolute minimum in our PES determined for our molecular model. On the contrary, if we perform this minimization after running molecular dynamics, using simulated annealing or other algorithms available, to relax the system, we will reach a structure more stable in our model, but perhaps geometrically far from that experimentally obtained. Both strategies have been used in the literature. However, we must keep in mind that the deepest minimum will not always be a representative structure of the reactants’ configuration. There are several procedures used for the optimization of molecular structures. Systems where molecular quantum chemistry is usually applied have no more than 100 atoms. In this case the most common algorithms are those using second derivatives or approximated second derivatives of the energy. The most popular methods are quasi-Newton–Raphson [52], rational function optimization (RFO) [53, 54], and direct inverted iterated space (DIIS) [55]. The simplest second derivative method is Newton–Raphson. In a system involving N degrees of freedom a quadratic Taylor expansion of the potential energy about the point qk is made, where the subscript k states for the step number along the optimization: 1 E共q k ⫹ ⌬qk兲 ⫽ E共qk兲 ⫹ gTk ⌬qk ⫹ ⌬qkTHk⌬qk. 2 (14) The vector ⌬qk ⫽ (qk⫹1 ⫺ qk) describes the displacement from the reference geometry qk to the desired new geometry qk⫹1, gk is the first derivative vector (gradient) at the point qk, and Hk is the second derivative matrix (Hessian) at the same geometry. Under the approximation of a purely quadratic PES, and imposing the condition of a stationary point gk ⫽ 0, we have the Newton–Raphson equation that predicts the displacement that has to be performed to reach the stationary point in just one step: ⌬q k ⫽ ⫺Hk⫺1gk. (15) VOL. 93, NO. 3 ENZYMATIC REACTION PATHWAYS Because the real PES are not quadratic, in practice an iterative process has to be done to reach the stationary point, and several steps will be required. In this case the Hessian should be calculated at every step, which is highly computationally demanding. A variation on the Newton–Raphson method is the family of quasi-Newton–Raphson methods, where an approximated Hessian matrix Bk (or its inverse) is gradually updated using the gradient and displacement vectors of the previous steps. While standard Newton–Raphson is based on the optimization on a quadratic model, by replacing this quadratic model by a rational function approximation we obtain the RFO method: 冉 冊冉 冊 冉 冊冉 冊 1 0 T 共1 ⌬qk 兲 g 2 k E共q k ⫹ ⌬qk兲 ⫺ E共qk兲 ⬵ 1 共1 ⌬qkT兲 0 gkT 1 ⌬q Bk k 0T 1 Sk ⌬qk . (16) The numerator in Eq. (16) is the quadratic model of Eq. (14). The matrix in this numerator is the socalled augmented Hessian (AH). Bk is the Hessian (analytic or approximated). The Sk matrix is a symmetrical matrix that has to be specified but normally is taken as the unit matrix I. The solution of the RFO equation is obtained diagonalizing the AH matrix, that is, solving the (N ⫹ 1)-dimensional eigenvalue Eq. (17) 冉 冊 0 g kT 共k兲 共k兲 共k兲 gk Bk v ⫽ v , ᭙ ⫽ 1, . . . , N ⫹ 1 (17) and then the displacement vector ⌬qk for the kth step is evaluated as ⌬q k ⫽ 1 vⴕ共k兲, v1,共k兲 (18) where 共k兲 共vⴕ 共k兲兲T ⫽ 共v2,共k兲, . . . , vN⫹1, 兲. (19) In Eq. (19), if one is interested in locating a minimum then ⫽ 1 and for a transition structure ⫽ 2. As the optimization process converges, v(k) 1, tends to 1 and (k) to 0. For quasi-Newton–Raphson and RFO methods, at every step the approximated Hessian matrix is updated from the information of previous steps: 冘 关j u ⫹ u j ⫺ 共j ⌬q 兲u u 兴 k B k⫹1 ⫽ B0 ⫹ i T i T i i T i i i T i i⫽0 k ⫽ 0, 1, . . . , (20) where ji ⫽ Di ⫺ Ai, Di ⫽ gi⫹1 ⫺ gi, Ai ⫽ Bi⌬qi, and ui ⫽ Mi⌬qi/(⌬qTi Mi⌬qi). Different election of the Mi matrix leads to different update Hessian matrix formulae. In particular, for the BFGS update Mi ⫽ aiBi⫹1 ⫹ biBi for some selected positive definite scalars ai and bi. For the Powell update case the matrix Mi is equal to unit matrix I. These methods are effective and can reach a stationary point in few steps. This is an important issue when energy evaluation is expensive, as in systems described by QM PES. On the other hand, systems treated under a molecular mechanics potential usually have thousands of atoms, but the energy and gradient evaluation is still computationally cheaper than in quantum potentials. So, the limiting aspect is the storage and manipulation of a Hessian of thousands of moving atoms. This is the main reason why minimizations are carried out with other algorithms that do not require the evaluation of a Hessian matrix although they are less effective than quasi-Newton–Raphson or RFO methods. Steepest descent or conjugate gradientlike algorithms are examples of those methods that only need the storage of the position and gradient vectors and that have low computer memory requirements. In enzymatic systems treated with QM/MM potentials thousands of atoms are moved and energy evaluation for each nuclear configuration is CPU time demanding. So, as mentioned above, due to the size of the system we cannot use a standard quasi-Newton–Raphson method due to the impossibility of manipulating a high-dimensional Hessian matrix. This usually implies an O(N3) diagonalization and an O(N2) storage. In addition, we cannot use a conjugate gradient procedure because it needs too many optimization steps (i.e., energy and gradient evaluations) to converge. Then, we need a method efficient enough to reach convergence in few optimization steps but avoiding the usage of a big amount of computer memory. Different kind of methods, which will be explained here on, are used to solve this problem: INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY 239 MONARD ET AL. Limited memory, adopted basis Newton–Raphson (ABNR), truncated Newton–Raphson, and coupled methods. All of them are based on the Newton– Raphson equation and have in common to avoid a full Hessian manipulation. 4.2.1.1. Limited Memory. A first solution is the so-called limited memory methodology. In this case a quasi-Newton–Raphson algorithm is used. However, the inverse of the Hessian matrix is never built up, but directly the product of the inverse of the Hessian by the gradient, and then no Hessian diagonalization is required. What makes this method powerful is that to update this matrix product only information of last m steps is used. In this way only the geometry and gradient of these last steps have to be stored. When a BFGS [56] update formula is used this procedure is called L-BFGS [57]. Unit matrix can be used as an initial Hessian for the minima search [58]. This useful method for minima, as will be explained later, cannot be applied to transition-state search. 4.2.1.2. ABNR. In the limited memory method described above, although the inverse of the Hessian matrix is never constructed information of the second derivatives of the whole system is used. An alternative solution consists of constructing the Hessian matrix or its inverse only corresponding to a reduced basis set of the whole space. This method is the so-called ABNR [22]. This procedure still avoids the diagonalization and storage of the full Hessian. In this case Newton–Raphson equations are solved in a subspace while conjugate gradient is applied to the rest of the directions. 4.2.1.3. Truncated Newton–Raphson Methods. The Newton–Raphson equation [Eq. (15)] can be rewritten as H k⌬qk ⫽ ⫺gk. (21) The truncated Newton–Raphson method [59, 60] finds iteratively an approximation to the solution ⌬qk in Eq. (21) using the preconditioned conjugate gradients method. 4.2.1.4. Coupled Methods. Despite the fact these methods are usually used in transition-state search, they have also been applied for locating minimum energy structures. It tries to take advantage of ap- 240 plying the different standard methods in the different zones, that is, it uses a quasi-Newton–Raphson or RFO scheme for the small core of the enzyme. This core will include most of the quantum atoms, whereas a steepest-descent, conjugate gradient-like or any of the last more efficient methods is applied to the rest of environment atoms treated mostly with molecular mechanics. This last method will be explained in detail in Section 4.2.2 for transitionstate search. 4.2.2. Algorithms for Locating Transition States As we said before, location of the transition-state (TS) structure through which the system evolves from reactants to products on a PES is essential to understand the reaction dynamics. Minimization algorithms have been widely studied due to its broad utility in macromolecular chemistry (e.g., preparation of structure for molecular dynamics, docking, harmonic analysis, comparing and fitting force fields . . .). On the other hand, because TSs is related only to chemical reactions their search in high-dimension systems has not been studied until adequate potentials such as QM/MM have been available. In this section we will describe some of the methods to find these structures. 4.2.2.1. Reaction Coordinate Method. The most intuitive strategy to find a TS structure is to identify an internal coordinate (bond distance, angle, dihedral) as a reaction coordinate and then perform several restrained energy minimizations at different values of this coordinate kept frozen. At every restrained minimization this coordinate is modified, going from reactant to product, to have a discontinuous representation of the supposed reaction path. The way this coordinate is fixed is usually applying a harmonic potential with a force constant big enough to keep the atoms involved in this internal coordinate unmoved: V total ⫽ Vsystem ⫹ k共xa,fix ⫺ xa兲2. (22) xa,fix is the intended fixed value at each restrained minimization and xa is the current value of the reaction coordinate along the simulation. When there are more than one internal coordinate identified as the reaction coordinate we must modify all of them in our discrete energy profile. In this case the restraining harmonic potential is made from all VOL. 93, NO. 3 ENZYMATIC REACTION PATHWAYS of these distinguished coordinates (restrained distances: RESD [61]): which constructs a set of conjugate directions (s1, . . . , sj), starting with the direction s0: V total ⫽ Vsystem ⫹ k共共xa,fix ⫺ xa兲 ⫹ 共xb,fix ⫺ xb兲 ⫹ . . .兲2. s 0: given (23) The RESD method is also useful when we want to discriminate between a concerted or stepwise mechanism, where, in this case, a and b are those coordinates governing the two reaction steps. When there are two coupled internal coordinates, for instance, the breaking and forming bond in a proton transfer reaction, there are several possibilities: Only the acceptor atom-transferring atom distance or the donor atom-transferring atom distance is chosen to define the reaction coordinate, simultaneously both of them or the difference between the two distances. In this last case V total ⫽ Vsystem ⫹ k共difffix ⫺ diff兲2, (24) where diff ⫽ rdonor⫺H ⫺ racceptor⫺H. This last option is the best because only one degree of freedom is kept unmoved but it contemplates both distances variation. The point of maximum potential energy along the reaction coordinate can be taken as a first approach to the TS structure. However, it is not always so easy or intuitive to identify an internal coordinate as the reaction coordinate. If the coordinate is not appropriate we cannot be sure of visiting the saddle point region. Anyway, even when a coordinate seems to be intuitive, it should be checked if the Hessian matrix has a unique negative eigenvalue that will be associated with the transition eigenvector. 4.2.2.2. Conjugate Peak Refinement. There are some more sophisticated algorithms that still avoid any computation of second derivative. Conjugate peak refinement (CPR) [62] has been applied to the search of TSs in enzymatic reactivity. To converge to a saddle point from a distance at which the energy can be approximated by a quadratic expansion around that saddle point, it is necessary to obtain a set of conjugate vectors with respect to the Hessian matrix. Once a direction along which the energy has a local maximum is found, this direction is called s0. For instance, s0 can be the vector that connects reactants and products. The rest of the conjugate basis set is then built recursively, making use of a recurrence formula, s 1 ⫽ ⫺g1 ⫹ g1Th s, s0Th 0 (25) s j ⫽ ⫺gj ⫹ gTj h 兩gj兩2 s ⫹ s , j ⬎ 1, s0Th 0 兩gj⫺1兩2 j⫺1 (26) where gj is the gradient vector at the energy extremum yj along sj⫺1 and h is an estimate of Hs0. One cycle of maximizing the energy along s0 and minimizing along successive sj ( j ⬎ 0) yields the saddle point on a quadratic energy surface. Because the real PESs are not quadratic, in general several such maximization/minimization cycles have to be performed to locate the saddle point. These iterative maximization and linear conjugate minimizations will lead to an approximation of the reaction path whose maximum will be an approximation to the saddle point structure. This process stops when the gradient norm in the saddle point structure is under a given convergence criteria. Nonetheless, we must insist on the fact that an analysis of the number of negative eigenvalues of the Hessian in this point is necessary to be sure that we have found a true saddle point. In big molecular systems with N degrees of freedom, these conjugate line minimizations cannot be done for all the N ⫺ 1 conjugate directions. These will be interrupted at direction j as soon as the quantity j is greater than a given tolerance . j ⬅ N 1/2 g Tj s0 ⬎ 兩gj储s0兩 共 j ⬎ 1兲. (27) 4.2.2.3. Methods That Require the Hessian Matrix. Second derivatives: Only moving a core. The easiest approximation is to keep frozen most of atoms of the system and move only a small part containing those atoms that participate directly in the reaction (the core), keeping frozen the rest of the system (the environment). thus number f moving atoms has to be small enough to be able to store and manipulate their corresponding Hessian matrix. Any of the standard methods to search for TSs in gas phase can then be used (e.g., Newton–Raphson, RFO). Second derivatives: Moving a core and an environment separately. An enzymatic system needs to be INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY 241 MONARD ET AL. FIGURE 3. Coupled optimization scheme. represented by hundreds or even thousands of moving atoms, where a residue can interact with many others residues. This implies that a movement of an atom, group, or side-chain provokes in turn a coupled movement of the interacting atoms. This means that moving only a small core region leaves the environment unrelaxed. This has forced computational chemists to develop methods able to search for a TS moving the whole of the system. As mentioned for the case of minima, coupled methods have been used to locate enzyme reaction intermediates, but these methods are also especially useful for TSs. In the core region a TS search is performed with a standard second derivative method (e.g., Newton–Raphson, RFO). The environment region is relaxed, minimizing it with a method able to minimize a big amount of atoms (e.g., conjugate gradient, L-BFGS). The two procedures will be repeated iteratively until self-consistency, that is, until the gradient norm of both regions is lower than a convergence criteria (see Fig. 3). In this case we will find a stationary point with one small region (core) with a first-order saddle point describing our reaction. Several research groups used this procedure and applied it to enzymatic reactivity. All of them when minimizing the environment avoid any QM/MM energy evaluation. During the minimization the atoms of the fixed core region are substituted by electrostatic potential (ESP) fitted charges (recalculated at the beginning of the minimization), and then only MM energy is necessary. This approximation forces the environment to contain only MM atoms. However, this is not restrictive because up to today the number of QM atoms is usually smaller than a treatable core region. 242 On the other hand, while in Refs. [63– 65] the environment is relaxed at every step of the TS search in the core, in Refs. [66, 67] the environment is not relaxed until the TS search has converged. No comparison has been done yet between these two procedures. Problems can arise when the two regions are highly coupled, that is, for example, when the core region is already converged, and a small displacement in the environment region makes the core nonconverged. Of course, this coupling will depend on the convergence criteria and the election of these two zones, but we can still make a forward improvement. Second derivatives: Moving the whole system at the same time. The next logical step in the TS search would be to move all the system at the same time. In this way we would avoid this last problem of coupling between the two zones. As in minima, a limited-memory procedure seems to be adequate, so that, like in L-BFGS scheme for minima, the Hessian matrix is neither stored nor diagonalized and only the last m steps information is used for the update. Unfortunately, the usual L-BFGS update cannot be used for TS search. A Powell update type is more convenient because BFGS preserves a positive definite matrix. The RFO technique should be used rather than Newton–Raphson because the former only needs one eigenvector to predict the next step while the latter will need a whole Hessian diagonalization to invert it. On the other hand, in minimization problems a unit matrix is not a bad initial Hessian matrix because we only need to decrease the gradient norm and a bad initial step can be improved with a line search or other available methods [52]. This is not the case in TS search, for which we need information of the PES to know where the TS is. This makes us calculate an approximated initial second derivative matrix. A first option would be to build up a high-dimensional square Hessian. This would imply its storage and can be problematic unless a lot of computer memory is available. Another useful solution is to set the matrix shown in Figure 4 as initial Hessian: A squared Hessian is used for few atoms of a core and only a vector describes the rest of the atoms in the environment [58, 68]. After these two problems are pointed out the optimal procedure could be applied to enzymatic systems. The procedure can be outlined as follows. An approximated initial Hessian is built up (see Fig. 4). Solving RFO equations implies obtaining one eigenvector of a large-scale matrix. An iterative VOL. 93, NO. 3 ENZYMATIC REACTION PATHWAYS FIGURE 4. Approximated initial Hessian for large systems. method, which requires only a matrix–vector product, must be used avoiding a prohibitive full diagonalization of a big matrix. We need neither the lowest root nor the highest one but the one whose eigenvalue tends to zero [53, 54]. An algorithm developed by Bofill et al. [69, 70] permits us to extract the correct eigenvalue– eigenvector pair and propose a displacement [71] of the geometry. The update of the Hessian (in fact, the product of the Hessian by a vector) must be done at every optimization step [68] but, in this case, only keeping the position, gradient, and displacement information of a limited m steps paring the different approaches but some questions remain opened: Do we need to parameterize specific van der Waals potentials for proteins to compute the QM/MM van der Waals interactions? What is the influence of a protein force field charge set on an enzymatic reaction pathway? The answers to these questions should lead us toward the improvement of QM/MM PES description, thus enabling QM/MM methods (in conjugation with the use of statistical sampling) to give quantitative results in the exploration of enzymatic reactivity [6, 72]. In the meantime, with the always increasing size of the molecular systems studied in the literature, a special effort has been devoted to the improvement of optimization algorithms to locate efficiently both minima and TS structures. These last efforts will also be of great help to linear scaling methodologies, which could become a serious “competitor” to QM/MM methodologies because they avoid the problems of the QM/MM interactions as mentioned above. However, whether linear scaling methods and full quantum calculations on enzymatic systems will one day replace QM/MM methods is something difficult to foretell at the time we wrote this article. 冘 关j u v ⫹ u 共j v ⫺ 共j ⌬q 兲u v兲兴. k B k⫹1v ⫽ B0v ⫹ i T i i T i T i i T i References i⫽k⫺m (28) In this way, we can find TSs moving a system of thousands of atoms. During the optimization process a large-scale matrix is never stored and full diagonalization is avoided. Note that this last method is also convenient for minima case. 1. Voet, D.; Voet, J. G. In: Biochemistry; Wiley & Sons: New York, 1995. 2. Dive, G.; Dehareng, D.; Peeters, D. Int J Quantum Chem 1996, 58, 85. 3. Mulholland, A. J.; Richards, W. G. J Phys Chem B 1998, 102, 6635. 4. Warshel, A. In: Computer Modeling of Chemical Reactions in Enzymes and Solutions; Wiley & Sons, New York, 1992. 5. Aqvist, J.; Warshel, A. Chem Rev 1993, 93, 2523. 5. Conclusions and Perspectives 6. Bentzien, J.; Muller, R. P.; Florian, J.; Warshel, A. J Phys Chem B 1998, 102, 2293. 7. Goedecker, S. Rev Mod Phys 1999, 71, 1085. During the last decade, the combined QM/MM method has emerged as a powerful tool to simulate enzymatic reactivity. A lot of work has been devoted to find solutions to the two main problems of the QM/MM approach for macromolecular systems: The problem of the nonbonded interactions between the quantum and the classic part and the problem of cutting covalent bonds. Several solutions have been proposed and each of them have their advantages and disadvantages. Recently, we have seen in the literature studies devoted to com- 8. van der Vaart, A.; Gogonea, V.; Dixon, S. L.; Kenneth, M.; Merz, J. J Comput Chem 2000, 21, 1494. 9. Stewart, J. J. P. Int J Quantum Chem 1996, 58, 133. 10. van der Vaart, A.; Suárez, D.; Kenneth, M.; Merz, J. J Chem Phys 2000, 113, 10512. 11. Daniels, A. D.; Scuseria, G. E. J Chem Phys 1999, 110, 1321. 12. Warshel, A.; Levitt, M. J Mol Biol 1976, 103, 227. 13. Singh, U.; Kollman, P. J Comput Chem 1986, 7, 718. 14. Field, M.; Bash, P.; Karplus, M. J Comput Chem 1990, 11, 700. 15. Monard, G.; Merz, K. M. Acc Chem Res 1999, 32, 904. 16. Gao, J. In: Lipkowitz, K. B.; Boyd, D. B., eds. Reviews in INTERNATIONAL JOURNAL OF QUANTUM CHEMISTRY 243 MONARD ET AL. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. Computational Chemistry, Vol. 7; VCH Publishers: New York, 1996; p 119 –185. Mordasini, T.; Thiel, W. Chimia 1998, 58, 288. Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Kenneth, M.; Merz, J.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. J Am Chem Soc 1995, 117, 5179. Stewart, J. J. P. J Comput Aided Design 1990, 4, 1–105. Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Zakrzewski, V. G.; Montgomery, J. A., Jr.; Stratmann, R. E.; Burant, J. C.; Dapprich, S.; Millam, J. M.; Daniels, A. D.; Kudin, K. N.; Strain, M. C.; Farkas, O.; Tomasi, J.; Barone, V.; Cossi, M.; Cammi, R.; Mennucci, B.; Pomelli, C.; Adamo, C.; Clifford, S.; Ochterski, J.; Petersson, G. A.; Ayala, P. Y.; Cui, Q.; Morokuma, K.; Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; Foresman, J. B.; Cioslowski, J.; Ortiz, J. V.; Baboul, A. G.; Stefanov, B. B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Gomperts, R.; Martin, R. L.; Fox, D. J.; Keith, T.; Al-Laham, M. A.; Peng, C. Y.; Nanayakkara, A.; Gonzalez, C.; Challacombe, M.; Gill, P. M. W.; Johnson, B.; Chen, W.; Wong, M. W; Andres, J. L.; Gonzalez, C.; Head-Gordon, M.; Replogle, E. S.; Pople, J. A. Gaussian 98, Revision A.7; Gaussian, Inc.: Pittsburgh, PA, 1999. Schmidt, M. W.; Baldridge, K. K.; Boatz, J. A.; Elbert, S. T.; Gordon, M. S.; Jensen, J. J.; Koseki, S.; Matsunaga, N.; Nguyen, K. A.; Su, S.; Windus, T. L.; Dupuis, M.; Montgomery, J. A. J Comput Chem 1993, 14, 1347. Brooks, B. R.; Bruccoleri, R. E.; Olafson, B. D.; States, D. J.; Swaminathan, S.; Karplus, M. J Comput Chem 1983, 4, 187. van Gunsteren, W. F.; Billeter, S. R.; Eising, A. A.; Hünenberger, P. H.; Krüger, P. K.; Mark, A. E.; Scott, W. R.; Tironi, I. G. GROMOS; Hochschuleverlag AG, Zurich 1996. Bakowies, D.; Thiel, W. J Phys Chem 1996, 100, 10580. Bakowies, D.; Thiel, W. J Comput Chem 1996, 17, 87. Warshel, A.; Chu, Z. T. J Phys Chem B 2001, 105, 9857. Chipot, C.; Millot, C.; Maigret, B.; Kollman, P. A. J Phys Chem 1994, 98, 11362. Luque, F. J.; Reuter, N.; Cartier, A.; Ruiz-López, M. F. J Phys Chem A 2000, 104, 10923. Cummins, P. L.; Gready, J. L. J Comput Chem 1999, 20, 1028. Ranganathan, S.; Gready, J. E. J Phys Chem B 2997, 101, 5614. Mulholland, A.; Richards, W. Proteins 1997, 27, 9. Antonczak, S.; Monard, G.; Ruiz-López, M.; Rivail, J.-L. J Am Chem Soc 1998, 120, 8825. Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. J Am Chem Soc 1985, 107, 3902. Stewart, J. J. P. J Comput Chem 1989, 10, 209. Antes, I.; Thiel, W. J Phys Chem A 1999, 103, 9290. Théry, V.; Rinaldi, D.; Rivail, J.-L.; Maigret, B.; Frenczy, B. J Comput Chem 1994, 15, 269. Monard, G.; Loos, M.; Théry, V.; Baka, K.; Rivail, J.-L. Int J Quantum Chem 1996, 58, 153. Assfeld, X.; Rivail, J.-L. Chem Phys Lett 1996, 263, 100. Gao, J.; Amara, P.; Alhambra, C.; Field, M. J. J Phys Chem A 1998, 102, 4714. HyperChem Users Manual; HyperCube, Inc.: Waterloo, Ontario, Canada, 2002. Reuter, N. Ph.D. thesis; Université Henri Poincaré Nancy I; Vandoeuvre-lès-Nancy, France, 1999. 244 42. Antes, I. Ph.D. thesis; Mathematischnaturwissenschaftlichen Fakultät, Universitä Zürich: Zürich, 1998. 43. Reuter, N.; Dejaegere, A.; Maigret, B.; Karplus, M. J Phys Chem A 2000, 104, 1720. 44. Zhang, Y. K.; Lee, T. S.; Yang, W. T. J Chem Phys 1999, 110, 46. 45. Ferré, N. Ph.D. thesis; Université Henri Poincaré Nancy I; Vandoeuvre-lès-Nancy, France, 2001. 46. Amara, P.; Field, M. J.; Alhambra, C.; Gao, J. Theor Chem Acc 2000, 336. 47. Monard, G. Ph.D. thesis; Université Henri Poincaré Nancy I: Vandoeuvre-lès-Nancy, France, 1998. 48. Alhambra, C.; Corchado, J.; Sánchez, M. L.; Garcia-Viloca, M.; Gao, J.; Truhlar, D. G. J Phys Chem B 2001, 105, 11326. 49. Truhlar, D. G.; Gao, J.; Alhambra, C.; Garcia-Viloca, M.; Corchado, J.; Sánchez, M. L.; Villá, J. Acc Chem Res 2002, 35, 341–349. 50. Hwang, J. K.; King, G.; Warshel, A. J Am Chem Soc 1988, 110, 5297. 51. Billeter, S. R.; Webb, S. P.; Agarwal, P. K.; Iordanov, T.; Hammes-Schiffer, S. J Am Chem Soc 2001, 123, 11262. 52. Schlegel, B. Adv Chem Phys 1987, 65, 249. 53. Simons, J.; Jorgensen, P.; Taylor, H.; Ozment, J. J Comput Chem 1983, 87, 2745. 54. Banerjee, A.; Adams, N.; Simons, J.; Shepard, R. J Comput Chem 1985, 89, 52. 55. Császár, P.; Pulay, P. J Mol Struct 1984, 114, 31. 56. Fletcher, R. Practical Methods of Optimization, 2nd Ed. John Wiley & Sons: New York, 1987. 57. Liu, D. C.; Nocedal, J. Math Program 1989, 45, 503–528. 58. Prat-Resina, X.; Garcia-Viloca, M.; Monard, G.; GonzálezLafont, A.; Lluch, J. M.; Anglada, J. M.; Bofill, J. M. Theor Chem Acc 2002, 107, 147. 59. Schlick, T.; Overton, M. J Comput Chem 1987, 8, 1025. 60. Derremaux, P.; Zhang, G.; Schlick, T.; Brooks, B. J Comput Chem 1994, 15, 532. 61. Eurenius, K. P.; Chatfield, D. C.; Brooks, B. R.; Hodoscek, M. Int J Quantum Chem 1996, 60, 1189. 62. Fischer, S.; Karplus, M. Chem Phys Lett 1992, 194, 252. 63. Turner, A. J.; Moliner, V.; Williams, I. H. Phys Chem Chem Phys 1999, 1, 1323. 64. Billeter, S. R.; Turner, A. J.; Thiel, W. Phys Chem Chem Phys 2000, 2, 2177. 65. Hall, R. J.; Hindle, S. A.; Burton, N. A.; Hillier, I. H. J Comput Chem 2000, 21, 1433. 66. Zhang, Y.; Liu, H.; Yang, W. J Chem Phys 2000, 112, 3483. 67. Prat-Resina, X.; Garcia-Viloca, M.; Monard, G.; GonzálezLafont, A.; Lluch, J. M.; Anglada, J. M.; Bofill, J. M. Unpublished results. 68. Anglada, J. M.; Besalú, E.; Bofill, J. M.; Rubio, J. J Math Chem 1999, 25, 85. 69. Anglada, J. M.; Besalú, E.; Bofill, J. M. Theor Chem Acc 1999, 103, 163. 70. Bofill, J. M.; de Pinho Ribeiro Moreira, I.; Anglada, J. M.; Illas, F. J Comput Chem 2000, 21, 1375. 71. Besalú, E.; Bofill, J. M. Theor Chem Acc 1998, 100, 265. 72. Li, H.; Hains, A. W.; Everts, J. E.; Robertson, A. D.; Jensen, J. H. J Phys Chem B 2002, 106, 3486. VOL. 93, NO. 3