Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Milano Chemometrics and QSAR Research Group Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria decision making environmetrics experimental design artificial neural networks statistical process control Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, 1 - 20126 Milano (Italy) Website: michem.unimib.it/chm/ Roberto Todeschini Milano Chemometrics and QSAR Research Group Molecular descriptors Autocorrelations, eigenvalue-based and information indices Iran - February 2009 Contents Autocorrelation descriptors Molecule representation by matrices Eigenvalue-based descriptors Information content Information indices Autocorrelation on a molecular graph w is the vector collecting the weights of each atom - quadratic molecular property A P w I w w i2 T i 1 1 = (1,A) (A,A) (A,1) - quadratic molecular property with interaction terms 2 A A 1 ATS w T U w w i w i2 2 j i w i w j i 1 i 1 i 1 A Autocorrelation on a molecular graph Moreau - Broto autocorrelation of a topological structure 1984 A 1 ATSk j i w i w j k; dij k 0 i 1 1 k; dij 0 if dij k if dij k LAG A ATS0 w i2 i 1 A A 1 d i 1 k 1 ATS w 2 j i w i w j ATS0 2 ATSk i 1 2 i Autocorrelation on a molecular graph Example : 4-hydroxy-2-butanone ATS0 w w w w w w 2 1 2 2 2 3 2 4 2 5 6 O 2 6 ATS1 w1 w2 w2 w3 w3 w 4 w 4 w5 w2 w6 ATS2 w1 w3 w2 w 4 w3 w5 w3 w6 w1 w6 ATS3 w1 w 4 w2 w5 w 4 w6 ATS0 122 122 122 122 162 162 1088 C 2 C 1 w i mi C 3 C 4 O 5 atomic masses ATS 0 1088 / 6 1813 . ATS1 12 12 12 12 12 12 12 16 12 16 816 ATS 1 816 / 5 163.2 ATS2 12 12 12 12 12 16 12 16 12 16 864 ATS 2 864 / 5 172.8 ATS3 12 12 12 16 12 16 528 ATS 3 528/ 3 176 Eigenvalue-based descriptors Eigenvalue descriptors are derived from the diagonalization of symmetric matrices derived from a molecular graph, such as: Adjacency matrix Vertex distance matrix Edge adjacency matrix Edge distance matrix Detour matrix Geometrical distance matrix Covariance matrix ... and any weighted symmetric matrix Eigenvalue-based descriptors Lovasz - Pelikan index (or leading eigenvalue) 1973 The largest eigenvalue derived from the adjacency matrix LP 1 Eigenvalue-based descriptors General functions of eigenvalues n SpSumk M, w i k i 1 n n SpSumk M, w i 1 k i n SpAD M, w i SpMAD M, w i / n MinSp M, w min i i MaxSp M, w max i i MaxSpA M, w max i i SpDiam M, w MaxSp - MinSp i 1 i 1 n SpSumk M, w i i 1 k Eigenvalue-based descriptors The trace of the adjacency matrix (and of the distance matrix) is equal to zero. trace( A) 0 j j Eigenvalue-based descriptors VAA indices (from adjacency matrix) Balaban et al., 1991 VAA1 j VAA2 A b g A VAA3 log VAA1 10 j Eigenvector-based descriptors VEA indices (from adjacency matrix) Balaban et al., 1991 A VEA1 iA i 1 VEA1 VEA2 A b g A VEA3 log VEA1 10 where A is largest negative eigenvalue derived from the adjacency matrix Eigenvalue-based descriptors VAD, VED and VRD indices (from distance matrix) Balaban et al., 1991 The same indices defined above are calculated on the topological distance matrix Molecular geometry The geometry matrix G (or geometric distance matrix) is a square symmetric matrix whose entry rst is the geometric distance calculated as the Euclidean distance between the atoms s and t: G 0 r12 r1 A r21 0 r2 A rA1 rA 2 0 Distance / distance matrix Distance / distance matrix (DD) Randic et al., 1994 DD ij G ij D ij G ij : geometry matrix D ij : distance matrix rij dij Eigenvalue-based descriptors Folding degree index Randic et al., 1994 The largest eigenvalue derived from the distance/distance matrix A DD 1 This quantity tends to 1 for linear molecules (of infinite length) and decreases in correspondence with the folding of the molecule. Conventional bond order single bond: * = 1 double bond: * = 2 triple bond: * = 3 conjugated bond: * = 1.5 Eigenvalue-based descriptors BCUT descriptors Burden - CAS - University of Texas eigenvalues 1997 The largest absolute eigenvalues 1, 2, 3, ..., L, derived from the following B matrix: Bii wi w atomic properties Bij R | S |T0 * ij i , j bonded otherwise * conventional bond order Topological information indices Indices based on the information content and entropy measures derived from the molecular graphs. Information content The information content of a system having n elements is a measure of the degree of diversity of the elements in the set. IC G n g log2 ng g 1 where G is the number of different equivalence classes and ng is the number of elements in the g-th class and n G n g 1 g Information content IMAX n log2 n Maximum information content Total information content G IT n log2 n ng log2 ng g 1 Information content The Shannon entropy of a system having n elements is the mean information content of a set of elements G H pg log2 pg g 1 where G is the number of different equivalence classes and pg is the probability of the g-th class and pg ng n G pg 1 g 1 Information content HMAX log2 n Maximum entropy Standardized entropy G H * pg log2 pg g 1 0 H 1 * log2 n Information content Me Me ... on atoms Br F IMAX = 9 log2 9 = 28.529 F HMAX = log2 9 = 3.170 n=9 C=7 F=2 F n = 9 C = 7 F = 1 Br = 1 IC = 7 log2 7 + 2 log2 2 = IC = 7 log2 7 + 2 (1 log2 1) = 19.651 + 2.000 = 21.651 19.651 + 0 = 19.651 IT = 28.529 – 21.651 = 6.878 IT = 28.529 – 19.651 = 8.878 H = -(7/9) log2 (7/9) + -(2/9) log2 (2/9) H = -(7/9) log2 (7/9) - 2 (1/9) log2 (1/9) = 0.282 + 0.482 = 0.764 = 0.282 + 2 x 0.352 = 0.986 H* = 0.764 / 3.170 = 0.241 H* = 0.986 / 3.170 = 0.311 Information content 1 ... on vertex degrees Me 1 2 3 n = 9 V1 = 3 V2 = 3 F 3 H = 3*[-(3/9) log2 (3/9)] = xxx 3 ... on vertex degree magnitudes 2 2 SV1 = 3 SV2 = 6 V3 = 3 1 F n = 18 V1 = 3 V2 = 6 V3 = 9 SV3 = 9 H = -(3/18) log2 (3/18) - (6/18) log2 (6/18) -(9/18) log2 (9/18) = xxxx Milano Chemometrics and QSAR Research Group Roberto Todeschini Viviana Consonni Manuela Pavan Andrea Mauri Davide Ballabio Alberto Manganaro chemometrics molecular descriptors QSAR multicriteria decision making environmetrics experimental design artificial neural networks statistical process control Department of Environmental Sciences University of Milano - Bicocca P.za della Scienza, 1 - 20126 Milano (Italy) Website: michem.disat.unimib.it/chm/ THANK YOU X X X X X Roberto Todeschini Milano Chemometrics and QSAR Research Group Molecular descriptors Autocorrelations, eigenvalue-based and information indices Prof. Roberto Todeschini Dr. Davide Ballabio Dr. Viviana Consonni Dr. Alberto Manganaro Dr. Andrea Mauri X X X Autocorrelation ona molecular graph