Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Supplementary material Improved model for fullerene C60 solubility in organic solvents based on quantum-chemical and topological descriptors. Tetyana Petrova1, Bakhtiyor F Rasulev1,*, Andrey A Toropov3, Danuta Leszczynska2 and Jerzy Leszczynski1 1 Interdisciplinary Center for Nanotoxicity, Department of Chemistry and Biochemistry, Jackson State University, 1400 J. R. Lynch Street, P. O. Box 17910, Jackson, MS 39217, USA 2 Department of Civil and Environmental Engineering, Jackson State University, 1400 J. R. Lynch Street, P. O. Box 17910, Jackson, MS 39217, USA 3 Istituto di Ricerche Farmacologiche Mario Negri, 20156, Via La Masa 19, Milano, Italy *) Corresponding author: Tel: 601-979-4114, fax: 601-979-7823 E-mail address: [email protected] Table S1. List of descriptors involved in the models 1-5. N Solvent name 1 2 3 pentane hexane octane 4 5 6 isooctane decane dodecane 7 8 9 10 11 12 13 14 15 cis-decahydronaphthalene transdecahydronaphthalene cyclopentyl bromide cyclohexyl chloride cyclohexyl bromide cyclohexyl iodide CAS No. X1sol J3D 109-66-0 2.414 2.914 3.914 3.770 5.467 5.753 6.151 6.582 493-01-6 4.914 5.914 4.966 6.404 6.577 3.173 493-02-7 4.966 3.154 137-43-9 3.471 3.683 3.971 4.260 4.959 3.207 3.541 3.524 3.508 3.452 3.000 3.394 3.805 3.242 3.904 4.199 2.121 3.000 2.828 3.464 2.500 2.475 2.121 2.475 3.797 2.243 1.783 2.126 1.849 2.417 2.183 3.537 3.485 3.029 Training set -0.308 0.420 -0.302 0.414 -0.293 0.406 -0.294 0.404 -0.288 0.400 -0.284 0.397 -0.274 0.388 -0.272 0.384 -0.264 0.250 -0.281 0.290 -0.262 0.248 -0.246 0.210 -0.264 0.229 -0.235 0.261 -0.286 0.403 -0.282 0.395 -0.316 0.280 -0.337 0.229 -0.289 0.225 -0.292 0.190 -0.258 0.216 -0.297 0.244 -0.271 0.254 -0.253 0.215 -0.326 0.261 3.382 3.215 4.403 4.324 4.395 4.356 4.319 4.244 4.226 4.175 4.079 3.976 4.072 -0.300 -0.316 -0.289 -0.251 -0.286 -0.266 -0.250 -0.296 -0.279 -0.272 -0.239 -0.264 -0.288 5.247 -0.287 110-54-3 111-65-9 2663564-3 124-18-5 112-40-3 542-18-7 108-85-0 626-62-0 5401-627 110-83-8 16 17 18 19 20 21 22 23 24 1,2-dibromocyclohexane cyclohexene methylcyclohexane trans-1,2dimethylcyclohexane dichloromethane carbon tetrachloride dibromomethane bromoform iodomethane bromochloromethane bromoethane iodoethane 25 26 27 28 29 30 31 32 33 34 35 36 37 1,1,2,2-tetrachloroethane 1,2-dichloroethane 1,1,1-trichloroethane 1-chloropropane 1-iodopropane 2-chloropropane 2-bromopropane 2-iodopropane 1,2-dichloropropane 1,3-dichloropropane 1,2-dibromopropane 1,3-diiodopropane 1,2,3-tribromopropane 79-34-5 38 1,2,3-trichloropropane 96-18-4 2.621 2.750 2.268 2.975 2.021 2.309 2.598 2.912 3.121 3.555 4.536 4.800 3.804 513-36-0 2.624 39 1-chloro-2-methylpropane 108-87-2 6876-239 75-09-2 56-23-5 74-95-3 75-25-2 74-88-4 74-97-5 74-96-4 75-03-6 107-06-2 71-55-6 540-54-5 107-08-4 75-29-6 75-26-3 75-30-9 78-87-5 142-28-9 78-75-1 627-31-6 96-11-7 HOMO, eV HOMOLUMO gap 0.283 0.248 0.297 0.215 0.294 0.250 0.211 0.280 0.280 0.234 0.197 0.212 0.257 0.297 TI2 FDI H052 X3 AMW nHAcc 2.094 2.488 3.284 2.999 0.956 0.970 0.993 0.971 0 0 0 0 0.707 0.957 1.457 1.385 4.250 4.310 4.390 4.390 0 0 0 0 4.086 4.891 1.047 1.000 1.000 0.972 0 0 0 1.957 2.457 3.466 4.450 4.480 4.940 0 0 0 1.047 0.977 0 3.466 4.940 0 0.956 0.975 0.975 0.975 0.854 1.000 0.986 0.993 1.000 1.000 4 4 4 4 4 1.644 1.894 1.894 1.894 2.54 9.940 6.590 9.060 11.67 13.44 0 0 0 0 0 0.667 0.975 0.854 0.979 0.951 0.947 0 0 0 1.5 1.894 2.54 5.140 4.680 4.680 0 0 0 1.333 0.800 1.333 1.000 1.000 1.333 1.333 1.333 1.521 1.000 1.000 1.000 1.000 1.000 1.000 0.983 1.000 1.000 0 0 0 0 0 0 3 3 0 0 0 0 0 0 0 0 0 1.333 16.99 30.76 34.77 50.54 28.39 25.88 13.62 19.50 20.98 0 0 0 0 0 0 0 0 0 1.707 0.800 1.707 1.707 1.000 1.000 1.000 1.542 2.094 1.542 2.094 1.745 1.745 1.000 1.000 0.964 0.990 0.968 0.982 0.996 1.000 1.000 1.000 1.000 1.000 1.000 0 0 2 2 6 6 6 3 0 3 0 0 0 0.5 0 0.5 0.5 0 0 0 0.816 0.707 0.816 0.707 1.394 1.394 12.37 16.67 7.140 15.45 7.140 11.180 15.45 10.27 10.27 18.35 26.900 25.53 13.40 0 0 0 0 0 0 0 0 0 0 0 0 0 1.542 0.961 1 0.816 6.610 0 40 1-iodo-2-methylpropane 513-38-2 3.331 2.977 5.178 5.214 -0.250 -0.267 3.328 3.797 2.624 2.301 2.060 4.376 -0.263 -0.283 -0.268 526-73-8 3.000 3.805 3.788 4.215 2.527 3.063 2.940 3.319 -0.253 -0.234 -0.233 -0.231 95-63-6 4.198 3.192 -0.224 108-67-8 4.182 3.148 -0.230 527-53-7 4.609 3.465 -0.221 119-64-2 694-80-4 4.966 4.432 4.305 4.932 4.605 2.816 3.683 3.971 4.382 4.942 4.671 2.546 3.277 3.318 3.244 3.689 2.501 2.458 2.443 2.391 2.362 2.377 -0.233 -0.240 -0.240 -0.239 -0.240 -0.260 -0.257 -0.251 -0.264 -0.259 -0.259 108-37-2 4.654 2.378 -0.262 507-19-7 41 42 43 2-bromo-2-methylpropane 1,2-dibromoethylene tetrachloroethylene 540-49-8 127-18-4 513-37-1 44 45 46 47 1-chloro-2-methylpropene benzene 1,2-dimethylbenzene 1,3-dimethylbenzene 48 1,2,3-trimethylbenzene 49 1,2,4-trimethylbenzene 50 63 1,3,5-trimethylbenzene 1,2,3,5tetramethylbenzene tetralin n-propylbenzene iso-propylbenzene n-butylbenzene tert-butylbenzene fluorobenzene chlorobenzene bromobenzene 1,2-dichlorobenzene 1,3-dibromobenzene 1-bromo-2-chlorobenzene 1-bromo-3-chlorobenzene 64 65 66 67 68 69 70 71 72 73 1,2,4-trichlorobenzene styrene nitrobenzene benzonitrile anisole benzaldehyde phenyl isocyanate 3-nitrotoluene thiophenol benzyl bromide 74 75 trichlorotoluene 1-methylnaphthalene 51 52 53 54 55 56 57 58 59 60 61 62 76 77 78 79 80 81 dimethylnaphthalene 1-phenylnaphthalene ethanol 1-butanol 1-pentanol acetone 82 N,N-dimethylformamide 71-43-2 95-47-6 108-38-3 103-65-1 98-82-8 104-51-8 98-06-6 462-06-6 108-90-7 108-86-1 95-50-1 108-36-1 0.215 0.254 0.207 0.226 0.233 0.244 0.234 0.231 0.232 0.226 0.230 0.228 0.233 0.234 0.235 0.234 0.235 0.233 0.230 0.225 0.225 0.219 1.542 1.542 0.980 0.970 1 1 0.816 0.816 13.14 9.790 0 0 1.707 1.521 1.542 1.000 1.000 0.976 0 0 0 0.5 1.333 0.816 30.97 27.64 7.550 0 0 0 0.667 0.854 1.069 0.950 1.000 0.991 1.000 0.987 0 0 0 0 1.5 2.54 2.199 3.114 6.510 5.900 5.900 5.720 0 0 0 0 1.060 0.998 0 2.86 5.720 0 0.950 1.000 0 2.414 5.720 0 0.962 0.993 0 3.343 5.590 0 1.047 2.036 1.659 2.593 1.726 0.975 0.975 0.975 0.854 1.069 0.854 1.000 0.965 0.981 0.994 0.973 1.000 1.000 1.000 1.000 1.000 1.000 0 0 0 0 0 0 0 0 0 0 0 3.466 2.422 2.593 2.691 2.83 1.894 1.894 1.894 2.54 2.199 2.54 6.010 5.720 5.720 5.590 5.590 8.010 9.380 13.08 12.25 19.66 15.95 0 0 0 0 0 1 0 0 0 0 0 1.069 1.000 0 2.199 15.95 0 1.060 1.000 0 2.86 15.12 0 1.481 1.659 1.481 1.481 1.481 2.036 1.686 0.975 1.481 0.962 1.000 1.000 1.000 0.988 1.000 1.000 1.000 1.000 1.000 1.000 0 0 0 0 0 0 0 0 0 0 2.302 2.593 2.302 2.302 2.302 2.422 2.92 1.894 2.302 3.343 6.510 8.790 7.930 6.760 7.580 8.510 8.070 8.480 11.40 13.03 0 3 1 1 1 2 3 0 0 0 1.075 1.164 1.000 1.000 0 0 3.933 4.534 6.770 6.510 0 0 2.180 1.333 2.094 2.488 1.000 1.542 1.000 0.911 0.953 0.967 0.935 0.930 0 3 2 2 6 0 5.886 0 0.707 0.957 0 0.816 7.300 5.120 4.940 4.900 5.810 6.090 0 1 1 1 1 2 0.221 0.221 120-82-1 5.064 2.329 -0.270 100-42-5 3.932 4.305 3.932 3.932 3.932 4.432 4.698 3.683 4.639 5.475 2.704 2.614 2.502 2.712 2.613 2.473 2.831 2.484 2.690 2.591 -0.228 -0.291 -0.275 -0.223 -0.263 -0.249 -0.278 -0.240 -0.255 -0.270 5.377 5.788 2.311 2.439 -0.215 -0.210 7.949 1.414 2.414 2.914 1.732 2.270 1.973 3.932 5.117 5.488 3.997 4.553 -0.214 -0.266 -0.266 -0.266 -0.253 -0.246 98-95-3 100-47-0 100-66-3 100-52-7 103-71-9 99-08-1 108-98-5 100-39-0 3058333-6 90-12-0 2880488-8 605-02-7 64-17-5 71-36-3 71-41-0 67-64-1 68-12-2 0.215 0.186 0.172 0.212 0.213 0.183 0.207 0.162 0.215 0.200 0.220 0.172 0.170 0.165 0.339 0.336 0.337 0.224 0.255 83 84 85 86 87 88 89 90 91 tetrahydrothiophene thiophene 2-methylthiophene N-methyl-2-pyrrolidone pyridine quinoline aniline N-methylaniline N,N-dimethylaniline 92 1,5,9-cyclododecatriene 110-01-0 110-02-1 554-14-3 872-50-4 110-86-1 91-22-5 62-53-3 100-61-8 121-69-7 4904-614 3.000 3.000 3.348 3.305 3.000 4.966 3.394 3.932 4.305 6.000 2.887 2.129 2.329 3.108 2.413 2.079 2.651 2.840 3.157 4.049 -0.213 -0.245 -0.231 -0.238 -0.251 -0.240 -0.196 -0.189 -0.184 -0.232 0.245 0.219 0.213 0.260 0.214 0.177 0.196 0.191 0.188 0.247 0.579 0.579 0.956 0.918 0.667 1.047 0.975 1.481 1.659 1.244 1.000 1.000 1.000 0.993 1.000 1.000 1.000 0.989 0.976 0.948 4 0 3 2 0 0 0 0 0 0 1.25 1.25 1.644 2.29 1.5 3.466 1.894 2.302 2.593 3 6.780 9.350 8.180 6.200 7.190 7.600 6.650 6.300 6.060 5.410 0 0 0 2 1 1 1 1 1 0 Test set 1 2 3 4 5 6 7 8 9 tetradecane cyclohexane 1-methyl-1-cyclohexene 629-59-4 cis-1,2dimethylcyclohexane ethylcyclohexane 6.914 3.000 3.394 6.699 3.605 3.399 -0.281 -0.290 -0.224 0.394 0.425 0.258 5.698 0.667 0.975 1.000 0.963 0.982 0 0 0 2.957 1.500 1.894 4.510 4.680 5.060 0 0 0 2207-014 3.805 4.154 -0.282 0.395 0.854 0.953 0 2.540 4.680 0 1678-917 67-66-3 3.932 3.936 -0.283 0.400 1.481 0.966 0 2.302 4.680 0 78-77-3 2.598 3.328 2.621 3.828 2.977 1.993 3.292 4.366 4.155 5.219 -0.326 -0.276 -0.269 -0.257 -0.268 0.253 0.237 0.254 0.236 0.255 1.000 1.707 1.707 2.094 1.542 1.000 1.000 0.976 1.000 0.972 0 0 2 0 1 0.000 0.500 0.500 0.707 0.816 23.87 23.48 11.18 18.35 9.790 0 0 0 0 0 2-chloro-2-methylpropane 507-20-0 2.250 5.237 -0.282 0.285 0.800 0.966 9 0.000 6.610 0 2-iodo-2-methylpropane 558-17-8 2.750 5.175 -0.246 0.204 0.800 0.987 9 0.000 13.14 0 trichloroethylene toluene 1,4-dimethylbenzene 1,2,3,4tetramethylbenzene ethylbenzene sec-butylbenzene iodobenzene 1,3-dichlorobenzene 1,2-dibromobenzene 2-nitrotoluene benzyl chloride 1-chloronaphthalene 1-bromo-2methylnapthalene 1-propanol 1-hexanol 1-octanol acrylonitrile 2-methoxyethyl ether 79-01-6 3.201 3.394 3.788 4.626 2.213 2.746 2.895 3.530 -0.280 -0.241 -0.230 -0.222 0.233 0.234 0.225 0.227 1.542 0.975 1.140 0.984 1.000 1.000 1.000 0.985 0 0 0 0 0.816 1.894 2.305 3.702 21.90 6.140 5.900 5.590 0 0 0 0 3.932 4.843 4.260 4.365 4.959 4.715 4.285 5.666 6.365 2.991 3.514 2.425 2.392 2.388 2.927 2.721 2.137 2.208 -0.241 -0.240 -0.243 -0.267 -0.246 -0.268 -0.253 -0.220 -0.215 0.235 0.234 0.206 0.225 0.221 0.183 0.225 0.172 0.170 1.481 1.964 0.975 1.069 0.854 1.563 1.481 1.075 1.164 0.990 0.979 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0 0 0 0 0 0 0 0 0 2.302 3.099 1.894 2.199 2.540 3.034 2.302 3.933 4.534 5.900 5.590 17.00 12.25 19.66 8.070 8.440 9.030 10.53 0 0 0 0 0 3 0 0 0 1.914 3.414 4.414 1.914 4.414 4.733 5.867 6.244 2.828 5.224 -0.262 -0.262 -0.261 -0.290 -0.251 0.338 0.338 0.338 0.233 0.329 1.707 2.885 3.685 1.707 3.685 0.919 0.965 0.987 0.944 0.967 2 2 2 0 0 0.500 1.207 1.707 0.500 1.707 5.010 4.870 4.820 7.580 5.830 1 1 1 1 3 chloroform 1,2-dibromoethane 1-bromopropane 1,3-dibromopropane 1-bromo-2-methylpropane 110-82-7 591-49-1 106-93-4 106-94-5 109-64-8 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 108-88-3 106-42-3 488-23-3 100-41-4 135-98-8 591-50-4 541-73-1 583-53-9 88-72-2 100-44-7 90-13-1 2586-621 71-23-8 111-27-3 111-87-5 107-13-1 111-96-6 Plot of correlation coefficients for all models 1 Correlation coefficient, r2 0.9 0.8 0.7 0.6 0.5 0 1 2 3 4 5 Number of variables in the model Figure S1. Comparative plot of correlation coefficient values for each model. ■- values for training set; ▲- values for test set. Definitions for the descriptors described in the models selected. 1. X1sol - is topological descriptor, which represents solvation connectivity index (chi-1) that encodes the solvation property of the compound (Todeschini and Consonni 2003). This molecular descriptor is defined in order to model solvation entropy and dispersion interactions in solution. The descriptor relates the characteristic dimension of the molecule to the atomic parameters (quantum number, bond indices and etc). The bidimensional descriptor X1sol was proposed in 1991 by group of Zefirov and Palyulin (Antipin 1991) in order to treat the enthalpies of non-specific solvation. The descriptor is defined by the following equations. If the characteristic dimensions of the molecules by atomic parameters are taken into account, they are defined as: ( L) ( ) k 1 X sol 2 s m q m 1 k k 1 a 1 a k n 1/ 2 a 1 a k where La is the principal quantum number (2 for C, N, O atoms, 3 for Si, S, Cl and etc.) of the ath atom in the kth subgraph; δa is the corresponding vertex degree; k is the total number of mth order subgraphs and n is the number of vertices in the subgraph. The normalization factor 1/(2m+1) is defined in such a way that the indices Xm and Xmsol for compounds, which contain only second-row atoms, coincide. The 1st order solvation connectivity index is X 1sol 1 ( L L ) 4 ( ) i j b 1/ 2 i j b where b runs over all the B bonds; Li and Lj are the principal quantum numbers of the two vertices related to the considered bond. The positive coefficient of X1sol indicates that an increase in the descriptor value results in an increase in solubility of C60 in the considered solvent. 2. J3D – the descriptor represents 3D-Balaban index, the geometrical descriptor (Todeschini and Consonni 2003). The Balaban index describes the distance connectivity of the molecule, which is the average distance sum connectivity. This index is derived from the geometry distance matrix (hence, a 3D descriptor). The geometry matrix G is a square symmetric matrix where the ijth entry is the Euclidean distance between the ith and the jth atoms. The geometric distance degree is the ith row sum in the geometry matrix G for each i, that is, Now J3D can be defined as follows: where, and are the geometric distance degrees of two adjacent atoms i and j connected by the bond b, and the sum runs over all the bonds b in the molecule, B is the total number of bonds in the molecule, and C is the cyclomatic number. 3. HOMO, LUMO and HOMO-LUMO gap – these descriptors represent quantum chemical descriptors, energies of Highest Occupied Molecular Orbital (HOMO), Lowest Unoccupied Molecular Orbital (LUMO) and band gap between them. These orbitals are called the frontier orbitals, and determine the way the molecule interacts with other species. The HOMO is the orbital that could act as an electron donor, since it is the outermost (highest energy) orbital containing electrons. The LUMO is the orbital that could act as the electron acceptor, since it is the innermost (lowest energy) orbital that has room to accept electrons. The HOMO descriptor describes the nucleophilic properties of solvent and LUMO descriptor describes the electrophilic properties of solvent. HOMOLUMO gap reflects the reactivity of the compound, thus, the less value of the descriptor corresponds to the more reactive compound. These descriptors can be calculated by various quantum-chemical methods. 4. TI2 – The TI2 descriptor is topological descriptor, second Mohar index TI2. The Mohar index is derived from Laplacian matrix (Todeschini and Consonni 2003; Mohar 1989), a distance matrix. The descriptors, TI1 and TI2 are defined on the ground of Laplacian spectrum: where λ is adjacency matrix, as a measure of molecular branching; graph of N and Q dimension. 5. FDI – the FDI is geometrical descriptor representing a folding degree index. The FDI descriptor is defined as the largest eigenvalue obtained by the diagonalization of the distance/distance matrix, and then normalized and divided by the number of atoms (Todeschini and Consonni 2003; Randic et al. 1994; Randic and Krilov 1999). The values of the descriptor are in range 0<FDI=<1. This descriptor converges to one for linear molecules (of infinite length) and decreases in accord with the folding of the molecule. The FDI descriptor can be used as indicator of the degree of departure of a molecule from a strict linearity. 6. H-052 – descriptor is among atom-centered fragments, describing H (hydrogen) attached to C(sp3) with 1X (heteroatom) attached to the next C (Todeschini and Consonni 2003). 7. nHAcc – descriptor represents a number of acceptor atoms for H bonds (N, O, F and etc). This descriptor is among the functional group descriptors. 8. X3 – descriptor represents connectivity index chi-3, this is a topological descriptor (Todeschini and Consonni 2003). This descriptor is among Kier-Hall Connectivity Indices that are calculated from the Hydrogen-depleted molecular graph (Kier & Hall, 1986). 1. Connectivity index Chi-0 through Chi-5 2. Average Connectivity index Chi-0 through Chi-5 3. Valence Connectivity index Chi-0 through Chi-5 4. Average Valence Connectivity index Chi-0 through Chi-5 Connectivity indices Chi-0 through Chi-5 are defined as follows. : Connectivity index Chi-0 is defined as: where, n is the number of nodes in the Hydrogen-depleted graph, δi is the vertex degree of the ith atom defined as the number of non-Hydrogen neighbours in the molecular graph. The Average Connectivity index Chi-0 is: : Connectivity index Chi-1 is defined as: where, b is the number of bonds, the sum runs through all bonds in the Hydrogen-depleted molecule, and for each bond δi δj is the product of the vertex degrees of the end atoms i and j. The Average Connectivity index Chi-1 is Higher Indices: Connectivity indices Chi-m for 2 ≤ m ≤ 5 is defined as: where, (II δi)k is the product of the vertex degrees of the atoms that form a connected subgraph with m edges, and K is the total number of such distinct connected sub graphs (the H-depleted molecular graph) each having m edges. For any m, 0 ≤ m ≤ 5, if we replace the vertex degree δi by the valence vertex degree for each atom i in the Connectivity index Chi-m, then we get Valence Connectivity Indices Chi-m (Kier & Hall, 1981; Kier & Hall, 1983). That is, where, is the product of the valence vertex degrees of the atoms that form a connected subgraph with m edges, and K is the total number of such distinct connected subgraphs (the H-depleted molecular graph) each having m edges. The Valence Connectivity Indices account for the presence of heteroatoms and double and triple bonds. The Average Valence Connectivity index Chi-1 is defined similarly: 9. AMW - a constitutional descriptor, describes an average molecular weight. References Kier LB, Hall LH (1981) J. Pharm. Sci., 70:583. Kier LB, Hall LH (1983) General definition of valence delta-values for molecular connectivity. J. Pharm. Sci. 72:1170–1173 Kier LB, Hall LH (1986) Molecular Connectivity in Structure-Activity Analysis, J. Wiley & Sons, New York