Download PDF

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Population genetics wikipedia , lookup

Gene expression programming wikipedia , lookup

Transcript
Building QSAR models using shannon entropy and a genetic
algorithm
Jörg K. Wegner and Andreas Zell
Zentrum für Bioinformatik Tübingen (ZBIT)
Universität Tübingen
Sand 1
D-72076 Tübingen
{wegnerj,zell}@informatik.uni-tuebingen.de
We describe a fast and flexible descriptor selection method using a genetic algorithm variant (GA-SEC). The relevance of the descriptors is measured using Shannon Entropy (SE)
and Differential Shannon Entropy (DSE) [1], which have very sparse memory requirements and allow the processing of huge data sets. A small quantity of the most important
descriptors will be used automatically to build a value prediction model. The most important descriptors are not a linear combination of other descriptors, but transparent, pure
descriptors. We used an artificial neuronal network (ANN) model [2] to predict logS/logP
values and obtained R2AN N,T est =0.93 for the Huuskonen test set [3] and R2AN N,T est =0.92
for the Wang test set [4]. The SE and DSE values are used to initialise the GA algorithm
and speed up the descriptor selection process dramatically. The value prediction model
(e.g. ANN) is calculated using the selected descriptors. The fitness function rewards a
small number of descriptors and penalizes a large number of descriptors. This modified
fitness function avoids implicit overfitted models with poor predictive ability.
The figure shows the mean values over 8 experiments. The GA-SEC algorithms are faster
than standard GA algorithms. Additionally they are more stable and in general lead to
models with a smaller standard deviation.
The GA-SEC algorithms and the free descriptor calculation library JOELib [5] are completely written in Java.
References
[1] Godden, W., Bajorath, J., J. Chem. Inf. Comput. Sci., 2001, 41, 1060.
[2] Zell, A., Simulation neuronaler Netze, Oldenbourg Verlag, München, 1997.
[3] Huuskonen, J., J. Chem. Inf. Comput. Sci., 2000, 40, 773.
[4] Wang, R., Gao, Y., Lai, L., Perspectives in Drug Discovery and Design, 2000, 19,
47.
[5] JOELib, http://www-ra.informatik.uni-tuebingen.de/software/joelib.