Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
274 Chapter 14 Learning Methodologies for Detection and Classification of Mutagens Huma Lodhi Imperial College London, UK AbsTRACT Predicting mutagenicity is a complex and challenging problem in chemoinformatics. Ames test is a biological method to assess mutagenicity of molecules. The dynamic growth in the repositories of molecules establishes a need to develop and apply effective and efficient computational techniques to solving chemoinformatics problems such as identification and classification of mutagens. Machine learning methods provide effective solutions to chemoinformatics problems. This chapter presents an overview of the learning techniques that have been developed and applied to the problem of identification and classification of mutagens. INTRODUCTION Mutagenicity is an unfavorable characteristic of drugs that can cause adverse effects. In chemoinformatics, it is crucial to develop and design effective and efficient computational tools to identify toxic and mutagenic molecules. Accurate prediction of mutagenicity will not only accelerate the process of finding quality lead molecules but will also decrease the potential drug attrition. During recent years considerable efforts have been devoted to developing, analyzing and applying DOI: 10.4018/978-1-61520-911-8.ch014 statistical and relational learning techniques to identify undesirable biological effects such as mutagenicity. Mutagens produce mutations to DNA and may/ may not cause cancers. However the use of drugs that are characterized by mutagenicity but not carcinogenicity is not recommended (Debnath, Compadre, Debnath, Schusterman, & Hansch, 1991). The Ames test (Ames, Lee, & Durston, 1973) is viewed a biological means to identify mutagenic molecules. In this test, a bacterium, generally Salmonella typhimurium, is used to categorize mutagens and non-mutagens. The novel molecules are exposed to the bacterium that lacks Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. Learning Methodologies for Detection and Classification of Mutagens the ability to produce amino acid, histidine. The growth of the bacterial culture demonstrates the mutations in DNA, hence the molecule is classified mutagen. Figure 1 shows a mutagenic molecule. Machine learning methods and techniques provides an accurate, useful and efficient means to classify mutagens. In this chapter we present an overview of a number of techniques that have been developed and applied to the problem of predicting mutagenicity. The review, presented in the chapter, is not exhaustive and recent research and seminal work has been outlined. bACKGROUND In machine learning the problem of recognition and identification of mutagens is generally solved by viewing it as a classification problems. Methods ranging from Inductive Logic Programming Figure 1. An example of mutagenic molecule (ILP) techniques to kernel based methods (KMs) have been developed and applied to mutagenicity classification. Mutagenesis dataset presented by Debnath et al. (1991) is a benchmark dataset on which the efficacy of learning methods has been evaluated. We, therefore, present an overview of the techniques that have been applied to the dataset. Mutagenesis dataset comprises 230 molecules trialled for mutagenicity on Salmonella typhimurium. Debnath et al. (1991) showed that a subset of 188 molecules are learnable using linear regression. This subset was later termed the “regression friendly” dataset (hereafter referred to as mutagenesis dataset). The remaining 42 molecules are named the “regression unfriendly” subset. Of the 188 molecules 125 have positive log mutagenicity whereas 63 molecules have zero or negative log mutagenicity. Debnath et al. identified two chemical features, C, and two structural (indicator) variables, I, to predicting mutagenicity. The chemical features are lowest unoccupied molecule orbital (LUMO) and water/octanol partition coefficient (LOGP). The two indicator variables are number of fused rings (fused rings count), IN1, and examples of acenthrylenes, IN2. These are structural binary variables where IN1 is assigned value “1” if a molecule has 3 or more fused rigs and IN1 is set to “0” for all the molecules that have less than 3 fused rings. Similarly the value of IN2 is set to 1 for 5 examples of acenthrylenes and alternatively 0. On the basis of linear regression based quantitative structure activity relation analysis, Debanth et al. suggested that mutagenicity of molecules that are aromatic nitro compounds is characterized by hydrophobicity, nitro groups in conjunction with electron attracting elements and 3 or more fused rings. Srinivasan, Muggleton, King, and Sternberg (1996) introduced more features for the mutagenesis dataset by exploiting atom bond connectivities and using first order logic. The key information is given in the form of atom and bond, AB, description. Furthermore, atom and bond description is used to define functional groups, FG, including 275 13 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the product's webpage: www.igi-global.com/chapter/learning-methodologies-detection-classificationmutagens/45475?camid=4v1 This title is available in InfoSci-Medical, InfoSci-Books, Communications, Social Science, and Healthcare. Recommend this product to your librarian: www.igi-global.com/e-resources/library-recommendation/?id=18 Related Content Using Chemical Structural Indicators for Periodic Classification of Local Anaesthetics Francisco Torrens and Gloria Castellano (2013). Methodologies and Applications for Chemoinformatics and Chemical Engineering (pp. 117-137). www.igi-global.com/chapter/using-chemical-structural-indicators-periodic/77073?camid=4v1a Advanced PLS Techniques in Chemometrics and Their Applications to Molecular Design Kiyoshi Hasegawa and Kimito Funatsu (2011). Chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques (pp. 145-168). www.igi-global.com/chapter/advanced-pls-techniques-chemometrics-their/45469?camid=4v1a Modeling Ecotoxicity as Applied to some Selected Aromatic Compounds: A Conceptual DFT Based Quantitative-Structure-Toxicity-Relationship (QSTR) Analysis Santanab Giri, Arindam Chakraborty, Ashutosh Kumar Gupta, Debesh Ranjan Roy, Ramadoss Vijayaraj, Ramakrishnan Parthasarathi, Venkatesan Subramanian and Pratim Chattaraj (2012). Advanced Methods and Applications in Chemoinformatics: Research Progress and New Applications (pp. 1-24). www.igi-global.com/chapter/modeling-ecotoxicity-applied-some-selected/56448?camid=4v1a On Extended Topochemical Atom (ETA) Indices for QSPR Studies Kunal Roy and Rudra Narayan Das (2012). Advanced Methods and Applications in Chemoinformatics: Research Progress and New Applications (pp. 380-411). www.igi-global.com/chapter/extended-topochemical-atom-eta-indices/56464?camid=4v1a