Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Application of rule induction techniques for detecting the possible impact of endocrine disruptors on the North Sea ecosystems Tim Verslycke1, Peter Goethals1,2, Gert Vandenbergh1, Karen Callebaut3 & Colin Janssen1 1 Laboratory of Environmental Toxicology and Aquatic Ecology, Ghent University Institute for Forestry and Game Management 3 Ecolas n.v. 2 Outline Introduction on endocrine disruptors ED North project Database set-up Data mining and rule induction Practical application on ED North database Conclusions Endocrine disruptors ?? Endocrine disruptors, pseudo-hormones, endocrine modulators, xeno-hormones, … Compounds that interfere with the endocrine system, resulting in (negative) effects on health and/or reproduction of organisms Since 90s: one of the strongest growing research domains in environmental toxicology Dozens of lists, 100s compounds Worldwide implication: industry government - academics Endocrine disruption in marine environments ?? Sea: final sink for many chemicals North Sea and its estuaries are under a heavy pollution load Indications of potential endocrine disruption in these ecosystems Need to have better overview of potential endocrine disruption in North Sea and Scheldt estuary ED-NORTH project ED-North project ~ Goals Critical evaluation of the literature on endocrine disruptors Build a reference list and database of chemicals with (potential) endocrine disruptive activity Evaluation of the described and suspected effects of endocrine disruptors on marine organisms Prioritize the selected chemicals If enough information: preliminary risk assessment Formulation of the research needs and policy actions (overview of the Belgian expertise) ED-North project ~ Methods Literature study - electronic databases: Poltox, Medline, Current Contents, CAB abstracts, Agris, Agricola, Web of Science,… - world wide web: USEPA, OECD, WWF, CEFIC, IEH,… - grey literature Database MS Access (relational database) ED-North project ~ Results General overview of endocrine disruption in humans and other mammals, birds, reptiles, fish and invertebrates Situation in Belgium and The Netherlands Expertise in Belgium Emission of synthetic and natural hormones in Belgium Sources, effects and occurrence of endocrine disruptors in the North Sea + prioritization Database of (potential) endocrine disruptors for the North Sea ecosystem Relational database: anthropogenic (potential) endocrine disruptors CHEMICALS (765) Chemical ID Chemical Name Nl Chemical Name E CAS UN Chemical Formula Molecular Weight Boiling Point Melting Point Density Pressure Solubility Log Kow Phase Notes ENDOCRINE Endocrine ID Chemical ID Reference ID Group Name Organism Tissue Age In vivo Lab Flow Duration Route Temperature Concentration Notes EFFECT (3516) Effect ID Hormone Name Endocrine ID Effect Code Effect description HORMONE Hormone Name EFFECT CODE Effect Code REFERENCES (423) Reference ID Authors Year Title Source GROUP Group Name Relational database Tabel: References RefID 26 Authors Year Source Soto, A.M., Chung, K.L., Sonnenschein, C. 1994 Environ. Health Perspect., 102:380-383 Tabel: Endocrine Endocrin ID Chem ID Ref ID 240 26 2598 Group Organism Tissue mammalian Human MCF-7 cells Age In Vivo In vitro Dura tion Lab Laboratory 6 days Concentra tion 10 µM Notes Technical grade; Escreen Tabel: Chemicals Chem ID ChemNameNl 240 DDT CAS Chem Form Mol weight BP MP Pressure Solubility Log Kow Phase 50-29-3 C14H9Cl5 354,49 260°C 108°C 1,9E-7 mm Hg at 20°C 3,1-3,4 µg/l 6,19 Solid Rule induction techniques Data mining (analysis) techniques: 1) Clustering methods (which data are related or ‘similar’) e.g. cluster analysis 2) Classification methods (how are variables related, merely using classes (numerical or not) = rules amongst variables) e.g. decision trees 3) Regression methods (quantitative description of the relation between two variables) e.g. multivariate regression B A B A B A Rule induction techniques Classification and decision trees: induction of rules from datasets • which variables are related e.g. which variables are mainly related to endocrine disruptive effects in animals • how are variables related (quantitative rules making use of treshold values or classes) e.g. when hormone concentration higher than value A, then estrogenic effects of type X will occur Rule induction techniques WEKA data mining software: DOS command window but also Visual JAVA interface Induced rule set Rule set performance indicators Applications on ED-North database Example on crustacean data 1) Prediction of endocrine disruptive effects based on physical/chemical properties of chemicals 2) Prediction of estrogenic effect of chemicals to the crustaceans in the database 3) Which factors (flow, concentration, duration, ...) affect this estrogenicity 1) Which molecular characteristics are related to estrogenic effects Estrogenic effects in crustaceans (89 cases) Tested variables: effects, molecular weight, boiling point, temperature, Log Kow, solubility Induced rule set: LogKow 3.74: Estrogenic effect LogKow > 3.74 | Solubility 0.00033: No Estrogenic effect | Solubility > 0.00033: Estrogenic effect Reliability (CCI): 63 % 2) Which estrogenic effects are related with particular compounds in the environment Estrogenic effects in crustaceans Tested variables: effects, compounds Induced rule set (23 rules, one for each compound): CHEMID = 4-nonylphenol (p-nonylphenol): Estrogenic effect CHEMID = ... ... CHEMID = 20-hydroxyecdysone: No Estrogenic effect Reliability (CCI): 60 % 2) Which estrogenic effects are related with particular compounds in the environment Estrogenic effects in crustaceans Tested variables: effects, organisms, compounds Induced rule set (13 rules, one for each organism): Organism = Balanus amphitrite: No estrogenic effect Organism = Daphnia magna: Estrogenic effect ... Reliability (CCI): 74 % 3) Which factors affect the estrogenic effects Estrogenic effects in crustaceans Tested variables: effects, organisms, compounds, age, flow, in vitro/in vivo, duration Induced rule set (16 rules, one for each age class and for larval also one for each organism type): Age = Juvenile: No estrogenic effect Age = Larval | Organism = Balanus amphitrite : Estrogenic effect | Organism = ... Age = Adult: Estrogenic effect Age = Egg: Estrogenic effect Reliability (CCI): 78 % General discussion This exercice on the ED North data base illustrated that data mining can help to find relations between: Compounds and their structure Estrogenic effects Type of organisms Test and environmental conditions General discussion Data mining helps to find errors and outliers in the data set, and creates insights to improve further data collection and the development of databases Interaction between data miners and domain experts (ecologist, ecotoxicologist) very important: 1) easily find ‘reliable nonsense’ rules by excluding important variables during the analysis (need for expertise of ecotoxicologist) 2) the parameter settings and the insight in tuning them have a very important impact on the richness of the outcome of the data mining exercice (need for data mining expertise) General discussion The collected data set itself influences to an important extend the outcome of the analysis: 1) importance of collecting data that cover the whole range (variables and their values/classes) and stratification of the instances is necessary 2) Selection of variable-classes can affect the results to a high extend (e.g. larval-adult problem, amount of effect-classes, ...) Conclusions Data mining allows to find which gaps exist in the database and delivers information for sustainable data collection and management Data mining delivers insight in the dataset: generation of knowledge from data Highly impredictable parts in the dataset are useful to focus further research on General reliable rules are promising for decision support in environmental management Important to be aware of exploring correlations instead of causal relations! Control by experts or further research (validation) is always necessary Data mining adds more colour to our data Acknowledgements Federal Office for Scientific, Technical and Cultural Affairs (OSTC) Thesis students Ward Vanden Berghe (VLIZ) The Flemish Institute for the Promotion of Scientific and Technological Research in Industry (IWT)