Download presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Application of rule induction
techniques for detecting the possible
impact of endocrine disruptors on
the North Sea ecosystems
Tim Verslycke1, Peter Goethals1,2, Gert Vandenbergh1,
Karen Callebaut3 & Colin Janssen1
1
Laboratory of Environmental Toxicology and Aquatic Ecology, Ghent University
Institute for Forestry and Game Management
3 Ecolas n.v.
2
Outline
Introduction on endocrine disruptors
ED North project
Database set-up
Data mining and rule induction
Practical application on ED North database
Conclusions
Endocrine disruptors ??
Endocrine disruptors, pseudo-hormones,
endocrine modulators, xeno-hormones, …
Compounds that interfere with the
endocrine system, resulting in (negative)
effects on health and/or reproduction of
organisms
Since 90s: one of the strongest growing
research domains in environmental
toxicology
Dozens of lists, 100s compounds
Worldwide implication: industry government - academics
Endocrine disruption in marine
environments ??
Sea: final sink for many chemicals
North Sea and its estuaries are under
a heavy pollution load
Indications of potential endocrine
disruption in these ecosystems
Need to have better overview of
potential endocrine disruption in
North Sea and Scheldt estuary
 ED-NORTH project
ED-North project ~ Goals
Critical evaluation of the literature on endocrine disruptors
Build a reference list and database of chemicals with
(potential) endocrine disruptive activity
Evaluation of the described and suspected effects of
endocrine disruptors on marine organisms
Prioritize the selected chemicals
If enough information: preliminary risk assessment
Formulation of the research needs and policy actions
(overview of the Belgian expertise)
ED-North project ~ Methods
Literature study
- electronic databases: Poltox, Medline, Current Contents,
CAB abstracts, Agris, Agricola, Web of Science,…
- world wide web: USEPA, OECD, WWF, CEFIC, IEH,…
- grey literature
Database
MS Access (relational database)
ED-North project ~ Results
General overview of endocrine disruption in humans and
other mammals, birds, reptiles, fish and invertebrates
Situation in Belgium and The Netherlands
Expertise in Belgium
Emission of synthetic and natural hormones in Belgium
Sources, effects and occurrence of endocrine disruptors in
the North Sea + prioritization
Database of (potential) endocrine disruptors for the
North Sea ecosystem
Relational database: anthropogenic
(potential) endocrine disruptors
CHEMICALS (765)
Chemical ID
Chemical Name Nl
Chemical Name E
CAS
UN
Chemical Formula
Molecular Weight
Boiling Point
Melting Point
Density
Pressure
Solubility
Log Kow
Phase
Notes
ENDOCRINE
Endocrine ID
Chemical ID
Reference ID
Group Name
Organism
Tissue
Age
In vivo
Lab
Flow
Duration
Route
Temperature
Concentration
Notes
EFFECT (3516)
Effect ID
Hormone Name
Endocrine ID
Effect Code
Effect description
HORMONE
Hormone Name
EFFECT CODE
Effect Code
REFERENCES (423)
Reference ID
Authors
Year
Title
Source
GROUP
Group Name
Relational database
Tabel: References
RefID
26
Authors
Year
Source
Soto, A.M., Chung, K.L., Sonnenschein, C.
1994
Environ. Health Perspect., 102:380-383
Tabel: Endocrine
Endocrin
ID
Chem
ID
Ref
ID
240 26
2598
Group
Organism
Tissue
mammalian
Human
MCF-7 cells
Age
In
Vivo
In vitro
Dura
tion
Lab
Laboratory
6 days
Concentra
tion
10 µM
Notes
Technical grade; Escreen
Tabel: Chemicals
Chem
ID
ChemNameNl
240 DDT
CAS
Chem
Form
Mol
weight
BP
MP
Pressure
Solubility
Log
Kow
Phase
50-29-3
C14H9Cl5
354,49
260°C
108°C
1,9E-7 mm Hg at 20°C
3,1-3,4 µg/l
6,19
Solid
Rule induction techniques
Data mining (analysis) techniques:
1) Clustering methods (which data are related
or ‘similar’)
e.g. cluster analysis
2) Classification methods (how are variables
related, merely using classes (numerical or not)
= rules amongst variables)
e.g. decision trees
3) Regression methods (quantitative description
of the relation between two variables)
e.g. multivariate regression
B
A
B
A
B
A
Rule induction techniques
Classification and decision trees: induction of rules
from datasets
• which variables are related
e.g. which variables are mainly related to endocrine
disruptive effects in animals
• how are variables related (quantitative rules making use of
treshold values or classes)
e.g. when hormone concentration higher than value A, then
estrogenic effects of type X will occur
Rule induction techniques
WEKA data mining software: DOS command window but also
Visual JAVA interface
Induced rule
set
Rule set
performance
indicators
Applications on ED-North database
Example on crustacean data
1) Prediction of endocrine disruptive effects based on
physical/chemical properties of chemicals
2) Prediction of estrogenic effect of chemicals to the
crustaceans in the database
3) Which factors (flow, concentration, duration, ...) affect this
estrogenicity
1) Which molecular characteristics are
related to estrogenic effects
Estrogenic effects in crustaceans (89 cases)
Tested variables: effects, molecular weight, boiling
point, temperature, Log Kow, solubility
Induced rule set:
LogKow  3.74: Estrogenic effect
LogKow > 3.74
| Solubility  0.00033: No Estrogenic effect
| Solubility > 0.00033: Estrogenic effect
Reliability (CCI): 63 %
2) Which estrogenic effects are related
with particular compounds in the
environment
Estrogenic effects in crustaceans
Tested variables: effects, compounds
Induced rule set (23 rules, one for each compound):
CHEMID = 4-nonylphenol (p-nonylphenol): Estrogenic
effect
CHEMID = ...
...
CHEMID = 20-hydroxyecdysone: No Estrogenic effect
Reliability (CCI): 60 %
2) Which estrogenic effects are related
with particular compounds in the
environment
Estrogenic effects in crustaceans
Tested variables: effects, organisms, compounds
Induced rule set (13 rules, one for each organism):
Organism = Balanus amphitrite: No estrogenic effect
Organism = Daphnia magna: Estrogenic effect
...
Reliability (CCI): 74 %
3) Which factors affect the estrogenic
effects
Estrogenic effects in crustaceans
Tested variables: effects, organisms, compounds, age, flow,
in vitro/in vivo, duration
Induced rule set (16 rules, one for each age class and for
larval also one for each organism type):
Age = Juvenile: No estrogenic effect
Age = Larval
| Organism = Balanus amphitrite : Estrogenic effect
| Organism = ...
Age = Adult: Estrogenic effect
Age = Egg: Estrogenic effect
Reliability (CCI): 78 %
General discussion
This exercice on the ED North data base illustrated
that data mining can help to find relations between:
Compounds and
their structure
Estrogenic
effects
Type of
organisms
Test and
environmental
conditions
General discussion
Data mining helps to find errors and outliers in the
data set, and creates insights to improve further data
collection and the development of databases
Interaction between data miners and domain experts
(ecologist, ecotoxicologist) very important:
1) easily find ‘reliable nonsense’ rules by excluding
important variables during the analysis (need for
expertise of ecotoxicologist)
2) the parameter settings and the insight in tuning
them have a very important impact on the richness of
the outcome of the data mining exercice (need for data
mining expertise)
General discussion
The collected data set itself influences to an
important extend the outcome of the analysis:
1) importance of collecting data that cover the whole
range (variables and their values/classes) and
stratification of the instances is necessary
2) Selection of variable-classes can affect the results
to a high extend (e.g. larval-adult problem, amount
of effect-classes, ...)
Conclusions
Data mining allows to find which gaps exist in the database
and delivers information for sustainable data collection and
management
Data mining delivers insight in the dataset: generation of
knowledge from data
Highly impredictable parts in the dataset are useful to focus
further research on
General reliable rules are promising for decision support in
environmental management
Important to be aware of exploring correlations instead of
causal relations! Control by experts or further research
(validation) is always necessary
Data mining adds more colour to our data
Acknowledgements
Federal Office for Scientific, Technical
and Cultural Affairs (OSTC)
Thesis students
Ward Vanden Berghe (VLIZ)
The Flemish Institute for the Promotion of
Scientific and Technological Research in
Industry (IWT)