Download Active Subgroup Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Active subgroup mining
for
descriptive induction tasks
Dragan Gamberger
Rudjer Bošković Instute, Zagreb
Zdenko Sonicki
University of Zagreb
Talk overview:
- descriptive induction
- active subgroup mining
- subgroup discovery
- data mining server
- a real medical example
Descriptive induction is aimed at
generating (inducing) knowledge that is
understandable (interpretable) by humans.
It is different from classification aimed
induction where the main goal is high
classification quality (but induced
classification schemes are typically too
complex for human interpretation).
Main properties of descriptive induction:
- simple rules
- reasonable prediction quality (both on
available and future cases)
Main problem: overfitting
functional genomics domain has 150 examples with 16000
measured attribute values
- descriptive induction
- active subgroup mining
- subgroup discovery
- data mining server
- a real medical example
Active subgroup mining is a data analysis
approach specially developed for medical
applications (but applicable also for other
domains).
It is based on the observation that expert
knowledge (in medical domains it means knowledge and
experience of medical doctors) is very important for
the quality of obtained results.
In active subgroup mining the expert is
positioned in the center of the process and
machine learning (subgroup discovery) is only a
tool that helps him in the data analysis
process.
subgroup discovery
definition
of task(s)
induction
of models
selection
of models
statistical
evaluation
presentation
visualization
integration
expert
- descriptive induction
- active subgroup mining
- subgroup discovery
- data mining server
- a real medical example
classical versus subgroup discovery
induction
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
very specific subgroup
very sensitive subgroup
generality – the main parameter of
the subgroup induction process
Subgroup discovery is a beam search
algorithm which generates short rules in the
form of conjunctions of conditions.
Conditions are based on the values of
available attributes.
example:
CHD
<-
age > 53
AND T.CH > 6.1
AND
BMI < 30
- descriptive induction
- active subgroup mining
- subgroup discovery
- data mining server
- a real medical example
dms.irb.hr
meningoencephalitis domain
subgroup describing bacteria in
contrast to the virus type disease
- descriptive induction
- active subgroup mining
- subgroup discovery
- data mining server
- a real medical example
Conclusions:
-descriptive induction and active subgroup
mining are novel concepts potentially very
interesting for data analysis and knowledge
induction in medical applications
- active and central role of medical experts is
essential
- we have extensive and positive experience
with these methodology on different medical
domains but no experience in constructing
medical guidelines. For such applications
potentially useful might be:
- detection of decision points for numerical
attributes
- detection of apparent but significant
contradictions
- explicit noise detection
Related documents