Download StatMaster – Update

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
StatMaster – An Update
Kartik Vishwanath
Chintan Patel
Yugyung Lee
William Drake
Richard Stroup
Steve Simon
UMKC
Childrens Mercy Hospital,
Kansas City, MO
07 June 2004
Defining Data Mining

The automated extraction of hidden predictive
information from (large) databases



Three key words:
Automated
Hidden

Predictive
 Implicit is a statistical methodology


Data mining lets you be proactive
Prospective rather than Retrospective
Kinds of Data Mining Problems
Classification: Finding a set of models
that describe or distinguish data classes.
 Clustering: Grouping objects by
minimizing interclass similarity and
maximizing intraclass similarity.
 Association: Discovery of association
rules showing attribute-value conditions
that occur frequently.

Examples of Data Mining
Table 2.1 • Cardiology Patient Data
Attribute
Name
M ixed
Values
Numeric
Values
Comments
A ge
Numeric
Numeric
A ge in years
Sex
M ale, Female
1, 0
Pat ient gender
Chest Pain Type
A ngina, A bnormal A ngina,
NoTang, A sy mpt omat ic
1–4
NoTang = Nonanginal
pain
Blood Pressure
Numeric
Numeric
Rest ing blood pressure
upon hospit al admission
Cholest erol
Numeric
Numeric
Serum cholest erol
Fast ing Blood
Sugar < 1 2 0
True, False
1, 0
Is f ast ing blood sugar less
t han 1 2 0 ?
Rest ing ECG
Normal, A bnormal, Hy p
0, 1, 2
Hyp = Lef t vent ricular
hypert rophy
M aximum Heart
Rat e
Numeric
Numeric
M aximum heart rat e
achieved
Induced A ngina?
True, False
1, 0
Does t he pat ient ex perience angina
as a result of exercise?
Old Peak
Numeric
Numeric
ST depression induced by exercise
relat ive t o rest
Slope
Up, f lat , dow n
1–3
Slope of t he peak exercise ST
segment
Number Colored
Vessels
0, 1, 2, 3
0, 1, 2, 3
Number of major vessels
colored by f luorosopy
Thal
Normal f ix , rev
3, 6, 7
Normal, f ixed def ect ,
reversible def ect
Concept Class
Healt hy, Sick
1, 0
A ngiographic disease st at us
Table 2.2 • Most and Least Typical Instances from the Cardiology Domain
Attribute
Name
Age
Sex
Chest Pain Type
Blood Pressure
Cholesterol
Fasting Blood Sugar < 120
Resting ECG
Maximum Heart Rate
Induced Angina?
Old Peak
Slope
Number of Colored Vessels
Thal
Most Typical
Healthy Class
Least Typical
Healthy Class
Most Typical
Sick Class
Least Typical
Sick Class
52
Male
NoTang
138
223
False
Normal
169
False
0
Up
0
Normal
63
Male
Angina
145
233
True
Hyp
150
False
2.3
Down
0
Fix
60
Male
Asymptomatic
125
258
False
Hyp
141
True
2.8
Flat
1
Rev
62
Female
Asymptomatic
160
164
False
Hyp
145
False
6.2
Down
3
Rev
A Healthy Class Rule for the
Cardiology Patient Dataset

IF 169 <= Maximum Heart Rate <=202
THEN Concept Class = Healthy
Rule accuracy: 85.07%
Rule coverage: 34.55%
The rule works correctly 85%
of the time.
34.5 % of all healty
patients meet the
conditions specified in
this rule
A Sick Class Rule for the Cardiology
Patient Dataset
IF Thal = Rev & Chest Pain Type =
Asymptomatic
 THEN Concept Class = Sick

Rule accuracy: 91.14%

Rule coverage: 52.17%

Drawing Conclusions

Recall the rule:


IF 169 <= Maximum Heart Rate <=202 THEN Concept Class
= Healthy
Possible interpretations




If patient’s max heart rate is low, s/he might have a
heart attack?
If patient had a heart attack, his max heart rate would
decrease?
A low max heart rate causes a heart attack?
Only a medical expert can tell.
Another Example

Hypoplastic Left Heart Syndrome Case
Study
Affects infants and is uniformly fatal without
surgery.
 Extremely complex relationships among
physiologic parameters in a given patient.
 Temporal datasets


Parameters continuously measured
•Parameters intermittently measured and the
Interventions
•Some rules extracted by mining
•Some rules extracted by mining
Results
Wellness score predicted with accuracy
of 94.57%.
 Incorrect predictions for 1.60% of new
cases (with unknown value of the
wellness score)
 2.22% of new cases the decision rules
could not make any predications.

DiscussionDiscussion
!!
!!!
Related documents