Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
StatMaster – An Update Kartik Vishwanath Chintan Patel Yugyung Lee William Drake Richard Stroup Steve Simon UMKC Childrens Mercy Hospital, Kansas City, MO 07 June 2004 Defining Data Mining The automated extraction of hidden predictive information from (large) databases Three key words: Automated Hidden Predictive Implicit is a statistical methodology Data mining lets you be proactive Prospective rather than Retrospective Kinds of Data Mining Problems Classification: Finding a set of models that describe or distinguish data classes. Clustering: Grouping objects by minimizing interclass similarity and maximizing intraclass similarity. Association: Discovery of association rules showing attribute-value conditions that occur frequently. Examples of Data Mining Table 2.1 • Cardiology Patient Data Attribute Name M ixed Values Numeric Values Comments A ge Numeric Numeric A ge in years Sex M ale, Female 1, 0 Pat ient gender Chest Pain Type A ngina, A bnormal A ngina, NoTang, A sy mpt omat ic 1–4 NoTang = Nonanginal pain Blood Pressure Numeric Numeric Rest ing blood pressure upon hospit al admission Cholest erol Numeric Numeric Serum cholest erol Fast ing Blood Sugar < 1 2 0 True, False 1, 0 Is f ast ing blood sugar less t han 1 2 0 ? Rest ing ECG Normal, A bnormal, Hy p 0, 1, 2 Hyp = Lef t vent ricular hypert rophy M aximum Heart Rat e Numeric Numeric M aximum heart rat e achieved Induced A ngina? True, False 1, 0 Does t he pat ient ex perience angina as a result of exercise? Old Peak Numeric Numeric ST depression induced by exercise relat ive t o rest Slope Up, f lat , dow n 1–3 Slope of t he peak exercise ST segment Number Colored Vessels 0, 1, 2, 3 0, 1, 2, 3 Number of major vessels colored by f luorosopy Thal Normal f ix , rev 3, 6, 7 Normal, f ixed def ect , reversible def ect Concept Class Healt hy, Sick 1, 0 A ngiographic disease st at us Table 2.2 • Most and Least Typical Instances from the Cardiology Domain Attribute Name Age Sex Chest Pain Type Blood Pressure Cholesterol Fasting Blood Sugar < 120 Resting ECG Maximum Heart Rate Induced Angina? Old Peak Slope Number of Colored Vessels Thal Most Typical Healthy Class Least Typical Healthy Class Most Typical Sick Class Least Typical Sick Class 52 Male NoTang 138 223 False Normal 169 False 0 Up 0 Normal 63 Male Angina 145 233 True Hyp 150 False 2.3 Down 0 Fix 60 Male Asymptomatic 125 258 False Hyp 141 True 2.8 Flat 1 Rev 62 Female Asymptomatic 160 164 False Hyp 145 False 6.2 Down 3 Rev A Healthy Class Rule for the Cardiology Patient Dataset IF 169 <= Maximum Heart Rate <=202 THEN Concept Class = Healthy Rule accuracy: 85.07% Rule coverage: 34.55% The rule works correctly 85% of the time. 34.5 % of all healty patients meet the conditions specified in this rule A Sick Class Rule for the Cardiology Patient Dataset IF Thal = Rev & Chest Pain Type = Asymptomatic THEN Concept Class = Sick Rule accuracy: 91.14% Rule coverage: 52.17% Drawing Conclusions Recall the rule: IF 169 <= Maximum Heart Rate <=202 THEN Concept Class = Healthy Possible interpretations If patient’s max heart rate is low, s/he might have a heart attack? If patient had a heart attack, his max heart rate would decrease? A low max heart rate causes a heart attack? Only a medical expert can tell. Another Example Hypoplastic Left Heart Syndrome Case Study Affects infants and is uniformly fatal without surgery. Extremely complex relationships among physiologic parameters in a given patient. Temporal datasets Parameters continuously measured •Parameters intermittently measured and the Interventions •Some rules extracted by mining •Some rules extracted by mining Results Wellness score predicted with accuracy of 94.57%. Incorrect predictions for 1.60% of new cases (with unknown value of the wellness score) 2.22% of new cases the decision rules could not make any predications. DiscussionDiscussion !! !!!