Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
國立雲林科技大學 National Yunlin University of Science and Technology Comparing Association Rules and Decision Trees for Disease Prediction Advisor : Dr. Hsu Presenter : Yu-San Hsieh Author : Carlos Ordonez 2006. CIKM.17-24 1 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Outline Motivation Objective Method Experiments Conclusions 2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation The mining association rules exits some questions in a medical data set ─ ─ ─ Irrelevant Most relevant rules appear only at low support The number of discovered rules becomes large at low support The number of rules makes search slow and interpretation by the domain expert difficult. 3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objective We propose search constraints to find only medically significant association rules and make search more efficient. 4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Method Medical dataset Transforming Search constraints ─ ─ Phase 1 ─ ─ Phase 2 Phase 1 Search Constraints Support confidence User-specified maximum item-set size κ group : A→g group(Aj) = gj group(AGE)=0 AGE is not group-constrained group(AL)=1 AL is constrained to belong group 1 group(attribute(a)) ≠ group(attribute(b)) (-1.0<= IL < 0.2) and (-1.0 <= LA < 0.2) are not in the same itemset ac : A→C ac(Aj) = cj AGE LAD ac(AGE) = 1 AGE is in antecedent 5 ac(LAD) =2 LAD is in consequent Intelligent Database Systems Lab Phase 2 Association rules Experiments N.Y.U.S.T. I. M. Decision tree The medical data set ─ ─ 655 patients and 25 attribute (numeric and categorical) Three basic elements for analysis ─ Default parameter setting ─ Perfusion defect Coronary stenosis Risk fatocr Maximal itemset size κ=4 Minimum support = 1% Minimum confidence = 70% Negation, ac and Group 6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions The decision tree are less effective than constrained association rule ─ ─ ─ ─ ─ Predict disease with several related target attribute Low confidence factor Slight overfitting Rule complexity Data set fragmentation 7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. My opinion Advantage ─ Drawback ─ ─ Producing medically useful rules, reducing the number of discovered rules and improving running time Lack of quantitative evaluation Most of rules’ analysis Application ─ ─ Prediction Classification 8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Method Transformed to binary dimension ─ ─ Numerical data: age 0< age <=40 and 40< age <=60 Categorical data: sex sex = Male and sex = Female First constraint ─ An attribute has negation Additional items are created and corresponding to each negated categorical value or each negated interval example: not(0 <= LM < 30), not(0 <= LAD <50), not(0 <= LCX <50)…… 9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Predictive healthy 10 association rule LCX diseased LAD RCA Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Predictive Decision tree ─ ─ ─ Using the CN4.5 decision tree algorithm Focused on predicting LAD disease (LAD≧50 as the target class) Result : maximal height = 3 Numeric dimensions and automatic splits Manually binned variable Confidence↓,not useful 11 Confidence↓ Intelligent Database Systems Lab