Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
國立雲林科技大學 National Yunlin University of Science and Technology Knowledge discovery with classification rules in a cardiovascular dataset Advisor : Dr. Hsu Presenter : Zih-Hui Lin Author :Viii Podgorelec a,*, Peter Kokol a, Milojka Molan Sti81ic b, Marjan Heri :ko a, Ivan Rozrnan a Computer Methods and Programs in Biomedicine, Volume 80, Supplement 1, December 2005, Pages S39-S49 1 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Outline Motivation Objective Introduction The AREX algorithm Experiment Conclusions 2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation Modern medicine generates huge amounts of data and there is an acute and widening gap between data collection and data comprehension. it is very difficult for a human to make use of such amount of information (i.e. hundreds of attributes, thousand of images, several channels of 24 hours of ECG or EEG signals) 3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objective enable searching for new facts, which should reveal some new interesting patterns and possibly improve the existing medical knowledge. 4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction Decision tree ─ Advantage ─ transparency of the classification process that one can easily interpret, understand and criticize. Disadvantages poor processing of incomplete, noisy data, inability to build several trees for the same dataset inability to use the preferred attributes, etc. 5 Intelligent Database Systems Lab 2 1 The AREX algorithm N.Y.U.S.T. I. M. 1 multi-population self-adapting genetic algorithm for the induction of decision trees. 3 1.1 Build N decision trees upon objects from S Oi 1.2 Classify object with nt randomly chosen trees from all N trees s N 1.4 From all N decision create M initial 2 evolution of programs in an arbitrary classification rules programming language, which is used to evolve classification 2.2 S* 1.3 if frequency of the most frequent decision class classified by nt trees > nt - ct (ct=nt/2) 6 2.1 create M/2+1 rules (randomly) 2.3 If s is not empty •Add |s| randomly chosen objects from s* to s •ct=ct+1 •repeat 1.1 2.4 an optimal set of classification rules is determined with a simple genetic algorithm Intelligent Database Systems Lab Genetic algorithm for the construction of decision trees 1.Number of attribute nodes M that will be in the tree 2. Select an attribute Xi M attributes population Xi null 3. 選一空節點,(tree深度 愈高,選中機率愈低) root null Xi 4. Randomly select an attribute Xi (還沒被選過的機率較高) null null null null N.Y.U.S.T. I. M. (1)Continuous attributes →split constant (2)Discrete attributes →randomly defined two disjunctive sets Xi null null •For each empty leaf the following algorithm determines the appropriate decision class 7 未使用的 leaf nodes Intelligent Database Systems Lab proGenesys system & Finding the optimal set of rules N.Y.U.S.T. I. M. (it covers many objects - otherwise it tends to be too specific). (most of the objects covered by the rule should fall into the same decision class) 8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Dataset contains data of 100 patients from Maribor Hospital. The attributes include ─ ─ ─ ─ general data (age, sex, etc.) health status (data from family history and child's previous illnesses), general cardiovascular data (blood pressure, pulse, chest pain, etc.) more specialized cardiovascular data - data from child's cardiac history and clinical examinations (with findings of ultrasound, ECG, etc.). dataset five different diagnoses are possible: ─ ─ ─ ─ ─ innocent heart murmur良性雜音 congenital heart disease with left-to-right shunt先天性心臟病(左向右 分流) aortic valve disease with aorta coarctation,主動脈辨疾病(主動脈縮窄) arrhythmias心律不整 chest pain.心悸 9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Classification result –training set Overfitting 10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Classification result –testing set 11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Classification result 12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions One of the most evident advantages of AREX is the simultaneous very good ─ ─ Generalization → high and similar overall accuracy on both training set and test set Specialization → high and very similar accuracy of all decision classes, also the least frequent ones. equip physicians with a powerful technique to ─ ─ (1) confirm their existing knowledge about some medical problem (2) enable searching for new facts, which should reveal some new interesting patterns and possibly improve the existing medical knowledge. 13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. My opinion Advantage: 依類別給予權重 Disadvantage: Apply: 實際應用於臨床上 14 Intelligent Database Systems Lab