* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Artificial Intelligence Project #3 : Analysis of Decision Tree Learning
Epigenetics in learning and memory wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Genome evolution wikipedia , lookup
Gene desert wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene therapy wikipedia , lookup
Genome (book) wikipedia , lookup
Gene nomenclature wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Maximum parsimony (phylogenetics) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Quantitative comparative linguistics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
Gene expression profiling wikipedia , lookup
Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006 Introduction Decision tree learning is a method for approximating discrete-valued target function The learned function is represented by a decision tree Decision tree can also be re-represented as if-then rules to improve human readability An Example of Decision Tree Decision Tree Representation (1/2) Decision tree classify instances by sorting them down the tree from the root to some leaf node Node Specifies test of some attribute Branch Corresponds to one of the possible values for this attribute Decision Tree Representation (2/2) Each path corresponds to a conjunction of attribute tests Outlook Sunny (Outlook=sunny, Temperature=Hot, Humidity=high, Wind=Strong) Rain (Outlook=Sunny ∧ Humidity=High) so NO Overcast Humidity High Normal No Yes Yes Wind Strong Weak No Yes Decision trees represent a disjunction of conjunction of constraints on the attribute values of instances (Outlook=Sunny ∧Humidity=normal) ∨(Outlook=Overcast) ∨(Outlook=Rain ∧Wind=Weak) What is the merit of tree representation? Appropriate Problems for Decision Tree Learning Instances are represented by attribute-value pairs The target function has discrete output values Disjunctive descriptions may be required The training data may contain errors Both errors in classification of the training examples and errors in the attribute values The training data may contain missing attribute values Suitable for classification Study Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells, MH Cheok et al., Nature Genetics 35, 2003. 60 leukemia patients Bone marrow samples Affymetrix GeneChip arrays Gene expression data Gene Expression Data # of data examples 120 (60: before treatment, 60: after treatment) # of genes measured 12600 (Affymetrix HG-U95A array) Task Classification between “before treatment” and “after treatment” based on gene expression pattern Affymetrix GeneChip Arrays Use short oligos to detect gene expression level. Each gene is probed by a set of short oligos. Each gene expression level is summarized by Signal: numerical value describing the abundance of mRNA A/P call: denotes the statistical significance of signal Preprocessing Remove the genes having more than 60 ‘A’ calls # of genes: 12600 3190 Discretization of gene expression level Criterion: median gene expression value of each sample 0 (low) and 1 (high) Gene Filtering Using mutual information P(G, C ) I (G; C ) P(G, C ) log P(G ) P(C ) G ,C Estimated probabilities were used. # of genes: 3190 1000 Final dataset # of attributes: 1001 (one for the class) Class: 0 (after treatment), 1 (before treatment) # of data examples: 120 Final Dataset 1000 120 Materials for the Project Given Preprocessed microarray data file: data2.txt Downloadable WEKA (http://www.cs.waikato.ac.nz/ml/weka/) Analysis of Decision Tree Learning Analysis of Decision Tree Learning Submission Due date: June 15 (Thu.), 12:00(noon) Report: Hard copy(301-419) & e-mail. ID3, J48 and another decision tree algorithm with learning parameter. Show the experimental results of each algorithm. Except for ID3, you should try to find out better performance, changing learning parameter. Analyze what makes difference between selected algorithms. E-mail : [email protected]