Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Evaluating Induced Models with Daniel L. Silver Copyright (c), 2004 All Rights Reserved CogNova Technologies 2 Agenda Interpretation and Evaluation Phase Model accuracy (fitness) and confidence Testing the difference between two models Testing the difference between two DM methods (e.g. IDT versus ANN) CogNova Technologies 3 The KDD Process Interpretation and Evaluation Data Mining Knowledge Selection and Preprocessing Data Consolidation p(x)=0.02 Patterns & Models Data Warehouse Prepared Data Consolidated Data Data Sources CogNova Technologies 4 Inductive Modeling = Data Mining Basic Framework for Inductive Learning Testing Examples Environment Training Examples (x, f(x)) Inductive Learning System Induced Model of Classifier ~ f(x)? h(x) = Focus is on developing models that can accurately classify new examples. Output Classification (x, h(x)) CogNova Technologies 5 Model Accuracy and Confidence Preferably a separate verification set is used to judge fitness or accuracy Statistical confidence in the accuracy of a model can be expressed as an interval Mean Error or Error Rate h1 CogNova Technologies 6 The Normal Curve and Confidence Intervals Consider a class of 30 persons True mean (average) mark of 75% How can we estimate this from the marks of only 10 sample persons? Let’s do an example using Excel CogNova Technologies 7 Model Accuracy and Confidence Approach #1: Large Sample When the amount of available data is large ... Available Examples 70% Divide randomly Training Set Used to develop one model Test Set 30% Verify Set Generalization = test/verify fit Compute Test error CogNova Technologies 8 Model Accuracy and Confidence Generalization statistic (fit, error or accuracy) is provided by the learning system Confidence interval must be computed: • Continuous target variable - Compute mean error over n examples and confidence interval using Excel (evaluate_models.xls) • Nominal (binary) target variable - Given an error rate of P from a sample of n examples, then the 95%conf. interval = 1.96 sqrt(P(1-P)/n) = 1.96 stdev o P = number incorrect / n • Strictly speaking this is for n >= 30 CogNova Technologies Testing the Difference Between Two Models 9 Which of the following two hypotheses is the better? … h1 or h2 ? Fitness or Error Rate h1 h2 h3 CogNova Technologies Testing the Difference Between Two Models 10 Assumption: If some measurable characteristic of the models is statistically different then we will consider the models different We will focus on the characteristics: mean error, and error rate (proportion incorrect) which can be computed from the test results CogNova Technologies Testing the Difference Between Two Models Continuous 11 target variable • Use a Difference of Means Test Nominal (binary) target variable • Use a Difference of Proportions Test For 95% confidence in a difference then p-value statistic must be <= 0.05 (see Excel spreadsheet example) CogNova Technologies Testing the Difference Between Two DM Methods 12 Cross-Validation must be performed Requires generating several models with different train, test and verify sets With WEKA use the accuracy or error rate on the test sets CogNova Technologies 13 Network Training Approach #2: Cross-validation Provides a sense of confidence in model ... Available Examples 10% 90% Training Set Used to develop 10 different models Repeat 10 times Test Set Ver. Set Generalization determined by mean test fit and stddev Accumulate test errors CogNova Technologies Testing the Difference Between Two DM Methods 14 A Difference of Means T-test can be used to determine a p-value statistic For 95% confidence in a difference then p-value statistic must be <= 0.05 (see Excel spreadsheet example) CogNova Technologies 15 Example: Using Census Data Problem: To identify males given census data Performance measure: • Accuracy = Goodness of fit Model generation: IDT and ANN CogNova Technologies 16 Example: Using Census Data Record results: Goodness of fit stats on test set for 10 different models • Mean fitness: ANN= 26.6, IDT = 31.8 Test difference between models: Use a difference of means T-test (see evaluate_models.xls) • p-value = 0.00124 • Since p-value < 0.05, the two models are significantly different CogNova Technologies 17 THE END [email protected] CogNova Technologies