Download Empirical Methods in Computer Science

Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan 2 Single-Factor Experiments   A generalization of treatment experiments Determine effect of independent variable values (nominal)  Effect: On the dependent variable treatment1 treatment2 control   Ind1 & Ex1 & Ex2 & .... & Exn ==> Dep1 Ind2 & Ex1 & Ex2 & .... & Exn ==> Dep2 Ex1 & Ex2 & .... & Exn ==> Dep3 Compare performance of algorithm A to B to C .... Control condition: Optional (e.g., to establish baseline) Empirical Methods in Computer Science © 2006-now Gal Kaminka 3 Single-Factor Experiments   An generalization of treatment experiments Determine effect of independent variable values (nominal)  Effect: On the dependent variable treatment1 treatment2 control   Ind1 & Ex1 & Ex2 & .... & Exn ==> Dep1 Ind2 & Ex1 & Ex2 & .... & Exn ==> Dep2 Ex1 & Ex2 & .... & Exn ==> Dep3 Values of independent variable Values of dependent variable Compare performance of algorithm A to B to C .... Control condition: Optional (e.g., to establish baseline) Empirical Methods in Computer Science © 2006-now Gal Kaminka Single-Factor Experiments: Definitions   The independent variable is called the factor Its values (being tested) are called levels Our goal:  Determine whether there is an effect of levels  Null hypothesis: There is no effect  Alternative hypothesis: At least one level causes an effect  Tool: One-way ANOVA  A simple special case of general Analysis of Variance Empirical Methods in Computer Science © 2006-now Gal Kaminka 4 The case for Single-factor ANOVA (one-way ANOVA)  We have k samples (k levels of the factor)   5 Each with its own sample mean, sample std. deviation for the dependent variable value We want to determine whether one (at least) is different treatment1 … treatment2 control Ind1 & Ex1 & Ex2 & .... & Exn ==> Dep1 Indk & Ex1 & Ex2 & .... & Exn ==> Depk Ex1 & Ex2 & .... & Exn ==> Dep3 Values of independent variable = levels of the factor Values of dependent variable Cannot use the tests we learned: Why? Empirical Methods in Computer Science © 2006-now Gal Kaminka The case for Single-factor ANOVA (one-way ANOVA)  We have k samples (k levels of the factor)   Each with its own sample mean, sample std. deviation We want to determine whether one (at least) is different Level ( sam ple) 1 2 3 4   S. m ean S. st dev. Mi Si 4.24 0.91 3.75 1.38 2.85 1.38 2.63 1.41 H0: M1=M2=M3=M4 H1: There exist i,j such that Mi <> Mj Empirical Methods in Computer Science © 2006-now Gal Kaminka N 29 120 59 59 6 The case for Single-factor ANOVA (one-way ANOVA)  We have k samples (k levels of the factor)   Each with its own sample mean, sample std. deviation We want to determine whether one (at least) is different Level ( sam ple) 1 2 3 4   S. m ean S. st dev. Mi Si 4.24 0.91 3.75 1.38 2.85 1.38 2.63 1.41 N 29 120 59 59 Why not use t-test H0: M1=M2=M3=M4 to compare every H1: There exist i,j such that Mi <> Mj M, M? i Empirical Methods in Computer Science © 2006-now Gal Kaminka j 7 8 Multiple paired comparisons  Let ac be the probability of an error in a single comparison     alpha = the probability of incorrectly rejecting null hypothesis 1-ac: probability of making no error in a single comparison (1-ac)m: probability of no error in m comparisons (experiment) ae = 1-(1-ac)m: probability of an error in the experiment  Under assumption of independent comparisons ae quickly becomes large as m increases Empirical Methods in Computer Science © 2006-now Gal Kaminka 9 Example  Suppose we want to contrast 15 levels of the factor   Total number of pairwise comparisons (m) : 105    15 groups, k=15 15 X (15-1) / 2 = 105 Suppose ac = 0.05 Then ae = 1-(1-ac)m = 1-(1-0.05)105 = 0.9954 We are very likely to make a type I error! Empirical Methods in Computer Science © 2006-now Gal Kaminka 10 Possible solutions?  Reduce ac until overall ae level is 0.05 (or as needed)   Risk: comparison alpha target may become unobtainable Ignore experiment null hypothesis, focus on comparisons    Carry out m comparisons # of errors in m experiments: m X ac e.g., m=105, ac=0.05, # of errors = 5.25. But which? Empirical Methods in Computer Science © 2006-now Gal Kaminka 11 One-way ANOVA  A method for testing the experiment null hypothesis  H0: all levels' sample means are equal to each other Key idea:  Estimate a variance B under the assumption H0 is true  Estimate a “real” variance W (regardless of H0)  Use F-test to test hypothesis that B=W  Assumes variance of all groups is the same Empirical Methods in Computer Science © 2006-now Gal Kaminka 12 Some preliminaries    Let xi,j be the jth element in sample i Let Mi be the sample mean of sample i Let Vi be the sample variance of sample i For example: x1,2 x3,4 Mi Vi Empirical Methods in Computer Science Class 1 14.9 15.2 17.9 15.6 10.7 14.86 6.8 Class 2 11.1 9.5 10.9 11.7 11.8 11 0.85 © 2006-now Gal Kaminka Class 3 5.7 6.6 6.7 6.8 6.9 6.54 0.23 13 Some preliminaries      Let xi,j be the jth element in sample i Let Mi be the sample mean of sample i Let Vi be the sample variance of sample i Let M be the grand sample mean (all elements, all samples) Let V be the grand sample variance Empirical Methods in Computer Science © 2006-now Gal Kaminka 14 The variance contributing to a value  Every element xi,j can be re-written as: xi,j = M + ei,j where ei,j is some error component  We can focus on the error component ei,j = xi,j – M which we will rewrite as: ei,j = (xi,j - Mi ) + (Mi - M) Empirical Methods in Computer Science © 2006-now Gal Kaminka 15 Within-group and between-group  The re-written form of the error component has two parts ei,j = (xi,j - Mi ) + (Mi - M)    Within-group component: variance w.r.t group mean Between-group component: variance w.r.t grand mean For example, in the table:   x1,1 = 14.9, M1 = 14.86, M = 10.8 e1,1 = (14.9-14.86) + (14.86 – 10.8) = 0.04 + 4.06 = 4.1 Empirical Methods in Computer Science © 2006-now Gal Kaminka 16 Within-group and between-group  The re-written form of the error component has two parts ei,j = (xi,j - Mi ) + (Mi - M)    For example, in the table:    Within-group component: variance w.r.t group mean Between-group component: variance w.r.t grand mean x1,1 = 14.9, M1 = 14.86, M = 10.8 e1,1 = (14.9-14.86) + (14.86 – 10.8) = 0.04 + 4.06 = 4.1 Note within-group and between-group components:  Most of the error (variance) is due to the between group! Can we use this in more general fashion? Empirical Methods in Computer Science © 2006-now Gal Kaminka 17 No within-group variance M 10.67 V 14.52 Mi Vi Class 1 15 15 15 15 15 15 0 Class 2 11 11 11 11 11 11 0 Class 3 6 6 6 6 6 6 0 No variance within group, in any element Empirical Methods in Computer Science © 2006-now Gal Kaminka 18 No between-group variance M 15 V 24.86 Mi Vi Class 1 17 26 9 11 12 15 46.5 Class 2 11 13 18 18 15 15 9.5 Class 3 22 14 12 8 19 15 31 No variance between groups, in any group Empirical Methods in Computer Science © 2006-now Gal Kaminka 19 Comparing within-group and between-groups components  The error component of a single element is: ei,j =(xi,j - M) = (xi,j - Mi ) + (Mi - M)  Let us relate this to the sample and grand sums-of-squares  It can be shown that:  x i, j  M  =  xi, j  M i  +  M i  M  i  2 j 2 i j 2 i Let us rewrite this as SStotal = SSwithin + SSbetween Empirical Methods in Computer Science © 2006-now Gal Kaminka j 20 From Sums of Squares (SS) to variances  We know SStotal = SSwithin + SSbetween  ... and convert to Mean Squares (as variance estimates): MS within = SSwithin = df within  x i, j  M i  i  x i, j  M i  2 j I  N i  1 = i j NI 2 ` k =1 MS between = SSbetween = df between Empirical Methods in Computer Science 2 2   M  M   N M  M  i  i i i j I 1 = i © 2006-now Gal Kaminka I 1 21 From Sums of Squares (SS) to variances  We know  ... and convert to variances: MS within = SStotal = SSwithin + SSbetween SSwithin = df within  x i, j  M i  i  x i, j  M i  2 j I  N i  1 = i 2 j NI k =1 Degrees of freedom SSbetween MS between = = df between Empirical Methods in Computer Science 2 2   M  M   N M  M  i  i i i j I 1 = i I 1 © 2006-now Gal Kaminka 22 From Sums of Squares (SS) to variances   We know SStotal = SSwithin + SSbetween ... and convert to variances: SSwithin MS within = = df within  x i, j  M i  i j I  N i  1  x i, j  M i  2 = i 2 j NI k =1 # of levels (samples) MS between = SSbetween = df between Empirical Methods in Computer Science 2 2   M  M   N M  M  i  i i i j I 1 = i © 2006-now Gal Kaminka I 1 23 From Sums of Squares (SS) to variances  We know SStotal = SSwithin + SSbetween  ... and convert to variances: MS within = SSwithin = df within  x i, j  M i  i  x i, j  M i  2 j I  N i  1 = i j NI 2 ` k =1 MS between = SSbetween = df between Empirical Methods in Computer Science 2 2   M  M   N M  M  i  i i i j I 1 = i © 2006-now Gal Kaminka I 1 24 Determining final alpha level  MSwithin is an estimate of the (inherent) population variance  Which does not depend on the null hypothesis (M1=M2=... MI)   Intuition: It’s an “average” of variances in the individual groups MSbetween estimates the population variance + the treatment effect  It does depend on the null hypothesis  Intuition: It’s similar to an estimate for the variance of the samples means, where each component is multiplied by Ni     N · sample mean variance = population variance If the null hypothesis is true – the two values estimate the inherent variance, and should be equal up to the sampling variation So now we have two variance estimates for testing Use F-test    Recall: F = Msbetween / MSwithin Compare to F-distribution with dfbetween, dfwithin Determine alpha level (significance) Empirical Methods in Computer Science © 2006-now Gal Kaminka 25 Example M 10.8 V 14.64 Mi Vi Class 1 14.9 15.2 17.9 15.6 10.7 14.86 6.8 Class 2 11.1 9.5 10.9 11.7 11.8 11 0.85 Class 3 5.7 6.6 6.7 6.8 6.9 6.54 0.23 SSbetween = 514.86  10.8 + 511  10.8 + 56.54  10.8 = 173.3 2 2 SSbetween 173.3 173.3 MS between = = = = 86.7 df between 3  1 2 Empirical Methods in Computer Science © 2006-now Gal Kaminka 2 26 Example Class 1 14.9 15.2 17.9 15.6 10.7 14.86 6.8 M 10.8 V 14.64 Mi Vi Class 2 11.1 9.5 10.9 11.7 11.8 11 0.85 Class 3 5.7 6.6 6.7 6.8 6.9 6.54 0.23 SSwithin = 14.9  14.86 +... + 10.7  14.86 2 2 + 11.1  11 + .... + 11.8  11 + ... + 6.9  6.54  = 31.5 2 MS within = 2 2 SSwithin 31.5 31.5 = = = 2.6 df within 15  3 12 Empirical Methods in Computer Science © 2006-now Gal Kaminka 27 Example M 10.8 V 14.64 Mi Vi Class 1 14.9 15.2 17.9 15.6 10.7 14.86 6.8 Class 2 11.1 9.5 10.9 11.7 11.8 11 0.85 M S b e t w e e n 8 6 .7 F= = = 3 2 .9 7 M S w it h in 2 .6 Empirical Methods in Computer Science Class 3 5.7 6.6 6.7 6.8 6.9 6.54 0.23 Check F distribution(2,12): Significant! © 2006-now Gal Kaminka 28 Reading the results from statistics software   You can use a statistics software to run one-way ANOVA It will give out something like this: Source between within total df 2 14 16 SS 173.3 31.5 204.9 MS F 86.7 32.97 2.6 p p<0.001 You should have no problem reading this, now. Empirical Methods in Computer Science © 2006-now Gal Kaminka 29 Analogy to linear regression Analogy to linear regression – where: • the variance of observation is composed of: – the variance of the predictions – plus the variance of the deviations from the corresponding predictions: • that is – explained variance (according to the prediction) vs. unexplained variance (due to deviations from prediction) Empirical Methods in Computer Science © 2006-now Gal Kaminka 30 Summary  Treatment and single-factor experiments   Independent variable: categorical Dependent variable: “numerical” (ratio/interval)  Multiple comparisons: A problem for experiment hypotheses  Run one-way ANOVA instead     Assumes:  populations are normal  have equal variances  independent random samples (with replacement) Moderate deviation from normal, particularly with large samples, is still fine Somewhat different variances are fine for roughly equal samples If significant, run additional tests for details:     Tukey's procedure (T method) LSD Scheffe ... Empirical Methods in Computer Science © 2006-now Gal Kaminka

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Empirical Methods in Computer Science