Download Empirical Methods in Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Analysis of variance wikipedia , lookup

Transcript
Statistical Methods in
Computer Science
Hypothesis Testing II:
Single-Factor Experiments
Ido Dagan
2
Single-Factor Experiments


A generalization of treatment experiments
Determine effect of independent variable values (nominal)

Effect: On the dependent variable
treatment1
treatment2
control


Ind1 & Ex1 & Ex2 & .... & Exn ==> Dep1
Ind2 & Ex1 & Ex2 & .... & Exn ==> Dep2
Ex1 & Ex2 & .... & Exn ==> Dep3
Compare performance of algorithm A to B to C ....
Control condition: Optional (e.g., to establish baseline)
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
3
Single-Factor Experiments


An generalization of treatment experiments
Determine effect of independent variable values (nominal)

Effect: On the dependent variable
treatment1
treatment2
control


Ind1 & Ex1 & Ex2 & .... & Exn ==> Dep1
Ind2 & Ex1 & Ex2 & .... & Exn ==> Dep2
Ex1 & Ex2 & .... & Exn ==> Dep3
Values of independent
variable
Values of dependent variable
Compare performance of algorithm A to B to C ....
Control condition: Optional (e.g., to establish baseline)
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
Single-Factor Experiments:
Definitions


The independent variable is called the factor
Its values (being tested) are called levels
Our goal:
 Determine whether there is an effect of levels
 Null hypothesis: There is no effect
 Alternative hypothesis: At least one level causes an effect

Tool: One-way ANOVA

A simple special case of general Analysis of Variance
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
4
The case for Single-factor ANOVA
(one-way ANOVA)

We have k samples (k levels of the factor)


5
Each with its own sample mean, sample std. deviation for the
dependent variable value
We want to determine whether one (at least) is different
treatment1
…
treatment2
control
Ind1 & Ex1 & Ex2 & .... & Exn ==> Dep1
Indk & Ex1 & Ex2 & .... & Exn ==> Depk
Ex1 & Ex2 & .... & Exn ==> Dep3
Values of independent
variable = levels of the factor
Values of dependent variable
Cannot use the tests we learned: Why?
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
The case for Single-factor ANOVA
(one-way ANOVA)

We have k samples (k levels of the factor)


Each with its own sample mean, sample std. deviation
We want to determine whether one (at least) is different
Level
( sam ple)
1
2
3
4


S. m ean S. st dev.
Mi
Si
4.24
0.91
3.75
1.38
2.85
1.38
2.63
1.41
H0: M1=M2=M3=M4
H1: There exist i,j such that Mi <> Mj
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
N
29
120
59
59
6
The case for Single-factor ANOVA
(one-way ANOVA)

We have k samples (k levels of the factor)


Each with its own sample mean, sample std. deviation
We want to determine whether one (at least) is different
Level
( sam ple)
1
2
3
4


S. m ean S. st dev.
Mi
Si
4.24
0.91
3.75
1.38
2.85
1.38
2.63
1.41
N
29
120
59
59
Why not use t-test
H0: M1=M2=M3=M4
to compare every
H1: There exist i,j such that Mi <> Mj
M, M?
i
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
j
7
8
Multiple paired comparisons

Let ac be the probability of an error in a single comparison




alpha = the probability of incorrectly rejecting null hypothesis
1-ac: probability of making no error in a single comparison
(1-ac)m: probability of no error in m comparisons
(experiment)
ae = 1-(1-ac)m: probability of an error in the experiment

Under assumption of independent comparisons
ae quickly becomes large as m increases
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
9
Example

Suppose we want to contrast 15 levels of the factor


Total number of pairwise comparisons (m) : 105



15 groups, k=15
15 X (15-1) / 2 = 105
Suppose ac = 0.05
Then ae = 1-(1-ac)m = 1-(1-0.05)105 = 0.9954
We are very likely to make a type I error!
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
10
Possible solutions?

Reduce ac until overall ae level is 0.05 (or as needed)


Risk: comparison alpha target may become unobtainable
Ignore experiment null hypothesis, focus on comparisons



Carry out m comparisons
# of errors in m experiments: m X ac
e.g., m=105, ac=0.05, # of errors = 5.25. But which?
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
11
One-way ANOVA

A method for testing the experiment null hypothesis

H0: all levels' sample means are equal to each other
Key idea:
 Estimate a variance B under the assumption H0 is true
 Estimate a “real” variance W (regardless of H0)
 Use F-test to test hypothesis that B=W

Assumes variance of all groups is the same
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
12
Some preliminaries



Let xi,j be the jth element in sample i
Let Mi be the sample mean of sample i
Let Vi be the sample variance of sample i
For example:
x1,2
x3,4
Mi
Vi
Empirical Methods in Computer Science
Class 1
14.9
15.2
17.9
15.6
10.7
14.86
6.8
Class 2
11.1
9.5
10.9
11.7
11.8
11
0.85
© 2006-now Gal Kaminka
Class 3
5.7
6.6
6.7
6.8
6.9
6.54
0.23
13
Some preliminaries





Let xi,j be the jth element in sample i
Let Mi be the sample mean of sample i
Let Vi be the sample variance of sample i
Let M be the grand sample mean (all elements, all samples)
Let V be the grand sample variance
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
14
The variance contributing to a value

Every element xi,j can be re-written as:
xi,j = M + ei,j
where ei,j is some error component

We can focus on the error component
ei,j = xi,j – M
which we will rewrite as:
ei,j = (xi,j - Mi ) + (Mi - M)
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
15
Within-group and between-group

The re-written form of the error component has two parts
ei,j = (xi,j - Mi ) + (Mi - M)



Within-group component: variance w.r.t group mean
Between-group component: variance w.r.t grand mean
For example, in the table:


x1,1 = 14.9,
M1 = 14.86,
M = 10.8
e1,1 = (14.9-14.86) + (14.86 – 10.8) = 0.04 + 4.06 = 4.1
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
16
Within-group and between-group

The re-written form of the error component has two parts
ei,j = (xi,j - Mi ) + (Mi - M)



For example, in the table:



Within-group component: variance w.r.t group mean
Between-group component: variance w.r.t grand mean
x1,1 = 14.9, M1 = 14.86, M = 10.8
e1,1 = (14.9-14.86) + (14.86 – 10.8) = 0.04 + 4.06 = 4.1
Note within-group and between-group components:

Most of the error (variance) is due to the between group!
Can we use this in more general fashion?
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
17
No within-group variance
M
10.67
V
14.52
Mi
Vi
Class 1
15
15
15
15
15
15
0
Class 2
11
11
11
11
11
11
0
Class 3
6
6
6
6
6
6
0
No variance within group, in any element
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
18
No between-group variance
M
15
V
24.86
Mi
Vi
Class 1
17
26
9
11
12
15
46.5
Class 2
11
13
18
18
15
15
9.5
Class 3
22
14
12
8
19
15
31
No variance between groups, in any group
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
19
Comparing within-group and
between-groups components

The error component of a single element is:
ei,j =(xi,j - M) = (xi,j - Mi ) + (Mi - M)

Let us relate this to the sample and grand sums-of-squares

It can be shown that:
 x
i, j  M  =  xi, j  M i  +  M i  M 
i

2
j
2
i
j
2
i
Let us rewrite this as
SStotal = SSwithin + SSbetween
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
j
20
From Sums of Squares (SS) to variances

We know
SStotal = SSwithin + SSbetween

... and convert to Mean Squares (as variance estimates):
MS within =
SSwithin
=
df within
 x
i, j  M i 
i
 x
i, j  M i 
2
j
I
 N i  1
=
i
j
NI
2
`
k =1
MS between =
SSbetween
=
df between
Empirical Methods in Computer Science
2
2


M

M


N
M

M

i

i
i
i
j
I 1
=
i
© 2006-now Gal Kaminka
I 1
21
From Sums of Squares (SS) to variances

We know

... and convert to variances:
MS within =
SStotal = SSwithin + SSbetween
SSwithin
=
df within
 x
i, j  M i 
i
 x
i, j  M i 
2
j
I
 N i  1
=
i
2
j
NI
k =1
Degrees of
freedom
SSbetween
MS between =
=
df between
Empirical Methods in Computer Science
2
2


M

M


N
M

M

i
 i i
i
j
I 1
=
i
I 1
© 2006-now Gal Kaminka
22
From Sums of Squares (SS) to variances


We know
SStotal = SSwithin + SSbetween
... and convert to variances:
SSwithin
MS within =
=
df within
 x
i, j  M i 
i
j
I
 N i  1
 x
i, j  M i 
2
=
i
2
j
NI
k =1
# of levels
(samples)
MS between =
SSbetween
=
df between
Empirical Methods in Computer Science
2
2


M

M


N
M

M

i

i
i
i
j
I 1
=
i
© 2006-now Gal Kaminka
I 1
23
From Sums of Squares (SS) to variances

We know
SStotal = SSwithin + SSbetween

... and convert to variances:
MS within =
SSwithin
=
df within
 x
i, j  M i 
i
 x
i, j  M i 
2
j
I
 N i  1
=
i
j
NI
2
`
k =1
MS between =
SSbetween
=
df between
Empirical Methods in Computer Science
2
2


M

M


N
M

M

i

i
i
i
j
I 1
=
i
© 2006-now Gal Kaminka
I 1
24
Determining final alpha level

MSwithin is an estimate of the (inherent) population variance

Which does not depend on the null hypothesis (M1=M2=... MI)


Intuition: It’s an “average” of variances in the individual groups
MSbetween estimates the population variance + the treatment effect

It does depend on the null hypothesis

Intuition: It’s similar to an estimate for the variance of the samples means, where each
component is multiplied by Ni




N · sample mean variance = population variance
If the null hypothesis is true – the two values estimate the inherent
variance, and should be equal up to the sampling variation
So now we have two variance estimates for testing
Use F-test



Recall:
F = Msbetween / MSwithin
Compare to F-distribution with dfbetween, dfwithin
Determine alpha level (significance)
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
25
Example
M
10.8
V
14.64
Mi
Vi
Class 1
14.9
15.2
17.9
15.6
10.7
14.86
6.8
Class 2
11.1
9.5
10.9
11.7
11.8
11
0.85
Class 3
5.7
6.6
6.7
6.8
6.9
6.54
0.23
SSbetween = 514.86  10.8 + 511  10.8 + 56.54  10.8 = 173.3
2
2
SSbetween 173.3 173.3
MS between =
=
=
= 86.7
df between 3  1
2
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
2
26
Example
Class 1
14.9
15.2
17.9
15.6
10.7
14.86
6.8
M
10.8
V
14.64
Mi
Vi
Class 2
11.1
9.5
10.9
11.7
11.8
11
0.85
Class 3
5.7
6.6
6.7
6.8
6.9
6.54
0.23
SSwithin = 14.9  14.86 +... + 10.7  14.86
2
2
+ 11.1  11 + .... + 11.8  11 + ... + 6.9  6.54  = 31.5
2
MS within =
2
2
SSwithin
31.5
31.5
=
=
= 2.6
df within 15  3 12
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
27
Example
M
10.8
V
14.64
Mi
Vi
Class 1
14.9
15.2
17.9
15.6
10.7
14.86
6.8
Class 2
11.1
9.5
10.9
11.7
11.8
11
0.85
M S b e t w e e n 8 6 .7
F=
=
= 3 2 .9 7
M S w it h in
2 .6
Empirical Methods in Computer Science
Class 3
5.7
6.6
6.7
6.8
6.9
6.54
0.23
Check F distribution(2,12):
Significant!
© 2006-now Gal Kaminka
28
Reading the results from statistics software


You can use a statistics software to run one-way ANOVA
It will give out something like this:
Source
between
within
total
df
2
14
16
SS
173.3
31.5
204.9
MS F
86.7 32.97
2.6
p
p<0.001
You should have no problem reading this, now.
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
29
Analogy to linear regression
Analogy to linear regression – where:
• the variance of observation is composed of:
– the variance of the predictions
– plus the variance of the deviations from the corresponding
predictions:
• that is – explained variance (according to the prediction) vs.
unexplained variance (due to deviations from prediction)
Empirical Methods in Computer Science
© 2006-now Gal Kaminka
30
Summary

Treatment and single-factor experiments


Independent variable: categorical
Dependent variable: “numerical” (ratio/interval)

Multiple comparisons: A problem for experiment hypotheses

Run one-way ANOVA instead




Assumes:
 populations are normal
 have equal variances
 independent random samples (with replacement)
Moderate deviation from normal, particularly with large samples, is still fine
Somewhat different variances are fine for roughly equal samples
If significant, run additional tests for details:




Tukey's procedure (T method)
LSD
Scheffe
...
Empirical Methods in Computer Science
© 2006-now Gal Kaminka