Download Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Omnibus test wikipedia , lookup

Student's t-test wikipedia , lookup

Analysis of variance wikipedia , lookup

Transcript
ANOVA
(Analysis of Variance)
Martina Litschmannová
[email protected]
K210
The basic ANOVA situation
๏‚ง Two variables: 1 Categorical, 1 Quantitative
๏‚ง Main Question: Do the (means of) the quantitative variables
depend on which group (given by categorical variable) the
individual is in?
๏‚ง If categorical variable has only 2 values:
๏‚ง null hypothesis: ๐œ‡1 โˆ’ ๐œ‡2 = ๐ท
๏‚ง ANOVA allows for 3 or more groups
An example ANOVA situation
๏‚ง Subjects: 25 patients with blisters
๏‚ง Treatments: Treatment A, Treatment B, Placebo
๏‚ง Measurement: # of days until blisters heal
๏‚ง Data [and means]:
A: 5,6,6,7,7,8,9,10
B: 7,7,8,9,9,10,10,11
P: 7,9,9,10,10,10,11,12,13
๏‚ง Are these differences significant?
[7.25]
[8.875]
[10.11]
Informal Investigation
๏‚ง Graphical investigation:
๏‚ง side-by-side box plots
๏‚ง multiple histograms
๏‚ง Whether the differences between the groups are
significant depends on
๏‚ง the difference in the means
๏‚ง the standard deviations of each group
๏‚ง the sample sizes
๏‚ง ANOVA determines p-value from the F statistic
Side by Side Boxplots
13
12
11
days
10
9
8
7
6
5
A
B
treatment
P
What does ANOVA do?
At its simplest (there are extensions) ANOVA tests the following
hypotheses:
H0: The means of all the groups are equal.
HA: Not all the means are equal.
โ€ข Doesnโ€™t say how or which ones differ.
โ€ข Can follow up with โ€œmultiple comparisonsโ€.
Note: We usually refer to the sub-populations as โ€œgroupsโ€ when
doing ANOVA.
Assumptions of ANOVA
๏‚ง each group is approximately normal
๏ƒผ check this by looking at histograms and/or normal
quantile plots, or use assumptions
๏ƒผ can handle some nonnormality, but not severe outliers
๏ƒผ test of normality
๏‚ง standard deviations of each group are approximately
equal
๏ƒผ rule of thumb: ratio of largest to smallest sample st.
dev. must be less than 2:1
๏ƒผ test of homoscedasticity
Normality Check
We should check for normality using:
โ€ข assumptions about population
โ€ข histograms for each group
โ€ข normal quantile plot for each group
โ€ข test of normality (Shapiro-Wilk, Liliefors, AndersonDarling test, โ€ฆ)
With such small data sets, there really isnโ€™t a really good way
to check normality from data, but we make the common
assumption that physical measurements of people tend to be
normally distributed.
Shapiro-Wilk test
๏‚ง One of the strongest tests of normality. [Shapiro, Wilk]
Online computer applet (Simon Ditto, 2009) for this test can be
found here.
Standard Deviation Check
Variable
days
treatment
A
B
P
N
8
8
9
Mean
7.250
8.875
10.111
Median
7.000
9.000
10.000
StDev
1.669
1.458
1.764
Compare largest and smallest standard deviations:
๏‚ง largest: 1,764
๏‚ง smallest: 1,458
๏‚ง 1,764/1,458=1,210<2 OK
Note: Std. dev. ratio greather then 2 signs heteroscedasticity.
ANOVA Notation
1
2
โ€ฆ
k
๐‘‹11
๐‘‹12
โ€ฆ
๐‘‹1๐‘˜
โ‹ฎ
โ‹ฎ
โ‹ฎ
๐‘‹๐‘›1 1
๐‘‹๐‘›22
๐‘‹๐‘›๐‘˜ ๐‘˜
Sample Size
๐‘›1
๐‘›2
โ€ฆ
๐‘›๐‘˜
Sample average
๐‘‹1
๐‘‹2
โ€ฆ
๐‘‹๐‘˜
Sample Std. Deviation
๐‘ 1
๐‘ 2
โ€ฆ
๐‘ ๐‘˜
Sample
Group
Number of Individuals all together :
Sample means:
Grand mean:
Sample Standard Deviations:
๐‘˜
๐‘–=1 ๐‘›๐‘– ,
1 ๐‘›๐‘–
๐‘‹ ,
๐‘›๐‘– ๐‘—=1 ๐‘–๐‘—
1 ๐‘˜
๐‘›๐‘–
๐‘‹=
๐‘‹ ,
๐‘› ๐‘–=1 ๐‘—=1 ๐‘–๐‘—
1
๐‘›๐‘–
๐‘ ๐‘–2 =
๐‘‹ โˆ’
๐‘›๐‘– โˆ’1 ๐‘—=1 ๐‘–๐‘—
๐‘›=
๐‘‹๐‘– =
๐‘‹๐‘–
2
.
Levene Test
Null and alternative hypothesis:
H0: ๐œŽ12 = ๐œŽ22 = โ‹ฏ = ๐œŽ๐‘˜2 , HA: ๐ป0
Test Statistic:
๐น๐‘ =
where ๐‘๐‘–๐‘— = ๐‘‹๐‘–๐‘— โˆ’ ๐‘‹๐‘– , ๐‘๐‘– =
๐‘†๐‘†๐‘๐ต =
๐‘˜
๐‘–=1 ๐‘›๐‘–
2
๐‘†๐‘†๐‘๐ต
๐‘˜โˆ’1
๐‘†๐‘†๐‘๐‘’
๐‘›โˆ’๐‘˜
๐‘›๐‘–
๐‘
๐‘—=1 ๐‘–๐‘—
,
๐‘›๐‘–
๐‘๐‘– โˆ’ ๐‘ , ๐‘†๐‘†๐‘๐‘’ =
,๐‘ =
๐‘˜
๐‘–=1
๐‘›๐‘– ๐‘๐‘–๐‘—
๐‘—=1 ๐‘› ,
2
๐‘›๐‘–
๐‘—=1 ๐‘๐‘–๐‘— โˆ’ ๐‘๐‘– .
๐‘˜
๐‘–=1
p-value: ๐‘โˆ’๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’ = 1 โˆ’ ๐น0 ๐‘ฅ๐‘‚๐ต๐‘† , where ๐น0 ๐‘ฅ is CDF of
Fisher-Snedecor distribution with ๐‘˜ โˆ’ 1, ๐‘› โˆ’ ๐‘˜ degrees of
freedom.
How ANOVA works (outline)
ANOVA measures two sources of variation in the data and
compares their relative sizes.
How ANOVA works (outline)
๏‚ง Sum of Squares between Groups,
2
๐‘†๐‘†๐ต = ๐‘˜๐‘–=1 ๐‘›๐‘– ๐‘‹๐‘– โˆ’ ๐‘‹ ,
resp. Mean of Squares โ€“ between groups
๐‘€๐‘†๐ต =
๐‘†๐‘†๐ต
,
๐‘˜โˆ’1
Difference between
Means
where ๐‘˜ โˆ’ 1 is degrees of freedom ๐‘‘๐‘“๐ต .
๏‚ง Sum of Squares โ€“ errors
๐‘†๐‘†๐‘’ =
๐‘˜
๐‘–=1
๐‘›๐‘–
๐‘—=1
๐‘‹๐‘–๐‘— โˆ’ ๐‘‹๐‘–
2
=
resp. Mean of squares - error
๐‘€๐‘†๐‘’ =
๐‘†๐‘†๐‘’
,
๐‘›โˆ’๐‘˜
where ๐‘› โˆ’ ๐‘˜ is degrees of freedom ๐‘‘๐‘“๐‘’ .
๐‘˜
๐‘–=1
๐‘›๐‘– โˆ’ 1 ๐‘ ๐‘–2 ,
Difference
within
Groups
The ANOVA F-statistic is a ratio of the Between Group
Variaton divided by the Within Group Variation:
๐‘€๐‘†๐ต
๐น=
๐‘€๐‘†๐‘’
A large F is evidence against H0, since it indicates that
there is more difference between groups than within
groups.
ANOVA Output
Source of Variation
Between Groups
Within Groups
Total
SS
34,74
59,26
94,00
DF
2
22
24
MS
17,37
2,69
F
6,45
p-value
0,006
How are these computations made?
Source of
Variation
Degrees of
Freedom
Sum of Squares
Mean of
Squares
๐‘˜
Between
Groups
Within
Groups
๐‘†๐‘†๐ต =
๐‘›๐‘– ๐‘‹๐‘– โˆ’ ๐‘‹
๐‘†๐‘†๐‘’ =
๐‘›๐‘– โˆ’ 1 ๐‘ ๐‘–2
๐‘–=1
๐‘˜ ๐‘›๐‘–
๐‘‹๐‘–๐‘— โˆ’ ๐‘‹
๐‘–=1 ๐‘—=1
2
๐‘โˆ’๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’
1 โˆ’ ๐น0 ๐‘ฅ๐‘‚๐ต๐‘†
๐‘‘๐‘“๐ต = ๐‘˜ โˆ’ 1
๐‘€๐‘†๐ต =
๐‘†๐‘†๐ต
๐‘‘๐‘“๐ต
๐‘€๐‘†๐ต
๐‘€๐‘†๐‘’
๐‘‘๐‘“๐‘’ = ๐‘› โˆ’ ๐‘˜
๐‘€๐‘†๐‘’ =
๐‘†๐‘†๐‘’
๐‘‘๐‘“๐‘’
---
---
๐‘‘๐‘“๐‘‡ = ๐‘› โˆ’ 1
---
---
---
๐‘–=1
๐‘˜
๐‘†๐‘†๐‘‡ =
Total
2
๐น
Using the results of exploratory analysis and ANOVA test, verify that the age
of a statistically significant effect on BMI.
Count
Average
Variance
-----------------------------------------------------------------------------<35 let
53
25,0796
10,3825
35 - 50 let
123
25,9492
16,2775
>50 let
76
26,0982
12,3393
------------------------------------------------------------------------------Total
252
25,8113
13,8971
58
BMI
48
38
28
18
ménฤ› neลพ 35 let
od 35 do 50 let
více neลพ 50 let
Using the results of exploratory analysis and ANOVA test, verify that the age
of a statistically significant effect on BMI.
Assumptions:
1. Normality
2. Homoskedasticita
H0: ๐œŽ12 = ๐œŽ22 = ๐œŽ32 , HA: ๐ป0 , ๐‘ โˆ’ ๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’ = 0,129 (Levene test)
Using the results of exploratory analysis and ANOVA test, verify that the age
of a statistically significant effect on BMI.
Null and alternative hypothesis:
H0: ๐œ‡1 = ๐œ‡2 = ๐œ‡3 , HA: ๐ป0
Calculating of p-value:
Count
Average
Variance
-----------------------------------------------------------------------------<35 let
53
25,0796
10,3825
35 - 50 let
123
25,9492
16,2775
>50 let
76
26,0982
12,3393
------------------------------------------------------------------------------Total
252
25,8113
13,8971
๐‘†๐‘†๐ต =
๐‘˜
๐‘–=1 ๐‘›๐‘–
๐‘†๐‘†๐‘’ =
๐‘˜
๐‘–=1
๐‘‹๐‘– โˆ’ ๐‘‹
2
= 53 โˆ™ 25,1 โˆ’ 25,8 2 + 123 โˆ™ 25,9 โˆ’ 25,8
+76 โˆ™ 26,1 โˆ’ 25,8 2 =34,0
2
๐‘›๐‘– โˆ’ 1 ๐‘ ๐‘–2 = 52 โˆ™ 10,4 + 122 โˆ™ 16,3 + 75 โˆ™ 12,3 = 3451,9
+
Using the results of exploratory analysis and ANOVA test, verify that the age
of a statistically significant effect on BMI.
Null and alternative hypothesis:
H0: ๐œ‡1 = ๐œ‡2 = ๐œ‡3 , HA: ๐ป0
Calculating of p-value:
Source of
Variation
Degrees of
Freedom
Sum of Squares
Mean of
Squares
๐‘˜
Between
Groups
Within
Groups
๐‘†๐‘†๐ต =
๐‘›๐‘– ๐‘‹๐‘– โˆ’ ๐‘‹
๐‘†๐‘†๐‘’ =
๐‘›๐‘– โˆ’ 1 ๐‘ ๐‘–2
๐‘–=1
๐‘˜ ๐‘›๐‘–
๐‘‹๐‘–๐‘— โˆ’ ๐‘‹
๐‘–=1 ๐‘—=1
2
๐‘โˆ’๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’
1 โˆ’ ๐น0 ๐‘ฅ๐‘‚๐ต๐‘†
๐‘‘๐‘“๐ต = ๐‘˜ โˆ’ 1
๐‘€๐‘†๐ต =
๐‘†๐‘†๐ต
๐‘‘๐‘“๐ต
๐‘€๐‘†๐ต
๐‘€๐‘†๐‘’
๐‘‘๐‘“๐‘’ = ๐‘› โˆ’ ๐‘˜
๐‘€๐‘†๐‘’ =
๐‘†๐‘†๐‘’
๐‘‘๐‘“๐‘’
---
---
๐‘‘๐‘“๐‘‡ = ๐‘› โˆ’ 1
---
---
---
๐‘–=1
๐‘˜
๐‘†๐‘†๐‘‡ =
Total
2
๐น
Using the results of exploratory analysis and ANOVA test, verify that the age
of a statistically significant effect on BMI.
Null and alternative hypothesis:
H0: ๐œ‡1 = ๐œ‡2 = ๐œ‡3 , HA: ๐ป0
Calculating of p-value:
Source of
Variation
Between
Groups
Within
Groups
Sum of Squares
Degrees of
Freedom
34,0
๐‘‘๐‘“๐ต = ๐‘˜ โˆ’ 1
3451,9
๐‘‘๐‘“๐‘’ = ๐‘› โˆ’ ๐‘˜
๐‘˜
Total
๐‘›๐‘–
๐‘†๐‘†๐‘‡ =
๐‘‹๐‘–๐‘— โˆ’ ๐‘‹
๐‘–=1 ๐‘—=1
2
๐‘‘๐‘“๐‘‡ = ๐‘› โˆ’ 1
Mean of
Squares
๐‘†๐‘†๐ต
๐‘€๐‘†๐ต =
๐‘‘๐‘“๐ต
๐‘†๐‘†๐‘’
๐‘€๐‘†๐‘’ =
๐‘‘๐‘“๐‘’
๐น
๐‘โˆ’๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’
๐‘€๐‘†๐ต
๐‘€๐‘†๐‘’
1 โˆ’ ๐น0 ๐‘ฅ๐‘‚๐ต๐‘†
---
---
---
---
---
Using the results of exploratory analysis and ANOVA test, verify that the age
of a statistically significant effect on BMI.
Null and alternative hypothesis:
H0: ๐œ‡1 = ๐œ‡2 = ๐œ‡3 , HA: ๐ป0
Calculating of p-value:
Source of
Variation
Between
Groups
Within
Groups
Total
Sum of Squares
Degrees of
Freedom
34,0
๐‘‘๐‘“๐ต = ๐‘˜ โˆ’ 1
3451,9
๐‘‘๐‘“๐‘’ = ๐‘› โˆ’ ๐‘˜
3485,9
๐‘‘๐‘“๐‘‡ = ๐‘› โˆ’ 1
Mean of
Squares
๐‘†๐‘†๐ต
๐‘€๐‘†๐ต =
๐‘‘๐‘“๐ต
๐‘†๐‘†๐‘’
๐‘€๐‘†๐‘’ =
๐‘‘๐‘“๐‘’
---
๐น
๐‘โˆ’๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’
๐‘€๐‘†๐ต
๐‘€๐‘†๐‘’
1 โˆ’ ๐น0 ๐‘ฅ๐‘‚๐ต๐‘†
---
---
---
---
Using the results of exploratory analysis and ANOVA test, verify that the age
of a statistically significant effect on BMI.
Null and alternative hypothesis:
H0: ๐œ‡1 = ๐œ‡2 = ๐œ‡3 , HA: ๐ป0
Calculating of p-value:
Source of
Variation
Between
Groups
Within
Groups
Total
Sum of Squares
Degrees of
Freedom
34,0
๐‘‘๐‘“๐ต = ๐‘˜ โˆ’ 1
3451,9
๐‘‘๐‘“๐‘’ = ๐‘› โˆ’ ๐‘˜
3485,9
๐‘‘๐‘“๐‘‡ = ๐‘› โˆ’ 1
k โ€ฆ number of sanmples ๐‘˜ = 3
n โ€ฆ total sample si๐‘ง๐‘’ ๐‘› = 252
Mean of
Squares
๐‘†๐‘†๐ต
๐‘€๐‘†๐ต =
๐‘‘๐‘“๐ต
๐‘†๐‘†๐‘’
๐‘€๐‘†๐‘’ =
๐‘‘๐‘“๐‘’
---
๐น
๐‘โˆ’๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’
๐‘€๐‘†๐ต
๐‘€๐‘†๐‘’
1 โˆ’ ๐น0 ๐‘ฅ๐‘‚๐ต๐‘†
---
---
---
---
Using the results of exploratory analysis and ANOVA test, verify that the age
of a statistically significant effect on BMI.
Null and alternative hypothesis:
H0: ๐œ‡1 = ๐œ‡2 = ๐œ‡3 , HA: ๐ป0
Calculating of p-value:
Source of
Variation
Between
Groups
Within
Groups
Total
Degrees of
Freedom
Sum of Squares
34,0
3451,9
3485,9
/
/
2
249
251
=
=
Mean of
Squares
๐‘†๐‘†๐ต
๐‘€๐‘†๐ต =
๐‘‘๐‘“๐ต
๐‘†๐‘†๐‘’
๐‘€๐‘†๐‘’ =
๐‘‘๐‘“๐‘’
---
๐น
๐‘โˆ’๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’
๐‘€๐‘†๐ต
๐‘€๐‘†๐‘’
1 โˆ’ ๐น0 ๐‘ฅ๐‘‚๐ต๐‘†
---
---
---
---
Using the results of exploratory analysis and ANOVA test, verify that the age
of a statistically significant effect on BMI.
Null and alternative hypothesis:
H0: ๐œ‡1 = ๐œ‡2 = ๐œ‡3 , HA: ๐ป0
Calculating of p-value:
Source of
Variation
Between
Groups
Within
Groups
Total
Sum of Squares
Degrees of
Freedom
Mean of
Squares
๐น
๐‘โˆ’๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’
34,0
2
17,0
๐‘€๐‘†๐ต
๐‘€๐‘†๐‘’
1 โˆ’ ๐น0 ๐‘ฅ๐‘‚๐ต๐‘†
3451,9
249
13,9
---
---
3485,9
251
---
---
---
Using the results of exploratory analysis and ANOVA test, verify that the age
of a statistically significant effect on BMI.
Null and alternative hypothesis:
H0: ๐œ‡1 = ๐œ‡2 = ๐œ‡3 , HA: ๐ป0
Calculating of p-value:
Source of
Variation
Between
Groups
Within
Groups
Total
Sum of Squares
Degrees of
Freedom
Mean of
Squares
๐น
๐‘โˆ’๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’
34,0
2
17,0
1,23
0,294
3451,9
249
13,9
---
---
3485,9
251
---
---
---
๐‘ โˆ’ ๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’ = 1 โˆ’ ๐น 1,23 = 0,294 ,
where F(x) is CDF of Fisher-Snedecor distribution with 2 , 249 degrees of freedom
Using the results of exploratory analysis and ANOVA test, verify that the age
of a statistically significant effect on BMI.
Null and alternative hypothesis:
H0: ๐œ‡1 = ๐œ‡2 = ๐œ‡3 , HA: ๐ป0
Calculating of p-value:
Source of
Variation
Between
Groups
Within
Groups
Total
Sum of Squares
Degrees of
Freedom
Mean of
Squares
๐น
๐‘โˆ’๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’
34,0
2
17,0
1,23
0,294
3451,9
249
13,9
---
---
3485,9
251
---
---
---
Result:
We dont reject null hypothesis at the significance level 0,05. There is not a
statistically significant difference between the means of BMI depended on
the age.
Whereโ€™s the Difference?
Once ANOVA indicates that the groups do not all appear to have
the same means, what do we do?
Analysis of Variance for days
Source
DF
SS
MS
treatmen
2
34.74
17.37
Error
22
59.26
2.69
Total
24
94.00
Level
A
B
P
N
8
8
9
Pooled StDev =
Mean
7.250
8.875
10.111
1.641
StDev
1.669
1.458
1.764
F
6.45
P
0.006
Individual 95% CIs For Mean
Based on Pooled StDev
----------+---------+---------+-----(-------*-------)
(-------*-------)
(------*-------)
----------+---------+---------+-----7.5
9.0
10.5
Clearest difference: P is worse than A (CIโ€™s donโ€™t overlap)
Multiple Comparisons
Once ANOVA indicates that the groups do not all have the same
means, we can compare them two by two using the 2-sample t
test.
๏‚ง We need to adjust our p-value threshold because we are doing
multiple tests with the same data.
๏‚ง There are several methods for doing this.
๏‚ง If we really just want to test the difference between one pair of
treatments, we should set the study up that way.
Bonferroni method โ€“ post hoc analysis
We reject null hypothesis if
๐‘ฅ๐ผ โˆ’ ๐‘ฅ๐ฝ โ‰ฅ ๐‘ก๐‘›โˆ’๐‘˜ 1
๐›ผโˆ—
โˆ’
2
๐‘€๐‘†๐‘’
where ๐›ผ โˆ— is correct significance level, ๐›ผ โˆ— =
๐‘ก
๐›ผโˆ—
1โˆ’ 2
๐‘› โˆ’ ๐‘˜ is 1
๐›ผโˆ—
โˆ’
2
1
๐‘›๐ผ
+
๐›ผ
๐‘˜
2
,
1
,
๐‘›๐ฝ
is quantile of Student distribution
with ๐‘› โˆ’ ๐‘˜ degrees of freedom.
Kruskal-Wallis test
๏‚ง The Kruskalโ€“Wallis test is most commonly used when there
is one nominal variable and one measurement variable,
and the measurement variable does not meet the
normality assumption of an ANOVA.
๏‚ง It is the non-parametric analogue of a one-way ANOVA.
Kruskal-Wallis test
๏‚ง Like most non-parametric tests, it is performed on ranked
data, so the measurement observations are converted to
their ranks in the overall data set: the smallest value gets a
rank of 1, the next smallest gets a rank of 2, and so on. The
loss of information involved in substituting ranks for the
original values can make this a less powerful test than an
anova, so the anova should be used if the data meet the
assumptions.
๏‚ง If the original data set actually consists of one nominal
variable and one ranked variable, you cannot do an anova
and must use the Kruskalโ€“Wallis test.
1. The farm bred three breeds of rabbits. An attempt was made
(rabbits.xls), whose objective was to determine whether
there is statistically significant (conclusive) the difference in
weight between breeds of rabbits. Verify.
2. The effects of three drugs on blood clotting was determined.
Among other indicators was determined the thrombin time.
Information about the 45 monitored patients are recorded in
the file thrombin.xls. Does the magnitude of thrombin time
depend on the used preparation?
Study materials :
๏‚ง http://homel.vsb.cz/~bri10/Teaching/Bris%20Prob%20&%20Stat.pdf
(p. 142 - p.154)
๏‚ง Shapiro, S.S., Wilk, M.B. An Analysis of Variance Test for Normality
(Complete Samples). Biometrika. 1965, roฤ. 52, ฤ. 3/4, s. 591-611.
Dostupné z: http://www.jstor.org/stable/2333709.