Download Lecture 19 - Wharton Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Omnibus test wikipedia , lookup

Analysis of variance wikipedia , lookup

Transcript
Statistics 111 - Lecture 19
One Way Analysis of
Variance
ANOVA
•A statistical method for comparing several population
means
• This is a generalization of the two sample t-test to
more than two groups.
•Data
We will obtain a SRS from each of the k populations.
•Null hypothesis
All the populations means are the same.
1
Example: Workplace safety
•Workers were asked to rate various elements of safety
•A composite score called the Safety Climate Index was
calculated. Its values are between 0-100.
•The workers were classified according to their job
category as unskilled, skilled and supervisor.
Job
n
mean
SD
Unskilled
workers
448
70.42
18.27
Skilled
Workers
91
71.21
18.83
Supervisors
51
80.51
14.58
Example: Workplace safety
• The purpose of Anova is to asses whether the observed
differences among sample means are statistically
significant
• Is this variation among the means is due to chance or is it
good evidence for a difference among the population
means?
2
Example: Workplace safety
• Just looking at the means is not enough!
• We need to look at the standard error which depends on
the standard deviations of each group and their sizes.
Example: Workplace safety
• Within-group variation
• Between-group variation
3
Example: Workplace safety
• If the between group
variation is large and
the within group
variation is small
This will imply that the
means are likely to be
different
The ANOVA model
The one-way ANOVA model assumptions
1.The observations in group i are generated from a normal
distribution with mean μi.
2.The groups population standard deviations are equal
4
The ANOVA model
Estimation
1. Estimating the population means
̂i  xi
2. Estimating the standard deviations by the pooled
estimator
( n1 1) s12  ( n2 1) s22 ... ( nk 1) sk2
2
sp
( n1 1)  ( n2 1) ... ( nk 1)
The ANOVA model for the worker safety
Is it reasonable to assume normality in our case?
Is it reasonable to assume equal standard deviation?
5
The ANOVA model for the worker safety
Rule for examining standard deviations in ANOVA
If the largest standard deviation is less than twice the
smaller standard deviation, we can use methods based
on the assumption of equal standard deviations, and
our results will still be approximately correct
The ANOVA model for the worker safety
Testing hypothesis in one-way ANOVA
H 0 : 1  2  ...  k
H a : not all of the i are equal
Notice
if you reject the null it DOES NOT imply that ALL of the
means are different from each other.
It could be that only two differ and the rest are the same!
6
The ANOVA model for the worker safety
The information for testing the null hypothesis is organized in
an ANOVA table.
Sum of Squares
df
Mean squares
F
Between groups
SSG
K-1
MSG=SSG/(K-1)
MSG/MSE
Within group
SSE
N-K
MSE=SSE/(N-K)
total
SST
N-1
N- is the total number of observations (in the data set)
K- number of categories
The ANOVA model for the worker safety
The information for testing the null hypothesis is organized in
an ANOVA table.
Sum of Squares
df
Mean squares
F
Between groups
SSG
K-1
MSG=SSG/(K-1)
MSG/MSE
Within group
SSE
N-K
MSE=SSE/(N-K)
total
SST
N-1
SSG is the estimated total variation between the groups means
SSE is the estimated total variation within the groups
SST=SSG+SSE
7
The ANOVA model for the worker safety
The information for testing the null hypothesis is organized in
an ANOVA table.
Sum of Squares
df
Mean squares
F
Between groups
SSG
K-1
MSG=SSG/(K-1)
MSG/MSE
Within group
SSE
N-K
MSE=SSE/(N-K)
total
SST
N-1
MSG is the estimated average variation between the groups
means
MSE is the estimated average variation within the groups
The ANOVA model for the worker safety
The information for testing the null hypothesis is organized in
an ANOVA table.
Sum of Squares
df
Mean squares
F
Between groups
4662.2
2
2331.116
7.137
Within group
191729.2
587
326.626
total
196391.4
589
•The F-statistic from the ANOVA follows a new distribution
that is called F distribution
•The F-distribution has two parameters:
1. Numerator DF
2. Denominator DF
8
The ANOVA model for the worker safety
The ANOVA model for the worker safety
• P-value turns out to be 0.001
• Conclusion
We reject the null hypothesis of equal mean values.
This implies that some of the groups means differ
from each other.
9
ANOVA in JMP
Jmp!
10