Download One-Way Analysis of Variance (ANOVA)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
One-Way Analysis of Variance (ANOVA)
One-way ANOVA is used to compare means from at least three groups from one
variable. The null hypothesis is that all the population group means are equal versus the
alternative that at least one of the population means differs from the others. This may
seem confusing as we call it Analysis of Variance even though we are comparing means.
The reason is that the test statistic uses evidence about two types of variability. We will
only consider the reason behind it instead of the complex formula used to calculate it.
The dot plots present the minutes someone remained on hold before hanging up for three
types of a recorded message. Say one got the sample data for A and someone else
gathered data in B. Which case do you think gives stronger evidence that mean wait
times for the three recording types are not all the same? That is, which gives stronger
evidence against Ho: u1 = u2 = u3? In both sets of data the group means are the same.
That is, in both case A and B the mean wait time for Advertisement is 5.4 minutes;
Classical 10.4 minutes and Muzak 2.8 minutes. What’s the difference then? The
variability between pairs of sample means is the same in each case because the sample
means are the same. However, the variability within each sample is much smaller for B
than for A. The sample SD is about 1 for each sample in B but in A the this SD is
between 2.4 and 4.2 We will see that evidence against Ho is stronger when the
variability within each sample is smaller. The evidence is also stronger when variability
between sample means increases, i.e. sample means are farther apart.
Recorded Message
Telephone Holding Time (a)
0
2
4
6
8
10
12
14
Advertisement
Classical
Muzak
Telephone Holding Time (b)
Advertisement
Classical
Muzak
2
4
6
8
10
12
1
EXAMPLE
For instance, say we were interested in studying if mean salaries for MLB, NFL and the
NBA were the same. To answer this question we took a random sample of 30 players
each for the three sports leagues. This gives us three levels, or sub-categories, for the
overall variable Sports League. Since we want to compare means the question of interest
is:
Ho: The means salaries for players across the three leagues are the equal
Ha: At least one of the mean salaries differs from the others (i.e. not all mean salaries are the
same)
One-way ANOVA: Salary versus Sport
Source
Sport
Error
Total
DF
2
87
89
SS
202.7
1450.1
1652.8
MS
101.4
16.7
F
6.08
P
0.003
Individual 95% CIs For Mean Based on
Pooled StDev
Level
N
Baseball
30
Basketball 30
Football
30
Mean
3.765
5.204
1.554
StDev
4.861
4.914
1.490
+---------+---------+---------+-------(-------*------)
(------*------)
(-------*------)
+---------+---------+---------+-------0.0
2.0
4.0
6.0
Pooled StDev = 4.083
Interpreting this output:
1. A one-way analysis is used to compare the populations for one variable or factor.
In this instance the one variable is Sports League and there are 3 populations, also
called group or factor levels being compared: baseball, basketball, and football.
2. DF stands for degrees of freedom. - The DF for the variable (e.g. Sports League)
is found by taking the number of group levels (called k) minus 1 (i.e. 3 – 1 = 2). The DF for Error is found by taking the total sample size, N, minus k (i.e. 90 – 3
= 87). The DF for Total is found by N – 1 (i.e. 90 – 1 =89).
3. The SS stands for Sum of Squares. The first SS is a measure of the variation in the
data between the groups and for the Source lists the variable name (e.g. Sport)
used in the analysis. This is sometimes referred to as SSB for "Sum of Squares
Between groups". The next value is the sum of squares for the error often called
SSE or SSW for "Sum of Squares Within". Lastly, the value for Total is called
SST (or sometimes SSTO) for "Sum of Squares Total". These values are additive,
meaning SST = SSB + SSW.
2
4. The test statistic used for ANOVA is the F-statistic and is calculated by taking the
Mean Square (MS) for the variable divided by the MS of the error (called Mean
Square of the Error or MSE). The F-statistic will always be at least 0, meaning the
F-statistic is always nonnegative. This F-statistic is a ratio of the variability
between groups compared to the variability within the groups. If this ratio is large
then the p-value is small producing a statistically significant result.(i.e. rejection
of the null hypothesis)
5. The p-value is the probability of being greater than the F-statistic or simply the
area to the right of the F-statistic, with the corresponding degrees of freedom for
the group (number of group levels minus 1, or here 3 − 1 = 2) and error (total
sample size minus the number of group levels, or here 995 − 3 = 992). The Fdistribution is skewed to the right (i.e. positively skewed) so there is no
symmetrical relationship such as those found with the Z or t distributions. This pvalue is used to test the null hypothesis that all the group population means are
equal versus the alternative that at least one is not equal. The alternative is not
"they are not all equal."
6. The individual 95% confidence intervals provide one-sample t intervals that
estimate the mean response for each group level. For example, the interval for
Baseball provides the estimate of the population mean salary for major league
baseball players. The * indicates the sample mean value (e.g. 3.765).
Hypotheses Statements and Assumptions for One−Way ANOVA
The hypothesis test for analysis of variance for g populations:
Ho: μ1 = μ2 = ... = μg
Ha: not all μi (i = 1, ... g) are equal
Recall that when we compare the means of two populations for independent samples, we
use a 2-sample t-test with pooled variance when the population variances can be assumed
equal. For more than two populations, the test statistic is the ratio of between group
sample variance and the within-group-sample variance. Under the null hypothesis, both
quantities estimate the variance of the random error and thus the ratio should be close to
1. If the ratio is large, then we reject the null hypothesis.
Assumptions: To apply or perform a One−Way ANOVA test, certain assumptions (or
conditions) need to exist. If any of the conditions are not satisfied, the results from the
use of ANOVA techniques may be unreliable. The assumptions are:
1. Each sample is an independent random sample
2. The distribution of the response variable follows a normal distribution (don’t
forget about the central limit theorem where for ANOVA we would want a
sample of at least 30)
3
3. The population variances are equal across responses for the group levels. This can
be evaluated by using the following rule of thumb: if the largest sample standard
deviation divided by the smallest sample standard deviation is not greater than
two, then assume that the population variances are equal. (For this sports salary
data we have the 4.914 as the largest compared to 1.490 the smallest and this ratio
is greater than 4. However, since the group sample sizes are equal this is not of
great concern).
If we reject the null hypothesis how does one determine which mean(s) are
statistically different? If the null hypothesis is rejected we would want to look at the
comparisons of all possible means, i.e. all possible two-mean tests. This can be
accomplished in Minitab by clicking the Comparisons button and then selecting the radio
button for Tukey. Note that the theory behind why we choose an adjusted level of alpha
to conduct these multiple comparisons and why we select the Tukey method over the
other options provided is beyond this course. But to get a general understanding, think of
this: If you were throwing a dart at a dart board and hit the bulls-eye on the first toss you
would have the right to brag. However, if it took you 10, 15, 20, or even more throws to
finally hit the center would anyone really be impressed? Probably not. Similar logic
applies here. If we just started testing for a difference by doing all possible two mean
comparisons, then just like throwing many darts, we shouldn’t be surprised to find two
means that are different. So to avoid this possibility, we adjust the alpha level using
some adjustment technique, e.g. the one used by this Tukey method. That is all you need
to know about why we do it. The interpretation though is quite straightforward: the
Minitab output will provide the intervals for all possible two-mean comparisons and the
decision process is simply to inspect the intervals for these comparisons to see if the
interval contains 0 (i.e. no difference between the means) or does not contain 0 (i.e. a
statistical difference between the means).
With our sport salary data being significant, p-value of 0.003 is less than 0.05, we would
then look at these multiple comparisons which are:
Sport = Baseball subtracted from:
Sport
Basketball
Football
Lower
-1.074
-4.723
Center
1.438
-2.211
Upper
3.950
0.301
Sport = Basketball subtracted from:
Sport
Football
Lower
-6.161
Center
-3.649
Upper
-1.137
From these we see that the only interval NOT containing 0 is the interval comparing
Basketball to Football where the interval is -6.161 to -1.137
Another Example:
4
Center of Disease Control was interested in comparing tar content for four cigarette
brands, A, B, C, and D. Since tar content is something measured their interest is in
comparing means. With four levels i.e. four cigarette brands, ANOVA methods would
be more appropriate than using two-mean methods. The output is:
One-way ANOVA: Tar Content versus Brand
Source
Brand
Error
Total
DF
3
44
47
S = 1.073
Level
A
B
C
D
N
12
12
12
12
SS
24.00
50.65
74.65
MS
8.00
1.15
F
6.95
R-Sq = 32.15%
Mean
10.000
11.000
12.000
11.000
StDev
0.949
0.915
1.602
0.547
P
0.001
R-Sq(adj) = 27.53%
Individual 95% CIs For Mean Based on
Pooled StDev
------+---------+---------+---------+-(-----*-----)
(-----*-----)
(-----*-----)
(-----*-----)
------+---------+---------+---------+-10.0
11.0
12.0
13.0
Pooled StDev = 1.073
Ho: All means of tar content are equal
Ha: At least one mean differs from another
To the right of the one-way ANOVA table, under the column headed P, is the P-value. In
the example, P = 0.001 which is less than our usual alpha level of 0.05, but not less than
0.01 another often used level of significance. This is why setting alpha prior to
conducting one"s analysis is important: to avoid potential conflicts of interest after the
analysis is completed. The P-value is for testing the hypothesis above, the mean Tar
Content from the 4 brands are the same vs. not all the means are the same. Because the Pvalue of 0.001 is less than specified significance level of 0.05, we reject H0. The data
provide sufficient evidence to conclude that the mean Tar Content for the four brands are
not all the same.
We can see below the ANOVA table, another table that provides the sample sizes, sample
means, and sample standard deviations of the 4 samples. Beneath that table, the pooled
StDev is given and this is an estimate for the common standard deviation of the 4
populations. Since the largest SD is not more than twice the smallest SD we can assume,
using the rule of thumb, that the equal variance assumption is satisfied. With the sample
size exceeding 30 we can assume normality by the Central Limit Theorem. Please note,
however, that equal sample sizes can be very helpful in the design of ANOVA studies.
5
Having equal sample sizes helps to protect against violations to the equal variance
assumption.
Also, the lower right side of the Minitab output gives individual 95% confidence intervals
for the population means of the 4 brands. This example provides an excellent illustration
as to why 2-sample t-tests can be misleading when more than two groups are being
compared. As we see from the confidence interval output all of the intervals overlap. This
indicates that none of the means would differ if we conducted a series of 2-sample tests
of means. However, the ANOVA test does result in at least one mean being significantly
different from the other. With this significance we look at the multiple comparisons
which indicate that there is only one pair of means that differ: Brands A-C
Brand = A subtracted from:
Brand
B
C
D
Lower
-0.171
0.829
-0.171
Center
1.000
2.000
1.000
Upper
2.171
3.171
2.171
Brand = B subtracted from:
Brand
C
D
Lower
-0.171
-1.171
Center
1.000
0.000
Upper
2.171
1.171
Brand = C subtracted from:
Brand
D
Lower
-2.171
Center
-1.000
Upper
0.171
6