Download Hypothesis Testing (continued)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
1
One-Way ANOVAs
We have already discussed the t-test. The t-test is used for comparing the means of two groups to
determine if there is a statistically significant difference between them. The t-test allows us to determine
the probability that we are making a Type I error (rejecting the null when it is true). If the probability (pvalue) is less than alpha (generally set at 0.05) then we reject the null hypothesis (conclude that there is a
difference) with at least 95% confidence that the difference between the two conditions is not due simply to
chance.
Take a detour with me. Imagine your instructor stated that she could obtain a 6 on the throw of a single
die – and then she did it! Would that be impressive? What would the probability of doing that be?
_____________.
Now suppose that she said she could obtain at least one 6 on the throw of a pair of die. Is this more or less
impressive than using a single die? What is the probability that she could get a 6 with two dice?
_______________ What if she said that she could get at least one 6 by throwing 3 dice? What is the
probability now? ___________________.
We have the same problem when we want to make more than one comparison of condition means within a
study. For example, we may want to include 3 conditions in our study A, B and C. I am interested in
looking at difference between the means of conditions A and B; between the means of condition A and C
and between the means of conditions B and C. Each new comparison we make is like adding an additional
die to the example above. If we set alpha at .05 we have 5% chance of making a type one error with every
comparison we make. The probability of making a Type I error in this study is not 5%, it is, in fact, 15%.
Continuing with the example we have been using on the Marital Status and Happiness Ratings. The
dependant variable in this example is happiness ratings. Assume that I interviewed 20 Married, 20 Single
and 20 Divorced persons from Grant County for this study and obtained their responses to the Happiness
Questionnaire. My Independent variable is Marital Status and it has three levels; single, married, and
divorced. Below are Descriptive Statistics for this study. They are presented in the same format that
would be produced if you analyzed the results using a computer statistical analysis program called
Statistical Package for the Social Sciences (SPSS). As we discuss ANOVAs you will need to learn to
interpret results from tables presented in the format that would be produced by SPSS. The Descriptive
Statistics Table below contains the means, standard deviations and sample sizes (N) for the total sample as
well as for each of the conditions.
Descriptive Statistics
Dependent Variable: Happiness rating
Marriage Status Mean Std. Deviation
Single
5.4000
1.6670
Married
6.6500
1.7252
Divorced
4.9500
1.7614
Total
5.6667
1.8381
N
20
20
20
60
To determine if there are statistically significant differences among the mean happiness ratings for these
three conditions, I could do three t-tests to compare all possible combinations of these groups. (i.e., I could
compare single to married; single to divorced and married to divorced.) But, remember for each t-test we
have a 5% chance of making a Type I error. If I do three t-tests, I triple that chance. In fact, I would then
be running a 15% chance of making a Type I error in the entire set of comparisons. If I found a significant
difference for any or for all of these comparisons, I could not say that I was 95% confident that the
difference is not due to chance. I could only be 85% confident. In science, that is not good enough!
Fortunately, there is a method for comparing more than two means. The procedure is called an Analysis
of Variance (ANOVA). When we have one independent variable (with 3 or more levels) we use a
procedure called a One-way ANOVA. There are other variations that can be used for factorial designs.
2
For example, if we did a study with two independent variables, we would use a Two-way ANOVA to
analyze the results. What do you think they would call the analysis used for a study with five independent
variables?
An ANOVA can be used to compare any number of group means and still maintain the probability of
making a Type I error at 5%. An ANOVA is called an omnibus test because it looks at the amount the
whole set of means differ from each other and determines if that pattern of differences is likely to have
occurred less than 5 % of the time by chance alone. If the entire set does not have a p value of greater
than .05, than no one comparison can either.
Recall when hypothesis testing we are deciding which of two possible hypothesis to accept as the most
likely conclusion to our study. The two hypotheses for the ANOVA are:
Scientific Hypothesis - there are differences between at least 2 of the groups.
Null Hypothesis – There are no differences among the groups.
The logic of this test is really simple (the mathematics is more complex – but the computer does it). Here
is the basic idea.
Review: When we discussed measures of dispersion much earlier in the term, we talked about range,
deviations scores (the score minus the mean), variance and standard deviations. The range is not a good
estimate of the dispersion of scores because it is greatly influenced by extreme scores. Deviation scores
when summed, equal zero, so they too are useless a measure of the dispersion of scores in a sample
distribution. To get around this problem we can square all the dispersion scores (this makes them all
positive numbers) then we can sum them and divide by the sample size to obtain the mean squared amount
scores in the distribution deviate from the mean. This mean squared dispersion is called the variance of the
distribution. Since most of us have difficulty thinking in squared amounts, it is generally more useful to
think about the standard deviation of a distribution. The standard deviation is the square root of the
variance and can be thought of as the mean amount the scores in the distribution vary from the distribution
mean.
Why are we talking about variances instead of standard deviations? The problem with the standard
distribution is that it is a square root. We cannot add, subtract, multiply or divide square roots without
squaring them first. For example, 9  9  18 . Since variances are easier to work with
mathematically, we use them for this analysis. Keep in mind however, that a variance is just a measure of
the dispersion of scores. The larger the variance, the more spread out the scores are in the distribution.
1) We start with the assumption that people in general differ from each other in happiness. I expect that
married people show the same variability in Happiness that single people do and that divorced people do.
In other words, I might expect that marriage shifts the Happiness ratings of the entire group, but does not
effect how much variability in happiness there is within the group. If there are differences between my
groups that are due to the independent variable (Marital Status), I expect the means of the groups to differ
– but not the variances.
3
4
5
6
7
8
3
We refer to the amount of variation that is associated, in general, with the dependant variable as Within
Groups Variance. It is the amount that scores between individuals within the same condition would be
expected to vary from the mean. It does not have anything to do with variation due to the level of the IV.
It can be thought of as variance that is due to random variation between individuals or what statisticians
call Error. Having three groups (conditions) I have three estimates of this Within Groups Variance.
Using the average of these three Within Groups variance estimates gives a better estimate of the general
variation of happiness in the population. We make a second assumption -- that we can use Within Groups
Variance to estimate the amount of variation we would expect to find Between Groups if the
Independent variable has no effect (if the null is true). So we assume that Within Groups Variance give a
good estimate of Error variance. This estimate of Error variance is called the Mean Squared Error
(MSE).
That leaves just one last step measuring the amount of variance between groups. The way this is
computed is fairly complex but for this class we do not need to worry about that (SPSS will do it for us). If
we repeatedly measured samples drawn from the same population, over and over again, we would not
expect to get exactly the same mean each time. (Remember, the distribution of the means we discussed
when we discussed t-tests.) Similarly, if we sample three levels of our IV, even if they do not differ from
each other on the DV, we would not expect to get exactly the same means for each condition. The means
will vary simply due to random variation. But they might also vary due to differences in the level of the
IV. What is important is that you understand that Between Groups Variance is due to both random
variation and to variation due to the independent variable. Between Group Variance is the variance of the
entire set of scores in the study.
Between Group Variance = Random Variance + Variance Due to the Independent variable
3
4
5
6
7
8
Assume for a moment that there is no effect of the Independent variable. This means that the independent
variable adds zero variance to that we would expect to find due to chance. If there are no differences
between our groups, then we would expect our Within Group Variance and our Between Group Variance to
be equal. What would we expect the ration of these two variance estimates to be if there were no effect of
the IV?
Random Variance + Variance Due to the Independent variable = Random Variance + 0
Random Variance
Random Variance
We would expect to find a ratio of one. The amount that the ratio differs from 1 can be attributed to
variation due to the IV. So, if the IV has an effect, the ratio of Between Groups Variance to Within
Groups Variance will be greater than one (i.e., the numerator will be greater than the denominator). This is
called an F ratio. Because we are only using estimates of variances we do not expect that the ratio will
always be one even when the IV does not have an effect. How much the F ratio needs to be above 1 for us
to be 95% sure that there really is an effect of the Independent variable can be determined using probability
theory, Again (lucky us!) this is done for us by the computer program. SPSS provides the following type
of output.
4
Tests of Between-Subjects Effects
Dependent Variable: Happiness rating
Source
Sum of Squares
Marital Status
31.033
Error
168.300
Total
2126.000
df Mean Square
2
15.517
57
2.953
60
F
5.255
Sig.
.008
F is the ratio of Mean Squared Variance due to Marital Status (Between Group Variance) and Mean
Squared Error (Within Group Variance). Sig. (in the final column) is the probability (p value) of making
a Type I error. Because p is less than .05 we reject the null hypothesis and conclude that there is a
significant difference between at least two of the means. We would report this result by stating that “A
One-way ANOVA determined that Happiness ratings significantly differ among Marital Status Groups
(F(2,57) = 5.26; p = .008).”
The numbers in the parenthesis following the letter F are the degrees of freedom (df) associated with this
analysis. The first is the degrees of freedom between groups and the second is the degrees of freedom
within groups. They are related to the number of conditions and the number of subjects. They must be
reported in APA reports. They are always reported in parenthesis after the letter F and are always reported
in the order between Groups df, within groups df separated by a comma.
A significant result for a One-way ANOVA allows us to reject the null hypothesis and conclude that the
scientific hypothesis is a valid conclusion. (Remember the Scientific hypothesis is that there are
differences between at least 2 of the groups.) The One-Way ANOVA does not tell us, which groups differ
from each other. To determine that we need to make individual comparisons
When the F ratio is significant, SPSS continues the analysis by running Multiple Comparisons of the sets
of means to determine which means are significantly different from each other. These are very much like
doing t-tests between all combinations of the means. Why can we do them now? Having found a
significant F ratio from the One-way ANOVA we know that the level of Type I error is limited to 5%
for the entire set of group comparisons. So, it is safe to go ahead and do the three separate comparisons.
Since the pattern of differences we found was unlikely to occur (p < .05) if we just selected three samples
randomly from a population we can be assured that any significant differences we find in the multiple
comparisons are not due to having done multiple test (like throwing the dice three times) but are actually
due to the effects of the IV.
There are several ways of doing these Multiple Comparisons. I have had SPSS do a Least Squares
Difference Test (LSD). Looking at the Multiple Comparisons Table below, the difference between each
set of means is listed in the third column, and the p value in the last column. The LSD multiple
comparisons analysis determined that married people rate themselves as significantly more happy then
single people (p = .025) and than divorced people (p = .003), whereas single and divorced people do not
differ on Happiness ratings.
Multiple Comparisons
Dependent Variable: Happiness rating
LSD
Mean
Std. Error
Difference
(I-J)
(I) Marital (J) Marital
Status
Status
Single
Married
-1.2500
.5434
Divorced
.4500
.5434
Married
Single
1.2500
.5434
Divorced
1.7000
.5434
Divorced
Single
-.4500
.5434
Married
-1.7000
.5434
Sig.
.025
.411
.025
.003
.411
.003
5
If you were answering an exam question, I would be looking for the following.
A statement about the one-way ANOVA:
The one way ANOVA was significant (F(2,57) = 5.255, p = .008).
If the one-way ANOVA is significant, then you need to give a FULL interpretation of the Least
Squares Difference Multiple comparisons.
If there are three conditions you need to make three statements.
If there are four conditions you need to make 6 comparisons.
If there are five conditions you need to make 10 comparisons.
In our case
The LSD multiple comparisons analysis determined that married people rate themselves as significantly
more happy than single people (p = .025) and then divorced people (p = .003), whereas single and divorced
people do not differ on Happiness ratings.
Within Subject Designs
When the design of the study is Within Subjects, the Post Hoc Multiple comparisons would be paired ttests instead of LSDs. We will see an example of this in concept checks.