Download Testing the Differences between Means

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Testing the Differences between Means
Statistics for Political Science
Levin and Fox
Chapter Seven
1
What is hypothesis testing?
When we evaluate sample data collected about a particular population and
see how likely the sample results are, given our hypothesis about the
population.
If the sample results are plausible under the hypothesis about the
population, we accept the hypothesis.
If the sample results are unlikely (less than 5 chances in 100) we then reject
the hypothesis (or retain the null) and attribute any departure form our
expected results to be pure chance based on sampling error.
2
The Null Hypothesis
Null Hypothesis:
It is the hypothesis that says that two samples have been drawn from equivalent
populations. Any observed difference between samples is a result of chance
occurrence resulting from sampling error alone. The difference in sample
means does not imply a difference in population means.
To conclude that sampling error is responsible for obtaining a difference
between sample means is to retain the null hypothesis:
µ1 = µ2
Where µ1 = mean of the first population
µ2 = mean of the second population
To Retain: Does not imply that we have proven the population means are equal,
but rather that we lack sufficient evidence to say otherwise (that is, to say
they there is a difference between the populations).
3
The Research Hypothesis for Means Difference
Research Hypothesis:
If we reject the null hypothesis, then we automatically accept the research
hypothesis that a true population difference does exist. The difference
between sample means is too large to be accounted for by sampling error.
The research hypothesis for mean differences is symbolized by (the population
means are not equal):
µ1 ≠ µ2
4
Null and Research Hypothesis
Hypothesis: Example
Men are more permissive than women with regards to disciplining children.
Null Hypothesis:
The null hypothesis holds that there is no difference between Men and Women (as
populations) when it comes to disciplining children. Any observed difference is the
result of sampling error (rather than actual difference).
5
Null and Research Hypothesis
Hypothesis: Example
Men are more permissive than women with regards to disciplining children.
Research Hypothesis:
The research hypothesis holds that there IS a difference between between Men and
Women (as populations) when it comes to disciplining children.
6
Sampling Distribution of Differences between Means
Sampling Distribution of Differences between Means:
Recall from our long-distance phone calling example, that if a researcher was to
take multiple samples, he/she could get a sampling distribution of means
(rather than raw scores).
Paired Samples:
What if the researcher, while gathering samples, studies or compares two
samples at a time.
7
Sampling Distribution of Differences between Means
Example: Child Rearing: Comparing Males and Females
To test the difference, a researcher constructs a scale of permissiveness
from 1 (Strict: not very permissive) to 100 (very permissive). Then they
study two random samples of 30 men and 30 women.
Results:
Women: (sample mean) = 58.0 (more permissive)
Men: (sample mean) = 54.0 (less permissive)
Difference Between Means: (58.0 – 54.0) = + 4.0
Is the difference the result of chance alone/sampling error (Null hypothesis)? Or is there
a difference between men and women (as populations) (Research Hypothesis)?
8
Sampling Distribution of Differences between Means
Example: Child Rearing: Comparing Males and Females
What if the researcher continued to take samples, and took 70 additional pairs
of samples, each containing 30 women and 30 men. This would give us, as it
did with the first two paired samples a difference between the means.
Sampling Distribution of Differences between Means:
And, just as we did first with raw scores, and then sample means, once we
have a distribution of mean differences we can construct a Sampling
Distribution of Differences between Means.
9
Sampling Distribution of Differences between Means
Example: Child Rearing: Comparing Males and Females
What if the researcher continued to take samples, and took 70 additional pairs
of samples, each containing 30 women and 30 men.
Samples: Women (30 in each) Samples: Men (30 in
each)
Population:
µ=?
Note: You always subtract the
second sample mean (men) from
the first sample mean (women).
1
57
_
54
=+3
2
55
_
56
=-1
3
59
_
57
=+2
…
70
10
The Purpose and Function of a Sampling Distribution of
Differences between Means
Child Rearing: Males and Females
Here is what it looks like as a
frequency distribution.
Mean Difference
f
+3
1
+2
5
+1
7
0
13
-1
8
-2
4
-3
1
N=
35
11
Testing Hypotheses with the Distribution of Differences
between Means
Sampling Distribution of Differences between Means:
1) It assumes that all sample pairs differ only by virtue of sample error
and not as a function of true population differences.
2) The mean of the difference between means equals zero (this is so
because the resulting positive and negative numbers tend to cancel each
other out.
3) Approximates the normal curve (most of the mean differences fall near
zero, which is expected since any difference between means is a product
of sampling error.)
12
Testing Hypotheses with the Distribution of Differences
between Means
Probability and Sampling Distribution of Differences between Means:
Since Sampling Distribution of Differences between Means approximates the
normal curve, we can use the properties of the normal curve to make
statements of probability about mean differences, specifically whether it
is likely or not that the mean difference is a result of chance/sampling
error or true population differences.
13
Testing Hypotheses with the Distribution of Differences
between Means
Null
Research
Closer to zero, more
Further from zero, less
likely to be sample error
likely to be sample error
14
Probability and Sampling Distribution of Differences between
Means:
If the obtained difference between means lies so far from a difference of zero
that it has only a small probability of occurrence in the sampling distribution
of differences between means, we reject the null hypothesis.
If our sample mean difference falls so close to zero that its probability of
occurrence is large, we must retain the null hypothesis and treat the
obtained difference as a sampling error.
15
Testing Hypotheses with the Distribution of Differences
between Means
Example: Child Rearing: Comparing Males and Females
What if the researcher examines one pair (as opposed to 70 pairs) containing
30 men and 30 women. (Subtract second mean from the first.)
Results:
Women: (sample mean) = 45.0
Men: (sample mean) = 40.0
Difference Between Means: (45.0 – 40.0) = + 5.0
How far does + 5.0 fall from the mean of zero?
16
Child Rearing: Comparing Males and Females
So, to determine how far our obtained difference betweens lies from the mean
difference of zero, we must translate our obtained difference into units of
standard deviation.
Step 1: Recall this formula for standardizing units of deviation: Raw Score
Z =
X-µ
σ
Where
x = raw score
µ = mean of the distribution of raw scores
σ = standard deviation of the distribution of raw scores
17
Child Rearing: Comparing Males and Females
Step 2a: Use this formula as step in translating the mean scores in a distribution
of sample means into units of standard deviation.
Z =
X
-µ
σ
Where
X
X
= sample mean
µ = population mean (mean of means)
σ
= standard error of the mean (standard deviation of the
X
distribution of means)
18
Child Rearing: Comparing Males and Females
Step 2b: Translate our sample mean difference into units of standard deviation.
Z =
X1
X2
Where
(X
1
– X 2) - 0
X
1X 2
= mean of the first sample
= mean of the second sample
0 = zero, the value of the mean of the sampling distribution of
differences between means (we assume that µ1 - µ2 = 0)
X
1X 2
= standard error of the mean (standard deviation of the
distribution of the difference between means)
We can reduce this equation down to the following:
z
X1  X 2
X
1X 2
19
Child Rearing: Comparing Males and Females
Result: (assuming 
X 1X 2
Z =
equals 2)
( 45 – 40)
2
Z = + 2.5
Thus, a difference of 5 between the means of the two samples (women and
men) falls 2.5 standard deviations from a mean of zero.
20
What is the probability that a difference of 5 between sample means
could be caused by sampling error?
The probability of getting 5
or move (above or below
the mean) because of
sample error is roughly P =
.01 (1 in a 100). 5 and
above P = .006 (.06 in
100).
z = 2.50
1.24 %
P =.012
P =.4938
0
49.38%
.62 %
P =.4938
P=.006
5
Levels of Significance
Is a mean difference of 5, which has a P = .01 chance of resulting from
sample error statistically significant, that is, does it result from population
difference?
Levels of Significance: We need to establish this to determine whether or
not our obtained sample difference is statistically significant.
The α (alpha) value is the level of probability at which the null hypothesis can be
rejected with confidence and the research hypothesis accepted with
confidence.
We decide to reject the null hypothesis if the probability is very small. This is
symbolized as
P ≤ .05
(P is less than or equal to .05)
22
Things to Know about Levels of Significance:
A small probability is symbolized by
– P ≤ .05
Alpha is generally defined as (95 % Confidence Interval)
– α = .05 level of significance
This means that we are willing to reject the null hypothesis if an obtained
sample difference occurs by chance less than 5 times out of 100.
Thus, a mean difference of 5, between men women with regards to their
approach to child-rearing is statistically significant, and is not the result of
sampling error but differences between the populations.
23
Child Rearing: Comparing Males and Females
Thus, a mean difference of 5, between men women with regards to their
approach to child-rearing is statistically significant, and is not the result
of sampling error but differences between the populations.
24
Critical Values
In this case, the z scores are called critical values.
With α = .05, the z score ±1.96 is a critical value.
If we obtain a z score that exceeds 1.96 (z>1.96 or z<-1.96), it is statistically
significant.
Critical or rejection regions are those areas beyond the z score to the tail of the
normal curve and scores within these areas lead us to reject the null
hypothesis.
25
Critical Values: Z Score
2.50 %
z = -1.96
z = 1.96
47.5 %
47.5 %
95%
2.50 %
If we obtain a z score that
exceeds 1.96 it is called
statistically significant.
Statistically
Significant:
reject Null
Hypothesis
Critical Values: Z Score
Statistically Insignificant: accept Null Hypothesis
Statistically
Significant:
reject Null
Hypothesis
2.50 %
2.50 %
z= 1.96
0
z=
+1.96
Significance levels
Significance levels can be set up for any degree of probability.
NOTE: Levels of significance do not give us an absolute statement as to the
correctness of the null hypothesis. We can choose to accept or reject the null
hypothesis anyway.
28
Type I Errors
Type 1 Error: Rejecting the hypothesis when it should have been retained
For example, if we reject the the null hypothesis at the .05 level of significance
and conclude that there are gender differences in child-rearing attitudes,
then there are 5 chances out of 100 that we are wrong. Or P = .05 that we
committed a Type I error and that gender actually has no effect.
The more stringent our level of significance (the farther out in the tail it lies),
the less likely we are to make a Type 1 error.
The probability of a Type I error is represented by α or alpha.
29
Type II Errors
Type II Error: Accepting the null hypothesis when it should have been
rejected
The farther out in the tail of the curve that our critical value falls, the
greater the risk of a Type II error.
The research hypothesis may still be correct, despite the decision to reject
it and retain the null hypothesis.
One method for reducing the risk of a Type II error is to increase the size
of the sample so that the true population difference is more likely to be
represented.
The probability of a Type II error is β or beta.
30
Type I: Reject Null, when
we should have retained it
Error Types: Type I
Example: 95% confidence
interval, α =.05, z =1.96
The larger the significance
level, and thus % on the tail,
the more likely we are to
mistakenly reject the null.
2.50%
2.50%
z= 1.96
0
z=
+1.96
Error Types: Type I
Type I: Reject Null, when
we should have retained it
Example: 99% confidence
interval, α =.01, z =2.58
The smaller the significance
level, and thus % on the tail,
the less likely we are to
mistakenly reject the null.
.5%
z= 2.58
.5%
0
z=
+258
Error Types: Type II
Type II: Accept Null, when
we should have rejected it
Example: 99% confidence
interval, α =.01, z =2.58
The smaller the significance
level, and thus % on the tail,
the more likely we are to
mistakenly accept the null.
.5%
z= 2.58
.5%
0
z=
+258
Some notes on Type I and Type II Errors
The probabilities of Type I and Type II errors are inversely related.
The larger the level of significance, the larger the chance of a Type I error.
We predetermine our level of significance for a hypothesis test depending on
which error is the least damaging or costly.
If it would be far worse to reject a true null hypothesis (that is, suggest
statistically significance, or different populations where there is none) (Type
I error) than to retain a false null hypothesis (to suggest there is no
population difference where there is difference) (Type II error), we should
use a use a smaller level of significance: α = .01
34
The Difference between P and α
P is the exact probability that the null hypothesis is true in light of some
sample data.
Alpha is the threshold below which is considered so small that we decide to
reject the null hypothesis.
We reject the null hypothesis if the P value is less than the alpha value.
35
The Difference between P and α
Example: a mean
difference of 5 has a P of
.006 x2 = roughly .01 (1 in a
100), whereas α = .05 cuts
off the null hypothesis at
.025 x2 = .05 (5 chances in
100).
z = 2.50
49.38%
.62 %
P =.4938
P=.006
α =.05/z =1.96: 95.0%
2.50%
P = .025%
z= 1.96
0
z=
+1.96
5
The Difference between P and α
Example: a mean
difference of 5 has a P of
.006 x2 = roughly .01 (1 in a
100), whereas α = .05 cuts
off the null hypothesis at
.025 x2 = .05 (5 chances in
100).
Any mean difference below
5 chances in 100
Supports the research
hypothesis.
Statistically
Significant
.62%
P =.006
2.50%
P = .025%
z= 1.96
0
z=
+1.96
5