Download Testing the Differences between Means

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Testing the Differences between Means
Statistics for Political Science
Levin and Fox
Chapter Seven
1
Exam 4 Review Topics
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Null hypothesis
Research hypothesis
Standard error of the difference between means
t score/ratio
Degrees of freedom for t score/ratio
t chart (table c)
Sum of squares (total, within, between)
Mean square
Between-groups and within-groups degrees of freedom
Between-groups and within-group mean square
f ratio
2
Standard Error of the Difference between Means?
Are two populations the same:
µ1 = µ2
If so, retain the null hypothesis.
Are two populations different:
µ1 ≠ µ2
If so, reject the null hypothesis (accept hypothesis).
3
What is hypothesis testing?
When we evaluate sample data collected about a particular population and
see how likely the sample results are, given our hypothesis about the
population.
If the sample results are plausible under the hypothesis about the
population, we retain the hypothesis and attribute any departure form
our expected results to pure chance based on sampling error.
If the sample results are unlikely (less than 5 chances in 100) we then reject
the hypothesis.
4
The Null Hypothesis
Null Hypothesis:
It is the hypothesis that says that two samples have been drawn from equivalent
populations. Any observed difference between samples is a result of chance
occurrence resulting from sampling error alone. The difference in sample
means does not imply a difference in population means.
To conclude that sampling error is responsible for obtaining a difference
between sample means is to retain the null hypothesis:
µ1 = µ2
Where µ1 = mean of the first population
µ2 = mean of the second population
To Retain: Does not imply that we have proven the population means are equal,
but rather that we lack sufficient evidence to say otherwise (that is, to say
they there is a difference between the populations).
5
The Research Hypothesis for Means Difference
Research Hypothesis:
If we reject the null hypothesis, then we automatically accept the research
hypothesis that a true population difference does exist. The difference
between sample means is too large to be accounted for by sampling error.
The research hypothesis for mean differences is symbolized by (the population
means are not equal):
µ1 ≠ µ2
6
Sampling Distribution of Differences between Means
Sampling Distribution of Differences between Means:
Recall from our long-distance phone calling example, that if a researcher was to
take multiple samples, he/she could get a sampling distribution of means
(rather than raw scores).
Paired Samples:
What if the researcher, while gathering samples, studies, or compares two
samples at a time.
7
Testing Hypotheses with the Distribution of Differences
between Means
Sampling Distribution of Differences between Means:
1) It assumes that all sample pairs differ only by virtue of sample error
and not as a function of true population differences.
2) The mean of the difference between means equals zero (this is so
because the resulting positive and negative numbers tend to cancel each
other out.
3) Approximates the normal curve (most of the mean differences fall near
zero, which is expected since any difference between means is a product
of sampling error.)
8
Testing Hypotheses with the Distribution of Differences
between Means
Probability and Sampling Distribution of Differences between Means:
Since Sampling Distribution of Differences between Means approximates the
normal curve, we can use the properties of the normal curve to make
statements of probability about mean differences, specifically whether it
is likely or not that the mean difference is a result of chance/sampling
error or true population differences.
9
Testing Hypotheses with the Distribution of Differences
between Means
Null
Research
Closer to zero, more
Further from zero, less
likely to be sample error
likely to be sample error
10
Probability and Sampling Distribution of Differences between
Means:
If the obtained difference between means lies so far from a difference of zero
that it has only a small probability of occurrence in the sampling distribution
of differences between means, we reject the null hypothesis.
If our sample mean difference falls so close to zero that its probability of
occurrence is large, we must retain the null hypothesis and treat the
obtained difference as a sampling error.
11
Testing Hypotheses with the Distribution of Differences
between Means
Example: Child Rearing: Comparing Males and Females
What if the researcher examines one pair (as opposed to 70 pairs) containing
30 men and 30 women. (Subtract second mean from the first.)
Results:
Women: (sample mean) = 45.0
Men: (sample mean) = 40.0
Difference Between Means: (45.0 – 40.0) = + 5.0
How far does + 5.0 fall from the mean of zero?
12
Child Rearing: Comparing Males and Females
Step 2b: Translate our sample mean difference into units of standard deviation.
Z =
X1
X2
Where
(X
1
– X 2) - 0
X
1X 2
= mean of the first sample
= mean of the second sample
0 = zero, the value of the mean of the sampling distribution of
differences between means (we assume that µ1 - µ2 = 0)
X
1X 2
= standard error of the mean (standard deviation of the
distribution of the difference between means)
We can reduce this equation down to the following:
z
X1  X 2
X
1X 2
13
Child Rearing: Comparing Males and Females
Result: (assuming 
X 1X 2
Z =
equals 2)
( 45 – 40)
2
Z = + 2.5
Thus, a difference of 5 between the means of the two samples (women and
men) falls 2.5 standard deviations from a mean of zero.
14
What is the probability that a difference of 5 between sample means
could be caused by sampling error?
The probability of getting 5
or move (above or below
the mean) because of
sample error is roughly P =
.01 (1 in a 100)
z = 2.50
1.24 %
P =.012
P =.4938
0
49.38%
.62 %
P =.4938
P=.006
5
Levels of Significance
Is a mean difference of 5, which has a P = .01 chance of resulting from
sample error statistically significant, that is does it result from population
difference?
Levels of Significance: We need to establish this to determine whether or
not our obtained sample difference is statistically significant.
The α (alpha) value is the level of probability at which the null hypothesis can be
rejected with confidence and the research hypothesis accepted with
confidence.
We decide to reject the null hypothesis if the probability is very small. This is
symbolized as
P ≤ .05
16
Things to Know about Levels of Significance:
A small probability is symbolized by
– P ≤ .05
Alpha is generally defined as (95 % Confidence Interval)
– α = .05 level of significance
This means that we are willing to reject the null hypothesis if an obtained
sample difference occurs by chance less than 5 times out of 100.
Thus, a mean difference of 5, between men women with regards to their
approach to child-rearing is statistically significant, and is not the result of
sampling error but differences between the populations.
17
Critical Values
In this case, the z scores are called critical values.
With α = .05, the z score ±1.96 is a critical value.
If we obtain a z score that exceeds 1.96 (z>1.96 or z<-1.96), it is statistically
significant.
Critical or rejection regions are those areas beyond the z score to the tail of the
normal curve and scores within these areas lead us to reject the null
hypothesis.
18
The Difference between P and α
P is the exact probability that the null hypothesis is true in light of some
sample data.
Alpha is the threshold below which is considered so small that we decide to
reject the null hypothesis.
We reject the null hypothesis if the P value is less than the alpha value.
19
Critical Values: Z Score
2.50 %
z = -1.96
z = 1.96
47.5 %
47.5 %
95%
2.50 %
If we obtain a z score that
exceeds 1.96 it is called
statistically significant.
Critical Values: Z Score
Statistically
Significant: Statistically Insignificant: accept Null Hypothesis
reject Null
Hypothesis
Statistically
Significant:
reject Null
Hypothesis
2.50 %
2.50 %
z= 1.96
0
z=
+1.96
The Difference between P and α
Example: a mean
difference of 5 has a P of
.006 x2 = roughly .01 (1 in a
100), whereas α = .05 cuts
off the null hypothesis at
.025 x2 = .05 (5 chances in
100).
z = 2.50
49.38%
.62 %
P =.4938
P=.006
α =.05/z =1.96: 95.0%
2.50%
P = .025%
z= 1.96
0
z=
+1.96
5
The Difference between P and α
Example: a mean
difference of 5 has a P of
.006 x2 = roughly .01 (1 in a
100), whereas α = .05 cuts
off the null hypothesis at
.025 x2 = .05 (5 chances in
100).
Any mean difference below
5 chances in 100
Supports the research
hypothesis.
Statistically
Significant
.62%
P =.006
2.50%
P = .025%
z= 1.96
0
z=
+1.96
5
Testing Differences between Means, continued
Statistics for Political Science
Levin and Fox
Chapter Seven
Testing Differences between Means
To test the significance of a mean difference we need to find the standard
deviation for any obtained mean difference.
However, we rarely know the standard deviation of the distribution of mean
differences since we rarely have population data. Fortunately, it can be
estimated based on two samples that we draw from the same population.
Testing Differences between Means
Steps for calculating standard deviation from Two Samples Means (sample
data)
1.
2.
3.
4.
5.
Calculate standard error of the difference between means
Calculate the t score
Calculate the Degrees of Freedom
Determine the alpha (.05)
Consult t chart
Standard Error of the Difference
between Means
Step One: Calculate standard error of the difference between means.
sx1 x 2
 N s  N s  N1  N 2 


 
 N1  N 2  2  N1 N 2 
The formula for
2
1 1
sX 1X 2
2
2 2
combines the information from the two samples.
Step Two: Calculate the t score:
X1  X 2
t
s X1  X 2
REMEMBER: We use t
instead of z because we do
not know the true population
standard deviation.
We aren’t finished yet!
Step 3: Calculate the Degrees of Freedom
N1+ N2 – 2.
Step 4: Determine the alpha (.05)
Step 5: Consult t chart
df
.20
.10
.05
.02
.01
.001
40
1.303
1.684
2.021
2.423
2.704
3.551
Testing the Difference between Means
Example: Lets say that we have the following information about two samples,
one of liberals and one of conservatives, on the progressive scale:
Liberals
Conservatives
N1 = 25
N2 = 35
X 1 = 60
X 2 = 49
S1 = 12
S2 = 14
We can use this information to calculate the estimate of the
standard error of the difference between means:
We start with
our formula:
sx1 x 2
sx1 x 2
 N1s12  N 2 s22  N1  N 2 


 
 N1  N 2  2  N1 N 2 
 (25)(12) 2  (35)(14) 2  25  35 


 
25  35  2

 (25)(35) 
 3,600  6,860  60 
 


58

 875 
 (180.3448)(. 0686)
 12.3717
 3.52
The standard error of the difference between means is 3.52.
We can now use our result to translate the difference between sample
means to a t ratio.
We can now use our standard error results to change difference between
sample mean into a t ratio:
X1  X 2
t
s X1  X 2
t = 60 – 49
3.52
t = 11
3.52
t = 3.13
REMEMBER: We use t
instead of z because we do
not know the true population
standard deviation.
We aren’t finished yet!
Turn to Table C.
1) N1+ N2 – 2.
2) For each standard deviation that we estimate, we lose 1 degree of
freedom from the total number of cases.
N = 60
Df ( 25 + 35 - 2) = 58
In Table C, use a critical value of 40 since 58 is not given.
We see that our t-value of 3.13 exceeds all the standard critical points except
for the .001 level.
df
.20
.10
.05
.02
.01
.001
40
1.303
1.684
2.021
2.423
2.704
3.551
Therefore, based on what we established BEFORE our study, we reject the
null hypothesis at the .10, .05, or .01 level.
Analysis of Variance
Statistics for Political Science
Levin and Fox
Chapter Seven
Analysis of Variance
Sometimes it is necessary to make comparisons over three or more groups.
The analysis of variance yields an F ratio (which we will cover a little bit later)
which indicates the size of the groups relative size of the variation within
each group.
The larger the F ratio, the greater the probability of rejecting the null
hypothesis and accepting the research hypothesis.
A Note on Process:
The Analysis of Variance is a multi-step process.
1.
2.
3.
Sum of Squares
Mean Square
F Ratio
The F ratio in Table D is the final step in the analysis of variance.
Sum of Squares
Sum of Squares:
The sum of squares is simply:
1.
Squaring the deviations from the mean of the distribution and
1.
Adding them up.
Now we will work to understand the components of the analysis
of variance:
The Sum of Squares: Found by squaring the deviations from the mean of a
distribution and adding these squared distributions together.
 X  X 
2
The general equation you
must know to calculate the
different types of sum of
squares.
Sum of Squares
Comparing Groups:
When groups are compared, there are more than one type of sum of squares.
Total Sum of Squares (SS total)
Between Groups Sum of Squares (SS between)
Within Groups Sum of Squares (SS within)
Each type represents the sum of squared deviations from a mean.
We will use THESE formulas for computation:
The Computational Formulas for Sum of Squares
SStotal   X
2
total
SS within   X
 N total X
2
total
2
total
  N group X
SSbetween   N group X
2
total
2
total
 N total X
2
total
2
X
 total  All the scores squared and then summed
X
N total 
Total mean of all groups combined
X group 
N group 
Mean of any group
Total number of scores in all groups combined
Number of scores in any group
Analysis of Variance is a multi-step process.
1) Sum of Squares:
a. sum of scores
b. sum of squared scores
c. number of scores
d. mean.
e. SS total
f. SS within
g. SS between
2) Mean of Squares
a. MS between
b. MS within
c. df between
d. df within
1) F Ratio
Table D
Before applying the formulas we have to find sum of scores (1), sum of squared
scores (2), number of scores (3), and mean (4)
1)
2)
3)
4)
Next we calculate the following sums of squares:
Nex
Mean Square (MS)
Mean Square (MS):
The value of the sum of squares becomes larger as variation increases.
The sum of squares also increases with sample size. Because of this, the SS
cannot be viewed as a true measure of variation.
Another measure of variation that we can use is the Mean Square.
Calculating the Mean Square for within and between groups:
MS between
SS between

df between
SS within
MS within 
df within
MSbetween = between group mean square
SSbetween = between group sum of squares
Dfbetween = between group degrees of freedom
MSwithin = within group mean square
SSwithin = within group sum of squares
Dfwithin = within group degrees of freedom
Use the following equations to obtain the correct degrees of freedom:
df between  k  1
df within  N total  k
k = number of groups
Calculating the mean Square (using Table 8.2 data)
df
between
= k (# of groups) - 1
df within = N
groups)
MS
between
MS
within
total
(# of cases) – K (number of
= SS between
df between
=
SS within
df within
The F Ratio
The analysis of variance yields an F ratio.
The F ratio is the variance between groups and variation within groups
compared.
MS between
F
MS within
The larger our calculated F ratio, the
increased likelihood that we will have a
statistically significant result.
1. Go to Table D in Appendix B.
2. Use the dfbetween (the numerator) across the top of the table.
3. Use the dfwithin (the denominator) along the side of the table.
Example: does family size vary by religious affiliation?
Step 1: Find the mean for each
sample
Step 2: Cal. (1) Sum of scores, (2)
sum of sq. scores, (3) number of
subjs., (4) and mean
1)
2)
3)
4)
Finding: Reject Null Hypothesis: Family size does vary by religion
To reject the null hypothesis at the .05 significance level with 2 and 12
degrees of freedom, our calculated F ratio must exceed 3.8. Our obtained
an F ratio of 8.24, we must reject the null hypothesis.
Requirements for using the F ratio:
1) Must be a comparison between three or more means.
2) Must be working with interval data.
3) Our sample must have been collected randomly from the research
population.
4) We can/must assume that the sample characteristics are normally
distributed.
5) We must assume that the variance between samples are all equal.