Download Testing Differences between Means continued

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Testing Differences between
Means, continued
Statistics for Political Science
Levin and Fox
Chapter Seven
Testing Differences between Means
To test the significance of a mean difference we need to find the standard
deviation for any obtained mean difference.
However, we rarely know the standard deviation of the distribution of mean
differences since we rarely have population data. Fortunately, it can be
estimated based on two samples that we draw from the same population.
Remember this formula required the standard deviation of the distribution of
mean differences.
Step 2b: Translate our sample mean difference into units of standard deviation.
Z =
X1
X2
Where
(X
= mean of the first sample
1
– X 2) - 0
X
1X 2
= mean of the second sample
0 = zero, the value of the mean of the sampling distribution of
differences between means (we assume that µ1 - µ2 = 0)
X
1X 2
= standard error of the mean (standard deviation of the
distribution of the difference between means)
We can reduce this equation down to the following:
z
X1  X 2
X
1X 2
3
Child Rearing: Comparing Males and Females
Result: (assuming 
X 1X 2
equals 2)
( 45 – 40)
Z =
2
Z =
+ 2.5
Thus, a difference of 5 between the means of the two samples (women and
men) falls 2.5 standard deviations from a mean of zero.
4
Standard Error of the Difference
between Means
Here is how the standard error of the difference between means can be
calculated.
sx1 x 2
 N s  N s  N1  N 2 


 
 N1  N 2  2  N1 N 2 
The formula for
2
1 1
sX 1X 2
2
2 2
combines the information from the two samples.
Where
The formula for
samples.
2
1
s
X


2
1
s
2
2
X


2
2
sX 1X 2
N1
N2
X
2
1
X
2
2
combines the information from the two
A large difference between Xbar1 and Xbar2 can result if (1) one mean is
very small, (2) one mean is very large, or (3) one mean is moderately
small and the other is moderately large.
Variance: Weeks on Unemployment:
Step 1:
Calculate
the Mean
Step 2: Calculate Step 3: Calculate
Deviation
Sum of square Dev
X
(weeks)
N=6
9
8
6
4
2
1
ΣX=30
χ= 30=5
6
Deviation:
(X  X)
(X  X)
2
(raw score from
the mean)
(raw score from the
mean, squared)
9-5= 4
8-5=3
6-5=1
4-5=-1
2-5=-3
1-5=-4
42 = 16
32 = 9
12 = 1
-12 = 1
-32 = 9
-42 = 16
2
(X

X)
 52

Step 4: Calculate
the Mean of squared dev.
Variance:
s
2



XX
N
52
 8.67
6
(weeks squared)

2
Testing the Difference between Means
Let’s say that we have the following information about two samples, one of
liberals and one of conservatives, on the progressive scale:
Liberals
Conservatives
N1 = 25
N2 = 35
X 1 = 60
X 2 = 49
S1 = 12
S2 = 14
We can use this information to calculate the estimate of the standard
error of the difference between means:
We start with
our formula:
sx1 x 2
sx1 x 2
 N1s12  N 2 s22  N1  N 2 


 
 N1  N 2  2  N1 N 2 
 (25)(12) 2  (35)(14) 2  25  35 


 
25  35  2

 (25)(35) 
 3,600  6,860  60 
 


58

 875 
 (180.3448)(. 0686)
 12.3717
 3.52
The standard error of the difference between means is 3.52.
We can now use our result to translate the difference between sample
means to a t ratio.
We can now use our standard error results to change difference between
sample mean into a t ratio:
X1  X 2
t
s X1  X 2
t = 60 – 49
3.52
t = 11
3.52
t = 3.13
REMEMBER: We use t
instead of z because we do
not know the true population
standard deviation.
We aren’t finished yet!
Turn to Table C.
1) Because we are estimating for both σ1 and σ2 from s1 and s2, we use a
wider t distribution, with degrees of freedom N1+ N2 – 2.
2) For each standard deviation that we estimate, we lose 1 degree of
freedom from the total number of cases.
N = 60
Df ( 25 + 35 - 2) = 58
In Table C, use a critical value of 40 since 58 is not given.
We see that our t-value of 3.13 exceeds all the standard critical points except
for the .001 level.
df
.20
.10
.05
.02
.01
.001
40
1.303
1.684
2.021
2.423
2.704
3.551
Therefore, based on what we established BEFORE our study, we reject the
null hypothesis at the .10, .05, or .01 level.
Comparing the Same Sample Measured Twice
Some research employs a panel design or before and after test (testing the
same sample at two points in time).
In these types of studies, the same sample is tested twice. It is not two
samples from the same population, it is a measuring the same group of
people twice.
CRITICAL POINTS TO NOTE:
1. The same sample measured twice uses the t-test of difference
between means.
2. Different samples from the same population selected at two points in
time use the t-test of difference between means for independent
groups.
Example Problem of Test of Difference Between
Means for Same Sample Measured Twice
Null Hypothesis (µ1 = µ2): The degree of neighborliness does not differ before and
after relocation.
Research Hypothesis (µ1 ≠ µ2): The degree of neighborliness differs before and
after relocation.
 Where µ1 is the mean score of neighborliness at time 1
 Where µ2 is the mean score of neighborliness at time 2
Before
(X1)
After
(X2)
Difference
(D = X1 – X2)
Difference2
(D2)
Johnson
2
1
1
1
Robinson
1
2
-1
1
Brown
3
1
2
4
Thomas
3
1
2
4
Smith
1
2
-1
1
Holmes
4
1
3
9
∑ X1 = 14
∑ X2 = 8
Respondent
∑ D2 = 20
The formula for obtaining
the standard deviation for
the distribution of beforeafter difference scores
sD 
D
N
2
 (X 1  X 2)
sD = standard deviation of the distribution of before-after difference scores
D = after-move raw score subtraction from before-move raw score
N = number of cases or respondents in sample
From this, we get the formula for the standard error of the difference
between the means:
SD 
SD
N 1
2
Step 1: Find mean for each point in time
X1 
X
n
14
=
6
= 2.33
1
X2 
=
X
2
n
8
6
= 1.33
Step 2: Find the SD for the diff between
the times
20
2
sD 
 (2.33  1.33)
6
= 1.53
Step 3: Find the SE for the diff
between the times
1.53
SD 
6 1
= .68
Step 4: Translate the mean diff into a t
score
X1  X 2
t
sD
t = 60 – 49
3.52
t = 3.13
Comparing the Same Sample Measured Twice
Step 5: Calculate the degrees of freedom
df = (n – 1)
=6–1
=5
Step 6: compare the obtained t ratio with t ratio in Table C
Obtained t = 1.47
Table t = 2.571
df = 5
α = .05
df
.20
.10
.05
.02
.01
.001
5
1.476
2.015
2.571
3.365
4.032
6.859
To order reject the null hypothesis at the .05 significance with five degrees of
freedom we must obtain a calculated t ratio of 2.571. Because our t ratio is
only 1.47 – we retain the null hypothesis.
Two Sample Test of Proportions
P1  P2
z
s P1  P2
The standard error of the
difference in proportions is:
Where P* is the combined
sample proportion
sP1  P2
Where P1 and P2 are
respective sample proportions.
 N1  N 2 

 P * (1  P*)
 N1 N 2 
N1 P1  N 2 P2
P* 
N1  N 2
Requirements when considering the appropriateness of the tratio as a test of significance. (For Testing the Difference
between Means):
1.
2.
3.
4.
5.
The t ratio is used to make comparisons between two means.
The assumption is that we are working with interval level data.
We used a random sampling process.
The sample characteristic is normally distributed.
The t ratio for independent samples assumes that the population
variances are equal.
So how do you interpreting the results and state them for
inclusion in your research?
“Since the observed value of t (state the test statistic) exceeds
the critical value (state the critical value), the null hypothesis
is rejected in favor of the directional alternative hypothesis.
The probability that the observed difference (state the
difference between means) would have occurred by chance,
if in fact the null hypothesis is true, is less than .05.”