Download Comparing Means Between Groups

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Omnibus test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Comparing Means Between Groups
Michael Ash
Lecture 6
Summary of Main Points
I
Comparing means between groups is an important method for
program evaluation by policy analysts and public
administrators.
I
I
I
The question “Does a program work?” is often answered in
terms of the program’s effect on the mean of an important
outcome variable by comparing the mean of a treated group
and a comparison group.
Comparing means between groups is an important method for
identifying discrimination and other social problems.
Examples: income by white or non-white; drop-out risk by
single-parent or two-parent household; body mass index (BMI)
by urban or suburban residence.
The treated group and the comparison group are samples
from two different populations. Sampling variation (rather
than true underlying differences in the populations) may
account for differences in the sample mean.between groups.
I
Statistical methods apply
Caveats
Outcome The outcome must measure something worth
knowing.
Confounding factors and selection into treatment The treatment
and comparison groups may be different other than
in the receipt of treatment.
Mean The population mean does not fully describe the
distribution of outcomes. For example, two groups
with equal population mean income could have
different probabilities of extreme poverty.
Means for Different Populations
Example of two populations
1. population of women recently graduated from college, mean
earnings µw
2. population of men recently graduated from college, mean
earnings µm
Hypothesis Test for the Difference Between Two Means
The null hypothesis is that the difference is some amount d 0
specified by the researcher.
H0 : µ m − µ w
H1 : µ m − µ w
= d0
6= d0
For example, d0 = 0 would set up the test that there is no
difference in mean earnings between recent male and female
college graduates.
Procedure to Test a Null about Differences
1. Y w is a good estimate of µw , and Y m is a good estimate of
µm
2. Y m − Y w is a good estimate of the difference in population
means, µm − µw
3. Y w and Y m are subject to sampling variation, as is the
difference Y m − Y w . We will need an estimate of the
standard deviation of Y m − Y w .
4. We want to know if, under the null hypothesis, the r.v.
(Y m − Y w ) − d0 , the difference between the difference in
sample means and the null-hypothesized difference between
population means, is likely to be as large as the observed
actual difference between the sample means our particular
sample and the null-hypothesized difference between
population means.
The Hypothesis Test
I
A test statistic for the difference between the difference in
sample means and the null-hypothesized difference in
population means
t=
I
(Y m − Y w ) − d0
SE (Y m − Y w )
This test statistic is distributed N(0, 1) if the two samples are
reasonably large. If the test statistic is “large” (bigger than
1.96), then we reject the null hypothesis. Why? The actual
difference in sample means is unlikely to be as big as it is if the
null were true.
Standard error of the difference in sample means
SE (Y m − Y w ) =
I
s
2
sm
s2
+ w
nm
nw
2 , Sample variance for men’s earnings
sm
n
2
sm
=
m
2
1 X
Yi − Y m
nm − 1
i =1
I
sw2 , Sample variance for women’s earnings
n
sw2 =
w
2
1 X
Yj − Y w
nw − 1
j=1
Real-world data
I
Table 3.1 presents
summary statistics
from real-world
data
I
Useful exercise to
think about the
underlying data
I
What is the unit of
observation?
I
What variables are
reported for each
observation?
. use cps_ch3
. list in 1/7
+-------------------------+
| a_sex
year
ahe98 |
|-------------------------|
1. |
1
1992
12.99912 |
2. |
1
1992
11.61796 |
3. |
1
1992
17.37729 |
4. |
2
1992
10.06127 |
5. |
1
1992
16.75668 |
|-------------------------|
6. |
2
1992
9.216171 |
7. |
2
1992
15.95874 |
+-------------------------+
Comparing means with Stata
Stata can tabulate and summarize data for us.
. tabulate a_sex if year==1992, summarize(ahe98)
|
Summary of ahe98
a_sex |
Mean
Std. Dev.
Freq.
------------+-----------------------------------1 |
17.574572
7.4964888
1591
2 |
15.220472
5.9732026
1371
------------+-----------------------------------Total |
16.484946
6.932766
2962
With just one command, we have moved from “raw” individual
data to the summary statistics in the first line of Table 3.1. (Think
about how long it would take to do this in Excel—or by hand)
Comparing means with Stata
In fact, we can now test a null of equality (d 0 = 0)of mean hourly
earnings for men and women in 1992, or H 0 : µm − µw = 0
t =
=
=
(Y m − Y w ) − d0
SE (Y m − Y w )
17.57 − 15.22 − 0
SE (Y m − Y w )
2.35
SE (Y m − Y w )
SE (Y m − Y w ) =
s
r
2
s2
sm
+ w
nm nw
5.972
7.502
+
1591
1371
= 0.25 or 25 cents per hour
=
Aside: is this SE, $0.25, plausible?
√
7.50/ 1591 = 0.18
√
The SE for women’s earnings
= 5.97/ 1371 = 0.16
The SE for the difference should not be tremendously different
from the SE for each group. (If you computed an SE of 7, you
should be worried.)
The SE for men’s earnings is
√sm =
nm
is √snww
Returning to our test statistic
t =
=
=
=
=
(Y m − Y w ) − d0
SE (Y m − Y w )
17.57 − 15.22 − 0
SE (Y m − Y w )
2.35
SE (Y m − Y w )
2.35
0.25
9.35
This is a very large t-statistic (a t-statistic of 2 is all that is
required to reject the null hypothesis. So we reject the null
hypothesis of equal wages with very high confidence (very low
probability that the difference in sample means is only due to
sampling variation).
Applying the method to a different null
Is the difference between male and female earnings $1.50?
H0 : µm − µw = 1.50
t =
=
=
=
=
(Y m − Y w ) − d0
SE (Y m − Y w )
(17.57 − 15.22) − 1.50
SE (Y m − Y w )
0.85
SE (Y m − Y w )
0.85
0.25
3.4
We can reject this null hypothesis as well (Pr(|t| > 3.4) < 0.001)
Young men’s earnings over time
H0 : µm,1998 − µm,1992 = 0
t =
=
=
=
=
(Y m,1998 − Y m,1992 ) − d0
SE (Y m,1998 − Y m,1992 )
(17.94 − 17.57) − 0
SE (Y m,1998 − Y m,1992 )
0.37
SE (Y m,1998 − Y m,1992 )
0.37
0.28
1.31
N.B. We are looking at two different samples of young men from
two different cohorts.
Young men’s earnings over time
1. This t-statistic is well below 1.96.
2. Pr(|t| > 1.31) = 0.19, or 19 percent of the time the sample
means will differ this much if there is no true difference in the
population means.
3. We cannot reject the null hypothesis with 95 percent
confidence: there is no evidence that the wages of recent male
college graduates was higher in the late 1990s than it had
been in the early 1990s.
Bernoulli outcomes
Very common application.
1. What is the percent of positive outcomes (Y = 1) in the
population?
2. Does the percent of positive outcomes (Y = 1) differ between
two groups?
Methods are identical to the method for continuous variables, but
the interpretation and computations differ slightly.
Do you approve of the job . . . is doing as your President?
Bernoulli outcomes
An individual’s response is yes Yi = 1 or no Yi = 0.
Call px the mean population approval of President x and p̂ x the
mean sample approval of President x. Note that
n
1X
Yi
p̂x =
n
i =1
Sample Size
President I 250
President II 300
(Think about the underlying
Percent “yes”
0.54
0.44
data.)
Is approval different from 50 percent?
H0 : pI = 0.5
t =
SE (p̂I ) =
=
=
t =
p̂I − pI ,0
p̂I − 0.5
0.54 − 0.5
=
=
SE (p̂I )
SE (p̂I )
SE (p̂I )
s
sY2
no difference so far
n
r
p̂(1 − p̂)
special sY2 for a Bernoulli variable
n
r
0.54 · 0.46
≈ 0.031
250
0.54 − 0.5
= 1.27
0.031
The t statistic is smaller than 1.96; so we cannot reject the null
hypothesis.
Polling: margin of error
By the way, poll results are often expressed with a “margin of
error” that is, in fact the 95 percent confidence interval.
Pr(p̂ − 1.96SE (p̂) ≤ p ≤ p̂ − 1.96SE (p̂)) = 0.95
Pr(0.54 − 1.96 × 0.031 ≤ p ≤ p̂ + 1.96 × 0.031) = 0.95
Pr(0.54 − .06 ≤ p ≤ 0.54 + 0.06) = 0.95
Pr(0.48 ≤ p ≤ 0.60) = 0.95
The margin of error would be reported as ±1.96 × SE (p̂) = ±0.06
Note the importance of sample size for determining standard error
and the margin of error of a poll:
r
p̂(1 − p̂)
SE (p̂) =
n
You can push down the SE , and the margin of error, by increasing
the sample size.
Approval rating for two presidents
Is approval for President I different from approval for President II?
H0 : pI − pII = 0
t =
SE (p̂I − p̂II ) =
=
=
t =
(p̂I − p̂II ) − d0
(p̂I − p̂II ) − 0
0.54 − 0.44
=
=
SE (p̂I − p̂II )
SE (p̂I − p̂II )
SE (p̂I − p̂II )
s
sY2
sY2 I
+ II
nI
nII
s
p̂I (1 − p̂I ) p̂II (1 − p̂II )
+
nI
nII
r
0.54 · 0.46 0.44 · 0.56
+
≈ 0.0426
250
300
0.54 − 0.44
= 2.35
0.0426
We can reject the null that approval for the two candidates is
equal.