Download Comparing Two Groups

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Comparing Two Groups
In this chapter (Chapter 9), we consider problems of estimation
or hypothesis testing when we have to compare two groups of
observations, instead of compatring one group with a predetermined standard.
1
Comparing Two Proportions
Dermatology example: among “light runners”, 5 out of 78 subjects had to be referred for a skin lesion conditions.
Among “heavy runners”, the number was 6 out of 31.
Is this significant evidence that the two groups are different?
2
Suppose the true proportions of individuals who have the critical
skin condition are p1 for the light runners group, and p2 for the
heavy runners group.
I: Form a confidence interval for p2 − p1.
II: Test the hypothesis H0 : p1 = p2 against Ha : p1 6= p2.
3
Confidence interval calculation:
5 = 0.0641,
p̂1 = 78
r
q
p̂1 (1−p̂1 )
0.0641×0.9359 = .0277.
=
standard error SE1 =
n
78
1
6 = 0.1935,
p̂2 = 31
r
q
p̂2 (1−p̂2 )
0.1935×0.8065 = .0710.
standard error SE2 =
=
n
31
2
The combined standard error is defined to be
SE =
q
SE12 + SE22 = .0762.
Note that p̂2 − p̂1 = 0.129.
95% confidence interval is 0.129 ± 1.96 × .0762 = (0.043, 0.341).
4
Now let’s look at a hypothesis test H0 : p1 = p2.
We have p̂1 and p̂2 as before, but also a pooled estimate com5+6
puted under the assumption that H0 is correct: p̂ = 78+31
=
.1009.
The pooled standard error is defined to be
v
u
u
SE = tp̂(1 − p̂)
s
=
1
1
+
n1
n2
!
1
1
.1009 × .8991 ×
+
78
31
= 0.0639.
Note that this is similar to, but not the same as, the SE used in
the confidence interval calculation.
5
Calculate the z score:
p̂2 − p̂1
SE
.1935 − .0641
= 2.03.
=
.0639
z =
The one-sided probability associated with this (Table A) is .0212.
Therefore, the two-sided P-value is .0424.
Just significant at the .05 level.
6
A caveat: This again uses the normal distribution in a situation
where it is not strictly justified. It is possible to make the calculation without using a normal approximation (the method is
called Fisher’s exact test), but we shall not do that.
7
General method
Sample proportions p̂1, p̂2 based on sample sizes n1, n2.
Standard error for a confidence interval is
s
SE =
p̂ (1 − p̂2)
p̂1(1 − p̂1)
+ 2
.
n1
n2
Confidence interval for p2 − p1 is p̂2 − p̂1 ± z × SE where z is the
z statistic appropriate to the confidence level, e.g. z = 1.96 for
a 95% CI or z = 2.58 for a 99% CI.
8
+n2 p̂2
For a hypothesis test, compute pooled proportion p̂ = n1np̂1+n
.
1
2
Standard error and z score
v
u
u
SE = tp̂(1 − p̂)
z =
!
1
1
+
,
n1
n2
p̂2 − p̂1
.
SE
Compute one-sided or two-sided P-value associated with z.
We usually interpret this as rejecting the null hypothesis if P<
.05, otherwise do not reject H0.
9
Example: Problem 9.3, page 438.
In two surveys of drinking habits, students were asked whether
“to get drunk” is an important reason for drinking. In 1993,
39.9% of 12708 students said yes; in 2001, 48.2% of 8783 students said yes.
(a) Estimate the difference between the proportions in 1993 and
2001, and interpret.
(b) Find the standard error of the difference, and interpret.
(c) Construct and interpret a 95% confidence interval for the
true change, explaining how your interpretation reflects whether
the interval includes 0. Why is the confidence interval so narrow?
(d) State and check the assumptions for the interval in (c) to be
valid.
10
(a) 0.482 − 0.399 = .083; proportion has increased
(b) SE =
q
.399×.601 + .482×.518 = .00688.
12708
8783
(c) .083 ± 1.96 × .00688 = (.070, .096). Does not include 0;
therefore change is statistically significant. The interval is
narrow, mainly because of the large sample sizes.
(d) The main assumptions are independence of the two samples,
and random sampling; these would appear to be satisfied.
Also, the np > 15, n(1 − p) > 15 should be satisfied for each
sample; obviously true here.
11