Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Comparing Two Groups In this chapter (Chapter 9), we consider problems of estimation or hypothesis testing when we have to compare two groups of observations, instead of compatring one group with a predetermined standard. 1 Comparing Two Proportions Dermatology example: among “light runners”, 5 out of 78 subjects had to be referred for a skin lesion conditions. Among “heavy runners”, the number was 6 out of 31. Is this significant evidence that the two groups are different? 2 Suppose the true proportions of individuals who have the critical skin condition are p1 for the light runners group, and p2 for the heavy runners group. I: Form a confidence interval for p2 − p1. II: Test the hypothesis H0 : p1 = p2 against Ha : p1 6= p2. 3 Confidence interval calculation: 5 = 0.0641, p̂1 = 78 r q p̂1 (1−p̂1 ) 0.0641×0.9359 = .0277. = standard error SE1 = n 78 1 6 = 0.1935, p̂2 = 31 r q p̂2 (1−p̂2 ) 0.1935×0.8065 = .0710. standard error SE2 = = n 31 2 The combined standard error is defined to be SE = q SE12 + SE22 = .0762. Note that p̂2 − p̂1 = 0.129. 95% confidence interval is 0.129 ± 1.96 × .0762 = (0.043, 0.341). 4 Now let’s look at a hypothesis test H0 : p1 = p2. We have p̂1 and p̂2 as before, but also a pooled estimate com5+6 puted under the assumption that H0 is correct: p̂ = 78+31 = .1009. The pooled standard error is defined to be v u u SE = tp̂(1 − p̂) s = 1 1 + n1 n2 ! 1 1 .1009 × .8991 × + 78 31 = 0.0639. Note that this is similar to, but not the same as, the SE used in the confidence interval calculation. 5 Calculate the z score: p̂2 − p̂1 SE .1935 − .0641 = 2.03. = .0639 z = The one-sided probability associated with this (Table A) is .0212. Therefore, the two-sided P-value is .0424. Just significant at the .05 level. 6 A caveat: This again uses the normal distribution in a situation where it is not strictly justified. It is possible to make the calculation without using a normal approximation (the method is called Fisher’s exact test), but we shall not do that. 7 General method Sample proportions p̂1, p̂2 based on sample sizes n1, n2. Standard error for a confidence interval is s SE = p̂ (1 − p̂2) p̂1(1 − p̂1) + 2 . n1 n2 Confidence interval for p2 − p1 is p̂2 − p̂1 ± z × SE where z is the z statistic appropriate to the confidence level, e.g. z = 1.96 for a 95% CI or z = 2.58 for a 99% CI. 8 +n2 p̂2 For a hypothesis test, compute pooled proportion p̂ = n1np̂1+n . 1 2 Standard error and z score v u u SE = tp̂(1 − p̂) z = ! 1 1 + , n1 n2 p̂2 − p̂1 . SE Compute one-sided or two-sided P-value associated with z. We usually interpret this as rejecting the null hypothesis if P< .05, otherwise do not reject H0. 9 Example: Problem 9.3, page 438. In two surveys of drinking habits, students were asked whether “to get drunk” is an important reason for drinking. In 1993, 39.9% of 12708 students said yes; in 2001, 48.2% of 8783 students said yes. (a) Estimate the difference between the proportions in 1993 and 2001, and interpret. (b) Find the standard error of the difference, and interpret. (c) Construct and interpret a 95% confidence interval for the true change, explaining how your interpretation reflects whether the interval includes 0. Why is the confidence interval so narrow? (d) State and check the assumptions for the interval in (c) to be valid. 10 (a) 0.482 − 0.399 = .083; proportion has increased (b) SE = q .399×.601 + .482×.518 = .00688. 12708 8783 (c) .083 ± 1.96 × .00688 = (.070, .096). Does not include 0; therefore change is statistically significant. The interval is narrow, mainly because of the large sample sizes. (d) The main assumptions are independence of the two samples, and random sampling; these would appear to be satisfied. Also, the np > 15, n(1 − p) > 15 should be satisfied for each sample; obviously true here. 11