Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Comparing Two Proportions Example: caries incidence Clinical trial with caries intervention on infants developed caries N by age two controls 36 27.8% intervention 68 8.8% Is this strong evidence of effectiveness of experimental intervention? Comparison of two proportions - two independent samples These are called “two-sample” tests. Our goal is usually to estimate p1 – p2, the corresponding confidence intervals, and to perform hypothesis tests on: H0: p1 – p2 = 0. The obvious statistic to compare the two population proportions is p̂1 - p̂2 . Where p̂i = number of successes in group i divided by sample size in group i. Probability theory tells us that: 1. p̂1 - p̂2 is the best estimate of p1 – p2 2. the standard error is p1 (1 p1 ) n1 p2 (1 p2 ) n2 3. If n1p1(1-p1) > 5 and n2p2(1-p2) > 5 pˆ1 pˆ 2 ~ N p1 p2 , p1 (1 p1 ) n1 p2 (1 p2 ) n2 Large-sample confidence interval for p1 – p2 ˆ1 p ˆ 2 Z1 / 2 p ˆ 1 (1 p ˆ 1 ) n1 p ˆ 2 (1 p ˆ 2 ) n2 p Large-sample Z-test of H0: p1 – p2 = 0 vs. H1: p1 – p2 ≠ 0 pˆ1 pˆ 2 Z Test statistic: SEH 0 ( pˆ1 pˆ 2 ) Where SE H ( pˆ 1 pˆ 2 ) denotes the standard error estimates using the null hypothesis, p1 = p2. 0 Estimate the common p using pˆ x1 x2 n1 n2 , where x1 and x2 are the number of successes in groups 1 and 2, respectively. Then SEH 0 ( pˆ1 pˆ 2 ) pˆ (1 pˆ )1 n1 1 n2 Compare Z to a standard Normal distribution. Example: Caries incidence caries by age two controls N 36 intervention 68 Number percent 10 27.8 6 8.8 95% confidence interval: p̂1 - p̂2 = .278 - .088 = 0.19 SE .278 (1 .278) 36 .088(1 .088) 68 .082 So 95% confidence interval is 0.19 1.96 0.082 0.029, 0.351 Test: H0: p1 – p2 = 0 vs. H1: p1 – p2 ≠ 0 pˆ 10 6 36 68 .154 SEH 0 ( pˆ1 pˆ 2 ) .154(1 .154)1 36 1 68 .074 Z 0.19 2.57 .074 Reject at α=.05 level. P-value = 2×P(Z > 2.57) = .0102 Chi-squared Test ( χ2 test) Chi-square test generalizes two-sample Z-test to situation with more than two proportions. Example: perio by gender (NHANES I data): Evaluate whether periodontitis is independent of gender by seeing if the proportion of males in each group defined by periodontal status is the same. χ2 test utilizes “contingency” tables Observed Data Co unt per iodon tal status GE NDE R T o tal healthy gin givitis per io T o tal male 11 43 92 9 93 7 30 09 fem ale 26 07 14 90 92 1 50 18 37 50 24 19 18 58 80 27 The null hypothesis is that all proportions are equal H0: p1 = p2 = p3. Expected frequencies (under assumption of equal proportions) male healthy periodontal status gingivitis perio Total 3750 × (3009/8027) = 1405.7 2419 × (3009/8027) = 906.8 1858 × (3009/8027) = 696.5 3,009 3750 × (5018/8027) = 2344.3 2419 × (5018/8027) = 1512.2 1858 × (5018/8027) = 1161.5 5,018 3,750 2,419 1,858 8,027 female Total Chi-squared statistic: X2 = Σ (observed - expected)2 expected (1143 1405.7) 2 (929 906.8) 2 (937 696.5) 2 1405.7 906.8 696.5 (2607 2344.3) 2 (1490 1512.2) 2 (921 1161.5) 2 2344.3 1512.2 1161.5 = 212.3 Large (positive) values of X2 indicate evidence against the null hypothesis. If H0 is true, then a χ2 statistic from a contingency table with R rows and C columns should have a Chi-square distribution with (R-1) × (C-1) degrees of freedom. The P-value is the probability that a χ2(R-1) × (C-1) distribution is greater than the observed statistic. Note that all the probability in the p-value (and rejection region) is on one side, since only large values of X2 would contradict H0. Our statistic, 212.3, was larger than 15.20, the 99.95th percentile of a χ22 dist’n, so p < 0.0005. Table 6 in the coursepack has χ2 percentiles. SPSS output for Chi-square test GENDER * periodontal status Crosstabulation per iodon tal status GENDER male Co unt Ex pected Count fem ale Co unt Ex pected Count To tal Co unt Ex pected Count healthy gin givitis per io To tal 11 43 92 9 93 7 30 09 14 05.7 90 6.8 69 6.5 30 09.0 26 07 14 90 92 1 50 18 23 44.3 15 12.2 11 61.5 50 18.0 37 50 24 19 18 58 80 27 37 50.0 24 19.0 18 58.0 80 27.0 Chi-Square Tests Value Asymp. Sig. (2- sided) df Pearson Chi- Squar e 21 2.271 a 2 .00 0 Lik eliho od Ratio 21 0.264 2 .00 0 Lin ear-by-Linear Associatio n 20 9.324 1 .00 0 N o f Valid Cases 80 27 a. 0 cells (. 0%) have expected count less than 5. The mini mum expected count is 696.49. Notes on Chi-squared test: 1. Chi-square test p-values rely on Normal approximations, so they not valid for small samples (any expected frequencies < 5). 2. The rejection region for a Chi-square test with significance level α is the region above the 100(1- α)th percentile of the Chi-square distribution (i.e. not α/2). 3. The null hypothesis for the Chi-square test can be equivalently formulated as “X1 is independent of X2”, where X1 and X2 are the two categorical variables being compared (gender and perio status in our example). 4. When comparing two proportions the Chisquare test is equivalent to Z-test.