Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inference for two-way tables General R x C tables • Tests of homogeneity of a factor across groups or independence of two factors rely on Pearson’s X2 statistic. • X2 is compared to a c2((r-1)x(c-1)) distribution • Expected cell counts should be larger than 5. 2 x 2 tables • Cohort (prospective) data (H0: relative risk for incidence = 1) • Case-control (retrospective) data (H0: odds ratio = 1) • Cross-sectional data (H0: relative risk for prevalence = 1) • Paired binary data – McNemar’s test (H0: odds ratio = 1) • For rare disease OR RR • Fisher’s exact test Fall 2002 Biostat 511 299 Categorical Data Types of Categorical Data •Nominal •Ordinal Often we wish to assess whether two factors are related. To do so we construct an R x C table that cross-classifies the observations according to the two factors. Such a table is called a contingency table. We can test whether the factors are “related” using a c2 test. We will consider the special case of 2 x 2 tables in detail. Fall 2002 Biostat 511 300 Categorical Data Contingency tables arise from two different, but related, situations: 1) We sample members of 2 (or more) groups (e.g. lung cancer vs control) and classify each member according to some qualitative characteristic (e.g. cigarette smoking). Cancer Control None p11 p21 Number cigarettes/day <5 5-14 15-24 25-49 p12 … p22 … 50+ The hypothesis is H0: groups are homogeneous (p1j=p2j for all j) HA: groups are not homogeneous Fall 2002 Biostat 511 301 Categorical Data Contingency tables arise from two different, but related, situations: 2) We sample members of a population and cross-classify each member according to two qualitative characteristics (e.g. willingness to participate in vaccine study vs education level). definitely not < HS p11 high school p21 > HS : p.1 probably probably definitely not p12 p13 p14 … p1. The hypothesis is H0: factors are independent (pij=pi.p.j ) HA: factors are not independent Fall 2002 Biostat 511 302 Categorical Data Example 1. Education versus willingness to participate in a study of a vaccine to prevent HIV infection if the study was to start tomorrow. Counts, row percents and row totals are given. definitely probably probably definitely Total not not < high 52 79 342 226 699 school 7.4% 11.3% 48.9% 32.3% high school 62 153 417 262 894 6.9% 17.1% 46.6% 29.3% some 53 213 629 375 1270 college 4.2% 16.8% 49.5% 29.5% college 54 231 571 244 1100 4.9% 21.0% 51.9% 22.2% some post 18 46 139 74 277 college 6.5% 16.6% 50.2% 26.7% graduate/ 25 139 330 116 610 prof 4.1% 22.8% 54.1% 19.0% Total 264 861 2428 1297 4850 5.4% 17.8% 50.1% 26.7% Fall 2002 Biostat 511 303 Categorical Data Example 2. From the 1984 General Social Survey Very Income dissatisfied < 6000 6000-15000 15000-25000 >25000 Fall 2002 20 22 13 7 Job Satisfaction Somewhat Moderately Very dissatisfie satisfied satisfied d 24 80 82 38 104 125 28 81 113 18 54 92 Biostat 511 304 Categorical Data Example 3: From Doll and Hill (1952) retrospective assessment of smoking frequency. The table displays the daily average number of cigarettes for lung cancer patients and control patients. Cancer Control Total Fall 2002 None 7 0.5% 61 4.5% 68 <5 55 4.1% 129 9.5% 184 Daily # cigarettes 5-14 15-24 25-49 50+ 489 475 293 38 36.0% 35.0% 21.6% 2.8% 570 431 154 12 42.0% 31.8% 11.3% 0.9% 1059 906 447 50 Biostat 511 Total 1357 1357 2714 305 Test of Homogeneity In example 3 we want to test whether the smoking frequency is the same for each of the populations sampled. We want to test whether the groups are homogeneous with respect to a characteristic. The concept is similar to a t-test, but the response is categorical. H0: smoking frequency same in both groups HA: smoking frequency not the same Q: What does H0 predict we would observe if all we knew were the marginal totals? None Cancer 50+ Total 1357 Control 1357 Total Fall 2002 68 <5 Daily # cigarettes 5-14 15-24 25-49 184 1059 Biostat 511 906 447 50 2714 306 Test of Homogeneity A: H0 predicts the following expectations: Daily # cigarettes 5-14 15-24 25-49 50+ Total 529.5 453 223.5 25 1357 Cancer None 34 <5 92 Control 34 92 529.5 453 223.5 25 1357 Total 68 184 1059 906 447 50 2714 Each group has the same proportion in each cell as the overall marginal proportion. The “equal” expected number for each group is the result of the equal sample size in each group (what would change if there were half as many cases as controls?) Fall 2002 Biostat 511 307 Test of Homogeneity Recall, we often use the Poisson distribution to model counts. Suppose the observed counts in each cell, Oij, are Poisson random variables with means mij. Then Oij mij Z mij would be approximately normal. It turns out that Z2 has a known distribution … it follows a “chi-squared (c2) distribution with 1 degree of freedom” (MM table F). Further, the sum of squared independent standard normal random variables follows a chi-square distribution with n degrees of freedom. Let Zi be standard normals, N(0,1) and let X Z12 Z 22 Z n2 n Zi2 i 1 X has a c2(n) distribution Fall 2002 Biostat 511 308 Test of Homogeneity Therefore, 2 O m ij ij Z2 ~ c 2 (1) m ij We don’t know the mij, but, under H0, we can estimate them based on the margins. We call these the expected counts, Eij. Summing the differences between the observed and expected counts provides an overall assessment of H0. X2 i, j Oij Eij 2 ~ c 2 (r 1) (c 1) Eij X2 is known as the Pearson’s Chi-square Statistic. Fall 2002 Biostat 511 309 Test of Homogeneity In example 3 the contributions to the X2 statistic are: Cancer None <5 7 34 55 92 2 34 Daily # cigarettes 5-14 15-24 25-49 etc. 2 50+ Total 92 2 Control 61 34 34 Total Cancer None < 5 21.44 14.88 Control 21.44 14.88 Daily # cigarettes 5-14 15-24 25-49 50+ Total 3.10 1.07 21.61 6.76 3.10 1.07 21.61 6.76 Total X2 i, j Oij Eij Eij 2 137.7 Looking in MM table F, we find that Qc.952 (5)= 11.07. Conclusion? Fall 2002 Biostat 511 310 Test of Independence The Chi-squared Test of Independence is mechanically the same as the test for homogeneity. The only difference is that the R x C table is formed based on the levels of 2 factors that are cross-classified. Therefore, the null and alternative hypotheses are different: H0: The two factors are independent HA: The two factors are not independent Independence implies that each row has the same relative frequencies (or each column has the same relative frequency). Example 1 is a situation where individuals are classified according to two factors. In this example, the assumption of independence implies that willingness to participate doesn’t depend on the level of education. Fall 2002 Biostat 511 311 definitely probably probably definitely Total not not < high 52 79 342 226 699 school 7.4% 11.3% 48.9% 32.3% high school 62 153 417 262 894 6.9% 17.1% 46.6% 29.3% some 53 213 629 375 1270 college 4.2% 16.8% 49.5% 29.5% college 54 231 571 244 1100 4.9% 21.0% 51.9% 22.2% some post 18 46 139 74 277 college 6.5% 16.6% 50.2% 26.7% graduate/ 25 139 330 116 610 prof 4.1% 22.8% 54.1% 19.0% Total 264 861 2428 1297 4850 5.4% 17.8% 50.1% 26.7% Q: Based on the observed row proportions, how does the independence hypothesis look? Q: How would the expected cell frequencies be calculated? Q: How many degrees of freedom would the chi-square have? Fall 2002 Biostat 511 312 The expected counts under independence are ... < high school high school some college college some post college graduate/ prof Total definitely probably probably definitely Total not not 38.1 124.1 349.9 186.9 699 48.7 69.1 158.7 225.5 447.6 635.8 239.1 339.6 894 1270 59.9 15.1 195.3 49.2 550.7 138.7 294.2 74.1 1100 277 33.2 108.3 305.4 163.1 610 264 5.4% 861 17.8% 2428 50.1% 1297 26.7% 4850 X2 = 89.7 15 df p < .0001 Fall 2002 Biostat 511 313 Summary c2 Tests for R x C Tables 1. Tests of homogeneity of a factor across groups or independence of two factors rely on Pearson’s X2 statistic. 2. X2 is compared to a c2((r-1)x(c-1)) distribution (MM, table F or display chiprob(df,X2)). 3. Expected cell counts should be larger than 5. 4. We have considered a global test without using possible factor ordering. Ordered factors permit a test for trend (see Agresti, 1990). Fall 2002 Biostat 511 314 2 x 2 Tables Example 1: Pauling (1971) Patients are randomized to either receive Vitamin C or placebo. Patients are followedup to ascertain the development of a cold. Vitamin C Cold - Y Cold - N 17 122 Total 139 Placebo 31 109 140 Total 48 231 279 Q: Is treatment with Vitamin C associated with a reduced probability of getting a cold? Q: If Vitamin C is associated with reducing colds, then what is the magnitude of the effect? Fall 2002 Biostat 511 315 2 x 2 Tables Example 2: Keller (AJPH, 1965) Patients with (cases) and without (controls) oral cancer were surveyed regarding their smoking frequency (this table collapses over the smoking frequency categories). Case 484 Control 385 Total 869 NonSmoker 27 90 117 Total 511 475 986 Smoker Q: Is oral cancer associated with smoking? Q: If smoking is associated with oral cancer, then what is the magnitude of the risk? Fall 2002 Biostat 511 316 2 x 2 Tables Example 3: Norusis (1988) In 1984, a random sample of US adults were cross-classified based on their income and reported job satisfaction: Dissatisfied Satisfied Total < $15,000 104 391 495 $15,000 66 340 406 Total 170 731 901 Q: Is salary associated with job satisfaction? Q: If salary is associated with satisfaction, then what is the magnitude of the effect? Fall 2002 Biostat 511 317 2 x 2 Tables Example 4: HIVNET (1995) Subjects were surveyed regarding their knowledge of vaccine trial concepts both at baseline and at month 3 after an informed consent process. The following table shows the subjects cross-classified according to the two responses. Month 3 Incorrect Correct Incorrect Baseline Correct Total Total 251 178 429 68 319 98 276 166 595 Q: Did the informed consent process improve knowledge? Q: If informed consent improved knowledge then what is the magnitude of the effect? Fall 2002 Biostat 511 318 2 x 2 Tables Each of these tables can be represented as follows: E not E Total D not D Total a b (a + b) = n1 c d (c + d) = n2 (a + c) = m1 (b + d) = m2 N The question of association can be addressed with Pearson’s X2 (except for example 4) We compute the expected cell counts as follows: Expected: E not E Total Fall 2002 D not D Total n1m1/N n1m2/N (a + b) = n1 n2m1/N n2m2/N (c + d) = n2 (a + c) = m1 (b + d) = m2 N Biostat 511 319 2 x 2 Tables Pearson’s chi-square is given by: X Oi Ei / Ei 4 2 2 i 1 2 2 2 nm n m / 2 1 d 2 2 N N n m n m n m n m a 1 1 / 1 1 b 1 2 / 1 2 N N N N n2 m1 c N 2 n m / 2 2 N N ad bc n1n2 m1m2 2 Q: How does this X2 test compare in Example 1 to simply using the 2 sample binomial test of H 0 : P( D | E ) P( D | E ) ? Fall 2002 Biostat 511 320 2 x 2 Tables Example 1: Pauling (1971) Vitamin C Cold - Y Cold - N 17 122 Total 139 Placebo 31 109 140 Total 48 231 279 H0 : probability of disease does not depend on treatment HA : probability of disease does depend on treatment N ad bc X n1n2 m1m2 2 2 27917 109 31 122 139 140 48 231 4.81 For the p-value we compute P(c2(1) > 4.81) = 0.028. Therefore, we reject the independence of treatment and disease. 2 Fall 2002 Biostat 511 321 Vitamin C Cold - Y Cold - N 17 122 Total 139 Placebo 31 109 140 Total 48 231 279 Two sample test of binomial proportions: p1 = P(cold | Vitamin C) p2 = P(cold | placebo) H0 : p 1 = p 2 HA : p 1 p2 Z pˆ 1 pˆ 2 pˆ 0 1 pˆ 0 1 / n1 1 / n2 (17 / 139 31/ 140) 48 231 1 / 139 1 / 140 279 279 2.193 For the 2-sided p-value we compute 2 P(| Z | > 2.193) = 0.028. Therefore, we reject H0 with the exact same result as the c2 test. (Z2 = X2) Fall 2002 Biostat 511 322 2 x 2 Tables Applications In Epidemiology Example 1 fixed the number of E and not E, then evaluated the disease status after a fixed period of time. This is a prospective study. Given this design we can estimate the relative risk: RR P D | E P D | E The range of RR is [0, ). By taking the logarithm, we have (- , +) as the range for ln(RR) and a better approximation to normality for the estimated lnRˆ R : Pˆ D | E ˆ lnRR ln ˆ PD | E a / n1 ln c / n2 1 p1 1 p2 lnRˆ R ~ N ln p1 / p2 , p1n1 p2 n2 Fall 2002 Biostat 511 323 Vitamin C Cold - Y Cold - N 17 122 Total 139 Placebo 31 109 140 Total 48 231 279 The estimated relative risk is: Pˆ D | E ˆ RR Pˆ D | E 17 / 139 31/ 140 0.55 We can obtain a confidence interval for the relative risk by first obtaining a confidence interval for the logRR: 1 1 p1 1 p2 lnRˆ R QZ 2 p1n1 p2 n2 For Example 1, a 95% confidence interval for the log relative risk is given by: 1 pˆ1 1 pˆ 2 lnRˆ R 1.96 pˆ1n1 pˆ 2 n2 ln0.55 1.96 Fall 2002 122 109 17139 31140 Biostat 511 324 -0.593 ± 1.96 × 0.277 -0.593 ± 0.543 (-1.116, -0.050) To obtain a 95% confidence interval for the relative risk we exponentiate the end-points of the interval for the log - relative risk. Therefore, ( exp(-1.116), exp(-0.050)) ( .33 , .95 ) is a 95% confidence interval for the relative risk. Fall 2002 Biostat 511 325 2 x 2 Tables Applications In Epidemiology In Example 2 we fixed the number of cases and controls then ascertained exposure status. Such a design is known as case- control study. Based on this we are able to directly estimate: P( E | D) and P( E | D) However, we generally are interested in the relative risk which is not estimable from these data alone - we’ve fixed the number of diseased and diseased free subjects. Instead of the relative risk we can estimate the exposure odds ratio which Cornfield (1951) showed equivalent to the disease odds ratio: P E | D / 1 P E | D P D | E / 1 P D | E PE | D / 1 PE | D PD | E / 1 PD | E Fall 2002 Biostat 511 326 Odds Ratio Furthermore, for rare diseases, P(D | E) 0 so that the disease odds ratio approximates the relative risk: P D | E / 1 P D | E P D | E P D | E / 1 P D | E P D | E Since with case-control data we are able to effectively estimate the exposure odds ratio we are then able to equivalently estimate the disease odds ratio which for rare diseases approximates the relative risk. Fall 2002 Biostat 511 327 2 x 2 Tables Applications in Epidemiology Like the relative risk, the odds ratio has [0, ) as its range. The log odds ratio has (- , +) as its range and the normal approximation is better as an approximation to the estimated log odds ratio. p /q OR 1 1 p2 / q2 pˆ / qˆ Oˆ R 1 1 pˆ 2 / qˆ 2 ad Oˆ R bc Confidence intervals are based upon: 1 1 1 1 ln Oˆ R ~ N ln(OR), n1 p1 n1q1 n 2 p2 n 2 q2 Therefore, a (1 - ) confidence interval for the log odds ratio is given by: 1 1 1 1 1 ad ln QZ 2 a b c d bc Fall 2002 Biostat 511 328 Example 2: Case 484 Control 385 Total 869 NonSmoker 27 90 117 Total 511 475 986 Smoker The estimated odds ratio (odds of cancer for smokers relative to the odds of cancer for nonsmokers) is given by: 484 90 Oˆ R 4.19 27 385 A 95% confidence interval for the log odds ratio is given by: 1 1 1 1 ln(4.19) 1.96 484 385 27 90 1.433 1.96 0.230 1.433 0.450 ( 0.983 , 1.883 ) Fall 2002 Biostat 511 329 To obtain a 95% confidence interval for the odds ratio we simply exponentiate the end-points of the interval for the log odds ratio. Therefore, ( exp(0.983) , exp(1.883) ) or ( 2.672 , 6.573 ) is a 95% confidence interval for the odds ratio. Fall 2002 Biostat 511 330 2 x 2 Tables Applications in Epidemiology Example 3 is an example of a cross-sectional study since only the total for the table is fixed in advance. The row totals or column totals are not fixed in advance. In epidemiological studies, the relative risk or odds ratio may be used to summarize the association when using a X-sectional design. The major distinction from a prospective study is that a crosssectional study will reveal the number of cases currently in the sample. These are known as prevalent cases. In a prospective study we count the number of new cases, or incident cases. Study Cohort Probability Description incidence probability of obtaining the disease Cross-sectional prevalence probability of having the disease Fall 2002 Biostat 511 331 Paired Binary Data Example 4 measured a binary response pre and post treatment. This is an example of paired binary data. One way to display these data is the following: Baseline Month 3 Total Correct 166 276 Incorrect 429 319 Total 595 595 442 748 1190 Q: Can’t we simply use X2 Test of Homogeneity to assess whether this is evidence for an increase in knowledge? A: NO!!! The X2 tests assume that the rows are independent samples. In this design it is the same 595 people at Baseline and at 3 months. Fall 2002 Biostat 511 332 Paired Binary Data For paired binary data we display the results as follows: Time 2 0 1 Time 1 0 n00 n01 1 n10 n11 This analysis explicitly recognizes the heterogeneity of subjects. Thus, those that score (0,0) and (1,1) provide no information about the effectiveness of the treatment since they may be “weak” or “strong” individuals. These are known as the concordant pairs. The information regarding treatment is in the discordant pairs, (0,1) and (1,0). p1 = success probability at Time 1 p2 = success probability at Time 2 H0 : p 1 = p2 HA : p 1 p2 Fall 2002 Biostat 511 333 Paired Binary Data McNemar’s Test Under the null hypothesis, H0 : p1 = p2, we expect equal numbers to change from 0 to 1 and from 1 to 0 (E[n01] = E[n10]). Specifically, under the null: M n01 n10 1 n10 | M ~ Bin M , 2 n10 M 21 Z M 21 1 12 Under H0, Z2 ~ c2(1), and forms the basis for McNemar’s Test for Paired Binary Responses. The odds ratio comparing the odds of success at Time 2 to Time 1 is estimated by: n01 ˆ OR n10 Confidence intervals can be obtained as described in Breslow and Day (1981), section 5.2, or in Armitage and Berry (1987), chapter 16. Fall 2002 Biostat 511 334 Paired Binary Data A common epidemiological design is to match cases and controls regarding certain factors (e.g. age, gender…) then ascertain the exposure history (e.g. smoking) for each member of the pair. The results for all pairs can be summarized by: Control EE+ Case En00 n10 E+ n01 n11 Given this design we can use McNemar’s Test to test the hypotheses Fall 2002 H0 : P ( D | E ) P ( D | E ) (OR = 1) HA : P ( D | E ) P ( D | E ) (OR 1) Biostat 511 335 Example 4: Month 3 Incorrect Correct Incorrect Baseline Correct Total Total 251 178 429 68 319 98 276 166 595 We can test H0: p1 = p2 using McNemar’s Test: n01 M 21 Z M 12 12 178 178 68 / 2 178 68 / 4 7.01 Comparing 7.012 to a c2 (1) we find that p < 0.001. Therefore we reject the null hypothesis of equal success probabilities for Time 1 and Time 2. We estimate the odds ratio as Oˆ R 178 / 68 2.62. Fall 2002 Biostat 511 336 Summary for 2 x 2 Tables •Cohort Analysis (Prospective) 1. H0: P( D | E ) P( D | E ) 2. RR for incident disease 3. c2 test •Case Control Analysis (Retrospective) 1. H0: P( E | D) P( E | D) 2. OR ( RR for rare disease) 3. c2 test •Cross-sectional Analysis 1. H0: P( D | E ) P( D | E ) 2. RR for prevalent disease 3. c2 test •Paired Binary Data 1. H0: P( D | E ) P( D | E ) 2. OR 3. McNemar’s test Fall 2002 Biostat 511 337 Fisher’s Exact Test Motivation: When a 2 2 table contains cells that have fewer than 5 expected observations, the normal approximation to the distribution of the log odds ratio (or other summary statistics) is known to be poor. This can lead to incorrect inference since the p-values based on this approximation are not valid. Solution: Use Fisher’s Exact Test D+ E+ D- Total n1 ETotal Fall 2002 n2 m1 m2 Biostat 511 N 338 Fisher’s Exact Test Example: (Rosner, p. 370) Cardiovascular disease. A retrospective study is done among men aged 50-54 who died over a 1-month period. The investigators tried to include equal numbers of men who died from CVD and those that did not. Then, asking a close relative, the dietary habits were ascertained. non-CVD High Salt Low Salt 2 23 Total 25 CVD 5 30 35 Total 7 53 60 A calculation of the odds ratio yields: 2 30 OR 0.522 5 23 Interpret. Fall 2002 Biostat 511 339 Fisher’s Exact Test D+ E+ D- Total n1 ETotal n2 m1 m2 N If we fix all of the margins then any one cell of the table will allow the remaining cells to be filled. Note that a must be greater than 0, less than both n1 and m1, and an integer. Thus there are only a relatively few number of possible table configurations if either n1or m1 is small (with n1, n2, m1, m2 fixed). Under the null hypothesis, H0 : OR = 1 we can use the hypergeometric distribution (a probability distribution for discrete rv’s) to compute the probability of any given configuration. Since we have the distribution of a statistic (a) under the null, we can use this to compute p-values. Fall 2002 Biostat 511 340 Fisher’s Exact Test Example: (Rosner, p. 370) Cardiovascular disease. High Salt Low Salt 2 23 non-CVD Total 25 CVD 5 30 35 Total 7 53 60 E a | H 0 n1m1 7 25 2.92 N 60 Possible Tables: 0 25 35 7 53 60 .017 4 25 35 7 53 60 .214 Fall 2002 1 25 35 7 53 60 .105 5 25 35 7 53 60 .082 2 25 35 7 53 60 .252 6 25 35 7 53 60 .016 Biostat 511 3 25 35 7 53 60 .312 7 25 35 7 53 60 .001 341 Fisher’s Exact Test Using the hypergeometric distribution we can compute the exact probability of each of these tables (under H0: p1 = p2) (Rosner pg. 370) To compute a p-value we then use the usual approach of summing the probability of all events (tables) as extreme or more extreme than the observed data. •For a one tailed test of p1 < p2 (p1 > p2) we sum the probabilities of all tables with a less than or equal to (greater than or equal to) the observed a. •For a two-tailed test of p1 = p2 we compute the two one-tailed p-values and double the smaller of the two. You will never do this by hand …. Fall 2002 Biostat 511 342 Categorical data -summary 2x2? Yes No Samples independent? Yes Expected > 5? Yes 2 sample Z test for proportions or c2 test 2xk? No Yes McNemar’s test Test for trend in proportions? No Fisher’s exact test Yes c2 test for trend No c2 test for R x C table No Expected > 5? No Yes c2 test Fall 2002 Biostat 511 Exact test 343