Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Contingency Tables 1. Explain 2 Test of Independence 2. Measure of Association Contingency Tables • Tables representing all combinations of levels of explanatory and response variables • Numbers in table represent Counts of the number of cases in each cell • Row and column totals are called Marginal counts 2x2 Tables • Each variable has 2 levels – Explanatory Variable – Groups (Typically based on demographics, exposure) – Response Variable – Outcome (Typically presence or absence of a characteristic) 2x2 Tables - Notation Outcome Present Outcome Absent Group Total Group 1 n11 n12 n1. Group 2 n21 n22 n2. Outcome Total n.1 n.2 n.. 2 Test of Independence 2 Test of Independence • 1. Shows If a Relationship Exists Between 2 Qualitative Variables – One Sample Is Drawn – Does Not Show Causality • 2. Assumptions – Multinomial Experiment – All Expected Counts 5 • 3. Uses Two-Way Contingency Table 2 Test of Independence Contingency Table • 1. Shows # Observations From 1 Sample Jointly in 2 Qualitative Variables 2 Test of Independence Contingency Table • 1. Shows # Observations From 1 Sample Jointly in 2 Qualitative Variables Levels of variable 2 House Style Split-Level Ranch Total House Location Urban Rural 63 49 15 33 78 82 Levels of variable 1 Total 112 48 160 2 Test of Independence Hypotheses & Statistic • 1. Hypotheses – H0: Variables Are Independent – Ha: Variables Are Related (Dependent) 2 Test of Independence Hypotheses & Statistic • 1. Hypotheses – H0: Variables Are Independent – Ha: Variables Are Related (Dependent) • 2. Test Statistic 2 all cells Observed count ch ch nij E nij E n ij 2 Expected count 2 Test of Independence Hypotheses & Statistic • 1. Hypotheses – H0: Variables Are Independent – Ha: Variables Are Related (Dependent) • 2. Test Statistic 2 all cells Observed count ch ch nij E nij E n ij 2 Expected count Rows Columns • Degrees of Freedom: (r - 1)(c - 1) 2 Test of Independence Expected Counts • 1. Statistical Independence Means Joint Probability Equals Product of Marginal Probabilities • 2. Compute Marginal Probabilities & Multiply for Joint Probability • 3. Expected Count Is Sample Size Times Joint Probability Expected Count Example Expected Count Example Location Urban Rural House Style Obs. Obs. Total Split-Level 63 49 112 Ranch 15 33 48 78 82 160 Total Expected Count Example Marginal probability = 112 160 Location Urban Rural House Style Obs. Obs. Total Split-Level 63 49 112 Ranch 15 33 48 78 82 160 Total Expected Count Example Marginal probability = 112 160 Location Urban Rural House Style Obs. Obs. Total Split-Level 63 49 112 Ranch 15 33 48 78 82 160 Total Marginal probability = 78 160 Expected Count Example Joint probability = 112 78 160 160 Marginal probability = 112 160 Location Urban Rural House Style Obs. Obs. Total Split-Level 63 49 112 Ranch 15 33 48 78 82 160 Total Marginal probability = 78 160 Expected Count Example Joint probability = 112 78 160 160 Marginal probability = 112 160 Location Urban Rural House Style Obs. Obs. Total Split-Level 63 49 112 Ranch 15 33 48 78 82 160 Total Marginal probability = 78 160 112 78 Expected count = 160· 160 160 = 54.6 Expected Count Calculation Expected Count Calculation Expected count = aRow totalf aColumn totalf Sample size Expected Count Calculation Expected count = aRow totalf aColumn totalf Sample size House Location 112·82 160 Urban Rural House Style Obs. Exp. Obs. Exp. Total 112·78 160 Split-Level 63 54.6 49 57.4 112 Ranch 15 23.4 33 24.6 48 78 78 82 82 Total 48·78 160 160 48·82 160 2 Test of Independence Example • You’re a marketing research analyst. You ask a random sample of 286 consumers if they purchase Diet Pepsi or Diet Coke. At the .05 level, is there evidence of a relationship? Diet Coke No Yes Total Diet Pepsi No Yes 84 32 48 122 132 154 Total 116 170 286 2 Test of Independence Solution 2 Test of Independence Solution • H0: • Ha: = • df = • Critical Value(s): Test Statistic: Decision: Reject Conclusion: 0 2 2 Test of Independence Solution • H0: No Relationship Test Statistic: • Ha: Relationship = • df = • Critical Value(s): Decision: Reject Conclusion: 0 2 2 Test of Independence Solution • H0: No Relationship Test Statistic: • Ha: Relationship = .05 • df = (2 - 1)(2 - 1) • =1 Critical Value(s): Decision: Reject Conclusion: 0 2 2 Test of Independence Solution • H0: No Relationship Test Statistic: • Ha: Relationship = .05 • df = (2 - 1)(2 - 1) • =1 Critical Value(s): Decision: Reject = .05 0 3.841 2 Conclusion: 2 Test of Independence Solution E(nij) 5 in all cells 116·132 286 Diet Pepsi 154·116 286 No Yes Diet Coke Obs. Exp. Obs. Exp. Total No 84 53.5 32 62.5 116 Yes 48 78.5 122 91.5 170 132 132 154 154 286 Total 170·132 286 170·154 286 2 Test of Independence Solution 2 all cells ch ch nij E nij E n ij af af n11 E n11 E n 11 84 53.5 53.5 2 2 2 af af n12 E n12 E n 32 62.5 62.5 12 2 2 af af n22 E n22 E n 122 91.5 91.5 2 22 2 54.29 2 Test of Independence Solution • H0: No Relationship Test Statistic: • Ha: Relationship 2 = 54.29 = .05 • df = (2 - 1)(2 - 1) • =1 Critical Value(s): Decision: Reject = .05 0 3.841 2 Conclusion: 2 Test of Independence Solution • H0: No Relationship Test Statistic: • Ha: Relationship 2 = 54.29 = .05 • df = (2 - 1)(2 - 1) • =1 Critical Value(s): Reject = .05 0 3.841 2 Decision: Reject at = .05 Conclusion: 2 Test of Independence Solution • H0: No Relationship Test Statistic: • Ha: Relationship 2 = 54.29 = .05 • df = (2 - 1)(2 - 1) • =1 Critical Value(s): Reject = .05 0 3.841 2 Decision: Reject at = .05 Conclusion: There is evidence of a relationship Siskel and Ebert • | Ebert • Siskel | Con Mix Pro | Total • -----------+---------------------------------+---------• Con | 24 8 13 | 45 • Mix | 8 13 11 | 32 • Pro | 10 9 64 | 83 • -----------+---------------------------------+---------• Total | 42 30 88 | 160 Siskel and Ebert • | Ebert • Siskel | Con Mix Pro | Total •-----------+---------------------------------+---------• Con | 24 8 13 | 45 • | 11.8 8.4 24.8 | 45.0 •-----------+---------------------------------+---------• Mix | 8 13 11 | 32 • | 8.4 6.0 17.6 | 32.0 •-----------+---------------------------------+---------• Pro | 10 9 64 | 83 • | 21.8 15.6 45.6 | 83.0 •-----------+---------------------------------+---------• Total | 42 30 88 | 160 • | 42.0 30.0 88.0 | 160.0 • Pearson chi2(4) = 45.3569 p < 0.001 Yate’s Statistics • Method of testing for association for 2x2 tables when sample size is moderate ( total observation between 6 – 25) O ij 2 i eij 0.5 j eij 2 Measures of association – Relative End Risk – Odds Ratio – Absolute Risk of Chapter Any blank slides that follow are blank intentionally. Relative Risk • Ratio of the probability that the outcome • characteristic is present for one group, relative to the other Sample proportions with characteristic from groups 1 and 2: n11 1 n1. ^ n21 2 n2. ^ Relative Risk • Estimated Relative Risk: ^ RR 1 ^ 2 95% Confidence Interval for Population Relative Risk: ( RR (e 1.96 v ) , RR (e1.96 ^ e 2.71828 v v )) ^ (1 1 ) (1 n11 n21 2 ) Relative Risk • Interpretation – Conclude that the probability that the outcome is present is higher (in the population) for group 1 if the entire interval is above 1 – Conclude that the probability that the outcome is present is lower (in the population) for group 1 if the entire interval is below 1 – Do not conclude that the probability of the outcome differs for the two groups if the interval contains 1 Example - Coccidioidomycosis and TNF-antagonists • Research Question: Risk of developing Coccidioidmycosis associated with arthritis therapy? • Groups: Patients receiving tumor necrosis factor (TNF) versus Patients not receiving TNF (all patients arthritic) TNF Other Total Source: Bergstrom, et al (2004) COC 7 4 11 No COC 240 734 974 Total 247 738 985 Example - Coccidioidomycosis and TNF-antagonists • Group 1: Patients on TNF • Group 2: Patients not on TNF ^ 7 4 1 .0283 2 .0054 247 738 ^ ^ 1 .0283 RR ^ 5.24 2 .0054 95%CI : (5.24e 1.96 .3874 1 .0283 1 .0054 v .3874 7 4 , 5.24e1.96 .3874 ) (1.55 , 17.76) Entire CI above 1 Conclude higher risk if on TNF Odds Ratio • Odds of an event is the probability it occurs • • divided by the probability it does not occur Odds ratio is the odds of the event for group 1 divided by the odds of the event for group 2 Sample odds of the outcome for each group: n11 / n1. n11 odds1 n12 / n1. n12 odds2 n21 n22 Odds Ratio • Estimated Odds Ratio: odds1 n11 / n12 n11n22 OR odds2 n21 / n22 n12n21 95% Confidence Interval for Population Odds Ratio ( OR (e 1.96 v ) , OR (e1.96 v ) ) 1 1 1 1 e 2.71828 v n11 n12 n21 n22 Odds Ratio • Interpretation – Conclude that the probability that the outcome is present is higher (in the population) for group 1 if the entire interval is above 1 – Conclude that the probability that the outcome is present is lower (in the population) for group 1 if the entire interval is below 1 – Do not conclude that the probability of the outcome differs for the two groups if the interval contains 1 Example - NSAIDs and GBM • Case-Control Study (Retrospective) – Cases: 137 Self-Reporting Patients with Glioblastoma Multiforme (GBM) – Controls: 401 Population-Based Individuals matched to cases wrt demographic factors GBM Present GBM Absent NSAID User 32 138 NSAID Non-User 105 263 Total 137 401 Source: Sivak-Sears, et al (2004) Total 170 368 538 Example - NSAIDs and GBM 32(263) 8416 0.58 138(105) 14490 1 1 1 1 v 0.0518 32 138 105 263 OR 95% CI : ( 0.58e 1.96 0.0518 , 0.58e1.96 0.0518 ) (0.37 , 0.91) Interval is entirely below 1, NSAID use appears to be lower among cases than controls Absolute Risk • Difference Between Proportions of outcomes with an outcome characteristic for 2 groups • Sample proportions with characteristic from groups 1 and 2: n11 1 n1. ^ n21 2 n2. ^ Absolute Risk Estimated Absolute Risk: ^ ^ AR 1 2 95% Confidence Interval for Population Absolute Risk ^ ^ ^ 1 1 1 2 1 2 AR 1.96 n1. n2. ^ Absolute Risk • Interpretation – Conclude that the probability that the outcome is present is higher (in the population) for group 1 if the entire interval is positive – Conclude that the probability that the outcome is present is lower (in the population) for group 1 if the entire interval is negative – Do not conclude that the probability of the outcome differs for the two groups if the interval contains 0 Example - Coccidioidomycosis and TNF-antagonists • Group 1: Patients on TNF • Group 2: Patients not on TNF ^ 7 4 1 .0283 2 .0054 247 738 ^ ^ ^ AR 1 2 .0283 .0054 .0229 .0283(.9717) .0054(.9946) 247 738 .0229 .0213 (0.0016 , 0.0242) 95%CI : .0229 1.96 Interval is entirely positive, TNF is associated with higher risk Ordinal Explanatory and Response Variables • Pearson’s Chi-square test can be used to test • associations among ordinal variables, but more powerful methods exist When theories exist that the association is directional (positive or negative), measures exist to describe and test for these specific alternatives from independence: – Gamma – Kendall’s tb Concordant and Discordant Pairs • Concordant Pairs - Pairs of individuals where one • • individual scores “higher” on both ordered variables than the other individual Discordant Pairs - Pairs of individuals where one individual scores “higher” on one ordered variable and the other individual scores “lower” on the other C = # Concordant Pairs D = # Discordant Pairs – Under Positive association, expect C > D – Under Negative association, expect C < D – Under No association, expect C D Example - Alcohol Use and Sick Days • Alcohol Risk (Without Risk, Hardly any Risk, • • • Some to Considerable Risk) Sick Days (0, 1-6, 7) Concordant Pairs - Pairs of respondents where one scores higher on both alcohol risk and sick days than the other Discordant Pairs - Pairs of respondents where one scores higher on alcohol risk and the other scores higher on sick days Source: Hermansson, et al (2003) Example - Alcohol Use and Sick Days A Y C D d o d d a t A W 7 3 5 5 H 4 3 6 3 S 2 5 4 1 T 3 1 5 9 • Concordant Pairs: Each individual in a given cell is concordant with each individual in cells “Southeast” of theirs •Discordant Pairs: Each individual in a given cell is discordant with each individual in cells “Southwest” of theirs Example - Alcohol Use and Sick Days A Y C D d o d d a t A W 7 3 5 5 H 4 3 6 3 S 2 5 4 1 T 3 1 5 9 C 347(63 56 25 34) 113(56 34) 154(25 34) 63(34) 83164 D 145(154 63 52 25) 113(154 52) 56(52 25) 63(52) 73496 Measures of Association • Goodman and Kruskal’s Gamma: CD CD ^ ^ 1 1 • Kendall’s tb: CD ^ tb (n ni. )( n 2 n. j ) 2 2 2 When there’s no association between the ordinal variables, the population based values of these measures are 0. Statistical software packages provide these tests. Example - Alcohol Use and Sick Days C D 83164 73496 0.0617 C D 83164 73496 ^ c y m a b o r l E x o u O K 5 0 7 5 O G 2 2 7 5 N 9 a N b U