Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sociology 5811: Lecture 16: Crosstabs 2 Measures of Association Plus Differences in Proportions Copyright © 2005 by Evan Schofer Do not copy or distribute without permission Announcements • Final project proposals due Nov 15 • Get started now!!! • Find a dataset • figure out what hypotheses you might test • Today: Wrap up Crosstabs • If time remains, we’ll discuss project ideas… Review: Chi-square Test • Chi-Square test is a test of independence • Null hypothesis: the two categorical variables are statistically independent • There is no relationship between them • H0: Gender and political party are independent • Alternate hypothesis: the variables are related, not independent of each other • H1: Gender and political party are not independent • Test is based on comparing the observed cell values with the values you’d expect if there were no relationship between variables. Review: Expected Cell Values • If two variables are independent, cell values will depend only on row & column marginals – Marginals reflect frequencies… And, if frequency is high, all cells in that row (or column) should be high • The formula for the expected value in a cell is: ˆf ij ( f i )( f j ) N • fi and fj are the row and column marginals • N is the total sample size Review: Chi-square Test • The Chi-square formula: R C 2 • Where: • • • • i 1 j 1 ( Eij Oij ) 2 Eij R = total number of rows in the table C = total number of columns in the table Eij = the expected frequency in row i, column j Oij = the observed frequency in row i, column j – Assumption for test: Large N (>100) – Critical value DofF: (R-1)(C-1). Chi-square Test of Independence • Example: Gender and Political Views – Let’s pretend that N of 68 is sufficient Women Men Democrat O11: 27 E11: 23.4 O12 : 10 E12 : 13.6 Republican O21 : 16 E21 : 19.6 O22 : 15 E22 : 11.4 Chi-square Test of Independence • Compute (E – O)2 /E for each cell Women Men Democrat (23.4 – 27)2/23.4 = .55 (13.6 – 10)2/13.6 = .95 Republican (19.6 – 16)2/19.6 = .66 (11.4 – 15)2/15 = .86 Chi-Square Test of Independence • Finally, sum up to compute the Chi-square • 2 = .55 + .95 + .66 + .86 = 3.02 • What is the critical value for a=.05? • Degrees of freedom: (R-1)(C-1) = (2-1)(2-1) = 1 • According to Knoke, p. 509: Critical value is 3.84 • Question: Can we reject H0? • No. 2 of 3.02 is less than the critical value • We cannot conclude that there is a relationship between gender and political party affiliation. Chi-square Test of Independence • Weaknesses of chi-square tests: • 1. If the sample is very large, we almost always reject H0. • Even tiny covariations are statistically significant • But, they may not be socially meaningful differences • 2. It doesn’t tell us how strong the relationship is • It doesn’t tell us if it is a large, meaningful difference or a very small one • It is only a test of “independence” vs. “dependence” • Measures of Association address this shortcoming. Measures of Association • Separate from the issue of independence, statisticians have created measures of association – They are measures that tell us how strong the relationship is between two variables • Weak Association Women Men Dem. 51 49 Rep. 49 51 Strong Association Women Men Dem. 100 0 Rep. 0 100 Crosstab Association:Yule’s Q • #1: Yule’s Q – Appropriate only for 2x2 tables (2 rows, 2 columns) • Label cell frequencies a through d: bc ad Formula : Q bc ad a b c d • Recall that extreme values along the “diagonal” (cells a & d) or the “off-diagonal” (b & c) indicate a strong relationship. • Yule’s Q captures that in a measure • 0 = no association. -1, +1 = strong association Crosstab Association:Yule’s Q • Rule of Thumb for interpreting Yule’s Q: • Bohrnstedt & Knoke, p. 150 Absolute value of Q Strength of Association 0 to .24 “virtually no relationship” .25 to .49 “weak relationship” .50 to .74 “moderate relationship” .75 to 1.0 “strong relationship” Crosstab Association:Yule’s Q • Example: Gender and Political Party Affiliation Women a Dem 27 10 Calculate “ad” d 16 Calculate “bc” bc = (10)(16) = 160 b c Rep Men 15 ad = (27)(15) = 405 bc ad 160 405 245 Q .48 bc ad 160 405 505 • -.48 = “weak association”, almost “moderate” Association: Other Measures • Phi () • Very similar to Yule’s Q • Only for 2x2 tables, ranges from –1 to 1, 0 = no assoc. • Gamma (G) • Based on a very different method of calculation • Not limited to 2x2 tables • Requires ordered variables • Tau c (tc) and Somer’s d (dyx) • Same basic principle as Gamma • Several Others discussed in Knoke, Norusis. Crosstab Association: Gamma • Gamma, like Q, is based on comparing “diagonal” to “off-diagonal” cases. – But, it does so differently • Jargon: • Concordant pairs: Pairs of cases where one case is higher on both variables than another case • Discordant pairs: Pairs of cases for which the first case (when compared to a second) is higher on one variable but lower on another Crosstab Association: Gamma • Example: Approval of candidates – Cases in “Love Trees/Love Guns” cell make concordant pairs with cases lower on both Love Guns Guns = OK Hate Guns Hate Trees Trees OK Love Trees 1205 603 71 All 71 individuals can be a pair with everyone in the lower cells. Just Multiply! (71)(659+1498+ 431+467) = 216,905 conc. pairs 659 1498 452 431 467 1120 Crosstab Association: Gamma • More possible concordant pairs – The “Love Guns/Trees are OK” cell and the “Trees = OK/Love Guns” cells also can have concordant pairs Love Guns Guns = OK Hate Guns Hate Trees Trees = OK Love Trees These 603 can pair with all those that score lower on approval for Guns & Trees 1205 603 71 (603)(659 + 431) = 657,270 conc. pairs 659 1498 452 These can pair lower too! 1120 (452)(431 + 467) = 405,896 conc. pairs 431 467 Crosstab Association: Gamma • Discordant pairs: Pairs where a first person ranks higher on one dimension (e.g. approval of Trees) but lower on the other (e.g., app. of Guns) Love Guns Guns = OK Hate Guns Hate Trees Trees = OK Love Trees 1205 603 71 659 1498 452 431 467 1120 The top-left cell is higher on Guns but lower on Trees than those in the lower right. They make pairs: (1205)(1498 + 452 + 467 + 1120) = 4,262,085 discordant pairs Crosstab Associaton: Gamma • If all pairs are concordant or all pairs are discordant, the variables are strongly related • If there are an equal number of discordant and concordant pairs, the variables are weakly associated. n s nd • Formula for Gamma: G n s nd • ns = number of concordant pairs • nd = number of discordant pairs Crosstab Association: Gamma • Calculation of Gamma is typically done by computer • Zero indicates no association • +1 = strong positive association • -1 = strong negative association • It is possible to do hypothesis tests on Gamma • To determine if population gamma differs from zero • Requirements: random sample, N > 50 • See Knoke, p. 155-6. Crosstab Association • Final remarks: • You have a variety of possible measures to assess association among variables. Which one should you use? • Yule’s Q and Phi require a 2x2 table • Larger ordered tables: use Gamma, Tau-c, Somer’s d • Ideally, report more than one to show that your findings are robust. Odds Ratios • Odds ratios are a powerful way of analyzing relationships in crosstabs • Many advanced categorical data analysis techniques are based on odds ratios • Review: What is a probability? • p(A) = # of outcomes that are “A” divided by total number of outcomes • To convert a frequency distribution to a probability distribution, simply divide frequency by N • The same can be done with crosstabs: Cell frequency over N is probability. Odds Ratios • If total N = 68, probability of drawing cases is: Women Men Dem 27 / 68 10 / 68 Rep 16 / 68 15 / 68 Women Men Dem .397 .147 Rep .235 .220 Odds Ratios • Odds are similar to probability… but not quite • Odds of A = Number of outcomes that are A, divided by number of outcomes that are not A – Note: Denominator is different that probability • Ex: Probability of rolling 1 on a 6-sided die = 1/6 • Odds of rolling a 1 on a six-sided die = 1/5 • Odds can also be calculated from probabilities: pi oddsi 1 pi Odds Ratios • Conditional odds = odds of being in one category of a variable within a specific category of another variable – Example: For women, what are the odds of being democrat? – Instead of overall odds of being democrat, conditional odds are about a particular subgroup in a table Dem Rep Women Men 27 10 16 15 Conditional odds of being democrat are: 27 / 16 = 1.69 Note: Odds for women are different than men Odds Ratios • If variables in a crosstab are independent, their conditional odds are equal • Odds of falling into one category or another are same for all values of other variable • If variables in a crosstab are associated, conditional odds differ • Odds can be compared by making a ratio • Ratio is equal to 1 if odds are the same for two groups • Ratios much greater or less than 1 indicate very different odds. Odds Ratios • Formula for Odds Ratio in 2x2 table: OR XY b d bc a c ad Women Dem Rep a c 27 16 Men b d 10 15 • Ex: OR = (10)(16)/(27)(15) = 160 / 405 = .395 • Interpretation: men have .395 times the odds of being a democrat compared to women • Inverted value (1/.395=2.5) indicates odds of women being democrat = 2.5 is times men’s odds Odds Ratios: Final Remarks • 1. Cells with zeros cause problems for odds ratios • Ratios with zero in denominator are undefined. • Thus, you need to have full cells • 2. Odds ratios can be used to measure assocation • Indeed, Yule’s Q is based on them • 3. Odds ratios form the basis for most advanced categorical data analysis techniques • For now it may be easier to use Yule’s Q, etc. But, if you need to do advanced techniques, you will use odds ratios. Tests for Difference in Proportions • Another approach to small (2x2) tables: • Instead of making a crosstab, you can just think about the proportion of people in a given category • More similar to T-test than a Chi-square test • • • • • Ex: Do you approve of Pres. Bush? (Yes/No) Sample: N = 86 women, 80 men Proportion of women that approve: PW = .70 Proportion of men that approve: PM = .78 Issue: Do the populations of men/women differ? • Or are the differences just due to sampling variability Tests for Difference in Proportions • Hypotheses: • Again, the typical null hypothesis is that there are no differences between groups • Which is equivalent to statistical independence • H0: Proportion women = proportion men • H1: Proportion women not = proportion men • Note: One-tailed directional hypotheses can also be used. Tests for Difference in Proportions • Strategy: Figure out the sampling distribution for differences in proportions • Statisticians have determined relevant info: • 1. If samples are “large”, the sampling distribution of difference in proportions is normal – The Z-distribution can be used for hypothesis tests • 2. A Z-value can be calculated using the formula: P1 P2 Z σ̂ ( P1 P2 ) Tests for Difference in Proportions • Standard error can be estimated as: σ̂ ( P1 P2 ) N1 N 2 Pboth (1 Pboth ) N1 N 2 • Where: N1 P1 N 2 P2 Pboth N1 N 2 Difference in Proportions: Example • • • • • Q: Do you approve of Pres. Bush? (Yes/No) Sample: N = 86 women, 80 men Women: N = 86, PW = .70 Men: N = 80, PW = .78 Total N is “Large”: 166 people – So, we can use a Z-test • Use a = .05, two-tailed Z = 1.96 Difference in Proportions: Example • Use formula to calculate Z-value P1 P2 .70 .78 .08 Z σ̂ ( P1 P2 ) σ̂ ( P1 P2 ) σ̂ ( P1 P2 ) • And, estimate the Standard Error as: σ̂ ( P1 P2 ) N1 N 2 Pboth (1 Pboth ) N1 N 2 Difference in Proportions: Example • First: Calculate Pboth: N1 P1 N 2 P2 Pboth N1 N 2 86(.70) 80(.78) Pboth 86 80 60.2 62.4 Pboth .739 166 Difference in Proportions: Example • Plug in Pboth=.739: σ̂ ( P1 P2 ) N1 N 2 .739(1 .739) N1 N 2 σ̂ ( P1 P2 ) 86 80 .454 (86)(80) σ̂ ( P1 P2 ) 166 .674 .104 6880 Difference in Proportions: Example • Finally, plug in S.E. and calculate Z: P1 P2 .70 .78 .08 Z σ̂ ( P1 P2 ) σ̂ ( P1 P2 ) σ̂ ( P1 P2 ) P1 P2 .08 Z .769 σ̂ ( P1 P2 ) .104 Difference in Proportions: Example • • • • Results: Critical Z = 1.96 Observed Z = .739 Conclusion: We can’t reject null hypothesis – Women and Men do not clearly differ in approval of Bush