Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics for Business and Economics Chapter 9 Categorical Data Analysis Learning Objectives 1. Explain 2 Test for Proportions 2. Explain 2 Test of Independence 3. Solve Hypothesis Testing Problems • More Than Two Population Proportions • Independence Data Types Data Quantitative Discrete Continuous Qualitative Qualitative Data • Qualitative random variables yield responses that classify – Example: gender (male, female) • Measurement reflects number in category • Nominal or ordinal scale • Examples – What make of car do you drive? – Do you live on-campus or off-campus? Hypothesis Tests Qualitative Data Qualitative Data 1 pop. Proportion More than 2 pop. Independence 2 pop. Z Test Z Test c2 Test c2 Test Chi-Square (2) Test for k Proportions Hypothesis Tests Qualitative Data Qualitative Data 1 pop. Proportion More than 2 pop. Independence 2 pop. Z Test Z Test c2 Test c2 Test Multinomial Experiment • • • • • • n identical trials k outcomes to each trial Constant outcome probability, pk Independent trials Random variable is count, nk Example: ask 100 people (n) which of 3 candidates (k) they will vote for 2 ( ) Chi-Square Test for k Proportions • Tests equality (=) of proportions only – Example: p1 = .2, p2=.3, p3 = .5 • One variable with several levels • Uses one-way contingency table One-Way Contingency Table Shows number of observations in k independent groups (outcomes or variable levels) Outcomes (k = 3) Candidate Tom Bill Mary Total 35 20 45 100 Number of responses Conditions Required for a Valid Test: One-way Table 1. A multinomial experiment has been conducted 2. The sample size n is large: E(ni) is greater than or equal to 5 for every cell 2 Test for k Proportions Hypotheses & Statistic 1. Hypotheses Hypothesized probability H0: p1 = p1,0, p2 = p2,0, ..., pk = pk,0 Ha: At least one pi is different from above 2. Test Statistic ni E ni E ni all cells 2 3. Observed count 2 Expected count: E(ni) = npi,0 Degrees of Freedom: k – 1 Number of outcomes 2 Test Basic Idea 1. Compares observed count to expected count assuming null hypothesis is true 2. Closer observed count is to expected count, the more likely the H0 is true • Measured by squared difference relative to expected count — Reject large values Finding Critical Value Example 2 What is the critical If ni = E(ni), 2 = 0. Do not reject H0 value if k = 3, and =.05? Reject H0 = .05 df = k - 1 = 2 0 2 Table (Portion) DF .995 1 ... 2 0.010 5.991 c2 Upper Tail Area … .95 … … 0.004 … … 0.103 … .05 3.841 5.991 2 Test for k Proportions Example As personnel director, you want to test the perception of fairness of three methods of performance evaluation. Of 180 employees, 63 rated Method 1 as fair, 45 rated Method 2 as fair, 72 rated Method 3 as fair. At the .05 level of significance, is there a difference in perceptions? 2 • • • • • Test for k Proportions Solution H0: p1 = p2 = p3 = 1/3 Test Statistic: Ha: At least 1 is different = .05 n1 = 63 n2 = 45 n3 = 72 Critical Value(s): Decision: Reject H0 = .05 0 5.991 c2 Conclusion: 2 Test for k Proportions Solution E ni npi ,0 E n1 E n2 E n3 180 1 3 60 ni E ni E ni all cells 2 2 63 60 2 60 45 60 2 60 72 60 2 60 6.3 2 • • • • • Test for k Proportions Solution H0: p1 = p2 = p3 = 1/3 Test Statistic: 2 = 6.3 Ha: At least 1 is different = .05 n1 = 63 n2 = 45 n3 = 72 Critical Value(s): Decision: Reject H0 Reject at = .05 = .05 0 5.991 c2 Conclusion: There is evidence of a difference in proportions Contingency Tables Contingency Tables • Useful in situations involving multiple population proportions • Used to classify sample observations according to two or more characteristics • Also called a cross-classification table. Contingency Table Example Left-Handed vs. Gender Dominant Hand: Left vs. Right Gender: Male vs. Female 2 categories for each variable, so called a 2 x 2 table Suppose we examine a sample of 300 children Contingency Table Example (continued) Sample results organized in a contingency table: Hand Preference sample size = n = 300: 120 Females, 12 were left handed 180 Males, 24 were left handed Gender Left Right Female 12 108 120 Male 24 156 180 36 264 300 2 Test for the Difference Between Two Proportions H0: π1 = π2 (Proportion of females who are left handed is equal to the proportion of males who are left handed) H1: π1 ≠ π2 (The two proportions are not the same hand preference is not independent of gender) • If H0 is true, then the proportion of left-handed females should be the same as the proportion of left-handed males • The two proportions above should be the same as the proportion of left-handed people overall The Chi-Square Test Statistic The Chi-square test statistic is: 2 χ STAT all cells ( fo fe )2 fe • where: fo = observed frequency in a particular cell fe = expected frequency in a particular cell if H0 is true 2 χ STAT for the 2 x 2 case has 1 de gre eof fre e dom (Assumed: each cell in the contingency table has expected frequency of at least 5) Decision Rule 2 The χ STAT test statistic approximately follows a chisquared distribution with one degree of freedom Decision Rule: 2 2 χ χ If STAT α , reject H0, otherwise, do not reject H0 0 Do not reject H0 Reject H0 2α 2 Computing the Average Proportion The average p X1 X2 X n1 n2 n proportion is: 120 Females, 12 were left handed 180 Males, 24 were left handed Here: 12 24 36 p 0.12 120 180 300 i.e., of all the children the proportion of left handers is 0.12, that is, 12% Finding Expected Frequencies • To obtain the expected frequency for left handed females, multiply the average proportion left handed (p) by the total number of females • To obtain the expected frequency for left handed males, multiply the average proportion left handed (p) by the total number of males If the two proportions are equal, then P(Left Handed | Female) = P(Left Handed | Male) = .12 i.e., we would expect (.12)(120) = 14.4 females to be left handed (.12)(180) = 21.6 males to be left handed Observed vs. Expected Frequencies Gender Hand Preference Left Right Female Observed = 12 Expected = 14.4 Observed = 108 Expected = 105.6 120 Male Observed = 24 Expected = 21.6 Observed = 156 Expected = 158.4 180 36 264 300 The Chi-Square Test Statistic Hand Preference Left Right Observed = 12 Observed = 108 Expected = 14.4 Expected = 105.6 Observed = 24 Observed = 156 Expected = 21.6 Expected = 158.4 Gender Female Male 36 The test statistic is: χ 2STAT all cells 264 120 180 300 (f o f e ) 2 fe (12 14.4) 2 (108 105.6) 2 (24 21.6) 2 (156 158.4) 2 0.7576 14.4 105.6 21.6 158.4 Decision Rule 2 The test statistic is χ STAT 0.7576 ; χ 02.05 with 1 d.f. 3.841 Decision Rule: 2 If χ STAT > 3.841, reject H0, otherwise, do not reject H0 0.05 0 Do not reject H0 Reject H0 20.05 = 3.841 2 Here, 2 2 χ STAT = 0.7576< χ 0.05 = 3.841, so we do not reject H0 and conclude that there is not sufficient evidence that the two proportions are different at = 0.05 2 Test for Differences Among More Than Two Proportions • Extend the 2 test to the case with more than two independent populations: H0: π1 = π2 = … = πc H1: Not all of the πj are equal (j = 1, 2, …, c) The Chi-Square Test Statistic The Chi-square test statistic is: 2 χ STAT all cells ( fo fe )2 fe • Where: fo = observed frequency in a particular cell of the 2 x c table fe = expected frequency in a particular cell if H0 is true χ 2STAT for the 2 x c case has (2 - 1)(c - 1) c - 1 de gre e sof fre e dom (Assumed: each cell in the contingency table has expected frequency of at least 1) Computing the Overall Proportion X1 X 2 Xc X The overall p n1 n2 nc n proportion is: • Expected cell frequencies for the c categories are calculated as in the 2 x 2 case, and the decision rule is the same: Decision Rule: 2 χ α2 , reject H0, If χ STAT otherwise, do not reject H0 2 χ Where α is from the chi- squared distribution with c – 1 degrees of freedom The Marascuilo Procedure • Used when the null hypothesis of equal proportions is rejected • Enables you to make comparisons between all pairs • Start with the observed differences, pj – pj’, for all pairs (for j ≠ j’) . . . • . . .then compare the absolute difference to a calculated critical range 2 Test of Independence Hypothesis Tests Qualitative Data Qualitative Data 1 pop. Proportion More than 2 pop. Independence 2 pop. Z Test Z Test c2 Test c2 Test 2 Test of Independence • Shows if a relationship exists between two qualitative variables – One sample is drawn – Does not show causality • Uses two-way contingency table 2 Test of Independence Contingency Table Shows number of observations from 1 sample jointly in 2 qualitative variables Levels of variable 2 House Style Split-Level Ranch Total House Location Urban Rural 63 49 15 33 78 82 Levels of variable 1 Total 112 48 160 Conditions Required for a Valid 2 Test: Independence 1. Multinomial experiment has been conducted 2. The sample size, n, is large: Eij is greater than or equal to 5 for every cell 2 Test of Independence Hypotheses & Statistic 1. Hypotheses • H0: Variables are independent • Ha: Variables are related (dependent) 2. Test Statistic Observed count nij Eij Eij all cells 2 2 Expected count 3. Degrees of Freedom: (r – 1)(c – 1) Rows Columns 2 Test of Independence Expected Counts 1. Statistical independence means joint probability equals product of marginal probabilities 2. Compute marginal probabilities and multiply for joint probability 3. Expected count is sample size times joint probability Expected Count Example Marginal probability = 112 160 House Style Location Urban Rural Obs. Obs. Total Split–Level 63 49 112 Ranch 15 33 48 Total 78 82 160 Expected Count Example Marginal probability = 112 160 House Style Location Urban Rural Obs. Obs. Total Split–Level 63 49 112 Ranch 15 33 48 Total 78 82 160 Marginal probability = 78 160 Expected Count Example Joint probability = House Style 112 78 160 160 Marginal probability = 112 160 Location Urban Rural Obs. Obs. Total Split–Level 63 49 112 Ranch 15 33 48 Total 78 82 160 Marginal probability = 78 160 112 78 Expected count = 160· 160 160 = 54.6 Expected Count Calculation Eij = 112·78 160 House Style R iC j n House Location Urban Rural Obs. Exp. Obs. Exp. 112·82 160 Total Split-Level 63 54.6 49 57.4 112 Ranch 15 23.4 33 24.6 48 Total 78 78 82 82 48·78 160 160 48·82 160 2 Test of Independence Example As a realtor you want to determine if house style and house location are related. At the .05 level of significance, is there evidence of a relationship? House Style Split-Level Ranch Total House Location Urban Rural 63 49 15 33 78 82 Total 112 48 160 2 Test of Independence Solution • • • • • H0: No Relationship Ha: Relationship = .05 df = (2 - 1)(2 - 1) = 1 Critical Value(s): Test Statistic: Decision: Reject H0 = .05 0 3.841 c2 Conclusion: 2 Test of Independence Solution Eij 5 in all cells 112·78 160 House Style House Location Urban Rural Obs. Exp. Obs. Exp. 112·82 160 Total Split-Level 63 54.6 49 57.4 112 Ranch 15 23.4 33 24.6 48 Total 78 78 82 82 48·78 160 160 48·82 160 2 Test of Independence Solution nij Eij Eij all cells 2 2 n11 E11 n12 E12 2 E11 E12 63 54.6 2 54.6 n22 E22 2 2 49 57.4 E22 2 57.4 33 24.6 2 24.6 8.41 2 Test of Independence Solution • • • • • H0: No Relationship Ha: Relationship = .05 df = (2 - 1)(2 - 1) = 1 Critical Value(s): Reject H0 = .05 0 3.841 c2 Test Statistic: 2 = 8.41 Decision: Reject at = .05 Conclusion: There is evidence of a relationship 2 Test of Independence Thinking Challenge You’re a marketing research analyst. You ask a random sample of 286 consumers if they purchase Diet Pepsi or Diet Coke. At the .05 level of significance, is there evidence of a relationship? Diet Coke No Yes Total Diet Pepsi No Yes 84 32 48 122 132 154 Total 116 170 286 2 Test of Independence Solution* • • • • • H0: No Relationship Ha: Relationship = .05 df = (2 - 1)(2 - 1) = 1 Critical Value(s): Test Statistic: Decision: Reject H0 = .05 0 3.841 c2 Conclusion: 2 Test of Independence Solution* Eij 5 in all cells 116·132 286 Diet Coke Diet Pepsi 154·132 286 No Yes Obs. Exp. Obs. Exp. Total No 84 53.5 32 62.5 116 Yes 48 78.5 122 91.5 170 Total 132 132 154 154 286 170·132 286 170·154 286 2 Test of Independence Solution* nij Eij Eij all cells 2 2 n11 E11 n12 E12 2 E11 E12 84 53.5 2 53.5 n22 E22 2 2 32 62.5 E22 2 62.5 122 91.5 2 91.5 54.29 2 Test of Independence Solution* • • • • • H0: No Relationship Ha: Relationship = .05 df = (2 - 1)(2 - 1) = 1 Critical Value(s): Reject H0 = .05 0 3.841 c2 Test Statistic: 2 = 54.29 Decision: Reject at = .05 Conclusion: There is evidence of a relationship 2 Test of Independence Thinking Challenge 2 There is a statistically significant relationship between purchasing Diet Coke and Diet Pepsi. So what do you think the relationship is? Aren’t they competitors? Diet Coke No Yes Total Diet Pepsi No Yes 84 32 48 122 132 154 Total 116 170 286 You Re-Analyze the Data High Income Diet Coke No Yes Total Diet Pepsi No Yes 4 30 40 2 44 32 Total 34 42 76 Diet Pepsi No Yes 80 2 8 120 88 122 Total 82 128 210 Low Income Diet Coke No Yes Total True Relationships* Diet Coke Underlying causal relation Control or intervening variable (true cause) Apparent relation Diet Pepsi Moral of the Story* Numbers don’t think - People do! © 1984-1994 T/Maker Co. Conclusion 1. Explained 2 Test for Proportions 2. Explained 2 Test of Independence 3. Solved Hypothesis Testing Problems • • More Than Two Population Proportions Independence