Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2 Testing for 2 Categorical Variables Chi-Square Testing gives us a way to test COUNTS of categorical data. The test measures how far the observed counts deviate from the expected counts (values) of the situation. The test statistics of chi-square, denoted as 2, combined with degrees of freedom, denoted as df, is used to calculate the probability of such a difference between observed and expected counts and any extreme difference beyond that. We have practiced using the Goodness-of-Fit Chi-Square Test for a single categorical variable with three or more outcomes. Now we will investigate how to use the Chi-Square Test on two categorical variables. When we sample from ONE population regarding two categorical variables and write data into a two-way (contingency) table, we will test to see if no association (or independence) exists or not between the 2 categorical variables. This is called a Chi-Squared Test of Independence. If we sample from TWO populations regarding two categorical variables and write data into a two-way (contingency) table, we will test to see if the population distributions of proportions are the same for the specific categories. This is called a Chi-Squared Test of Homogeneity. Example: In a survey reported in a special issue of Newsweek magazine (Special Edition: Health for Life, Spring/Summer 1999), n = 747 randomly selected women were asked, “How satisfied are you with your overall appearance?” There were four possible responses to this question, and the following table shows the distribution of counts for the possible responses for each of three age groups. (Note: The counts were estimated from percents given in Newsweek.) MOS, p.482 How Satisfied Are You with Your Overall Appearance? Very Somewhat Not Too Not at All Age Under 30 30-49 Over 50 Total 45 73 106 224 82 168 153 403 10 47 41 98 4 6 12 22 Total 141 294 312 747 We must first calculate the EXPECTED VALUES using the row 141 224 total, column total, and grand total for each cell. E Under 30, Very 42.281 747 ... Expected Value row total column tot al E 30 49, Not Too Grand Total 294 98 38.570 747 ... E Over 50, Not at All 312 22 9.189 747 Normally in a computer output the expected values are calculated and listed under the OBSERVED COUNT (EXPECTED VALUES usually in parentheses), as seen in the following output: Very Somewhat Not Too Not at All Total Age Under 30 45 82 10 4 141 (42.281) (76.068) (18.498) (4.153) 30-49 73 168 47 6 294 (88.161) (158.61) (38.57) (8.659) Over 50 106 153 41 12 312 (93.558) (168.32) (40.932) (9.189) Total 224 403 98 22 747 ASSUMPTIONS 1. All expected values are greater than five OR [all expected values are greater than one AND no more than 20% of expected values are less than 5] 2. IO: Independent Observations 3. ME: Mutually exclusive 4. RS: Random sample VERIFY ASSUMPTIONS (Includes calculations of expected values--see table above as an example) 1. All expected values are greater than 1 AND 8.3% (less than 20%) expected values are less than 5. 2. The 747 women responses are independent observations. 3. Each of the 747 women's responses is mutually exclusive (meaning each response falls strictly into one cell). 4. We assume the 747 women were randomly sampled. STATE HYPOTHESES Ho: There is NO ASSOCIATION between age group and satisfaction level of overall appearance. Ha: There is an ASSOCIATION between age group and satisfaction level of overall appearance. SIGNIFICANCE LEVEL: = .05 CALCULATE CHI-SQUARE TEST STATISTIC 2 observed expected 2 expected 2 45 42.2812 82 76.0682 10 18.4982 42.281 14.27799784 76.068 18.498 ... 41 40.9322 12 9.1892 40.932 9.189 2 CALCULATE DF: df = (number of row categories - 1)(number of columns categories- 1) = (4 – 1)(3 – 1) = 6 CALCULATE P-VALUE: Use Chi-square table to approximate OR TI-83+ for value to 4 decimal places. WRITE PROBABILITY STATEMENT: P 2 14.278 .0267 INTERPRETATION Since p-value (.0267) is less than alpha (.05), we will reject the null hypothesis that there is no association between age group and satisfaction level of overall appearance for 747 women responses. EXAMPLE: The data on drinking behavior for independently chosen random samples of male and female students is similar to data that appeared in the article “Relationship of Health Behaviors to Alcohol and Cigarette Use by College Students” (J. of College Student Development (1992):163-170). Does there appear to be a gender difference with respect to drinking behavior? Low (1-7 Moderate (8- High (25+ Row Drinking None drinks/week) 24 drinks/week) Marginal Level drinks/week) Total Gender Men 140 478 300 63 981 (158.6) (554.0) (230.1) (38.4) Women 186 661 173 16 1036 (167.4) (585.0) (242.9) (40.6) Column 326 1139 473 79 2017 Marginal Total 1 Are you working with categorical data? 2 How many variables are there, 1 or 2? 3 If one variable, use Goodness-of-fit Chi-square test. OR If two variables, use Association (Independence) Chisquare test. Calculate the expected values. 4 5 State and VERIFY assumptions (requirements) to perform test. 6 State null and alternative hypotheses. 7 8 Define alpha value. Calculate chi-square statistic. 9 Calculate degrees of freedom. 11 Using table or TI-83+, determine pvalue. Sketch and shade distribution. 12 Write probability statement. 13 Interpret results in context of problem. 10 ASSIGNMENT: 1. The article “Factors Associated with Sexual Risk-Taking Behaviors Among Adolescents” (J. Marriage and Family (1994): 663-632) examined the relationship between gender and contraceptive use by sexually active teens. Each person in a random sample of sexually active teens was classified according to gender and contraceptive use (with three categories: rarely or never use, use sometimes or most of the time, and always use). Data consistent with percentages in the article is given in the table. Is there evidence of an association between gender and contraceptive use of active teens? Gender Female Male Row Marginal Contraceptive Use Total Rarely/Never 210 350 560 Sometimes/Most Times 190 320 510 Always 400 530 930 Column Marginal Total 800 1200 2000 Remember to enter Observed Counts in Matrix A ("RC Cola", Row then Column), then run 2 Test. 2. Do women have different patterns of work behavior than men? The article “Workaholism in Organizations: Gender Differences” (Sex Roles: A Journal of Research (1999): 333346) attempts to answer this question. Each person in a random sample of 423 graduates of a business school in Canada were polled and classified by gender and workaholism type. Gender Female Male Workaholism Types Work Enthusiasts 20 41 Workaholics 32 37 Enthusiastic Workaholics 34 46 Unengaged Workers 43 52 Relaxed Workers 24 27 Disenchanted Workers 37 30 a. Test the hypothesis that gender and workaholism type are independent. b. The author writes “women and men fell into each of the six workaholism types to a similar degree.” Does the outcome of the test you performed in part (a) support this conclusion? EXPLAIN. 3. Reference: Keppel, R. D., and Weis, J. G., in their article "Time and Distance as Solvability Factor in Murder Cases," Journal of Forensic Science, Vol. 39, No 2, March 1994. Below is tabled information from a sample of single victim--single offender cases in the state of Washington from January 1981 through December 1986. Assume it is reasonable to regard this sample as a random sample of such murders in the United States (a debatable and almost certainly false assumption!). Time Elapsed and Distance between Victim Last Seen and Body Recovery 0-24 hours 24 hours - 1 month Greater than 1 month 0-199 feet 505 52 9 200 feet to 1.5 miles 28 10 4 More than 1.5 miles 55 60 47 a) Test the hypothesis that the distance and elapsed time between the victim last seen and body recoveries are independent. b) Notice the "greater than expected" and "less than expected" for the individuals cells. Do you see any pattern? If so, describe it in a few sentences. 4. In the summer of 1846--July 31, in fact--what was to become the famous Donner party left Fort Bridger in Wyoming, headed for California. What with one thing and another they ran a bit late and on November 1 found themselves only just west of the present-date California-Nevada border. After an accumulation of snow in late October, a fierce snowstorm blew up and trapped them at Tuckee Lake--now renamed Donner Lake. To make a long story short by leaving out the gruesome details, many of the Donner party did not make it through the winter. Below you will find the breakdowns of who lived and who died by age and sex. You are to test the hypotheses that living and dying were independent of (a) age, and independent of (b) sex. (a) Age by Fate Data 1-4 years old 5-45 years old 46+ years old Lived 6 37 3 Died 9 23 7 (b) Gender by Fate Data Male Lived 24 Died 29 Female 22 10 Drinking Level Gender Men Women CM Total None 140 (158.6) 186 (167.4) 326 Low: 1-7 drinks/ week 478 (554.0) 661 (585.0) 1139 Calculate the expected values. State and VERIFY assumptions (requirements) to perform test. State null and alternative hypotheses. Define alpha value. Calculate chi-square statistic. Calculate df. Determine p-value. Sketch and shade distribution. Write probability statement. Interpret results in context of problem. Moderate: High: 25+ 8-24 d/wk drinks/ week 300 63 (230.1) (38.4) 173 16 (242.9) (40.6) 473 79 Row Marg. Total 981 1036 2017 The article “Factors Associated with Sexual Risk-Taking Behaviors Among Adolescents” (J. Marriage and Family (1994): 663-632) examined the relationship between gender and contraceptive use by sexually active teens. Each person in a random sample of sexually active teens was classified according to gender and contraceptive use (with three categories: rarely or never use, use sometimes or most of the time, and always use). Data consistent with percentages in the article is given in the table. Is there evidence of an association between gender and contraceptive use of active teens? Gender Female Male Row Marginal Contraceptive Use Total Rarely/Never 210 350 560 Sometimes/Most Times 190 320 510 Always 400 530 930 Column Marginal Total 800 1200 2000 Remember to enter Observed Counts in Matrix A ("RC Cola", Row then Column), then run 2 Test. Do women have different patterns of work behavior than men? The article “Workaholism in Organizations: Gender Differences” (Sex Roles: A Journal of Research (1999): 333346) attempts to answer this question. Each person in a random sample of 423 graduates of a business school in Canada were polled and classified by gender and workaholism type. Gender Female Male Workaholism Types E(F,___) E(M,___) Work Enthusiasts 20 41 Workaholics 32 37 Enthusiastic Workaholics 34 46 Unengaged Workers 43 52 Relaxed Workers 24 27 Disenchanted Workers 37 30 a. Test the hypothesis that gender and workaholism type are independent. b. The author writes “women and men fell into each of the six workaholism types to a similar degree.” Does the outcome of the test you performed in part (a) support this conclusion? EXPLAIN. Reference: Keppel, R. D., and Weis, J. G., in their article "Time and Distance as Solvability Factor in Murder Cases," Journal of Forensic Science, Vol. 39, No 2, March 1994. Below is tabled information from a sample of single victim--single offender cases in the state of Washington from January 1981 through December 1986. Assume it is reasonable to regard this sample as a random sample of such murders in the United States (a debatable and almost certainly false assumption!). Time Elapsed and Distance between Victim Last Seen and Body Recovery 0-24 24 hours - Greater than hours 1 month 1 month 0-199 feet 505 52 9 200 feet to 1.5 miles More than 1.5 miles 28 10 4 55 60 47 a) Test the hypothesis that the distance and elapsed time between the victim last seen and body recoveries are independent. b) Notice the "greater than expected" and "less than expected" for the individuals cells. Do you see any pattern? If so, describe it in a few sentences. In the summer of 1846--July 31, in fact--what was to become the famous Donner party left Fort Bridger in Wyoming, headed for California. What with one thing and another they ran a bit late and on November 1 found themselves only just west of the present-date California-Nevada border. After an accumulation of snow in late October, a fierce snowstorm blew up and trapped them at Tuckee Lake--now renamed Donner Lake. To make a long story short by leaving out the gruesome details, many of the Donner party did not make it through the winter. Below you will find the breakdowns of who lived and who died by age and sex. You are to test the hypotheses that living and dying were independent of (a) age, and independent of (b) sex. a) Age by Fate Data 1-4 years old 5-45 years old 46+ years old Lived 6 37 3 Died 9 23 7 (b) Gender by Fate Data Male Female Lived 24 22 Died 29 10