Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Inference for Categorical Data William P. Wattles, Ph. D. Francis Marion University 1 Continuous vs. Categorical • Continuous (measurement) variables have many values • Categorical variables have only certain values representing different categories • Ordinal-a type of categorical with a natural order (e.g., year of college) • Nominal-a type of categorical with no order (e.g., brand of cola) 2 Categorical Data • Tells which category an individual is in rather than telling how much. • Sex, race, occupation naturally categorical • A quantitative variable can be grouped to form a categorical variable. • Analyze with counts or percents. 3 Describing relationships in categorical data • No single graph portrays the relationship • Also no similar number summarizes the relationship • Convert counts to proportions or percents 4 Prediction 5 5 Prediction 6 6 Moving from descriptive to Inferential • Chi Square Inference involves a test of independence. • If variable are independent, knowledge of one variable tells you nothing about the other. 7 Moving from descriptive to Inferential • Inference involves expected counts. – Expected count=The count that would occur if the variables are independent 8 Inference for two-way tables • Chi Square test of independence. • For more than two groups • Cannot compare multiple groups one at a time. 9 To Analyze Categorical Data • First obtain counts • In Excel can do this with a pivot table • Put data in a Matrix or two-way table 10 Matrix or two-way table Republican Democrat Independent Male 18 43 14 Female 39 23 18 11 Inference for two-way tables • Expected count • The count that would occur if the variables are independent 12 Matrix or two-way table • Rows • Columns • Distribution: how often each outcome occurred • Marginal distribution: Count for all entries in a row or column 13 Row and column totals Male Female 14 RepublicanDemocrat Independent 18 43 14 39 23 18 57 66 32 75 80 155 RepublicanDemocrat Independent Male Female 57 37% 15 66 43% 32 21% 75 80 155 48% 52% Expected counts • 37% of all subjects are Republicans • If independent 37% of females should be Republican (expected value) • 37% of 80= 29 • 37% of 75 = 28 16 Expected counts rounded Republican Male Female total 17 Democrat Independent total 28 32 15 75 29 34 17 80 57 66 32 155 Observed vs. Expected Male Female RepublicanDemocrat Independent 18 43 14 39 23 18 57 66 32 Republican Male Female total 18 75 80 155 Democrat Independent total 28 32 15 75 29 34 17 80 57 66 32 155 Chi-Square • Chi-square A measure of how far the observed counts are from the expected counts 19 Chi-square test of independence ( f f ) 2 o e X fe 20 2 Chi Square test of independence with SPSS 21 Chi Square test of independence with SPSS 22 Chi Square 23 Chi-square test of independence • Degrees of Freedom • df=number of rows-1 times number of columns -1 • compare the observed and expected counts. • P-value comes from comparing the Chisquare statistic with critical values for a chisquare distribution 24 Example • Have the percent of majors changed by school? 25 Data collection http://www.fmarion.edu/about/FactBook 2004/2005 Fall 2004 Graduates by Major 26 27 28 Chi Square 29 Marital Status, page 543 job grade single married divorced widowed 1 58 874 15 8 2 222 3927 70 20 3 50 2396 34 10 4 7 533 7 4 30 Marital Status, page 543 Test Statistics Pearson 31 Chi-Square Value 67.491 df 9 p-value 0.0000 Olive Oil, page 578 Olive Oil low medium high Colon cancer 398 397 430 rectal 250 241 217 controls 1368 1377 1409 32 Olive Oil, page 578 Test Statistics Value Pearson Chi-Square 1.552 Continuity Adjusted Chi-Square 1.396 Likelihood Ratio Chi-Square 1.549 33 df 4 4 4 p-value 0.817 0.845 0.818 Business Majors, page 563 Female Accounting Administration Economics Finance 34 Male 68 91 5 61 56 40 6 59 Business Majors, page 563 Test Statistics Pearson Chi-Square 35 Value 10.827 df 3 p-value 0.013 Exam Three • 37 multiple choice questions, 4 short answer • T-tests and chi square on Excel • General questions about analyzing categorical data and t-tests • Review from earlier this term 36 Inference as a decision • We must decide if the null hypothesis is true. • We cannot know for sure. • We choose an arbitrary standard that is conservative and set alpha at .05 • Our decision will be either correct or incorrect. 37 Type I and Type II errors Ho is really True We reject Type I Error Ho (false alarm) Ho is really False Correct Decision We accept Correct decision Type II Error Ho (miss) 38 Type I error • If we reject Ho when in fact Ho is true, this is a Type I error • Statistical procedures are designed to minimize the probability of a Type I error, because they are more serious for science. • With a Type I error we erroneously conclude that an independent variable works. 39 Type II error • If we accept Ho when in fact Ho is false this is a Type II error. • A type two error is serious to the researcher. • The Power of a test is the probability that Ho will be rejected when it is, in fact, false. 40 Probability 41 We reject Ho We accept Ho Ho is Ho is really True really False p= p=1- p=1- p= Power • The goal of any scientific research is to reject Ho when Ho is false. • To increase power: – – – – 42 a. increase sample size b. increase alpha c. decrease sample variability d. increase the difference between the means Categorical data example • African-American students more likely to register via the web. 43 Table Variable Students University-Wide Register on the Web Register with other method Total 44 White n 447 876 1323 African-American Percent n 34% 284 66% 356 640 Percent 44% 56% Web Registration by Race 60% 50% 40% 44% 30% 20% 34% 29% African-American 25% 10% 0% 2000 45 White Year 2001 Categorical Data Example • African-American students university-wide (44%) were more likely that white students (34%) to use web registration, X2(1, N = 1963) = 20.7 , p < .001. 46 47 Smoking among French Men • Do these data show a relationship between education and smoking in French men? 48 49 50 The End 51 The End Benford’s Law page 550 • Faking data? 52 Problem 20.14 Digit 53 ratio 1 2 3 4 5 6 7 8 9 0.301 0.176 0.125 0.097 0.079 0.067 0.058 0.051 0.046 Observed 6 4 6 7 3 5 6 4 4 Digit ratio 1 2 3 4 5 6 7 8 9 54 0.301 0.176 0.125 0.097 0.079 0.067 0.058 0.051 0.046 Expected Observed 13.545 7.92 5.625 4.365 3.555 3.015 2.61 2.295 2.07 6 4 6 7 3 5 6 4 4 Expected Observed 13.545 7.92 5.625 4.365 3.555 3.015 2.61 2.295 2.07 55 6 4 6 7 3 5 6 4 4 4.20280731 1.94020202 0.025 1.59065865 0.08664557 1.30687396 4.40310345 1.26667756 1.7994686 16.6214371 Significance test chitest p = 56 0.03430 Example • Survey2 Berk & Carey page 261 57