Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER 11 THE CHI-SQUARE DISTRIBUTION This presentation is based on material and graphs from Open Stax and is copyrighted by Open Stax and Georgia Highlands College. 11.1 FACTS ABOUT THE CHI-SQUARE DISTRIBUTION For Qualitative data (hint: data in categories) CHI-SQUARE NOTATION The notation for the chi-square distribution is: χ~χ²𝑑𝑓 where df = degrees of freedom which depends on how chi-square is being used. If you want to practice calculating chi square probabilities then use df= (number of categories – 1). CHI-SQUARE For the Chi-Square distribution, the population mean is μ=df the population standard deviation is σ = The random variable is shown as χ2 2(𝑑𝑓) RULES FOR CHI-SQUARE DISTRIBUTION 1. The curve is nonsymmetrical and skewed to the right. 2. There is a different chi-square curve for each df. 3. The test statistic for any test is always greater than or equal to zero. RULES FOR CHI-SQUARE DISTRIBUTION 4. When df > 90, the chi-square curve approximates the normal distribution. For a chi-square where the df = 1000, the mean, μ = df = 1,000 and the standard deviation, σ= 44.7. 2(1,000)= Therefore, X~N(1,000, 44.7), approximately. 5. The mean ,μ, is located just to the right of the peak. GOODNESS-OF-FIT TEST In this type of hypothesis test, you determine whether the data "fit" a particular distribution or not. For example, you may suspect you run known data fit a binomial distribution. You use a chi-square test (meaning the distribution for the hypothesis test is chi square) to determine if there is a fit or not. The null and the alternative hypotheses for this test may be written in sentences or may be stated as equations or inequalities. CHI SQUARE GOF TEST STATISTIC The test statistic for a goodness-of-fit test is: (𝑶−𝑬)² 𝒌 𝑬 where: • O=observed values(data) • E=expected values(from theory) • k= the number of different data cells or categories The observed values are the data values and the expected values are the values you would expect to get if the null hypothesis were true. The expected value for each cell needs to be at least five in order for you to use this test. CHI-SQUARE GOF TEST The number of degrees of freedom is df= (number of categories – 1). The goodness-of-fit test is almost always right-tailed. If the observed values and the corresponding expected values are not close to each other, then the test statistic can get very large and will be way out in the right tail of the chi-square curve. EXAMPLE OF CHI SQUARE GOF TEST Employers want to know which days of the week employees are absent in a five-day work week. Most employers would like to believe that employees are absent equally during the week. Suppose a random sample of 60 managers were asked on which day of the week they had the highest number of employee absences. The results were distributed as in Table 11.6. For the population of employees, do the days for the highest number of absences occur with equal frequencies during a five-day work week? Test at a 5% significance level. EXAMPLE OF CHI SQUARE GOF TEST Below is the observed amounts for each day Monday Number 15 of absences Tuesday 12 Wednesd Thursday Friday ay 9 9 15 HYPOTHESES The null and alternative hypotheses are: • H0: The absent days occur with equal frequencies, that is, they fit a uniform distribution. • Ha: The absent days occur with unequal frequencies, that is, they do not fit a uniform distribution. SOLVE FOR TEST STATISTICS #1 Find expected value (find the total number of absences and then divide by the number of categories) Day Monday Tuesday Wednesday Thursday Friday Observed 15 12 9 9 15 Expected 60 / 5 = 12 60 / 5 = 12 60 / 5 = 12 60 / 5 = 12 60 / 5 = 12 SOLVE FOR TEST STATISTICS #2 Next we subtract the expected from the observed. Day Monday Tuesday Wednesday Thursday Friday Observed 15 12 9 9 15 Expected 60 / 5 = 12 60 / 5 = 12 60 / 5 = 12 60 / 5 = 12 60 / 5 = 12 Observed-Expected 15 – 12 = 3 12 –12 = 0 9 – 12 = - 3 9 – 12 = - 3 15 – 12 = 3 SOLVE FOR TEST STATISTICS #3 Next we square the results from the observed – expected. Day Mon Tues Wed Thurs Fri Obs. 15 12 9 9 15 Exp. 12 12 12 12 12 Obs-Exp 15 – 12 = 3 12 –12 = 0 9 – 12 = - 3 9 – 12 = - 3 15 – 12 = 3 (Obs-Exp)² (3)² = 9 (0)² = 0 (-3)² = 9 (-3)² = 9 (3)² = 9 SOLVE FOR TEST STATISTICS #4 Next we divide the expected value from the (obs-exp)² Day Obs Exp Mon Tues Wed Thurs Fri 15 12 9 9 15 12 12 12 12 12 ObsExp 3 0 -3 -3 3 (ObsExp)² 9 0 9 9 9 (Obs-Exp)²/Exp 9/12 = 0.75 0/12 = 0 9/12 = 0.75 9/12 = 0.75 9/12 = 0.75 SOLVE FOR TEST STATISTICS #5 To get the Chi-Square 𝑥 2 test statistic, you add the ObsExp)²/Exp together 0.75 + 0 + 0.75 + 0.75 + 0.75 = 3.00 So the 𝑥 2 test statistic is 3.00 Day Mon Tues Wed Thurs Fri (Obs-Exp)²/Exp 9/12 = 0.75 0/12 = 0 9/12 = 0.75 9/12 = 0.75 9/12 = 0.75 ANSWER χ² test statistic is 3 d.f. is 5-1 = 4 To get p-value, Press2nd DISTR. Arrow down to χ2cdf. Press ENTER. Enter(3,10^99,4).Rounded to four decimal places, you should see 0.5578, which is the p-value. Since p (0.5578) is greater than α (0.05), then you decide not to reject the null hypothesis BUT WHAT DOES IT MEAN? If we decide not to reject the null hypothesis, that means we found the null hypothesis to be true. So we need to conclude it properly. At the 5% level of significance, there is not sufficient evidence to conclude that the absent days do not occur with equal frequencies. o We stated the level of significance oSince we failed to reject the null hypothesis, we base our conclusion on the alternative hypothesis. DIFFERENT PERCENTAGES Some Chi-Squared will want the data to be tested at different percentages based on the different categories. To determine the expected values, you will get the sample size (add all the observed values). Based on the particular categories’ percentage, you will cover the percentage to a decimal and multiply the decimal by the sample size. EX: For the last example, the sample size was 60. What if the manager believed that absences occurred 40% on Monday? 60 x .40 = 24 is the expected number for Monday. BUT we can use a website to help us so we don’t have to do it by hand. CHI-SQUARE GOODNESS OF FIT Using Internet Website WEBSITE http://www.socscistatistics.com/tests/goodnessoffit/Default2.aspx EXAMPLE: “EQUAL PROPORTIONS” WHICH FLAVOR OF SODA IS PREFERRED? Claim: There is no preference for flavors Let a = 0.05 A sample of 100 people provide the data in the table below: Cherry Strawberry Orange Lime Grape 32 28 14 10 16 STEP #1 ENTER THE QUALITATIVE CATEGORIES From the example above: SODA FLAVOR LIKES Cherry 32 Strawberry 28 Orange 16 Lime 14 Grape 10 These are the qualitative categories Enter the categories STEP #2 & #3 Click Next Select “Frequencies” STEP #4 & #5 Enter “Observed Values” from the given table Calculate the expected value for each category Expected value = (sample size)(proportion) E = np For this example: Sample size, n = 100 “Equal” proportions = 1 1 5 So, E = (100)( )= 20 expected likes per 5 soda flavor STEP #6 & #7 Enter “Expected Value” Select significance level given in problem (example) STEP #8 & #9 Click Calculate Chi^2 Results displayed on screen EXAMPLE #2 SPECIFIED PROPORTIONS A statistics teacher claims that, on the average, 20% of her students get a grade of A, 35% get a B, 25% get a C, 10% get a D, and 10% get an F. The grades of a random sample of 100 students were recorded. Test the claim that the grades follow the distribution claimed by teacher. Use a = 0.05 The following table presents the results: A B C D F 29 42 20 5 4 STEP #1 ENTER THE QUALITATIVE CATEGORIES From the example above: Grade Number of Students A 29 B 42 C 20 D 5 F 4 These are the qualitative categories Enter the categories STEP #2 & #3 Click Next Select “Frequencies” STEP #4 & #5 Enter “Observed Values” from the given table Calculate the expected value for each category Expected value = (sample size)(proportion) E = np The expected value will be different for each category. STEP #5 Grade Total number of students in SAMPLE (not observed values) Percentage (stated in problem) Total number in Sample * percentage EXPECTED VALUE A 100 20% (100)(0.20) 20 B 100 35% (100)(0.35) 35 C 100 25% (100)(0.25) 25 D 100 10% (100)(0.10) 10 F 100 10% (100)(0.10) 10 STEP #6 & #7 Enter “Expected Value” Select significance level given in problem (example) STEP #8 & #9 Click Calculate Chi^2 Results displayed on screen