Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2 Tests for Goodness of Fit: • General Notion: We often wish to know whether a particular distribution fits a general definition • Example: To use t tests, we must suppose that the population is normally distributed • If a sample is drawn from, say, a normal distribution, the sample values should be reflect the population distribution • Allows us to state the number in the sample that should be in a particular range • Example: 68% of a normal distribution is within +/- 1 standard deviation of the mean. About 68% of the values in a sample from a normal distribution should be within +/- 1 standard deviation of the mean • Comparison of actual and expected numbers is the province of the 2 distribution • Let Oj be the number observed in the sample in range j • Let Ej be the number that would be expected if the population had a given distribution, as uniform, Poisson, normal, etc. • Then 2 k 2 j 1 (O j E j ) Ej • where k is the number of categories • degrees of freedom = k – 1 – m where m is the number of parameter estimates used in the calculation • • Example: Are the answers to Dr. Dinwiddie’s multiple-choice tests random? If so, the answers should conform to a uniform distribution and P(A) = P(B) = P(C) = P(D) = ¼. (For the uniform distribution P(E) = 1/n, where n is the number of possible values.) On a recent exam there were sixty questions with correct answers: A-20, B-5, C-17, and D-18. H0: the distribution of answers is uniform H1: the distribution is not uniform Correct Answer Observed Expected A 20 15 B 5 15 C 17 15 D 18 15 k (O j E j ) 2 j 1 Ej 2 Squared Difference Then 2 = 9.207, and no parameters were estimated, so degrees of freedom = 4 – 1 = 3 • Excel and the chi-square distribution – CHIDIST(x value, df) returns the area in the right-hand tail of the chi-square distribution • goodness of fit tests are all upper one-tail tests, so chidist gives the p-value of the test – CHIINV(probability, df) gives the chi-square value for the upper tail of the probability entered • use to find the critical value for a chi-square test • For the Dinwiddie problem: CHIDIST(9.207, 3) gives the p-value of the test • EXAMPLE: Hamish suspects that the dice at Black Bart’s are not fair, so he spirits one out of the casino one night. After rolling the stolen die 120 times, he has the following result: No. of Dots No. of Times 1 27 2 24 3 18 4 11 5 27 6 13 k (O j E j ) 2 j 1 Ej 2 What are the null and alternative hypotheses? Is Hamish right to be suspicious of Black Bart? • Testing for normality – suppose that nationally auto insurance has a mean price of $700 with standard deviation $135. We have a sample of 80 NC drivers, and we’d like to know whether their insurance bills are normally distributed with the national parameters. – how many would we expect in the range 700 to 835? – HINT: how many standard deviations? What proportion are within that range of standard deviations? • answer: on a normal distribution, 0.34 are between the mean and +1 st dev, so we’d expect to find 0.34 * 80 = 27.2 in that range • Setting up a spreadsheet: use normsdist • normsdist(-2) gives the proportion more than two standard deviations below the mean • normsdist(-1) – normsdist(-2) would give proportion between 1 and 2 st devs below mean • Continuing in that fashion, we’d have the following St Devs Range Prop. Expected freq < -2 < 430 0.02275 1.82 -2 to -1 430-565 0.1359 10.87 -1 to 0 565-700 0.3413 27.31 0 to 1 700-835 0.3413 27.31 1 to 2 835-970 0.1359 10.87 >2 > 970 0.02275 1.82 • To find the observed values in the sample, use the HISTOGRAM tool • An elaborated solution appears under “Study Aids” on my web site. Click on the link to normaltest.xls • Issue: how many degrees of freedom does the 2 statistic have? – df = k – 1 – m = 6 – 1 – 0 = 5 • Alternate technique: determine whether the sample was drawn from a normal population • First, calculate sample mean and standard deviation and use those numbers in the calculation • Issue: how many degrees of freedom does the 2 statistic have? – df = k – 1 – m = 6 – 1 – 2 = 3 • • • • • • • • • • A problem and an alternate solution Each cell should have expected frequency at least 5, otherwise chisquare value is not correct One solution: choose ranges with equal expected frequencies Divide data into, say, 10 ranges – each expected to contain 8 observations So we define ranges that each contain 1/10 of total Remember NORMINV(probability, mean, standard deviation) displays the upper boundary of the given probability for the specified mean and standard deviation Example: NORMINV(.1, 300, 20) = 274.37. 10% of this distribution is ≤ 274.37 NORMINV(1/10, X, s) will find the boundary of the lowest 10% of the distribution NORMINV(4/10, X, s) finds the boundary of the lowest 40% and so on Look carefully at sheet 2 of the workbook normaltest.xls as posted • • The boundaries thus found are the bin range Each will have expected number equal to n/c where n is the amount of data and c the number of categories • Testing for conformity to an observed distribution: – The national distribution of pets is as follows: Number of Pets Percentage of Households 0 55 1 25 2 10 3 5 4 3 5 or more 2 A marketing company wants to know whether Boone conforms to the national pattern. In a sample of 300 Boone households, they found the following: No. of Pets No. of Households 0 128 1 75 2 50 3 20 4 18 5 or more 9 k (O j E j ) j 1 Ej 2 Expected No. 2 Squares