Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2 χ Applications (pronounced “kie-square”; text Ch. 15) • Tests for Independence – is characteristic R the same or different in different populations? • Tests for Equality of Population Proportions, Three or More Populations • Tests for “Goodness of Fit” – did this sample come from a population with ____ distribution? • Confidence Intervals and Hypothesis Tests for Variances and Standard Deviations χ2 Contingency Tables (Tests for Independence) Medication/ Outcome Prozan Recovered 150 Didn’t Recover 50 Drug X 200 100 Total Total Research Question: Was the proportion who recovered on Prozan the same as the proportion that recovered on Drug X? Some difference in proportions is NOT sufficient evidence – the difference could be due to sampling variability. • So we ask: how probable is the observed difference if there is no actual difference between these populations. • That probability is governed by the χ2 distribution • Chi-Square Calculation: r k χ2 = ∑ ∑ i =1 j =1 (Oij − Eij ) 2 Eij r is the number of rows; k is the number of columns. The expression directs us to calculate the expected number for each row and column intersection, then calculate the required squares. The expected number is the proportion in the total sample with characteristic R multiplied by the size of the subsample. The chi-square statistic has (r-1) X (k-1) degrees of freedom Medication/Out Recovered come Didn’t Recover Prozan 150 50 Drug X 200 100 Total Total How many would we expect to recover in the Prozan group? Of the total sample of 500, 350 recovered so P(R) = 350/500 = 0.7. There are 200 in the Prozan sample, so E11 = 0.7 × 200 = 140. Formula: Eij = [Cj/n) × Ri where Cj is the total in the j-th column and Ri is the total in the i-th row, while n is the size of the total sample. Medication Recovered Didn’t /Outcome Recover Prozan O11: 150 O12: 50 E11: 140 E12: 60 O22: 100 Drug X O21: 200 E21: 210 E22: 90 Total 350 150 r k χ 2 = ∑∑ i =1 j =1 (Oij − Eij ) 2 Eij Total 200 300 500 10 2 10 2 10 2 10 2 = + + + = 3.9683 140 210 60 90 Major/ Start Sal <40,000 Business Other 120 160 ≥ 40,000 280 340 Total Total χ 2 = r ∑ i = 1 k ∑ j = 1 (O ij − E E ij ij ) 2 the expected number of business grads with salary < 40,000 1) 2) 3) 4) 120 124.44 400 none of the above The squared value for “Other < 40000” is 1) 2) 3) 4) 160 6.25 0.1267 155.56 The χ2 value is 1) 2) 3) 4) 12.35 900 0.4139 none of the above Method/ Level Not Proficient Proficient Highly Proficient Sight 64 86 31 Phonics 51 78 45 Write-toRead 33 99 51 Total Total • Excel Procedures: – Enter the data as a table in Excel – Create a table in Excel showing the expected frequencies, as below – When cell B9 is copied to B9 to D11, it will create the required expected frequencies – Use chitest(B2:D4,B9:D11); the result is the p-value of the test B 9 10 11 =(B$5/$E$5)*$E2 C D Comic Relief • Achieving energy independence! • Your tax dollars at work! • The photo on the right is of the Great Howard’s Knbob Wind Turbine Generator, a joint project of the Department of Energy and the National Aeronautics and Space Administration. It cost 30 million 1980 dollars – about $75,000,000 in today’s money. During its lifetime this machine produced A. zero useable watts of power B. a cult that worshiped the machine and called themselves “Whooshies” C. significant interference with TV signals D. Several auto accidents due to brake failure as flatlanders descended from the Knob E. all of the above Testing for equality of three or more population proportions • Purpose: to avoid a series of pair-wise hypothesis tests • Very similar to tests for independence • Example: the following table gives proportions of pet ownership among samples in four towns: is the proportion the same or different in the four locations? City New York Percentage 12 own pet n= 200 Atlanta Boone 36 52 Los Angeles 39 300 200 300 • To use χ2 we must have numbers in each cell, so begin by calculating the number in each category: City/ Pet Own Don’t Own Totals New York Atlanta Boone LA Totals 0.12x200 108 = 24 176 192 104 117 353 96 183 647 200 200 300 1000 300 2 χ Tests for Goodness of Fit: • General Notion: We often wish to know whether a particular distribution fits a general definition • Example: To use t tests, we must suppose that the population is normally distributed • If a sample is drawn from, say, a normal distribution, the sample values should be reflect the population distribution • Allows us to state the number in the sample that should be in a particular range • Example: 68% of a normal distribution is within +/- 1 standard deviation of the mean. About 68% of the values in a sample from a normal distribution should be within +/- 1 standard deviation of the mean • Comparison of actual and expected numbers is the province of the χ2 distribution • Let Oj be the number observed in the sample in range j • Let Ej be the number that would be expected if the population had a given distribution, as uniform, Poisson, normal, etc. • Then 2 k χ =∑ 2 j =1 (O j − E j ) Ej • where k is the number of categories • degrees of freedom = k – 1 – m where m is the number of parameter estimates used in the calculation • Example: Are the answers to Dr. Dinwiddie’s multiple-choice tests random? If so, the answers should conform to a uniform distribution and P(A) = P(B) = P(C) = P(D) = ¼. (For the uniform distribution P(E) = 1/n, where n is the number of possible values.) • On a recent exam there were sixty questions with correct answers: A-20, B-5, C-17, and D-18. H0: the distribution of answers is uniform H1: the distribution is not uniform Correct Answer A Observed Expected 20 15 B 5 15 C 17 15 D 18 15 k (O j − E j ) j =1 Ej χ =∑ 2 Squared Difference 2 Then χ2 = 9.207, and no parameters were estimated, so degrees of freedom = 4 – 1 = 3 • Excel and the chi-square distribution – CHIDIST(x value, df) returns the area in the right-hand tail of the chi-square distribution • goodness of fit tests are all upper one-tail tests, so chidist gives the p-value of the test – CHIINV(probability, df) gives the chi-square value for the upper tail of the probability entered • use to find the critical value for a chi-square test • For the Dinwiddie problem: CHIDIST(9.207, 3) gives the p-value of the test • EXAMPLE: Hamish suspects that the dice at Black Bart’s are not fair, so he spirits one out of the casino one night. After rolling the stolen die 120 times, he has the following result: No. of Dots 1 No. of Times 27 2 3 4 5 6 24 18 11 27 13 No. of Dots No. of Times 1 27 2 3 4 5 6 24 18 11 27 13 Calculate the appropriate statistic. • Testing for normality – suppose that nationally auto insurance has a mean price of $700 with standard deviation $135. We have a sample of 80 NC drivers, and we’d like to know whether their insurance bills are normally distributed with the national parameters. – how many would we expect in the range 700 to 835? – HINT: how many standard deviations? What proportion are within that range of standard deviations? • answer: on a normal distribution, 0.34 are between the mean and +1 st dev, so we’d expect to find 0.34 * 80 = 27.2 in that range • Setting up a spreadsheet: use normsdist • Continuing in that fashion, we’d have the following St Devs < -2 Range Prop. < 430 0.02275 Expected freq 1.82 -2 to -1 430-565 0.1359 10.87 -1 to 0 565-700 0.3413 27.31 0 to 1 700-835 0.3413 27.31 1 to 2 835-970 0.1359 10.87 >2 > 970 1.82 0.02275 • To find the observed values in the sample, use the HISTOGRAM tool • An elaborated solution appears under “Study Aids” on my web site. Click on the link to normaltest.xls • Issue: how many degrees of freedom does the χ2 statistic have? – df = k – 1 – m = 6 – 1 – 0 = 5 • Alternate technique: determine whether the sample was drawn from a normal population • First, calculate sample mean and standard deviation and use those numbers in the calculation • Issue: how many degrees of freedom does the χ2 statistic have? – df = k – 1 – m = 6 – 1 – 2 = 3 • Testing for conformity to an observed distribution: – The national distribution of pets is as follows: Number of Pets Percentage of Households 0 55 1 25 2 10 3 5 4 3 5 or more 2 A marketing company wants to know whether Boone conforms to the national pattern. In a sample of 300 Boone households, they found the following: No. of Pets No. of Households 0 128 1 75 2 50 3 20 4 18 5 or more 9 k (O j − E j ) j =1 Ej χ =∑ 2 Expected No. 2 Squares