Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Øving 10 STAT111 Sondre Hølleland Auditorium π 4 25. April 2016 Oppgaver • Section 13.1: Section : 7, 10 • Section 13.2: 12,13 • Section 13.3: 29 Oppgaveteksten følger under. 13.1 Goodness-of-Fit Tests When Category Probabilities Are Completely Specified plant ovary). The article “A Genetic and Biochemical Study on Pericarp Pigments in a Cross Between Two Cultivars of Grain Sorghum, Sorghum Bicolor” (Heredity, 1976: 413–416) reports on an experiment that involved an initial cross between CK60 sorghum (an American variety with white seeds) and Abu Taima (an Ethiopian variety with yellow seeds) to produce plants with red seeds and then a self-cross of the red-seeded plants. According to genetic theory, this F2 cross should produce plants with red, yellow, or white seeds in the ratio 9:3:4. The data from the experiment follows; does the data confirm or contradict the genetic theory? Test at level .05 using the P-value approach. Seed Color Observed Frequency Winter 328 Spring 334 a. If you had observed X1, X2, . . . , Xn and wanted to use the chi-squared test with five class intervals having equal probability under H0, what would be the resulting class intervals? b. Carry out the chi-squared test using the following data resulting from a random sample of 40 response times: .10 .99 1.14 1.26 3.24 .12 .26 .80 .79 1.16 1.76 .41 .59 .27 2.22 .66 .71 2.21 .68 .43 .11 .46 .69 .38 .91 .55 .81 2.51 2.77 .16 1.11 .02 2.13 .19 1.21 1.13 2.93 2.14 .34 .44 10. a. Show that another expression for the chisquared statistic is Red Yellow White 195 73 100 7. Criminologists have long debated whether there is a relationship between weather conditions and the incidence of violent crime. The author of the article “Is There a Season for Homicide?” (Criminology, 1988: 287–296) classified 1361 homicides according to season, resulting in the accompanying data. Test the null hypothesis of equal proportions using a ¼ .01 by using the chi-squared table to say as much as possible about the P-value. Summer 372 Fall 327 8. The article “Psychiatric and Alcoholic Admissions Do Not Occur Disproportionately Close to Patients’ Birthdays” (Psych. Rep., 1992: 944–946) focuses on the existence of any relationship between date of patient admission for treatment of alcoholism and patient’s birthday. Assuming a 365day year (i.e., excluding leap year), in the absence of any relation, a patient’s admission date is equally likely to be any one of the 365 possible days. The investigators established four different admission categories: (1) within 7 days of birthday, (2) between 8 and 30 days, inclusive, from the birthday, (3) between 31 and 90 days, inclusive, from the birthday, and (4) more than 90 days from the birthday. A sample of 200 patients gave observed frequencies of 11, 24, 69, and 96 for categories 1, 2, 3, and 4, respectively. State and test the relevant hypotheses using a significance level of .01. 9. The response time of a computer system to a request for a certain type of information is hypothesized to have an exponential distribution with parameter l ¼ 1 [so if X ¼ response time, the pdf of X under H0 is f0(x) ¼ e–x for x 0]. 731 w2 ¼ k X Ni2 n npi0 i¼1 Why is it more efficient to compute w2 using this formula? b. When the null hypothesis is H0: p1 ¼ p2 ¼ ¼ pk ¼ 1/k (i.e., pi0 ¼ 1/k for all i), how does the formula of part (a) simplify? Use the simplified expression to calculate w2 for the pigeon/direction data in Exercise 4. 11. a. Having obtained a random sample from a population, you wish to use a chi-squared test to decide whether the population distribution is standard normal. If you base the test on six class intervals having equal probability under H0, what should the class intervals be? b. If you wish to use a chi-squared test to test H0: the population distribution is normal with m ¼ .5, s ¼ .002 and the test is to be based on six equiprobable (under H0) class intervals, what should these intervals be? c. Use the chi-squared test with the intervals of part (b) to decide, based on the following 45 bolt diameters, whether bolt diameter is a normally distributed variable with m ¼ .5 in., s ¼ .002 in. .4974 .4994 .5017 .4972 .4990 .4992 .5021 .5006 .4976 .5010 .4984 .5047 .4974 .5007 .4959 .4987 .4991 .4997 .4967 .5069 .5008 .4975 .5015 .4968 .5014 .4993 .5028 .4977 .5000 .4998 .5012 .5008 .5013 .4975 .4961 .4967 .5000 .5056 .4993 .5000 .5013 .4987 .4977 .5008 .4991 742 CHAPTER 13 Goodness-of-Fit Tests and Categorical Data Analysis Exercises Section 13.2 (12–22) 12. Consider a large population of families in which each family has exactly three children. If the genders of the three children in any family are independent of one another, the number of male children in a randomly selected family will have a binomial distribution based on three trials. a. Suppose a random sample of 160 families yields the following results. Test the relevant hypotheses by proceeding as in Example 13.5. Number of Male Children 0 1 2 3 Frequency 14 66 64 16 b. Suppose a random sample of families in a nonhuman population resulted in observed frequencies of 15, 20, 12, and 3, respectively. Would the chi-squared test be based on the same number of degrees of freedom as the test in part (a)? Explain. 13. A study of sterility in the fruit fly (“Hybrid Dysgenesis in Drosophila melanogaster: The Biology of Female and Male Sterility,” Genetics, 1979: 161–174) reports the following data on the number of ovaries developed for each female fly in a sample of size 1,388. One model for unilateral sterility states that each ovary develops with some probability p independently of the other ovary. Test the fit of this model using w2. x ¼ Number of Ovaries Developed Observed Count 0 1 2 1212 118 58 14. The article “Feeding Ecology of the Red-Eyed Vireo and Associated Foliage-Gleaning Birds” (Ecol. Monogr., 1971: 129–152) presents the accompanying data on the variable X ¼ the number of hops before the first flight and preceded by a flight. The author then proposed and fit a geometric probability distribution [p(x) ¼ P(X ¼ x) ¼ px–1 · q for x ¼ 1, 2, . . ., where q ¼ 1 – p] to the data. The total sample size was n ¼ 130. x 1 Number of Times x Observed 48 31 20 9 6 5 4 2 1 1 2 3 4 5 6 7 8 9 10 11 12 2 1 a. The likelihood is ðpx1 1 qÞ ðpxn 1 qÞ ¼ pSxi nP qn . ShowPthat the mle of p is given by p^ ¼ ð xi nÞ= xi , and compute p^ for the given data. b. Estimate the expected cell counts using p^ of part (a) [expected cell counts ¼ n p^x1 q^ for x ¼ 1, 2, . . . ], and test the fit of the model using a w2 test by combining the counts for x ¼ 7, 8, . . ., and 12 into one cell (x 7). 15. A certain type of flashlight is sold with the four batteries included. A random sample of 150 flashlights is obtained, and the number of defective batteries in each is determined, resulting in the following data: Number Defective 0 1 2 3 4 Frequency 26 51 47 16 10 Let X be the number of defective batteries in a randomly selected flashlight. Test the null hypothesis that the distribution of X is Bin(4, y). That is, with pi ¼ P(i defectives), test 4 i H0 : pi ¼ y ð1 yÞ4i i ¼ 0; 1; 2; 3; 4 i [Hint: To obtain the mle of y, write the likelihood (the function to be maximized) as y u(1 – y)v, where the exponents u and v are linear functions of the cell counts. Then take the natural log, differentiate with respect to y, equate the result to 0, and solve for ^y.] 16. In a genetics experiment, investigators looked at 300 chromosomes of a particular type and counted the number of sister-chromatid exchanges on each (“On the Nature of SisterChromatid Exchanges in 5-BromodeoxyuridineSubstituted Chromosomes,” Genetics, 1979: 1251–1264). A Poisson model was hypothesized for the distribution of the number of exchanges. Test the fit of a Poisson distribution to the data by first estimating l and then combining the counts for x ¼ 8 and x ¼ 9 into one cell. x ¼ Number of Exchanges 0 1 Observed Counts 6 24 42 59 62 44 41 14 6 2 2 3 4 5 6 7 8 9 17. An article in Annals of Mathematical Statistics reports the following data on the number of borers in each of 120 groups of borers. Does the Poisson pmf provide a plausible model for the distribution of the number of borers in a group? [Hint: Add the frequencies for 7, 8, . . ., 12 to establish a single category “ 7.”] 13.3 Two-Way Contingency Tables the number of degrees of freedom for the chisquared statistic. Sex Combination Male Genotype 1 2 3 4 5 6 753 M/M M/F F/F 35 41 33 8 5 30 80 84 87 26 11 65 39 45 31 8 6 20 29. Each individual in a random sample of high school and college students was cross-classified with respect to both political views and marijuana usage, resulting in the data displayed in the accompanying two-way table (“Attitudes About Marijuana and Political Views,” Psych. Rep., 1973: 1,051–1,054). Does the data support the hypothesis that political views and marijuana usage level are independent within the population? Test the appropriate hypotheses using level of significance .01. 30. Show that the chi-squared statistic for the test of independence can be written in the form ! I X J X Nij2 2 w ¼ n E^ij 32. Suppose that in a particular state consisting of four distinct regions, a random sample of nk voters is obtained from the kth region for k ¼ 1, 2, 3, 4. Each voter is then classified according to which candidate (1, 2, or 3) he or she prefers and according to voter registration (1 ¼ Dem., 2 ¼ Rep., 3 ¼ Indep.). Let pijk denote the proportion of voters in region k who belong in candidate category i and registration category j. The null hypothesis of homogeneous regions is H0: pij1 ¼ pij2 ¼ pij3 ¼ pij4 for all i, j (i.e., the proportion within each candidate/registration combination is the same for all four regions). Assuming that H0 is true, determine p^ijk and e^ijk as functions of the observed nijk’s, and use the general rule of thumb to obtain the number of degrees of freedom for the chi-squared test. 33. Consider the accompanying 2 3 table displaying the sample proportions that fell in the various combinations of categories (e.g., 13% of those in the sample were in the first category of both factors). a. Suppose the sample consisted of n ¼ 100 people. Use the chi-squared test for independence with significance level .10. b. Repeat part (a) assuming that the sample size was n ¼ 1000. c. What is the smallest sample size n for which these observed proportions would result in rejection of the independence hypothesis? i¼1 j¼1 Why is this formula more efficient computationally than the defining formula for w2? 31. Suppose that in Exercise 29 each student had been categorized with respect to political views, marijuana usage, and religious preference, with the categories of this latter factor being Protestant, Catholic, and other. The data could be displayed in three different two-way tables, one corresponding to each category of the third factor. With pijk ¼ P(political category i, marijuana category j, and religious category k), the null hypothesis of independence of all three factors states that pijk ¼ pi·· p·j· p··k Let nijk denote the observed frequency in cell (i, j, k). Show how to estimate the expected cell counts assuming that H0 is true (^ eijk ¼ n^ pijk , so the p^ijk ’s must be determined). Then use the general rule of thumb to determine 34. Use logistic regression to test the relationship between leaf removal and fruit growth in Exercise 25. Compare the P-value with what was found in Exercise 25. (Remember that w21 ¼ z2 .) Explain why you expected the logistic regression to give a smaller P-value. 35. A random sample of 100 faculty at a university gives the results shown below for professorial rank versus gender. a. Test for a relationship at the 5% level using a chi-squared statistic. b. Test for a relationship at the 5% level using logistic regression.