Download Øving 10 - Mitt UiB

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Øving 10
STAT111
Sondre Hølleland
Auditorium π 4
25. April 2016
Oppgaver
• Section 13.1: Section : 7, 10
• Section 13.2: 12,13
• Section 13.3: 29
Oppgaveteksten følger under.
13.1 Goodness-of-Fit Tests When Category Probabilities Are Completely Specified
plant ovary). The article “A Genetic and Biochemical Study on Pericarp Pigments in a Cross
Between Two Cultivars of Grain Sorghum, Sorghum Bicolor” (Heredity, 1976: 413–416) reports
on an experiment that involved an initial cross
between CK60 sorghum (an American variety
with white seeds) and Abu Taima (an Ethiopian
variety with yellow seeds) to produce plants with
red seeds and then a self-cross of the red-seeded
plants. According to genetic theory, this F2 cross
should produce plants with red, yellow, or white
seeds in the ratio 9:3:4. The data from
the experiment follows; does the data confirm or
contradict the genetic theory? Test at level .05
using the P-value approach.
Seed Color
Observed Frequency
Winter
328
Spring
334
a. If you had observed X1, X2, . . . , Xn and wanted
to use the chi-squared test with five class intervals having equal probability under H0, what
would be the resulting class intervals?
b. Carry out the chi-squared test using the following data resulting from a random sample of 40
response times:
.10 .99 1.14 1.26 3.24 .12 .26 .80
.79 1.16 1.76 .41 .59 .27 2.22 .66
.71 2.21 .68 .43 .11 .46 .69 .38
.91 .55 .81 2.51 2.77 .16 1.11 .02
2.13 .19 1.21 1.13 2.93 2.14 .34 .44
10. a. Show that another expression for the chisquared statistic is
Red Yellow White
195
73
100
7. Criminologists have long debated whether there is
a relationship between weather conditions and the
incidence of violent crime. The author of the article “Is There a Season for Homicide?” (Criminology, 1988: 287–296) classified 1361 homicides
according to season, resulting in the accompanying data. Test the null hypothesis of equal proportions using a ¼ .01 by using the chi-squared table
to say as much as possible about the P-value.
Summer
372
Fall
327
8. The article “Psychiatric and Alcoholic Admissions
Do Not Occur Disproportionately Close to
Patients’ Birthdays” (Psych. Rep., 1992: 944–946)
focuses on the existence of any relationship
between date of patient admission for treatment of
alcoholism and patient’s birthday. Assuming a 365day year (i.e., excluding leap year), in the absence
of any relation, a patient’s admission date is equally
likely to be any one of the 365 possible days. The
investigators established four different admission
categories: (1) within 7 days of birthday, (2)
between 8 and 30 days, inclusive, from the birthday, (3) between 31 and 90 days, inclusive, from
the birthday, and (4) more than 90 days from the
birthday. A sample of 200 patients gave observed
frequencies of 11, 24, 69, and 96 for categories 1, 2,
3, and 4, respectively. State and test the relevant
hypotheses using a significance level of .01.
9. The response time of a computer system to a
request for a certain type of information is
hypothesized to have an exponential distribution
with parameter l ¼ 1 [so if X ¼ response time,
the pdf of X under H0 is f0(x) ¼ e–x for x 0].
731
w2 ¼
k
X
Ni2
n
npi0
i¼1
Why is it more efficient to compute w2 using
this formula?
b. When the null hypothesis is H0: p1 ¼ p2 ¼ ¼ pk ¼ 1/k (i.e., pi0 ¼ 1/k for all i), how does
the formula of part (a) simplify? Use the simplified expression to calculate w2 for the
pigeon/direction data in Exercise 4.
11. a. Having obtained a random sample from a
population, you wish to use a chi-squared
test to decide whether the population distribution is standard normal. If you base the
test on six class intervals having equal probability under H0, what should the class
intervals be?
b. If you wish to use a chi-squared test to test H0:
the population distribution is normal with
m ¼ .5, s ¼ .002 and the test is to be based
on six equiprobable (under H0) class intervals,
what should these intervals be?
c. Use the chi-squared test with the intervals of
part (b) to decide, based on the following 45
bolt diameters, whether bolt diameter is a normally distributed variable with m ¼ .5 in., s
¼ .002 in.
.4974
.4994
.5017
.4972
.4990
.4992
.5021
.5006
.4976
.5010
.4984
.5047
.4974
.5007
.4959
.4987
.4991
.4997
.4967
.5069
.5008
.4975
.5015
.4968
.5014
.4993
.5028
.4977
.5000
.4998
.5012
.5008
.5013
.4975
.4961
.4967
.5000
.5056
.4993
.5000
.5013
.4987
.4977
.5008
.4991
742
CHAPTER
13
Goodness-of-Fit Tests and Categorical Data Analysis
Exercises Section 13.2 (12–22)
12. Consider a large population of families in which
each family has exactly three children. If the
genders of the three children in any family are
independent of one another, the number of male
children in a randomly selected family will have
a binomial distribution based on three trials.
a. Suppose a random sample of 160 families
yields the following results. Test the relevant
hypotheses by proceeding as in Example 13.5.
Number of Male Children
0
1
2
3
Frequency
14
66
64
16
b. Suppose a random sample of families in a
nonhuman population resulted in observed
frequencies of 15, 20, 12, and 3, respectively.
Would the chi-squared test be based on the
same number of degrees of freedom as the test
in part (a)? Explain.
13.
A study of sterility in the fruit fly (“Hybrid
Dysgenesis in Drosophila melanogaster: The
Biology of Female and Male Sterility,” Genetics,
1979: 161–174) reports the following data on the
number of ovaries developed for each female fly
in a sample of size 1,388. One model for unilateral sterility states that each ovary develops with
some probability p independently of the other
ovary. Test the fit of this model using w2.
x ¼ Number of
Ovaries Developed
Observed Count
0
1
2
1212
118
58
14. The article “Feeding Ecology of the Red-Eyed
Vireo and Associated Foliage-Gleaning Birds”
(Ecol. Monogr., 1971: 129–152) presents the
accompanying data on the variable X ¼ the number of hops before the first flight and preceded by
a flight. The author then proposed and fit a geometric probability distribution [p(x) ¼ P(X ¼ x)
¼ px–1 · q for x ¼ 1, 2, . . ., where q ¼ 1 – p] to
the data. The total sample size was n ¼ 130.
x
1
Number
of Times x
Observed
48 31 20 9 6 5 4 2 1 1
2
3 4 5 6 7 8 9 10 11 12
2
1
a. The likelihood is ðpx1 1 qÞ ðpxn 1 qÞ ¼
pSxi nP qn . ShowPthat the mle of p is given by
p^ ¼ ð xi nÞ= xi , and compute p^ for the
given data.
b. Estimate the expected cell counts using p^ of
part (a) [expected cell counts ¼ n p^x1 q^ for
x ¼ 1, 2, . . . ], and test the fit of the model
using a w2 test by combining the counts for
x ¼ 7, 8, . . ., and 12 into one cell (x 7).
15. A certain type of flashlight is sold with the four
batteries included. A random sample of 150
flashlights is obtained, and the number of defective batteries in each is determined, resulting in
the following data:
Number Defective
0
1
2
3
4
Frequency
26
51
47
16
10
Let X be the number of defective batteries in a
randomly selected flashlight. Test the null hypothesis that the distribution of X is Bin(4, y). That
is, with pi ¼ P(i defectives), test
4 i
H0 : pi ¼
y ð1 yÞ4i
i ¼ 0; 1; 2; 3; 4
i
[Hint: To obtain the mle of y, write the likelihood
(the function to be maximized) as y u(1 – y)v,
where the exponents u and v are linear functions
of the cell counts. Then take the natural log,
differentiate with respect to y, equate the result
to 0, and solve for ^y.]
16. In a genetics experiment, investigators looked at
300 chromosomes of a particular type and
counted the number of sister-chromatid
exchanges on each (“On the Nature of SisterChromatid Exchanges in 5-BromodeoxyuridineSubstituted Chromosomes,” Genetics, 1979:
1251–1264). A Poisson model was hypothesized
for the distribution of the number of exchanges.
Test the fit of a Poisson distribution to the data by
first estimating l and then combining the counts
for x ¼ 8 and x ¼ 9 into one cell.
x ¼ Number
of Exchanges
0
1
Observed
Counts
6
24 42 59 62 44 41 14 6 2
2
3
4
5
6
7 8 9
17. An article in Annals of Mathematical Statistics
reports the following data on the number of
borers in each of 120 groups of borers. Does
the Poisson pmf provide a plausible model for
the distribution of the number of borers in a
group? [Hint: Add the frequencies for 7, 8, . . .,
12 to establish a single category “ 7.”]
13.3 Two-Way Contingency Tables
the number of degrees of freedom for the chisquared statistic.
Sex Combination
Male
Genotype
1
2
3
4
5
6
753
M/M
M/F
F/F
35
41
33
8
5
30
80
84
87
26
11
65
39
45
31
8
6
20
29. Each individual in a random sample of high school
and college students was cross-classified with
respect to both political views and marijuana
usage, resulting in the data displayed in the accompanying two-way table (“Attitudes About Marijuana and Political Views,” Psych. Rep., 1973:
1,051–1,054). Does the data support the hypothesis
that political views and marijuana usage level are
independent within the population? Test the appropriate hypotheses using level of significance .01.
30. Show that the chi-squared statistic for the test of
independence can be written in the form
!
I X
J
X
Nij2
2
w ¼
n
E^ij
32. Suppose that in a particular state consisting of four
distinct regions, a random sample of nk voters is
obtained from the kth region for k ¼ 1, 2, 3, 4.
Each voter is then classified according to which
candidate (1, 2, or 3) he or she prefers and according to voter registration (1 ¼ Dem., 2 ¼ Rep.,
3 ¼ Indep.). Let pijk denote the proportion of
voters in region k who belong in candidate category i and registration category j. The null hypothesis of homogeneous regions is H0:
pij1 ¼ pij2 ¼ pij3 ¼ pij4 for all i, j (i.e., the proportion within each candidate/registration combination is the same for all four regions). Assuming
that H0 is true, determine p^ijk and e^ijk as functions
of the observed nijk’s, and use the general rule of
thumb to obtain the number of degrees of freedom
for the chi-squared test.
33. Consider the accompanying 2 3 table displaying
the sample proportions that fell in the various combinations of categories (e.g., 13% of those in the
sample were in the first category of both factors).
a. Suppose the sample consisted of n ¼ 100 people. Use the chi-squared test for independence
with significance level .10.
b. Repeat part (a) assuming that the sample size
was n ¼ 1000.
c. What is the smallest sample size n for which
these observed proportions would result in
rejection of the independence hypothesis?
i¼1 j¼1
Why is this formula more efficient computationally than the defining formula for w2?
31. Suppose that in Exercise 29 each student had
been categorized with respect to political views,
marijuana usage, and religious preference, with
the categories of this latter factor being Protestant, Catholic, and other. The data could be displayed in three different two-way tables, one
corresponding to each category of the third factor.
With pijk ¼ P(political category i, marijuana category j, and religious category k), the null hypothesis of independence of all three factors states that
pijk ¼ pi·· p·j· p··k Let nijk denote the observed
frequency in cell (i, j, k). Show how to estimate
the expected cell counts assuming that H0 is true
(^
eijk ¼ n^
pijk , so the p^ijk ’s must be determined).
Then use the general rule of thumb to determine
34. Use logistic regression to test the relationship
between leaf removal and fruit growth in Exercise 25. Compare the P-value with what was
found in Exercise 25. (Remember that w21 ¼ z2 .)
Explain why you expected the logistic regression
to give a smaller P-value.
35. A random sample of 100 faculty at a university
gives the results shown below for professorial
rank versus gender.
a. Test for a relationship at the 5% level using a
chi-squared statistic.
b. Test for a relationship at the 5% level using
logistic regression.