Download solutions - Berkeley Statistics - University of California, Berkeley

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
University of California, Berkeley, Statistics 135: Concepts of Statistics
Michael Lugo, Summer 2011
Midterm exam solutions
1. [14 points: 3 + 5 + 2 + 4] Consider a population consisting of five values: 1, 2, 4, 4, 7.
(a) Find the exact distribution of the sample mean X̄ for a simple random sample (withoutut replacement) of size 3 from this population.
Answer. Possible samples are
(1, 2, 4), (1, 2, 40 ), (1, 2, 7), (1, 4, 40 ), (1, 4, 7),
(1, 40 , 7), (2, 4, 40 ), (2, 4, 7), (2, 40 , 7), (4, 40 , 7).
The samples have sum 7, 7, 10, 9, 12, 12, 10, 13, 13, 15. The distribution of the sample
mean is therefore
P (X̄ = 7/3) = P (X̄ = 10/3) = P (X̄ = 4) = P (X̄ = 13/3) = .2, P (X̄ = 3) = P (X̄ = 5) = .1.
(b) What are the expectation and the standard deviation of X̄?
Answer. Two solutions are possible. The brute-force solution is to find
E(X̄) = (.2)(7/3) + (.2)(10/3) + (.2)(4) + (.2)(13/3) + (.1)(3) + (.1)(5) = 3.6
and
E(X̄ 2 ) = (.2)(7/3)2 + (.2)(10/3)2 + (.2)(4)2 + (.2)(13/3)2 + (.1)(3)2 + (.1)(5)2 = 13.66667
so the standard deviation is
q
SD(X̄) = E(X̄ 2 ) − E(X̄) = 0.8406.
Alternatively, we can compute E(X) = (1 +√
2 + 4 + 4 + 7)/5 = 18/5 = 3.6 and E(X 2 ) =
(12 + 22 + 22 + 42 + 72 )/5 = 17.2; so SD(X) = 4.24. Then E(X̄) = E(X) (by linearity, or
by the fact that X̄ is an unbiased estimator of E(X)). And
V ar(X̄) =
V ar(X) N − n
n
N −1
where n is the sample size and Npis the population size. So n = 3, N = 5 giving V ar(X̄) =
V ar(X)/6 = .706667; SD(X̄) = V ar(X) = 0.8406.
(c) What is the sampling distribution of the sample median of a simple random sample
of size 3?
Answer. The sample median of three numbers is just the middle number. Looking at
our list of samples, we see three 2s and seven 4s; so P (η̂ = 2) = .3, P (η̂ = 4) = .7.
1
(d) Is the sample median an unbiased estimator of the population median? Why or why
not?
Answer. The sample median has expectation (2)(0.3)+(4)(0.7) = 3.4 but the population
median is 4. Since these are not equal the sample median is not an unbiased estimator of
the population median.
Comments on this question. A lot of people didn’t bother to write out the exact
distribution in part (a) but went straight on to computing the mean and variance. Also, the
variance and the standard deviation are not the same thing; this gave people trouble in (b).
14 points possible. Mean was 10.7, SD 3.5. Quartiles 9, 12, 14. 12 perfect scores out of
39; 34 got at least half of the possible points.
2. [14 points: 2 + 3 + 3 + 6] There are two types of people in the world, Berkeley students
and Stanford students. We would like to estimate the average difference in intelligence of
Berkeley students and Stanford students, by administering an intelligence test. The mean
intelligence of Berkeley students is µB and the mean intelligence of Stanford students is
µS . We believe from previous experiments that the standard deviation of the intelligence of
Berkeley students is σ and the standard deviation of the intelligence of Stanford students is
2σ.
Now, Stanford students tend to be richer than Berkeley students, so we have to pay them
more to take our test. Assume that a Berkeley student will be willing to take our intelligence
test for ten dollars and a Stanford student will be willing to take our intelligence test for
forty dollars. We have ten thousand dollars to spend on paying students to be tested.
(a) Say we test nS Stanford students. How many Berkeley students can we afford to test?
(Call this number nB .)
Answer. If we test nS Stanford students we have 10000 − 40nS dollars left and can
afford to test 1000 − 4nS Berkeley students.
(b) Let B̄ be the mean score of the Berkeley students that we test and let S̄ be the mean
score of the Stanford students that we test. Show that B̄ − S̄ is an unbiased estimator of
µB − µS .
Answer. We know that B̄ is an unbiased estimator of the population mean of B, and
similarly for S̄ and S. So E(B̄ − S̄) = E(B̄) − E(S̄) = µB − µS .
(c) What is the variance of B̄ − S̄, as a function of nS ? (Ignore the finite population
correction.)
Answer.
4σ 2
σ2
+
.
V ar(B̄ − S̄) = V ar(B̄) + V ar(S̄) =
1000 − 4nS
nS
(d) For which nS is this variance smallest? What is the corresponding value of nB ? What
is the minimal variance? (Ignore the fact that your formula might not give an integer.)
Answer. Differentiating with respect to nS gives
4
4
2
σ
−
.
(1000 − 4nS )2 n2S
2
this is zero when 1000−4nS = nS , i. e. when nS = 200. Thus we should test 200 Stanford
students and 200 Berkeley students. We get V ar(B̄ − S̄) = σ 2 /200 + 4σ 2 /200 = σ 2 /40.
Comments on this question. Some people overthought (a) and (b), perhaps because
there was a lot of space to do them in. (d) is really a calculus problem.
14 points possible. Mean was 8.5, SD 4.0. Quartiles 5, 9, 11. 6 perfect scores out of 39;
26 got at least half the possible points.
3. [12 points: 2 + 2 + 2 + 2 + 4] On the following page are four Q − Q plots
with unlabeled axes. In each case X1 , . . . , X1000 are a sample from some distribution;
Y1 , . . . , Y1000 are a sample form some other distribution; and we have plotted the points
(X(1) , Y(1) ), (X(2) , Y(2) ), . . . , (X(1000) , Y(1000) ).
We write normal(µ, σ 2 ) for the normal distribution with mean µ and variance σ 2 .
We write gamma(α, λ) for the gamma distribution with shape parameter α and scale
parameter (rate) λ.
For each pair circle the number of the plot that it corresponds to.
(a) X from normal(2,1) distribution, Y from a gamma(4,2) distribution
plot 1
plot 2
plot 3
plot 4
(b) X from continuous uniform(-1,1) distribution, Y from a triangular distribution, which
has density
 √
√
(1/ 2) − x/2 if 0 ≤ x ≤ 2

√
√
f (x) = (1/ 2) + x/2 if − 2 ≤ x ≤ 0

√

0
if |x| > 2.
plot 1
plot 2
plot 3
plot 4
(c) X from exponential(1) distribution, Y from uniform(0,1) distribution
plot 1
plot 2
plot 3
plot 4
(d) X from normal(1,4) distribution, Y from normal(2,9) distribution.
plot 1
plot 2
plot 3
plot 4
You may explain your answers here. For grading purposes we will only take into account
answers corresponding to plots that you’ve identified incorrectly, so if you give correct answers
and incorrect or no explanation you’ll still get full credit.
Answers: 3, 1, 2, 4. There are multiple ways to see this. One solution is as follows: first,
the two distributions in (d) differ from each other by a linear function, so their probability
plot should be a straight line. This is plot 4. In (a), the normal distribution has a shorter
left tail and longer right tail than the gamma distribution; this right-skewness corresponds
to a convex probability plot like plot 3. Both of the distributions in (b) are symmetric, so
the probability plot should be symmetric; this is plot (1). This leaves only plot 2.
3
(e) One of the four plots on the following page is very close to being a straight line. What
is the equation of that straight line, in the form y = mx + b?
Answer: Plot 4, which corresponds to (d), is the closest to being a straight line. A
typical point on this line will have x-coordinate given by the pth quantile of normal(1, 4)
and y-coordinate the pth quantile of normal(2, 9); thus it will be (1 + 2Φ−1 (p), 2 + 3Φ−1 (p)).
The line therefore passes through (1, 2) and has slope 3/2, so it’s y = 1.5x + 0.5.
Comments. More people seemed to get (d) right than any other. A lot of people
confused (a) and (c), perhaps because if you interchange X and Y they do have quite similarlooking plots. In part (e), a surprising number of people answered y = x; this is only true for
Q-Q plots where both samples come from the same distribution. A common error in (e) was
to make the slope 9/4, the ratio of the variances, instead of 3/2, the ratio of the standard
deviations.
12 points possible. Mean was 5.9, SD 3.5. Quartiles 3, 5, 8. 4 perfect scores out of 39;
19 got at least half the possible points.
Rather surprisingly, the distribution of the number of scores on this problem was very
close to being uniform. Also, the correlation of the scores on this problem with the scores on
the remainder of the exam was 0.02; that is, how you did on this problem is almost unrelated
to how you did on the rest of the exam. I suspect this is because this problem is largely
graphical and it was the only such problem.
4
Plots for question 3.
plot 1
plot 2
plot 3
plot 4
5
4. [12 points: 4 + 4 + 4] Suppose that X is an exponential random variable with rate θ,
which is unknown to us. That is, f (x|θ) = θe−θx .
The prior distribution of Θ is a gamma distribution with its parameters chosen to have
mean 2 and variance 1.
We then make four independent observations taken from the distribution of X; they are
1.2, 3.2, 0.8, 1.6.
(a) What are the parameters α, λ of the prior distribution?
Answer. The gamma(α, λ) distribution has mean α/λ and variance α/λ2 . Thus we
have 2 = α/λ and 1 = α/λ2 , so λ = 2, α = 4.
(b) What is the likelihood of the observed data, as a function of θ?
Answer. The likelihood is
f (1.2, 3.2, 0.8, 1.6|θ) = (θe−1.2θ )(θe−3.2θ )(θe−0.8θ )(θe−1.6θ ) = θ4 e−6.8θ .
(c) What is the posterior density of θ? You may give the density explicitly or answer
with a well-known named distribution.
If you could not answer (a), use the values α = 3, λ = 8 (which are incorrect) in answering
this question.
Answer. The prior gamma(4, 2) distribution has density proportional to θ4−1 e−2θ . The
posterior density is therefore proportional to
θ4−1 e−2θ × θ4 e−6.8θ = θ8−1 e−8.8θ .
Therefore the posterior density is the gamma(8, 8.8) density.
Comments: A lot of people got (a) and (b) but couldn’t put them together to get (c),
or tried to write out the computations explicitly and got bogged down by the notation.
12 points possible. Mean was 7.2, SD 4.3. Quartiles 4, 9, 10.5. 7 perfect scores out of
39; 25 got at least half the possible points.
5. [19 points: 4 + 5 + 5 + 5]Let X1 , X2 , . . . , Xn be iid random variables with density
αx
on the interval 0 ≤ x ≤ 1, with α unknown.
(a) What is the method of moments estimate of α?R
1
Answer. X has density αxα−1 and so E(X) = 0 αxα dx = α/(α + 1). The MOM
estimator comes from solving X̄ = α̂/(α̂ + 1) for α̂. Solving gives α̂ = X̄/(1 − X̄).
(b) What is the maximum
likelihood estimator of α?
Q
α−1
Answer. lik(α) = i αXi . Taking logs gives
X
l(α) =
(log α + (α − 1) log Xi )
α−1
i
and differentaiting gives
0
l (α) =
X1
i
α
6
+ log Xi .
α̂ is the solution to l0 (α̂) = 0; that is,
X
n
=−
log Xi
α̂
and solving for α̂ gives
α̂ = P
−n
−n
Q .
=
log Xi
log Xi
(c) What is the asymptotic variance of the MLE?
Answer. We have the formula
2
2
∂
∂
I(α) = −E
log f (X|α) = −E
(log α + (α − 1) log X) .
∂α2
∂α2
Differentiating gives −E[−1/α2 ], or 1/α2 . The asymptotic variance is therefore α2 /n.
(d) We make 100 observations from this distribution and compute the MLE α̂ = 3.08.
Give an approximate 90 percent confidence interval for α.
Answer. From (c) we can estimate the standard error of α̂ as
α̂
sα̂ = √ .
n
With α̂ = 3.08 and n = 100 we get sα̂ = 3.08
= 0.308.. The sampling distribution of
10
the MLE is approximately normal and so the confidence interval is α̂ ± sα̂ Φ−1 (1 − 0.10/2).
Φ−1 (0.95) = 1.64 and so we get the interval [2.57, 3.59].
Comments. A lot of people wrote something like
Pn
1
i=1 Xi
α̂ = n 1 P
1 − n ni=1 Xi
for (a); this is correct but not necessary. In (c), note that the definition of Fisher information
refers to a single draw X from the distribution; in particular n should not appear in your
formula for I(α). Many people got 1/(nα2 ) for n (computing the Fisher information of n
draws) and got the variance α2 /n2 .
19 points possible. Mean was 14.6, SD 4.4. Quartiles 12.5, 16, 18. 7 perfect scores out
of 39; 34 got at least half the possible points. Probably the easiest problem on the exam; to
be honest I didn’t expect this going in.
6. [21: 7 + 6 + 4 + 4]The first sixty digits of π, after the decimal point, are
14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 74944
The number of times that this sequence contains each digit is:
3 0s, 5 1s, 6 2s, 8 3s, 7 4s, 6 5s, 4 6s, 5 7s, 6 8s, 10 9s.
Now, we know that these digits are not chosen at random. But it is believed that π is a
normal number; this means that each of the ten digits 0, 1, 2, . . . , 9 are equally likely to occur
and the digits are independent. Test this hypothesis using the following test statistics.
7
(a) The sum of the digits. Give a p-value.
Answer. We want to test the null hypothesis that the numbers are chosen uniformly
at random against the alternative hypothesis that they are not. Under the null hypothesis
that the digits are chosen uniformly at random, their sum is the sum of 60 discrete uniform
(0,9) random variables. Call such a variable X. We have E(X) = 9/2 and E(X 2 ) =
(02 + 12 + · · · + 92 )/10 = 28.5, so V ar(X)
p = 8.25. The sum therefore has expectation
(9/2)(60) = 270 and standard deviation (8.25)(60) = 22.2.
The actual sum of the digits is (3)(0) + (5)(1) + (6)(2) + · · · + (10)(9) = 296.
This corresponds to a z-score of (296 − 270)/(22.2) = 1.17 and therefore a p-value (twotailed) of 2(1 − Φ(1.17)) = 0.24, so we accept the hypothesis that π is a normal number.
(b) The number of even digits (0, 2, 4, 6, 8). Give a p-value.
Answer. There are 3 + 6 + 7 + 4 + 6 = 26 even digits and 60 − 26 = 34 odd digits.
Under the null hypothesis, the distribution of the number of even digits is Bin(60, 1/2). The
probability of observing an imbalance as least as severe as the one we see is
26 2 X 60
26 0 k=0 k
which is hard to compute with a calculator. It’s easier to compute
33 X
60
1−
/260
k
k=27
which works out to 0.36629. Alternatively, use the normal approximation: the null distribution of the number of even digits is approximately normal with mean
√ 30 and variance 15.
The probability that it’s not in the range [26.5, 33.5] is 2(1 − Φ(3.5/ 15)) ≈ 0.36616.
(c) An appropriate test statistic that has the chi-squared distribution as its approximate
null distribution.
Give the best possible range for p that can be determined from the table provided. For
example you might write 0.01 < p < 0.025.
Answer. There are two choices here. One is the generalized likelihood ratio test, which
gives the test statistic
X
Oi
2
Oi log .
Ei
i
Under the null hypothesis Ei = 6 for each i; the test statistic is then
3
5
10
= 5.9285.
2 3 log + 5 log + · · · + 10 log
6
6
6
Alternatively, compute Pearson’s χ2
X (Oi − Ei )2
i
Oi
=
(3 − 6)2
(10 − 6)2
+ ··· +
= 6 (exactly).
6
6
8
In either case the null distribution is approximately χ29 . The 10th and 90th percentiles of
this distribution are 4.17 and 14.68 (from tables) and so we have .1 ≤ p ≤ .9.
(d) Hard. Do this only if time allows. Estimate the true value of p from part (c)
using a normal approximation to the χ2 .
Answer. χ29 has mean 9 and variance 18. If we substitute the normal with the same
mean and variance we find
5.9285 − 9
√
= 1 − Φ(−.724) = 0.765.
P (N (9, 18) > 5.9285) = 1 − Φ
18
In fact P (χ29 > 5.9285) = .747.
Comments. Some people gave one-tailed p-values in (a) and (b); since either a very low
or very high sum of digits or number of even digits would be evidence that π is not normal,
a two-tailed test is appropriate. In (b), it’s also possible to do a χ2 -test for goodness of fit
with 1 df, but you can’t get a p-value at all that way from the table provided. (d) is not so
much “hard” as reliant on the somewhat obscure fact that the mean and variance of a χ2k
are k and 2k, but this is a fact we’ve used a few times.
21 points possible. Mean was 5.1, SD 4.7. Quartiles 1, 4, 8.5. 0 perfect scores out of 39
(high score was 15!); 4 got at least half the possible points.
Some general comments. The exam was difficult; I’m aware of that and you should
know that your grades will be calculated accordingly. The overall mean was 52 (out of a
possible 92) with a standard deviation of 15.5. The distribution of scores was as follows,
where for example the pair (60, 6) means that six people got between 60 and 64:
80 75 70 65 60 55 50 45 40 35 30 25 20 15 10
1 1 2 5 6 4 4 1 7 3 1 3 0 0 1
One problem I noticed a lot of was people forgetting which variable they were differentiating with respect to; for example in 5(c) one typical error was differentiating with respect
to X instead of with respect to α. This may be connected to expression I saw like dl(θ)/dα
in 5(b); l(θ) only really makes sense if the parameter is called θ!
9