Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
University of California, Berkeley, Statistics 135: Concepts of Statistics Michael Lugo, Summer 2011 Midterm exam solutions 1. [14 points: 3 + 5 + 2 + 4] Consider a population consisting of five values: 1, 2, 4, 4, 7. (a) Find the exact distribution of the sample mean X̄ for a simple random sample (withoutut replacement) of size 3 from this population. Answer. Possible samples are (1, 2, 4), (1, 2, 40 ), (1, 2, 7), (1, 4, 40 ), (1, 4, 7), (1, 40 , 7), (2, 4, 40 ), (2, 4, 7), (2, 40 , 7), (4, 40 , 7). The samples have sum 7, 7, 10, 9, 12, 12, 10, 13, 13, 15. The distribution of the sample mean is therefore P (X̄ = 7/3) = P (X̄ = 10/3) = P (X̄ = 4) = P (X̄ = 13/3) = .2, P (X̄ = 3) = P (X̄ = 5) = .1. (b) What are the expectation and the standard deviation of X̄? Answer. Two solutions are possible. The brute-force solution is to find E(X̄) = (.2)(7/3) + (.2)(10/3) + (.2)(4) + (.2)(13/3) + (.1)(3) + (.1)(5) = 3.6 and E(X̄ 2 ) = (.2)(7/3)2 + (.2)(10/3)2 + (.2)(4)2 + (.2)(13/3)2 + (.1)(3)2 + (.1)(5)2 = 13.66667 so the standard deviation is q SD(X̄) = E(X̄ 2 ) − E(X̄) = 0.8406. Alternatively, we can compute E(X) = (1 +√ 2 + 4 + 4 + 7)/5 = 18/5 = 3.6 and E(X 2 ) = (12 + 22 + 22 + 42 + 72 )/5 = 17.2; so SD(X) = 4.24. Then E(X̄) = E(X) (by linearity, or by the fact that X̄ is an unbiased estimator of E(X)). And V ar(X̄) = V ar(X) N − n n N −1 where n is the sample size and Npis the population size. So n = 3, N = 5 giving V ar(X̄) = V ar(X)/6 = .706667; SD(X̄) = V ar(X) = 0.8406. (c) What is the sampling distribution of the sample median of a simple random sample of size 3? Answer. The sample median of three numbers is just the middle number. Looking at our list of samples, we see three 2s and seven 4s; so P (η̂ = 2) = .3, P (η̂ = 4) = .7. 1 (d) Is the sample median an unbiased estimator of the population median? Why or why not? Answer. The sample median has expectation (2)(0.3)+(4)(0.7) = 3.4 but the population median is 4. Since these are not equal the sample median is not an unbiased estimator of the population median. Comments on this question. A lot of people didn’t bother to write out the exact distribution in part (a) but went straight on to computing the mean and variance. Also, the variance and the standard deviation are not the same thing; this gave people trouble in (b). 14 points possible. Mean was 10.7, SD 3.5. Quartiles 9, 12, 14. 12 perfect scores out of 39; 34 got at least half of the possible points. 2. [14 points: 2 + 3 + 3 + 6] There are two types of people in the world, Berkeley students and Stanford students. We would like to estimate the average difference in intelligence of Berkeley students and Stanford students, by administering an intelligence test. The mean intelligence of Berkeley students is µB and the mean intelligence of Stanford students is µS . We believe from previous experiments that the standard deviation of the intelligence of Berkeley students is σ and the standard deviation of the intelligence of Stanford students is 2σ. Now, Stanford students tend to be richer than Berkeley students, so we have to pay them more to take our test. Assume that a Berkeley student will be willing to take our intelligence test for ten dollars and a Stanford student will be willing to take our intelligence test for forty dollars. We have ten thousand dollars to spend on paying students to be tested. (a) Say we test nS Stanford students. How many Berkeley students can we afford to test? (Call this number nB .) Answer. If we test nS Stanford students we have 10000 − 40nS dollars left and can afford to test 1000 − 4nS Berkeley students. (b) Let B̄ be the mean score of the Berkeley students that we test and let S̄ be the mean score of the Stanford students that we test. Show that B̄ − S̄ is an unbiased estimator of µB − µS . Answer. We know that B̄ is an unbiased estimator of the population mean of B, and similarly for S̄ and S. So E(B̄ − S̄) = E(B̄) − E(S̄) = µB − µS . (c) What is the variance of B̄ − S̄, as a function of nS ? (Ignore the finite population correction.) Answer. 4σ 2 σ2 + . V ar(B̄ − S̄) = V ar(B̄) + V ar(S̄) = 1000 − 4nS nS (d) For which nS is this variance smallest? What is the corresponding value of nB ? What is the minimal variance? (Ignore the fact that your formula might not give an integer.) Answer. Differentiating with respect to nS gives 4 4 2 σ − . (1000 − 4nS )2 n2S 2 this is zero when 1000−4nS = nS , i. e. when nS = 200. Thus we should test 200 Stanford students and 200 Berkeley students. We get V ar(B̄ − S̄) = σ 2 /200 + 4σ 2 /200 = σ 2 /40. Comments on this question. Some people overthought (a) and (b), perhaps because there was a lot of space to do them in. (d) is really a calculus problem. 14 points possible. Mean was 8.5, SD 4.0. Quartiles 5, 9, 11. 6 perfect scores out of 39; 26 got at least half the possible points. 3. [12 points: 2 + 2 + 2 + 2 + 4] On the following page are four Q − Q plots with unlabeled axes. In each case X1 , . . . , X1000 are a sample from some distribution; Y1 , . . . , Y1000 are a sample form some other distribution; and we have plotted the points (X(1) , Y(1) ), (X(2) , Y(2) ), . . . , (X(1000) , Y(1000) ). We write normal(µ, σ 2 ) for the normal distribution with mean µ and variance σ 2 . We write gamma(α, λ) for the gamma distribution with shape parameter α and scale parameter (rate) λ. For each pair circle the number of the plot that it corresponds to. (a) X from normal(2,1) distribution, Y from a gamma(4,2) distribution plot 1 plot 2 plot 3 plot 4 (b) X from continuous uniform(-1,1) distribution, Y from a triangular distribution, which has density √ √ (1/ 2) − x/2 if 0 ≤ x ≤ 2 √ √ f (x) = (1/ 2) + x/2 if − 2 ≤ x ≤ 0 √ 0 if |x| > 2. plot 1 plot 2 plot 3 plot 4 (c) X from exponential(1) distribution, Y from uniform(0,1) distribution plot 1 plot 2 plot 3 plot 4 (d) X from normal(1,4) distribution, Y from normal(2,9) distribution. plot 1 plot 2 plot 3 plot 4 You may explain your answers here. For grading purposes we will only take into account answers corresponding to plots that you’ve identified incorrectly, so if you give correct answers and incorrect or no explanation you’ll still get full credit. Answers: 3, 1, 2, 4. There are multiple ways to see this. One solution is as follows: first, the two distributions in (d) differ from each other by a linear function, so their probability plot should be a straight line. This is plot 4. In (a), the normal distribution has a shorter left tail and longer right tail than the gamma distribution; this right-skewness corresponds to a convex probability plot like plot 3. Both of the distributions in (b) are symmetric, so the probability plot should be symmetric; this is plot (1). This leaves only plot 2. 3 (e) One of the four plots on the following page is very close to being a straight line. What is the equation of that straight line, in the form y = mx + b? Answer: Plot 4, which corresponds to (d), is the closest to being a straight line. A typical point on this line will have x-coordinate given by the pth quantile of normal(1, 4) and y-coordinate the pth quantile of normal(2, 9); thus it will be (1 + 2Φ−1 (p), 2 + 3Φ−1 (p)). The line therefore passes through (1, 2) and has slope 3/2, so it’s y = 1.5x + 0.5. Comments. More people seemed to get (d) right than any other. A lot of people confused (a) and (c), perhaps because if you interchange X and Y they do have quite similarlooking plots. In part (e), a surprising number of people answered y = x; this is only true for Q-Q plots where both samples come from the same distribution. A common error in (e) was to make the slope 9/4, the ratio of the variances, instead of 3/2, the ratio of the standard deviations. 12 points possible. Mean was 5.9, SD 3.5. Quartiles 3, 5, 8. 4 perfect scores out of 39; 19 got at least half the possible points. Rather surprisingly, the distribution of the number of scores on this problem was very close to being uniform. Also, the correlation of the scores on this problem with the scores on the remainder of the exam was 0.02; that is, how you did on this problem is almost unrelated to how you did on the rest of the exam. I suspect this is because this problem is largely graphical and it was the only such problem. 4 Plots for question 3. plot 1 plot 2 plot 3 plot 4 5 4. [12 points: 4 + 4 + 4] Suppose that X is an exponential random variable with rate θ, which is unknown to us. That is, f (x|θ) = θe−θx . The prior distribution of Θ is a gamma distribution with its parameters chosen to have mean 2 and variance 1. We then make four independent observations taken from the distribution of X; they are 1.2, 3.2, 0.8, 1.6. (a) What are the parameters α, λ of the prior distribution? Answer. The gamma(α, λ) distribution has mean α/λ and variance α/λ2 . Thus we have 2 = α/λ and 1 = α/λ2 , so λ = 2, α = 4. (b) What is the likelihood of the observed data, as a function of θ? Answer. The likelihood is f (1.2, 3.2, 0.8, 1.6|θ) = (θe−1.2θ )(θe−3.2θ )(θe−0.8θ )(θe−1.6θ ) = θ4 e−6.8θ . (c) What is the posterior density of θ? You may give the density explicitly or answer with a well-known named distribution. If you could not answer (a), use the values α = 3, λ = 8 (which are incorrect) in answering this question. Answer. The prior gamma(4, 2) distribution has density proportional to θ4−1 e−2θ . The posterior density is therefore proportional to θ4−1 e−2θ × θ4 e−6.8θ = θ8−1 e−8.8θ . Therefore the posterior density is the gamma(8, 8.8) density. Comments: A lot of people got (a) and (b) but couldn’t put them together to get (c), or tried to write out the computations explicitly and got bogged down by the notation. 12 points possible. Mean was 7.2, SD 4.3. Quartiles 4, 9, 10.5. 7 perfect scores out of 39; 25 got at least half the possible points. 5. [19 points: 4 + 5 + 5 + 5]Let X1 , X2 , . . . , Xn be iid random variables with density αx on the interval 0 ≤ x ≤ 1, with α unknown. (a) What is the method of moments estimate of α?R 1 Answer. X has density αxα−1 and so E(X) = 0 αxα dx = α/(α + 1). The MOM estimator comes from solving X̄ = α̂/(α̂ + 1) for α̂. Solving gives α̂ = X̄/(1 − X̄). (b) What is the maximum likelihood estimator of α? Q α−1 Answer. lik(α) = i αXi . Taking logs gives X l(α) = (log α + (α − 1) log Xi ) α−1 i and differentaiting gives 0 l (α) = X1 i α 6 + log Xi . α̂ is the solution to l0 (α̂) = 0; that is, X n =− log Xi α̂ and solving for α̂ gives α̂ = P −n −n Q . = log Xi log Xi (c) What is the asymptotic variance of the MLE? Answer. We have the formula 2 2 ∂ ∂ I(α) = −E log f (X|α) = −E (log α + (α − 1) log X) . ∂α2 ∂α2 Differentiating gives −E[−1/α2 ], or 1/α2 . The asymptotic variance is therefore α2 /n. (d) We make 100 observations from this distribution and compute the MLE α̂ = 3.08. Give an approximate 90 percent confidence interval for α. Answer. From (c) we can estimate the standard error of α̂ as α̂ sα̂ = √ . n With α̂ = 3.08 and n = 100 we get sα̂ = 3.08 = 0.308.. The sampling distribution of 10 the MLE is approximately normal and so the confidence interval is α̂ ± sα̂ Φ−1 (1 − 0.10/2). Φ−1 (0.95) = 1.64 and so we get the interval [2.57, 3.59]. Comments. A lot of people wrote something like Pn 1 i=1 Xi α̂ = n 1 P 1 − n ni=1 Xi for (a); this is correct but not necessary. In (c), note that the definition of Fisher information refers to a single draw X from the distribution; in particular n should not appear in your formula for I(α). Many people got 1/(nα2 ) for n (computing the Fisher information of n draws) and got the variance α2 /n2 . 19 points possible. Mean was 14.6, SD 4.4. Quartiles 12.5, 16, 18. 7 perfect scores out of 39; 34 got at least half the possible points. Probably the easiest problem on the exam; to be honest I didn’t expect this going in. 6. [21: 7 + 6 + 4 + 4]The first sixty digits of π, after the decimal point, are 14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 74944 The number of times that this sequence contains each digit is: 3 0s, 5 1s, 6 2s, 8 3s, 7 4s, 6 5s, 4 6s, 5 7s, 6 8s, 10 9s. Now, we know that these digits are not chosen at random. But it is believed that π is a normal number; this means that each of the ten digits 0, 1, 2, . . . , 9 are equally likely to occur and the digits are independent. Test this hypothesis using the following test statistics. 7 (a) The sum of the digits. Give a p-value. Answer. We want to test the null hypothesis that the numbers are chosen uniformly at random against the alternative hypothesis that they are not. Under the null hypothesis that the digits are chosen uniformly at random, their sum is the sum of 60 discrete uniform (0,9) random variables. Call such a variable X. We have E(X) = 9/2 and E(X 2 ) = (02 + 12 + · · · + 92 )/10 = 28.5, so V ar(X) p = 8.25. The sum therefore has expectation (9/2)(60) = 270 and standard deviation (8.25)(60) = 22.2. The actual sum of the digits is (3)(0) + (5)(1) + (6)(2) + · · · + (10)(9) = 296. This corresponds to a z-score of (296 − 270)/(22.2) = 1.17 and therefore a p-value (twotailed) of 2(1 − Φ(1.17)) = 0.24, so we accept the hypothesis that π is a normal number. (b) The number of even digits (0, 2, 4, 6, 8). Give a p-value. Answer. There are 3 + 6 + 7 + 4 + 6 = 26 even digits and 60 − 26 = 34 odd digits. Under the null hypothesis, the distribution of the number of even digits is Bin(60, 1/2). The probability of observing an imbalance as least as severe as the one we see is 26 2 X 60 26 0 k=0 k which is hard to compute with a calculator. It’s easier to compute 33 X 60 1− /260 k k=27 which works out to 0.36629. Alternatively, use the normal approximation: the null distribution of the number of even digits is approximately normal with mean √ 30 and variance 15. The probability that it’s not in the range [26.5, 33.5] is 2(1 − Φ(3.5/ 15)) ≈ 0.36616. (c) An appropriate test statistic that has the chi-squared distribution as its approximate null distribution. Give the best possible range for p that can be determined from the table provided. For example you might write 0.01 < p < 0.025. Answer. There are two choices here. One is the generalized likelihood ratio test, which gives the test statistic X Oi 2 Oi log . Ei i Under the null hypothesis Ei = 6 for each i; the test statistic is then 3 5 10 = 5.9285. 2 3 log + 5 log + · · · + 10 log 6 6 6 Alternatively, compute Pearson’s χ2 X (Oi − Ei )2 i Oi = (3 − 6)2 (10 − 6)2 + ··· + = 6 (exactly). 6 6 8 In either case the null distribution is approximately χ29 . The 10th and 90th percentiles of this distribution are 4.17 and 14.68 (from tables) and so we have .1 ≤ p ≤ .9. (d) Hard. Do this only if time allows. Estimate the true value of p from part (c) using a normal approximation to the χ2 . Answer. χ29 has mean 9 and variance 18. If we substitute the normal with the same mean and variance we find 5.9285 − 9 √ = 1 − Φ(−.724) = 0.765. P (N (9, 18) > 5.9285) = 1 − Φ 18 In fact P (χ29 > 5.9285) = .747. Comments. Some people gave one-tailed p-values in (a) and (b); since either a very low or very high sum of digits or number of even digits would be evidence that π is not normal, a two-tailed test is appropriate. In (b), it’s also possible to do a χ2 -test for goodness of fit with 1 df, but you can’t get a p-value at all that way from the table provided. (d) is not so much “hard” as reliant on the somewhat obscure fact that the mean and variance of a χ2k are k and 2k, but this is a fact we’ve used a few times. 21 points possible. Mean was 5.1, SD 4.7. Quartiles 1, 4, 8.5. 0 perfect scores out of 39 (high score was 15!); 4 got at least half the possible points. Some general comments. The exam was difficult; I’m aware of that and you should know that your grades will be calculated accordingly. The overall mean was 52 (out of a possible 92) with a standard deviation of 15.5. The distribution of scores was as follows, where for example the pair (60, 6) means that six people got between 60 and 64: 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 1 1 2 5 6 4 4 1 7 3 1 3 0 0 1 One problem I noticed a lot of was people forgetting which variable they were differentiating with respect to; for example in 5(c) one typical error was differentiating with respect to X instead of with respect to α. This may be connected to expression I saw like dl(θ)/dα in 5(b); l(θ) only really makes sense if the parameter is called θ! 9