Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stat 428 Midterm 1 Winter 2006 Name SOLUTIONS Please read each question carefully and ask me if you have any questions. You cannot get full credit for a problem unless you show your work. Partial credit will be granted based on work shown. Make sure to show formulas and calculator inputs where appropriate. There are 50 possible points on the exam. 1) (11 points) Psychiatrists are interested in how the pH levels of brains may change for patients with mental illnesses. We have the pH levels of brains of 20 healthy individuals which the psychiatrists hope to use as a reference point. A histogram, boxplot, and descriptive statistics for these pH levels follow. Histogram of pH levels Boxplot of pH levels 7.0 6 6.8 6.6 4 pH levels Frequency 5 3 2 6.2 6.0 1 0 6.4 5.8 5.8 Variable pH levels 6.0 N 20 6.2 6.4 pH levels Mean 6.3140 6.6 SE Mean 0.0831 6.8 5.6 StDev 0.3717 Minimum Q1 5.7300 5.9725 Median 6.2600 Q3 6.6675 Maximum 6.8700 a) (2 points) What is the population from which the sample is taken? The population is made up of the brains of healthy individuals or the pH levels of brains of healthy individuals. b) (2 points) Is the sample representative of the population you described in part a? Explain. Since we don’t know how the 20 individuals were selected, we cannot be sure that the sample is representative of the population. c) (4 points) Describe the distribution of pH levels. The distribution of pH levels is bimodal, yet somewhat symmetric. The center and spread are difficult to describe for this distribution because there appear to be two groups. We could consider 6.0 and 6.6 to be relative centers for the pH levels. d) (3 points) Is the histogram or the boxplot a better graphical display for these data? Explain. The histogram is a better graphical display for these data, since we can see the two groups. This feature is obscured in the boxplot. 1 2) (4 points) Suppose that a course has a capacity of at most 240 people, but that 1550 invitations are sent out. If each person who receives an invitation has a probability of 0.135 of attending the course, independently of everybody else, what is the probability that the number of people attending the course will exceed the capacity? Let X = number of people attending the course. Then X follows B(n,p) with n=1550 and P=.135. We want to find P(X>240). Getting the probability exactly involves a tedious calculation. Since n is very large and p is not too extreme, we can use the normal approximation to a binomial distribution for this problem, that is, B(n,p) can be approximated by N(np,np(1-p)). (This is an application of the CLT.) np = 1550 ( 0.135) = 209.25 and np (1 − p ) = 1550 ( 0.135)( 0.865) = 181.0 Using the CLT and standardizing X, we have X −μ 240 − 209.25 30.75 = = 2.29 σ 13.4536 181.0 P ( X > 240 ) ≈ P ( Z > 2.29 ) = 1 − 0.9890 = 0.0110 Z= = 3) (7 points) A lime kiln is a large cylinder made of metal that is used at a paper mill to extract lime from calcium carbonate by heating it to a high temperature. An engineer was interested in variations in the temperature of the kiln and took measurements of the kiln temperature every 10 minutes during a 5-hour period. Histogram of Temperature 7 6 Frequency 5 4 3 2 1 0 Variable N Temperature 30 Mean 567.77 564 SE Mean 0.485 566 StDev 2.65 568 570 Temperature Minimum 563.70 572 574 Q1 565.98 Median 567.55 Q3 Maximum 569.73 573.70 a. (4 points) Describe the distribution of temperatures. The distribution of temperatures is unimodal and skewed right. The center is around 568 degrees, with a spread of about (569.73 – 565.98 = 3.75) 4 degrees. There are two gaps in the distribution, one at 565 degrees and the other at 573 degrees. b. (3 points) Is it more appropriate to use the mean and standard deviation or the median and interquartile range to describe the center and spread of the temperatures? Explain. Since the distribution is skewed and there is a potential outlier at 574, it is more appropriate to use the median and IQR to describe the center and spread of the temperatures. 2 4) (18 points) We have two independent random variables X 1 and X 2 . Suppose that E ( X 1 ) = μ , Var ( X 1 ) = 13 and E ( X 2 ) = μ , Var ( X 2 ) = 9 . Consider the point estimates X1 4 X 2 + 5 5 X X μˆ 2 = 1 + 2 + 1 2 3 a. (6 points) Calculate the bias of each point estimate. Is either of them unbiased? If so, which one? 4 1 4 ⎡ X 4X2 ⎤ 1 bias = E [ μˆ1 − μ ] = E ⎢ 1 + − μ ⎥ = E [ X1 ] + E [ X 2 ] − μ = μ + μ − μ = 0 5 5 5 5 ⎣ 5 ⎦ 5 X 1 1 1 μ ⎡X ⎤ 1 bias = E [ μˆ 2 − μ ] = E ⎢ 1 + 2 + 1 − μ ⎥ = E [ X 1 ] + E [ X 2 ] + 1 − μ = μ + μ + 1 − μ = 1 − 3 3 2 3 6 ⎣ 2 ⎦ 2 μˆ1 = μ̂1 is unbiased since its bias is zero. b. (6 points) Calculate the variance of each point estimate. Which one has the smallest variance? 16 1 16 157 ⎛ X 4X2 ⎞ 1 Var ( μˆ1 ) = Var ⎜ 1 + = 6.28 ⎟ = Var ( X 1 ) + Var ( X 2 ) = (13) + ( 9 ) = 5 ⎠ 25 25 25 25 25 ⎝ 5 X 1 1 1 ⎛X ⎞ 1 Var ( μˆ 2 ) = Var ⎜ 1 + 2 + 1⎟ = Var ( X 1 ) + Var ( X 2 ) = (13) + ( 9 ) = 4.25 3 9 4 9 ⎝ 2 ⎠ 4 μ̂2 has the smaller variance c. (4 points) Calculate the mean square error of each point estimate. 2 MSE ( μˆ1 ) = Var ( μˆ1 ) + ( bias ) = 6.28 + 02 = 6.28 MSE ( μˆ 2 ) = Var ( μˆ 2 ) + ( bias ) μ μ μ μ ⎛ μ⎞ = 4.25 + ⎜ 1 − ⎟ = 4.25 + 1 − + = 5.25 − + 3 36 3 36 ⎝ 6⎠ 2 2 2 2 d. (2 points) Which point estimate would you choose to estimate μ ? Explain. Answers may vary. It depends on which point estimate has a smaller MSE. MSE( μ̂ 2 ) < MSE( μ̂1 ) if and only if μ < 2.549 . Otherwise choose μ̂1 . However, note that this answer requires some knowledge of μ , which we don’t know in practice. choose μ̂2 −14.549 < μ < 2.549 . Using the MSE criterion, for −14.549 < 3 5) (6 points) The weights of bricks are normal distributed with mean 110.0 grams and standard deviation 0.4 grams. The weights of 22 randomly selected bricks are measured, what is the probability that the resulting point estimate of μ will be in the interval (109.9, 110.2)? Suppose that we use the sample mean to estimate μ . The sampling distribution of the sample mean is exactly normal with a mean 110.0 and standard deviation .4 22 . ⎛ 109.9 − 110.0 110.2 − 110.0 ⎞ P (109.9 < X < 110.2 ) = P ⎜ <Z< ⎟ 0.4 22 ⎠ ⎝ 0.4 22 = P ( −1.17 < Z < 2.35 ) = 0.9906 − 0.1210 = 0.8696 6) (4 points) A scientist reports that the proportion of defective items from a process is 12.6%. If the scientist’s estimate is based on the examination of a random sample of 360 items from the process, what is the standard error of the scientist’s estimate? pˆ (1 − pˆ ) 0.126 ( 0.874 ) SE ( pˆ ) = = = 0.0003059 = 0.0175 n 360 4