Download solutions - Department of Statistics | OSU: Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Transcript
Stat 428
Midterm 1
Winter 2006
Name
SOLUTIONS
Please read each question carefully and ask me if you have any questions. You cannot get
full credit for a problem unless you show your work. Partial credit will be granted based on
work shown. Make sure to show formulas and calculator inputs where appropriate. There
are 50 possible points on the exam.
1) (11 points) Psychiatrists are interested in how the pH levels of brains may change for patients
with mental illnesses. We have the pH levels of brains of 20 healthy individuals which the
psychiatrists hope to use as a reference point. A histogram, boxplot, and descriptive statistics
for these pH levels follow.
Histogram of pH levels
Boxplot of pH levels
7.0
6
6.8
6.6
4
pH levels
Frequency
5
3
2
6.2
6.0
1
0
6.4
5.8
5.8
Variable
pH levels
6.0
N
20
6.2
6.4
pH levels
Mean
6.3140
6.6
SE Mean
0.0831
6.8
5.6
StDev
0.3717
Minimum
Q1
5.7300 5.9725
Median
6.2600
Q3
6.6675
Maximum
6.8700
a) (2 points) What is the population from which the sample is taken?
The population is made up of the brains of healthy individuals or the pH levels of
brains of healthy individuals.
b) (2 points) Is the sample representative of the population you described in part a? Explain.
Since we don’t know how the 20 individuals were selected, we cannot be sure that the
sample is representative of the population.
c) (4 points) Describe the distribution of pH levels.
The distribution of pH levels is bimodal, yet somewhat symmetric. The center and
spread are difficult to describe for this distribution because there appear to be two
groups. We could consider 6.0 and 6.6 to be relative centers for the pH levels.
d) (3 points) Is the histogram or the boxplot a better graphical display for these data?
Explain.
The histogram is a better graphical display for these data, since we can see the two
groups. This feature is obscured in the boxplot.
1
2) (4 points) Suppose that a course has a capacity of at most 240 people, but that 1550 invitations
are sent out. If each person who receives an invitation has a probability of 0.135 of attending the
course, independently of everybody else, what is the probability that the number of people
attending the course will exceed the capacity?
Let X = number of people attending the course. Then X follows B(n,p) with n=1550 and
P=.135. We want to find P(X>240). Getting the probability exactly involves a tedious
calculation. Since n is very large and p is not too extreme, we can use the normal
approximation to a binomial distribution for this problem, that is, B(n,p) can be
approximated by N(np,np(1-p)). (This is an application of the CLT.)
np = 1550 ( 0.135) = 209.25 and np (1 − p ) = 1550 ( 0.135)( 0.865) = 181.0
Using the CLT and standardizing X, we have
X −μ
240 − 209.25
30.75
=
= 2.29
σ
13.4536
181.0
P ( X > 240 ) ≈ P ( Z > 2.29 ) = 1 − 0.9890 = 0.0110
Z=
=
3) (7 points) A lime kiln is a large cylinder made of metal that is used at a paper mill to
extract lime from calcium carbonate by heating it to a high temperature. An engineer was
interested in variations in the temperature of the kiln and took measurements of the kiln
temperature every 10 minutes during a 5-hour period.
Histogram of Temperature
7
6
Frequency
5
4
3
2
1
0
Variable
N
Temperature 30
Mean
567.77
564
SE Mean
0.485
566
StDev
2.65
568
570
Temperature
Minimum
563.70
572
574
Q1
565.98
Median
567.55
Q3
Maximum
569.73 573.70
a. (4 points) Describe the distribution of temperatures.
The distribution of temperatures is unimodal and skewed right. The center is around
568 degrees, with a spread of about (569.73 – 565.98 = 3.75) 4 degrees. There are
two gaps in the distribution, one at 565 degrees and the other at 573 degrees.
b. (3 points) Is it more appropriate to use the mean and standard deviation or the
median and interquartile range to describe the center and spread of the
temperatures? Explain.
Since the distribution is skewed and there is a potential outlier at 574, it is more
appropriate to use the median and IQR to describe the center and spread of the
temperatures.
2
4) (18 points) We have two independent random variables X 1 and X 2 . Suppose that
E ( X 1 ) = μ , Var ( X 1 ) = 13 and E ( X 2 ) = μ , Var ( X 2 ) = 9 . Consider the point estimates
X1 4 X 2
+
5
5
X
X
μˆ 2 = 1 + 2 + 1
2
3
a. (6 points) Calculate the bias of each point estimate. Is either of them unbiased? If
so, which one?
4
1
4
⎡ X 4X2
⎤ 1
bias = E [ μˆ1 − μ ] = E ⎢ 1 +
− μ ⎥ = E [ X1 ] + E [ X 2 ] − μ = μ + μ − μ = 0
5
5
5
5
⎣ 5
⎦ 5
X
1
1
1
μ
⎡X
⎤ 1
bias = E [ μˆ 2 − μ ] = E ⎢ 1 + 2 + 1 − μ ⎥ = E [ X 1 ] + E [ X 2 ] + 1 − μ = μ + μ + 1 − μ = 1 −
3
3
2
3
6
⎣ 2
⎦ 2
μˆ1 =
μ̂1
is unbiased since its bias is zero.
b. (6 points) Calculate the variance of each point estimate. Which one has the
smallest variance?
16
1
16
157
⎛ X 4X2 ⎞ 1
Var ( μˆ1 ) = Var ⎜ 1 +
= 6.28
⎟ = Var ( X 1 ) + Var ( X 2 ) = (13) + ( 9 ) =
5 ⎠ 25
25
25
25
25
⎝ 5
X
1
1
1
⎛X
⎞ 1
Var ( μˆ 2 ) = Var ⎜ 1 + 2 + 1⎟ = Var ( X 1 ) + Var ( X 2 ) = (13) + ( 9 ) = 4.25
3
9
4
9
⎝ 2
⎠ 4
μ̂2
has the smaller variance
c. (4 points) Calculate the mean square error of each point estimate.
2
MSE ( μˆ1 ) = Var ( μˆ1 ) + ( bias ) = 6.28 + 02 = 6.28
MSE ( μˆ 2 ) = Var ( μˆ 2 ) + ( bias )
μ μ
μ μ
⎛ μ⎞
= 4.25 + ⎜ 1 − ⎟ = 4.25 + 1 − +
= 5.25 − +
3 36
3 36
⎝ 6⎠
2
2
2
2
d. (2 points) Which point estimate would you choose to estimate μ ? Explain.
Answers may vary. It depends on which point estimate has a smaller MSE. MSE( μ̂ 2 ) <
MSE( μ̂1 ) if and only if
μ < 2.549 . Otherwise choose μ̂1 . However, note that this
answer requires some knowledge of μ , which we don’t know in practice.
choose
μ̂2
−14.549 < μ < 2.549 . Using the MSE criterion,
for −14.549 <
3
5) (6 points) The weights of bricks are normal distributed with mean 110.0 grams and standard
deviation 0.4 grams. The weights of 22 randomly selected bricks are measured, what is the
probability that the resulting point estimate of μ will be in the interval (109.9, 110.2)?
Suppose that we use the sample mean to estimate μ . The sampling distribution of the
sample mean is exactly normal with a mean 110.0 and standard deviation .4
22 .
⎛ 109.9 − 110.0
110.2 − 110.0 ⎞
P (109.9 < X < 110.2 ) = P ⎜
<Z<
⎟
0.4 22 ⎠
⎝ 0.4 22
= P ( −1.17 < Z < 2.35 )
= 0.9906 − 0.1210
= 0.8696
6) (4 points) A scientist reports that the proportion of defective items from a process is
12.6%. If the scientist’s estimate is based on the examination of a random sample of 360
items from the process, what is the standard error of the scientist’s estimate?
pˆ (1 − pˆ )
0.126 ( 0.874 )
SE ( pˆ ) =
=
= 0.0003059 = 0.0175
n
360
4