Download Homework 3 Solution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Homework 3 Solution
This HW reviews the normal distribution, confidence intervals and the central
limit theorem.
(1) Suppose that X is a normally distributed random variable where X ∼
N (75, 32 ) (mean 75 and standard deviation 3).
(i) Calculate P (X > 67).
= −8/3 = −2.66. We
We make the z-transform z = 67−75
3
look this up in the tables, to get 0.0039. This is the area
to the left of −2.66, the area to the right (which matches
P (X > 67)) is 1 − 0.0039.
(ii) Find the x such that P (75 − x ≤ X ≤ 75 + x) = 0.99.
This means the area in the tails should in total be 1%. By
symmetry that is 0.5% on either side of the tail. Looking up 0.5% inside the tables gives 2.57 on the outside.
This means x should be 2.57 standard deviations from the
mean. Therefore x = 2.57 × 3 = 7.71.
(2) A patient is classified as having gestational diabeties if the glucose
level is above 140 miligams per deciliter one hour after ingesting a
sugary drink. Lucy’s measured sugar level varies according to a normal
distribution with mean µ = 125mg/dl and standard deviation 10mg/dl.
Since the her mean level is below 140mg/dl she does not have gestational diabetes. However, in reality the mean level is unknown, all that
1
is known are readings taken from blood samples. Therefore, below we
want to evaluate the chance of wrongly diagnosing gestational diabetes
based on the samples taken.
(a) Suppose one single measurement is made (one blood sample),
what is the probability that she will be misdiagnosed as having
gestational diabetes (in other words what is the chance that her
measurement will be above 140mg/dl given that a single measurement is normally distributed with mean µ = 125mg/dl and
standard deviation 10mg/dl).
We want to calculate the chance that her measurement
will be over 140. As measurements are close to normal
we use the normal distribution to calculate this. It is
easiest to understand with a picture. Draw a normal
distribution centered about 125 with standard deviation
10. We want to calculate the area to the right of 140 (this
is the probability). To do this make a z-transform 140.
z = (140 − 125)/10 = 1.5 (remember we subtract from 125
since this is her mean level). The area to the right of the
z-score 1.5 is 1 − 0.933 = 0.066. So the chance of her being
diagnosed based on just one measurement is 6.6%.
(b) Instead suppose that on three separate days measurements are
made and the average measurement is taken over these three days.
2
What is the probability that she will be misdiagnosed as having
gestational diabetes (in other words what is the chance that her
average over these three measurements will be above 140mg/dl)?
Hint: What is the distribution of the sample mean based on three
measurements given that a single measurement is normally distributed with mean µ = 125mg/dl and standard deviation 10mg/dl?
This is the same as the above, however the main difference is that we use the average of three measurements.
The difference √
now is that the standard error has changed
from 10 to 10/ 3 = 5.77. The distribution of the sample
mean normal with mean 125 (as before) but with standard error 5.77. We before we want to calculate the area
to the right of 140 (but using this new standard error).
The z-transform is z = (140 − 125)/5.77 = 2.59. The area to
the right of 2.59 is 0.0046. Thus the chance of her falsely
being diagnosed using the average of three measurements
goes down to 0.4%.
(c) Compare your solutions from part (a) and part (b). What have
you notice about the probability of false diagnosis as a larger sample is used?
As the sample size increases, the standard error of the
sample mean goes down. The the chance of a wrong diagnoses decreases.
3
(3) Suppose the scores of high school ACT test have mean 19.2 and standard deviation 5.1. As we discussed in class, ACT scores are only very
approximately normally distributed.
(a) Using the normal distribution, what is the approximate probability that a single randomly selected student will score 23 or higher?
The population mean is µ = 19.2 and standard deviation
is σ = 5. In order to calculate the probability we assume normality (even though this is not strictly true)
and calculate the z-transform z = x−µ
= 23−19.2
= 0.75.
σ
5.1
Thus the probability P (Z > 0.75) = 0.2266. In other words
the probability of a student getting over 23 marks is approximately 22.66% (approximately because we assumed
normality of the distribution of scores).
(b) A simple random sample of 25 students is taken. What is the
mean and standard deviation of the average score (sample mean
x̄) of these 25 students?
The mean of the sample mean is the same as the population mean µ = 19.2. The standard deviation
√
√ of the sample
mean is the standard error, which is σ/ n = 5.1/ 25 =
1.02.
(c) Using the normal distribution, what is the approximate probability that the sample mean score of these 25 randomly selected
students will be 23 or higher?
x̄−µ
√ = 23−19.2 =
Like part (a), we make a z-transform z = σ/
1.02
n
3.73. Looking this up in the tables gives P (Z > 3.73) =
0.0001.
(d) Which of your Normal probability calculations (a) and (c) will be
the most accurate, give a reason for your answer?
The central limit theorem tells us the distribution will be
much more normal if the sample size grow larger. As we
have calculated both the probabilities in (a) and (c) under
the assumption of normality, the probability in part (c)
will be a more accurate estimate the probability.
(4)
(i) 300 different samples are drawn, each sample is of size 50. For
each sample a 90% confidence interval (CI) for the mean µ is con4
structed.
On average, how many of the intervals will contain the mean?
300 × 0.9 = 270
(ii) Suppose it is known that the smallest adult is 1.5 feet tall and
the tallest known adult is 8.5 tall. A sample of size 50 people is
drawn, the average height using this sample is 5.5 feet tall.
Give a 100% CI for the mean adult height.
This is a slightly trick question. 100% means we need to
completely sure it will contain the mean. We know that
the smallest person is 1.5 and the tallest is 8.5. Therefore, the mean height must lie somewhere in this interval.
Therefore a 100% CI is [1.5, 8.5]. Some of you used the
absolute end points of the normal tables, which is a very
reasonable solution, but technically this is still not quite
100%.
(iii) Suppose a random sample of size 40 is drawn from a population
which hasPmean µ and variance σ 2 . I evaluate the sample mean
40
1
X̄ = 40
i=1 Xi . It is known the standard error of the sample
mean is 0.5. What is the standard deviation of the original population?
To answer this question we use the formula for the
√ standard error =standard deviationqof population/ n and
solve for σ. This is s.e. = 0.5 =
√
⇒ s.d = 0.5 × 40 = 3.16
σ2
n
=
s.d
√
n
=
s.d
√
40
(5) A random sample of size 15 is drawn. The QQplot is given below. Suppose that the sample mean is X̄ = 0.606 and the population variance
is σ 2 = 1.
(a) Construct a 95% CI for the mean.
5
[0.606 ± 1.96 ×
q
1
15
= [0.1, 1.11]
(b) Based on the QQplot comment on whether the 95% CI for the
mean is reliable. Give a reason for your answer.
The sample size n = 15 is small, hence for the CI to be reliable the distribution of the population should be close to
normal. Looking at the QQplot of the observations, the
points tend to be on the 45◦ line, suggesting that the observations have come a distribution which does not differ
much from a normal distribution. Based on this observation the 95% confidence interval appears to be reliable
interval at the 95% level.
(6) Suppose that the population mean and variance is µ and 10 respectively,
and the distribution is bimodal. A random sample of size 30P
is drawn
30
1
from this population and evaluate the sample mean, X̄ = 30
i=1 Xi .
(i) What is the approximate distribution of X̄ (give the mean and
variance), and given a reason for your answer?
Even though the original population is bimodal (does not
look at all normal), as the sample size is relatively large
(30 observations) it is reasonable to suppose that the
sample mean is close to normal (just play with the applet to see this). Therefore roughly speaking we can say
X̄ ∼ N (µ, 10
)
30
(ii) Over your sketch make a sketch of the (density) distribution of X̄.
A Bimodal with a much narrower normal distribution superimposed over the top. The both share the same mean
µ.
6
(iii) Suppose that the population mean is µ = 5. Find the probability
that the sample mean X̄ is greater than 6.5.
10
). CenThis is calculation using the normal distribution N (5, 30
ter the distribution about 5 and place 6.5 to the right of 5 and
calculate the area to the right of 6.5 using the normal tables.
1.5
6.5−5
} = P {Z > 0.58
} = P {Z >
P {X̄ > 6.5} = P {Z > √
10/30
2.59} = 0.0047 ∼
= 0.5%
7