Download 1 Chapter 9 Exercises 1. Suppose X is a variable that follows the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Opinion poll wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
1
Chapter 9 Exercises
1. Suppose X is a variable that follows the normal distribution with known standard
deviation σ = 0.3 but unknown mean µ.
(a) Construct a 95% confidence interval for µ if a random sample of n = 16 observations
of X has sample mean x̄ = 5.
(b) Suppose that we want the entire width of the confidence interval to be equal to 0.04.
Find the sample size n needed.
2. A sample of size n = 100 of a variable Y is taken. The sample mean of these 100
observations is found to be ȳ = 1450. Assume that the population standard deviation is
σ = 50.
(a) Construct a 95% confidence interval for µ, the population mean of Y .
(b) What sample size is needed so that the length of the interval is 10 with 95% confidence?
3. Five observations of a variable W are taken: 680, 705, 690, 783, and 702. Construct a
95% confidence interval for µ, the population mean of W . State any assumptions needed for
this confidence interval to be valid.
4. In a rural area of a developing country, a survey is conducted to estimate the proportion,
p, of households that have access to clean water. Out of the 1000 households survey, 650
reported they have clean water.
(a) Construct a 95% confidence interval for p and state all assumptions.
(b) Find the sample size needed so that the margin of error will be ±0.02 with confidence
level 95%.
5. A researcher wants to estimate the prevalence of a disease in a country. What sample
size should be used if she desires to be 95% confident that the final estimate is within 0.05
of the true prevalence?
6. A student recorded the duration (T in minutes) on 20 occasions when the course website
was down:
5.9, 5.1, 5.7, 8.8, 10.2, 8.3, 3.5, 9.2, 8.5, 7.3
19.6, 7.5, 0.3, 2.1, 2.0, 0.5, 0.9, 5.9, 0.4, 0.5
2
(a) The student wants to study the population mean duration of time that the website is
down. She assumes that the data is normally distributed. Based on this assumption,
find a 95% confidence interval of the mean duration.
(b) Suppose she is subsequently told that T actually follows an Exp(λ) distribution. Based
on this new piece of information, find another 95% confidence interval.
(c) Compare the answers in (a) and (b) and discuss their differences.
7.
The Farøes is a group
of islands situated about
half way between Norway
and Iceland.
The islands
have been a dependency of
the Kingdom of Denmark
since the 1300s.
However,
over the past few decades,
there have been increasing
desire from inhabitants on
the islands to seek independence from Denmark. A random sample of 1200 inhabitants
are used in a survey, where each person gives their opinion (X) on whether Farøe
Islands should become an independent country. The survey results are as follows:
(x1 , ..., x1200 ) = ( 0, 0, ..., 0 , 1, 1, ..., 1 ), where xi = 1 if the i-th person supports
| {z }
| {z }
636 observations 564 observations
independence and xi = 0, otherwise. Suppose the observations are IID Bernoulli(p), where
p represents the proportion of all inhabitants who want the islands to be independent.
(a) How many observations of X are there?
(b) Find the MLE, p̂, of p, based on the given data?
(c) Use the CLT to find a 95% confidence interval for p.
(d) What is the margin of error in your estimate?
(e) A local politician claims that there is enough evidence in the results to suggest that
50% of all inhabitants of Farøe Islands want independence. Does your analysis shed
some light on her comments?
3
8. The name of Farøe Islands is derived from the word Faøroyar, meaning “sheep”. Since
Vikings time, wool products have been of major importance for the subsistence of the islands.
A particular farm owns a herd of sheep. The sheep are
free to roam around the mountains surrounding the
farm. The sheep are natural climbers and they can
scale the steepest of all slopes. However, occasionally,
a sheep may be trapped and require rescue. Suppose
on 6 out of 120 days, a sheep would require rescue.
(a) Use the CLT to find a 95% confidence interval for p, the proportion of days when a
sheep requires rescue. You may use some of the results you found in Question 1 to
answer this question.
(b) What is the margin of error in your estimate?
(c) If it is desired to reduce the margin of error by a factor of 1/2, how much the sample
size needs to be increased? (i.e., if E1 is the margin of error under the current sample,
then we want E2 = 12 E1 under the new sample size.)
9. Tourism accounts for a substantial part of the
islands’ economy. Apart from the spectacular scenery and
landscape, many visitors to the islands want to see the
northern lights, or Aurora Borealis. Northern lights are
display of lights formed from the collision of solar clouds
and the Earth’s magnetic field and are best observed at
night in the northern hemisphere. Let X be the duration
(in minutes on the log-scale) of the display of lights on
any particular occasion and suppose X ∼ N (µ, σ 2 ), with (µ, σ 2 ) unknown. Suppose the
durations (in minutes on the log-scale) of a random sample of 30 displays are recorded and
they are:
5.3, 6.8, 5.1, 6.9, 6.2, 4.2, 5.7, 5.9, 3.7, 5.6, 6.0, 7.7, 3.4, 4.5, 5.9,
4.1, 4.5, 6.3, 5.8, 5.0, 7.1, 4.2, 5.4, 5.6, 5.3, 5.4, 7.9, 5.0, 4.9, 5.8
(a) Find the MLE, (µ̂, σ̂ 2 ), of (µ, σ 2 ), based on the given data?
(b) Use the CLT to find a 95% confidence interval for µ.
(c) Express your 95% confidence interval from (b) in terms of minutes in its original scale.
(d) What should be the sample size if we want to reduce the margin of error (in log-scale)
to ± 0.2?
4
10. A group of scientists studying global warming has arrived the islands. They took 60
observations of the time (in days) between days when the temperature on the islands exceeded
10 degrees Celsius. Suppose the data are IID Exp(λ), where λ1 represents the mean time
P60
1
between days with temperature exceeding 10 degrees Celsius. Suppose x̄ = 60
i=1 xi = 19.8
(in days).
(a) Find the MLE of λ, based on the given data?
(b) Use the CLT to find a 95% confidence interval for λ. You may use the fact that var(1/x̄) ≈
λ2 /n for reasonably large values of n. Note that in general var(1/x̄) does NOT equal 1/var(x̄).
(c) Fifty years ago, λ = 0.04 (or mean time= 25 days). Does your analysis give evidence
that λ has changed from 50 years ago?
11. Arguably the biggest industry on the islands is fishing
and fish farming. Recently, fishermen have been complaining
that their income are dropping due to competition and over
fishing in the waters surrounding the islands. Data are
collected to determine whether there is enough evidence to
support the fishermen’s claims. The data consist of records of the catch X (in 1000 kg, same
below) from m = 80 fishing trawlers five years ago and the catch (Y ) from n = 70 fishing
trawlers this year. Summary statistics of the data are given below:
m
1 X
xi = 175.3,
x̄ =
m i=1
n
ȳ
=
1X
yi = 155.8,
n i=1
m
1 X
(xi − x̄)2 = 1800.4,
m i=1
n
1X
(yi − ȳ)2 = 1182.8.
n i=1
2
2
) and (µY , σY2 ) are unknown.
) and Y ∼ N (µY , σY2 ), where (µX , σX
Assume X ∼ N (µX , σX
Furthermore, assume all data are independent of each other.
(a) Find the MLE of µX and µY and hence, find the MLE for µX − µY , using the given
data. You may use established results from Question 3.
(b) Let µ̂X−Y be the MLE of µX − µY . Show that var(µ̂X−Y ) =
fact that var(µ̂X ) =
5.)
2
σX
σ2
+ Y . (Hint: Use the
m
n
2
σX
σ2
, var(µ̂Y ) = Y and recall the rules of var(X + Y ) in Chapter
m
n
(c) Use the CLT to find a 95% confidence interval for µX − µY . Does your analysis give
evidence that the amount of catch has depleted compared to five years ago?
5
ANSWERS
(1a) A 95% confidence interval is
σ
0.3
x̄ ± 1.96 √ = 5 ± 1.96 √ = 5 ± 0.147.
n
16
We are 95% confident that µ is between 4.853 and 5.147.
(b) The width of the confidence interval is
0.3
2 × 1.96 √ .
n
If we want the width to be no more than 0.04, then we find n such that
2
0.3
0.3
≈ 865.
0.04 = 2 × 1.96 √ ⇒ n = 2 × 1.96
0.04
n
(2a) A 95% confidence interval is
σ
50
x̄ ± 1.96 √ = 1450 ± 1.96 √
= 1450 ± 9.8.
n
100
We are 95% confident that µ is between 1440.2 and 1459.8.
(b) The width of the confidence interval is
0.3
2 × 1.96 √ .
n
If we want the width to be no more than 10, then we find n such that
2
50
50
10 = 2 × 1.96 √ ⇒ n = 2 × 1.96
≈ 385.
10
n
(3) We may be able to consider a 95% confidence interval
σ
x̄ ± 1.96 √
n
if the sample comes from a normal distribution and σ is known. However, in this case, σ is
unknown and since the sample size n is small, we estimate σ using a sample estimate σ̂ and
we replace 1.96 by a number from the t-table. Since n = 5, df = n − 1 = 4, the number we
use is 2.776, hence a 95% confidence interval is
σ̂
40.9
x̄ ± 2.776 √ ≈ 712 ± 2.776 √ ≈ 712 ± 50.8.
n
5
6
q
Pn
1
2
where σ̂ is the sample standard deviation, n−1
i=1 (xi − x̄) (this estimate is better than
q P
the alternative estimate n1 ni=1 (xi − x̄)2 , since n = 5 is quite small).
Therefore, we are 95% confident that µ is between 661.2 and 762.8.
(b) The width of the confidence interval is
0.3
2 × 1.96 √ .
n
If we want the width to be no more than 10, then we find n such that
2
50
50
10 = 2 × 1.96 √ ⇒ n = 2 × 1.96
≈ 385.
10
n
(4a) We assume there is a probability p that a household has access to clean water and the
chance households have access to clean water are independent. A 95% confidence interval is
r
p(1 − p)
.
p̂ ± 1.96
n
Since p is unknown, we estimate the margin of error using p̂, giving
s
r
350 650
p̂(1 − p̂)
650
1000
1000
p̂ ± 1.96
=
± 1.96
= 0.65 ± 0.0296.
n
1000
1000
We are 95% confident that p is between 0.62 and 0.68.
(b) The margin of error is
r
p(1 − p)
.
n
Since p is unknown, and the largest margin of error, for a particular value of n is when
p = 0.5, then we find n such that
r
2
0.5(1 − 0.5)
1.96
0.02 = 1.96
⇒n=
(0.25) = 2401.
n
0.02
1.96
5. Since a prevalence is a proportion, 0 < p < 1 which is unknown, then a confidence interval
estimate has the form:
r
p(1 − p)
p̂ ± 1.96
n
q
meaning that we are 95% certain that p is from p̂ by the margin of error, 1.96 p(1−p)
.
n
7
Using the same argument as in Question 4, we replace the unknown p by the value 0.5
that would lead to the largest margin of error, then we find n such that
r
2
0.5(1 − 0.5)
1.96
⇒n=
0.05 = 1.96
(0.25) = 385.
n
0.05
(6a) Based on the information, we may consider a 95% confidence interval
σ
x̄ ± 1.96 √
n
if the sample comes from a normal distribution and σ is known. However, in this case, σ is
unknown and since the sample size n is small, we estimate σ using a sample estimate σ̂ and
we replace 1.96 by a number from the t-table. For n = 20, df = n − 1 = 19, the number we
use is 2.093, hence a 95% confidence interval is
4.71
σ̂
x̄ ± 2.093 √ ≈ 5.61 ± 2.093 √ ≈ 5.61 ± 2.20.
n
20
where
σ̂ is a sample estimate of the population standard
deviation. We use σ̂ =
q P
q
P
n
n
1
1
2
2
i=1 (xi − x̄) here; alternatively, we could have used
i=1 (xi − x̄) but for small
n−1
n
n, the former is better.
We are 95% confident that the mean is between 3.41 and 7.81.
(b) Assuming the observations follow an Exp(λ) distribution, then the mean 1/λ can be
estimated by 1/λ̂ = x̄. However for an exponential distribution, the standard deviation is
also 1/λ hence, we also use x̄ to estimate the standard deviation. So as long as the sample
size is assumed to be “big”, an approximate 95% confidence interval is
x̄
5.61
x̄ ± 1.96 √ ≈ 5.61 ± 1.96 √ ≈ 5.61 ± 2.46.
n
20
We are 95% confident that the mean is between 3.15 and 8.06.
√
(c) Comparing (a) to (b), the main difference is the way the margin of error, 1.96σ̂/ n, is
estimated. We aim to estimate that as well as possible. The estimate using σ̂ = x̄ is the MLE
when the data follow an exponential distribution
and hence (b)is better than (a) under that
q
P
n
1
2
assumption. Alternatively, σ̂ estimated by n−1
i=1 (xi − x̄) is a simple sample standard
deviation without any assumptions; furthermore, when the normality assumption holds, (a)
gives a confidence interval with correct level of confidence and (a) is better than (b) because
in that case σ̂ = x̄ is biased for σ.
To conclude, we choose a confidence interval that utilizes the information that is given.
(7a) Each xi , i = 1, ..., 1200 is an observation of X.
observations.
Therefore, there are n = 1200
8
(b) Let p̂ be the MLE, then
n
1X
564
p̂ = x̄ =
.
xi =
n i=1
1200
(c) According to the CLT, in a random sample of size n, as long as n is large,
p(1 − p)
p̂ ∼ N p, var(p̂) =
.
n
Therefore, using the CLT, a 95% confidence interval for p is
s
r
r
564
564
(1 − 1200
)
p(1 − p)
p̂(1 − p̂)
564
p̂ ± 1.96
≈ p̂ ± 1.96
=
± 1.96 1200
= 0.47 ± 0.0282.
n
n
1200
1200
(d) The margin of error is 0.0282.
(e) According to the 95% confidence interval, the level of support is between
(0.47 − 0.0282, 0.47 + 0.0282) = (0.441, 0.498).
Since the upper limit is less than 0.5, we can say that we are 95% certain that the politician
is wrong.
(8a) The sample size is n = 120.
Using the CLT, a 95% confidence interval for p is
r
p̂ ± 1.96
r
p(1 − p)
≈ p̂ ± 1.96
n
s
p̂(1 − p̂)
6
=
± 1.96
n
120
6
− 120
)
= 0.05 ± 0.0389953.
120
6
(1
120
(b) The margin of error is 0.0389953.
(c) Let m be the new sample size, so we want
r
1
p(1 − p)
1.96
=
2
|
{z m }
new margin of error
r
1.96
|
!
p(1 − p)
n
{z
}
old margin of error
1
1
1
√
√
=
2
m
n
1
1 1
=
m
4 n
m = 4n
The answer shows that the new sample size should be 480 = 4 × 120. Therefore, we need
4 times the original sample size to reduce the margin of error by a factor of 1/2. The
9
general rule is, for a reduction of every factor of 1/2 in the margin of error, we require a
4-fold increase in the sample size. For example, if we want to reduce the margin of error
by a factor of 1/16, then since 16 = 2 × 2 × 2 × 2, we need to increase the sample size by
4 × 4 × 4 × 4 = 256 times.
(9a) Let x1 , ..., xn be iid N(µ,σ 2 ). The MLE (µ̂, σ̂) are:
n
µ̂ = x̄ = 5.506667,
1 X
σ̂ =
(xi − x̄)2 = 1.202023.
n − 1 i=1
2
so in terms of minutes, the mean duration is exp(5.506667) or about 247 minutes.
(b) Using the CLT, µ̂ ∼ N (µ, var(µ̂) =
σ2
).
n
Therefore, using the CLT, if we use value from the t-table based df = n−1 = 30−1 = 29,
a 95% confidence interval for µ is
r
r
r
σ2
σ̂ 2
1.202023
µ̂ ± 2.045
≈ µ̂ ± 2.045
= 5.506667 ± 2.045
= 5.506667 ± 0.4093446.
n
n
30
(c) From (b), the 95% confidence interval on log-scale can be written as
(5.506667 − 0.4093446, 5.506667 + 0.4093446)
which, in terms of minutes, is
[exp(5.506667 − 0.4093446), exp(5.506667 + 0.4093446)] ≈ (163.6, 370.9)
(d) The expression for the margin of error (on log-scale) is
r
σ2
2.045
≈ 0.4093446,
n
using the sample size of n = 30 and estimating σ 2 by σ̂ 2 = 1.202023 in the expression for
margin of error. To reduce the margin of error to ± 0.2, we can approximate the new sample
size by:
r
σ2
2.045
= 0.2,
n
10
and solve for n. The above equation gives:
r
σ2
0.2
=
n
2.045
2
2
σ
0.2
⇒
=
n
2.045
0.2 2
1
2.045
=
⇒
n
σ2
σ2
⇒ n=
2 .
0.2
2.045
2
⇒ n≈
σ̂
0.2
2.045
2 =
1.202023
= 125.6 ≈ 126.
0.2 2
2.045
(10a) The MLE of λ is:
n
λ̂ = Pn
i=1
xi
=
1
1
=
= 0.05050.
x̄
19.8
(b) Using the CLT, λ̂ ∼ N (λ, var(λ̂)). Assuming n is large enough and since we are using
an MLE, therefore,
1
var(λ̂) ≈ λ2 .
n
But λ2 is unknown, so we estimate it by n1 λ̂2 . Therefore, using the CLT, a 95% confidence
interval for λ is
s
s
r
1
2
2
λ̂
1
λ
19.82
≈ λ̂ ± 1.96
=
± 1.96
= 0.05050 ± 0.01277.
λ̂ ± 1.96
n
n
19.8
60
(c) According to the 95% confidence interval, λ is between
(0.05050 − 0.01277, 0.05050 + 0.01277) = (0.03772, 0.06327).
Since the interval includes 0.04, we cannot say that the rate is different from 50 years ago.
2
(11a) From Question 9, we know the MLE of (µX , σX
) based on (x1 , ..., xm ) are
m
µ̂X = x̄ = 175.3,
2
σ̂X
1 X
=
(xi − x̄)2 = 1800.4.
m i=1
Similarly, the MLE of (µY , σY2 ) based on (y1 , ..., yn ) are
n
µ̂Y = ȳ = 155.8,
σ̂Y2
1X
=
(yi − ȳ)2 = 1182.8.
n i=1
11
Therefore, an estimate for µX − µY is µ̂X − µ̂Y = 175.3 − 155.8 = 19.5
(b) Recall in Chapter 5, we learned that, for independent random variables X and Y , var(X −
Y ) = var(X) + var(Y ).
var(µ̂X−Y ) = var(µ̂X − µ̂Y ) =
var(µ̂X ) + var(µ̂Y )
|
{z
}
X’s and Y ’s are independent samples
= var(x̄) + var(ȳ)
2
σY2
σX
+
.
=
|m {z n}
From Question 3
(c) Using the CLT for MLE,
µ̂X−Y
2
σX
σY2
∼ N (µX − µY , var(µ̂X−Y )) = N µX − µY ,
+
,
m
n
where the last result comes from (b).
Therefore, using the CLT, a 95% confidence interval for µX − µY is
r
r
2
2
σX
σY2
σ̂X
σ̂ 2
µ̂X − µ̂Y ± 1.96
+
≈ µ̂X − µ̂Y ± 1.96
+ Y
m
n
m
n
r
1800.4 1182.8
= 19.5 ± 1.96
+
80
70
= 19.5 ± 12.30.
According to the 95% confidence interval, the mean difference is between
(19.5 − 12.3, 19.5 + 12.3) = (7.2, 31.8).
Since the lower limit of the interval is above zero, we are 95% certain that the average catch
has decreased by more than 7200 kg from five years ago. So the claims from the fishermen
are supported.