Download Solution Exercise 14.4 A) The approximate 90% confidence interval

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Confidence interval wikipedia , lookup

Gibbs sampling wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Transcript
Solution Exercise 14.4
A) The approximate 90% confidence interval for
for the data from exercise
12.9 can be calculated using the formula, ̅
̂ √ where ̂ is the usual
version of the standard deviation of the process and n is the sample size of the
data. Given the “x” values from Exercise 12.9 as shown below, R can be used to
find the sample average and the estimate of the standard deviation.
6.11, 1.80, 2.32, 1.17, 5.28, 0.62, 0.68, 0.43, 1.18, 2.20, 1.24, 1.92, 0.63, 1.18
Plugging in the values of the sample average ( ̅ , 1.911, and ̂, 1.717, and using
the critical value for a 90% confidence interval, 1.645, the following is obtained:
Usual standard deviation estimator:
[
(
] √
=
Thus, using the usual standard deviation estimator, the 90% confidence interval is
.
__________________________________________________________________
R-Code:
x=c(6.11,1.80,2.32,1.17,5.28,0.62,0.68,0.43,1.18,2.20,1.24,1.92,0
.63,1.18)
m=mean(x)
n=length(x)
sd=sd(x)
left1=m-((1.645*sd)/sqrt(n))
left1
right1=m+((1.645*sd)/sqrt(n))
right1
__________________________________________________________________
B) Based on the calculations from Exercise 14.4A, it can be said that in 90% of
repeated samples of size n=14 from the same process, similarly constructed
intervals will give different upper and lower limits, because every sample
produces a different data set. However, 90% of the intervals will capture the true
, so we can be 90% confident that the interval,
We cannot say that
is correct.
is absolutely within the confidence interval. There is still a
10% chance of being incorrect (i.e.
not being within the confidence interval).
E) Using R and utilizing bootstrap sampling to estimate the true confidence level
of the interval calculated in 14.4A, the true confidence level of the interval is
found to be 0.85 or 85%.
_________________________________________________________________
R-Code:
nsample =14
NREP = 10000
n = nsample*NREP
x=c(6.11,1.80,2.32,1.17,5.28,0.62,0.68,0.43,1.18,2.20,1.24,1.92,0
.63,1.18)
l=length(x)
p = rep(1/l,l)
sim.surv.vec = sample(x, n, p, replace=T)
sim.surv.matrix = matrix(sim.surv.vec, nrow=NREP, ncol = nsample,
byrow=T)
ybar = rowMeans(sim.surv.matrix)
stdevs = apply(sim.surv.matrix, 1, sd)
lower.90.limits = ybar - 1.645*stdevs/sqrt(nsample)
upper.90.limits = ybar + 1.645*stdevs/sqrt(nsample)
m=sum(x*p)
correct.ci = (lower.90.limits<m)*(upper.90.limits>m)
ci.limits = cbind(ybar, stdevs, lower.90.limits, upper.90.limits,
m, correct.ci)
head(ci.limits)
mean(correct.ci)
__________________________________________________________________
2
Solution Exercise 14.5
( . In this case,
A) The distribution that produced these data is
specifically, the distribution is iid Bernoulli.
In list form, the distribution can be described as the following:
y
p(y)
0
1–π
1
π
Total
1.00
The mean of the distribution can be found as follows:
∑
(
(
(
(
(
The mean of the distribution is thus .
The variance of the distribution can be found as follows:
∑(
(
(
[
] (
[
[
] (
]
Thus, the variance of the distribution is
.
The standard deviation of the distribution is √
3
.
B) The bootstrap distribution for the data in list form is shown below.
̂(
0
7/20
1
13/20
Total
20/20 = 1.00
The mean of the bootstrap distribution can be found as follows:
̂
∑
̂(
( )
( )
( )
( )
=̅
Thus, the mean of the bootstrap distribution is 0.65.
The variance of the bootstrap distribution can be found as follows:
̂
∑(
̅
̂(
[
]
(
)
[
]
(
)
Thus, the variance of the bootstrap distribution is 0.2275.
The standard deviation of the bootstrap distribution is calculated as follows:
̂
√̂
√
Thus, the standard deviation of the bootstrap distribution is 0.477.
The distribution in Exercise 14.5A is different from the bootstrap distribution
because the distribution in Exercise 14.5A is the distribution for the process that
produces the data observed; it is the model that produces the data. The bootstrap
distribution, however, is an actual observed distribution of data that has come
from the distribution in Exercise 14.5A. However, it is only one possible set of
values from that distribution, there could be many others. Hence, the bootstrap
distribution offers a specific number for mean, variance, and standard deviation
4
unlike the distribution from 14.5A where we are unable to determine specific
values because the parameters are unknown. However, even though the bootstrap
distribution provides specific values for the parameters, they are not the true
values and in repeated samples, those parameters would be similar but not exactly
the same.
C) The approximate 95% interval for the mean of the distribution in Exercise
14.5A using the formula ̅
̂ √ where ̂ is the plug-in estimate from
Exercise 14.5B can be found as follows:
(
√
Thus, the 95% confidence interval is
.
D) The interval obtained in Exercise 14.5C is identical to the Bernoulli
confidence interval represented as ̂
√ ̂(
̂
. That fact can be
shown by replacing ̂ with the value 0.65 which is the proportion of 1s from the
bootstrap distribution which for a Bernoulli distribution is equal to the mean,
the distribution and replacing n with the sample size of 20. The calculation is
shown below.
√
Hence,
(
.
5
of