* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Solution Exercise 14.4 A) The approximate 90% confidence interval
Survey
Document related concepts
Transcript
Solution Exercise 14.4 A) The approximate 90% confidence interval for for the data from exercise 12.9 can be calculated using the formula, ̅ ̂ √ where ̂ is the usual version of the standard deviation of the process and n is the sample size of the data. Given the “x” values from Exercise 12.9 as shown below, R can be used to find the sample average and the estimate of the standard deviation. 6.11, 1.80, 2.32, 1.17, 5.28, 0.62, 0.68, 0.43, 1.18, 2.20, 1.24, 1.92, 0.63, 1.18 Plugging in the values of the sample average ( ̅ , 1.911, and ̂, 1.717, and using the critical value for a 90% confidence interval, 1.645, the following is obtained: Usual standard deviation estimator: [ ( ] √ = Thus, using the usual standard deviation estimator, the 90% confidence interval is . __________________________________________________________________ R-Code: x=c(6.11,1.80,2.32,1.17,5.28,0.62,0.68,0.43,1.18,2.20,1.24,1.92,0 .63,1.18) m=mean(x) n=length(x) sd=sd(x) left1=m-((1.645*sd)/sqrt(n)) left1 right1=m+((1.645*sd)/sqrt(n)) right1 __________________________________________________________________ B) Based on the calculations from Exercise 14.4A, it can be said that in 90% of repeated samples of size n=14 from the same process, similarly constructed intervals will give different upper and lower limits, because every sample produces a different data set. However, 90% of the intervals will capture the true , so we can be 90% confident that the interval, We cannot say that is correct. is absolutely within the confidence interval. There is still a 10% chance of being incorrect (i.e. not being within the confidence interval). E) Using R and utilizing bootstrap sampling to estimate the true confidence level of the interval calculated in 14.4A, the true confidence level of the interval is found to be 0.85 or 85%. _________________________________________________________________ R-Code: nsample =14 NREP = 10000 n = nsample*NREP x=c(6.11,1.80,2.32,1.17,5.28,0.62,0.68,0.43,1.18,2.20,1.24,1.92,0 .63,1.18) l=length(x) p = rep(1/l,l) sim.surv.vec = sample(x, n, p, replace=T) sim.surv.matrix = matrix(sim.surv.vec, nrow=NREP, ncol = nsample, byrow=T) ybar = rowMeans(sim.surv.matrix) stdevs = apply(sim.surv.matrix, 1, sd) lower.90.limits = ybar - 1.645*stdevs/sqrt(nsample) upper.90.limits = ybar + 1.645*stdevs/sqrt(nsample) m=sum(x*p) correct.ci = (lower.90.limits<m)*(upper.90.limits>m) ci.limits = cbind(ybar, stdevs, lower.90.limits, upper.90.limits, m, correct.ci) head(ci.limits) mean(correct.ci) __________________________________________________________________ 2 Solution Exercise 14.5 ( . In this case, A) The distribution that produced these data is specifically, the distribution is iid Bernoulli. In list form, the distribution can be described as the following: y p(y) 0 1–π 1 π Total 1.00 The mean of the distribution can be found as follows: ∑ ( ( ( ( ( The mean of the distribution is thus . The variance of the distribution can be found as follows: ∑( ( ( [ ] ( [ [ ] ( ] Thus, the variance of the distribution is . The standard deviation of the distribution is √ 3 . B) The bootstrap distribution for the data in list form is shown below. ̂( 0 7/20 1 13/20 Total 20/20 = 1.00 The mean of the bootstrap distribution can be found as follows: ̂ ∑ ̂( ( ) ( ) ( ) ( ) =̅ Thus, the mean of the bootstrap distribution is 0.65. The variance of the bootstrap distribution can be found as follows: ̂ ∑( ̅ ̂( [ ] ( ) [ ] ( ) Thus, the variance of the bootstrap distribution is 0.2275. The standard deviation of the bootstrap distribution is calculated as follows: ̂ √̂ √ Thus, the standard deviation of the bootstrap distribution is 0.477. The distribution in Exercise 14.5A is different from the bootstrap distribution because the distribution in Exercise 14.5A is the distribution for the process that produces the data observed; it is the model that produces the data. The bootstrap distribution, however, is an actual observed distribution of data that has come from the distribution in Exercise 14.5A. However, it is only one possible set of values from that distribution, there could be many others. Hence, the bootstrap distribution offers a specific number for mean, variance, and standard deviation 4 unlike the distribution from 14.5A where we are unable to determine specific values because the parameters are unknown. However, even though the bootstrap distribution provides specific values for the parameters, they are not the true values and in repeated samples, those parameters would be similar but not exactly the same. C) The approximate 95% interval for the mean of the distribution in Exercise 14.5A using the formula ̅ ̂ √ where ̂ is the plug-in estimate from Exercise 14.5B can be found as follows: ( √ Thus, the 95% confidence interval is . D) The interval obtained in Exercise 14.5C is identical to the Bernoulli confidence interval represented as ̂ √ ̂( ̂ . That fact can be shown by replacing ̂ with the value 0.65 which is the proportion of 1s from the bootstrap distribution which for a Bernoulli distribution is equal to the mean, the distribution and replacing n with the sample size of 20. The calculation is shown below. √ Hence, ( . 5 of