Download Notes 19 - Wharton Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Transcript
Statistics 475 Notes 19
Reading: Lohr, Chapter 7.5, 9.3.3
Schedule:
The due date for Homework 5 (final homework) is
postponed to Monday, December 8th to allow adequate time
for working on project presentations.
Presentations will be December 1st and December 3rd (see
handout with schedule and guidelines).
Final report on your project due Wed., Dec. 17th, 5 p.m.
For complex surveys, it is nearly impossible to develop a
closed-form expression for the variance of many
estimators. An alternative approach to estimating variances
(i.e., finding standard errors) and forming approximate
confidence intervals is the bootstrap method. The
bootstrap depends on extensive resampling from the
original data, but it does not depend on any formula for
calculating a variance.
I. The Bootstrap Method
Basic idea: We would like to estimate the variance of a
sample statistic from a population:
Population
fo
Sample
Statistic
1
We estimate the variance by assuming the population is
equal to the population estimated from the sample:
Estimated
Sample
Population
Statistic
Schedule:
from
Sample
For a simple random sample, we estimate the population by
the empirical distribution from the sample. We take
samples from this estimated population by taking samples
of size n with replacement from the original sample of size
n (if the resampling were done without replacement, we
would get the same sample every time).
Example 1: Consider the Agent Orange study from Notes 2.
The study consisted of a simple random sample of size 50
of Vietnam veterans’ dioxin levels. The sample was
dioxinsample=c(132,527,542,110,475,354,458,586,370,389,544,626,538,32
4,159,359,272,233,213,422,238,522,596,182,329,37,74,480,563,528,623,38,
81,610,111,543,577,525,344,335,4,198,83,452,145,249,455,418,540,175)
Here we take 1000 bootstrap samples:
boots=1000; # number of bootstrap samples
bootmeans=rep(0,boots); # vector which will store mean from each
# bootstrap sample
# Loop in which we take a bootstrap sample and calculate its mean
for(i in 1:boots){
# Take bootstrap sample, sample with replacement from original sample
2
bootsample=sample(dioxinsample,size=50,replace=TRUE);
# Calculate mean of boostrap sample
bootmeans[i]=mean(bootsample);
}
# Boostrap estimate of standard error = standard deviation of bootstrap
# means
sd(bootmeans);
[1] 26.69789
# Bootstrap 95% confidence interval = 2.5th quantile of bootstrap means,
# 97.5 quantile of bootstrap means
quantile(bootmeans,.025);
300.4165
quantile(bootmeans,.975);
405.4905
The bootstrap standard error and confidence interval is
similar to the usual standard error and confidence interval
from Chapter 2:
# Usual design based standard errors and confidence intervals
> designse
[1] 25.55921
> lowerci
[1] 303.6639
> upperci
[1] 403.8561
An advantage of the bootstrap is that it can easily be used
to form confidence intervals for statistics other than the
mean such as the median:
# Bootstrap confidence interval for median
# Sample median
median(dioxinsample)
boots=1000; # number of bootstrap samples
3
bootmedians=rep(0,boots); # vector which will store median from each
# bootstrap sample
# Loop in which we take a bootstrap sample and calculate its median
for(i in 1:boots){
# Take bootstrap sample, sample with replacement from original sample
bootsample=sample(dioxinsample,size=50,replace=TRUE);
# Calculate mean of boostrap sample
bootmedians[i]=median(bootsample);
}
# Bootstrap 95% confidence interval = 2.5th quantile of bootstrap medians,
# 97.5 quantile of bootstrap medians
quantile(bootmedians,.025);
quantile(bootmedians,.975);
> median(dioxinsample)
[1] 364.5
> quantile(bootmedians,.025);
2.5%
272
> quantile(bootmedians,.975);
97.5%
465
4