Download Notes 19 - Wharton Statistics

Statistics 475 Notes 19 Reading: Lohr, Chapter 7.5, 9.3.3 Schedule: The due date for Homework 5 (final homework) is postponed to Monday, December 8th to allow adequate time for working on project presentations. Presentations will be December 1st and December 3rd (see handout with schedule and guidelines). Final report on your project due Wed., Dec. 17th, 5 p.m. For complex surveys, it is nearly impossible to develop a closed-form expression for the variance of many estimators. An alternative approach to estimating variances (i.e., finding standard errors) and forming approximate confidence intervals is the bootstrap method. The bootstrap depends on extensive resampling from the original data, but it does not depend on any formula for calculating a variance. I. The Bootstrap Method Basic idea: We would like to estimate the variance of a sample statistic from a population: Population fo Sample Statistic 1 We estimate the variance by assuming the population is equal to the population estimated from the sample: Estimated Sample Population Statistic Schedule: from Sample For a simple random sample, we estimate the population by the empirical distribution from the sample. We take samples from this estimated population by taking samples of size n with replacement from the original sample of size n (if the resampling were done without replacement, we would get the same sample every time). Example 1: Consider the Agent Orange study from Notes 2. The study consisted of a simple random sample of size 50 of Vietnam veterans’ dioxin levels. The sample was dioxinsample=c(132,527,542,110,475,354,458,586,370,389,544,626,538,32 4,159,359,272,233,213,422,238,522,596,182,329,37,74,480,563,528,623,38, 81,610,111,543,577,525,344,335,4,198,83,452,145,249,455,418,540,175) Here we take 1000 bootstrap samples: boots=1000; # number of bootstrap samples bootmeans=rep(0,boots); # vector which will store mean from each # bootstrap sample # Loop in which we take a bootstrap sample and calculate its mean for(i in 1:boots){ # Take bootstrap sample, sample with replacement from original sample 2 bootsample=sample(dioxinsample,size=50,replace=TRUE); # Calculate mean of boostrap sample bootmeans[i]=mean(bootsample); } # Boostrap estimate of standard error = standard deviation of bootstrap # means sd(bootmeans); [1] 26.69789 # Bootstrap 95% confidence interval = 2.5th quantile of bootstrap means, # 97.5 quantile of bootstrap means quantile(bootmeans,.025); 300.4165 quantile(bootmeans,.975); 405.4905 The bootstrap standard error and confidence interval is similar to the usual standard error and confidence interval from Chapter 2: # Usual design based standard errors and confidence intervals > designse [1] 25.55921 > lowerci [1] 303.6639 > upperci [1] 403.8561 An advantage of the bootstrap is that it can easily be used to form confidence intervals for statistics other than the mean such as the median: # Bootstrap confidence interval for median # Sample median median(dioxinsample) boots=1000; # number of bootstrap samples 3 bootmedians=rep(0,boots); # vector which will store median from each # bootstrap sample # Loop in which we take a bootstrap sample and calculate its median for(i in 1:boots){ # Take bootstrap sample, sample with replacement from original sample bootsample=sample(dioxinsample,size=50,replace=TRUE); # Calculate mean of boostrap sample bootmedians[i]=median(bootsample); } # Bootstrap 95% confidence interval = 2.5th quantile of bootstrap medians, # 97.5 quantile of bootstrap medians quantile(bootmedians,.025); quantile(bootmedians,.975); > median(dioxinsample) [1] 364.5 > quantile(bootmedians,.025); 2.5% 272 > quantile(bootmedians,.975); 97.5% 465 4

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Notes 19 - Wharton Statistics