Download Boostrapping - Rossman/Chance

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sufficient statistic wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Transcript
Fall, 2013
Wednesday, Dec. 4
Stat 301 – Brief Overview of Bootstrapping
Suppose we did not believe the population distribution was approximately normal, how could we
construct a confidence interval for the population mean? Still want to consider starting with ̅ as our
estimate, but then what about the margin-of-error? Or what if we wanted to use a statistic other than ̅ ?
(a) What does the margin-of-error measure?
Central Limit Theorem for a sample mean: If we take repeated samples of size n from a population
with mean , then, if n is large, the “what if” distribution of ̅ will be approximately normal with mean
and standard deviation /√n.
(b) Remind yourself of what this theorem claims by opening the Sampling Pennies applet. These are
the ages in a population of 1000 pennies (actually collected and recorded by a statistics professor).
0. Uncheck the Animate box.
1. Take 1000 samples of 5 pennies each and describe the center, spread, and shape of the distribution.
2. Repeat for 1000 samples of 50 pennies each.
Mean
SD
shape
n=5
n = 50
(c) Do these sampling distributions behave as expected/predicted by the CLT?
Proposal: To assess the variability of our statistic from random sample to random sample, we can
sample with replacement from our sample. The steps are:
1. Take a random sample of n observations from the population
2. Take a random sample of n observations from the sample with replacement (think of this as
repeating the first sample infinitely many times and using that as the population)
3. Calculate the “bootstrap statistic”
4. Repeat steps 2 and 3 a large number of times to create a bootstrap sampling distribution.
The claim is the standard deviation of the bootstrap sampling distribution will be a reasonable
approximation of the standard deviation of the statistic.
(d) Consider this claim using the Bootstrap Sampling Change applet.
1. Draw a sample of 5 pennies.
2. Press Bootstrap Population a few times. (This creates the bootstrap population.)
3. Now set of the Number of samples to 1000, uncheck the Animate box, and press Draw Bootstrap
Samples. Record the characteristics of the distribution below.
4. Repeat for an initial sample of 50 pennies. Compare the results to those above.
Mean
SD
Shape
n=5
n = 50
Fall, 2013
Wednesday, Dec. 4
The real advantage of bootstrapping is to get an estimate of the standard error of the statistic for
more interesting statistics when the “normal theory” (CLT) does not apply.
Simulation in R – Using the penny ages data with variable name “pennyages”
> iscamsummary(pennyages)
n
Min
Q1 Median
1000.00
0.00
4.00
11.00
Q3
19.00
Max
59.00
Mean
12.30
SD
9.61
Suppose we take a random sample of 50 pennies:
> pennysample=sample(pennyages, 50)
> iscamsummary(pennysample)
n
Min
Q1 Median
Q3
50.0
0.0
4.0
10.0
20.0
Max
40.0
Mean
13.2
SD
10.9
Now we are going to take “bootstrap samples” from this sample.
> I=1000
> bootstrapsample =0
> bootstrapmean = 0
> for (i in 1:I){
+
bootstrapsample=sample(pennysample, 50, replace=TRUE)
+
bootstrapmean[i]=mean(bootstrapsample)
+ }
> iscamsummary(bootstrapmean)
n
Min
Q1 Median
1000.00
8.86
12.20
13.10
Q3
14.10
Max
17.80
Mean
13.10
SD
1.51
(e) How could you construct a 95% confidence interval from this bootstrap sampling distribution?
(f) How could you decide if this is a 95% confidence interval procedure?
Other Statistics: What about the median?
bootstrapmedian[i]=median(bootstrapsample)
> iscamsummary(bootstrapmedian)
n
Min
Q1 Median
Q3
989.00
4.00
9.50 11.00 12.00
> hist(bootstrapmedian, nclass=20)
Max
17.50
Mean
10.70
SD
2.21
Two groups: To compare two groups, we can find the standard error of the differences by sampling
with replacement from each group, calculating the statistic comparing the two samples, and building the
bootstrap distribution of the statistic. Does not require you to use a hypothesized parameter value in the
simulation (no assumption the samples are coming from the same population). Allows you to model
random sampling instead of random assignment. [Investigation 3.9 required assuming particular
populations to sample from.]
Fall, 2013
Wednesday, Dec. 4
Simulation in Minitab – Using the penny data with the sample of 50 ages in column 1
Build a .mac file:
Do k1=1:1000
sample 50 c1 c2;
replace.
let c3(k1)=mean(c2)
Enddo
> Describe c3