Download Comparison of bootstrap methods and t

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Comparison of Bootstrap
Methods and t-methods:
Capture Rates of Confidence
Intervals and Probability of
Type I Errors in Hypothesis
Tests
Jeff Kollath
Oregon State University
[email protected]
Introductory Statistics courses at Oregon State University


ST 201/202:

3 credits – 3 hours of lecture and 1 hour of recitation each week

Traditional approach to teaching introductory statistics
ST 351/352: (what I teach)
4
credits – 3 hours of lecture and one 80-minute lab each week
 Use
Minitab during labs
 Inference
is introduced using bootstrap methods
Motivation:

Inference is introduced using bootstrap and randomization methods
used: Unlocking the Power of Data by the 5 Lock’s (Wiley, 1st
edition)
 Text

Provide a better understanding of sampling distributions and how they
are used in inference

Methods courses: we use Minitab macros to generate confidence
intervals (using percentile methods) and p-values using bootstrap and
randomization methods.

Students also learn “traditional normal-based” methods (t-methods, for
example)

Students often ask which procedure should they use: bootstrap
methods or t-methods
Comparison of capture rates for confidence intervals


Two different populations were created

a population that was normally distributed

a population that was heavily right skewed
For each population:

a random sample was taken from the population of sizes 15, 100, and 500.

From the sample data, several 95% confidence intervals were constructed:


A 95% confidence interval for the population mean using the t-methods

A 95% confidence interval for the population mean from a bootstrap distribution of
2000 sample means using the percentile method

A 95% confidence interval for the population median from a bootstrap distribution of
2000 sample medians using the percentile method

For each, it was noted whether the population mean (or median) fell between the
bounds of the confidence interval.
All simulations were done using Minitab
Normal population
Skewed population
900
16000
800
14000
700
12000
Frequency
Frequency
600
500
400
300
8000
6000
4000
200
2000
100
0
10000
0
66
77
88
99
data
110
121
132
143
5500
11000
16500
22000
data
27500
33000
38500
Results
From several thousand simulations, the percent of simulations that had a
confidence interval capture the population parameter are given in the table
below:
Comparison of probability of Type I Errors in hypothesis tests


Two different populations were created

a population that was normally distributed

a population that was heavily right skewed
For each population:

a random sample was taken from the population of sizes 15 and 100 was taken.

A hypothesis test was performed where the hypothesized parameter (either the mean or
the median) was equal to the true population value of the mean or median.


For the same sample, a hypothesis test was performed on both the mean and the
median

The significance level for each hypothesis test was set at 5%

The two-tailed p-value for each hypothesis test was determined using the t-methods,
bootstrap methods on the mean, and bootstrap methods on the median.

Whether or not the null hypothesis was rejected at the 5% significance level was
recorded.
All simulations were done using Minitab
Results
After around 750 simulations, the percent of simulations that had a Type I
Error are given in the table below:
Skewed pop.
Test on mean
using tmethods
Skewed pop.
Test on mean
using
bootstrap
methods
Skewed pop.
Test on median
using bootstrap
methods
normal pop.
Test on mean
using tmethods
normal pop.
Test on mean
using bootstrap
methods
normal pop.
Test on median
using bootstrap
methods
n = 15
27.02%
30.73%
6.86%
4.72%
8.19%
5.17%
n = 100
13.14%
13.81%
4.56%
6.56%
6.83%
5.44%
Percent of simulations where p-value from t-test was higher
than p-value from bootstrap methods:


normal population

n = 15: 92.50%

n = 100: 65.13%
skewed population

n = 15: 84.77%

n = 100: 62.06%
What I’d say to the student who asks, “Which method
should I use?”

The t-methods perform slightly better than the bootstrap methods
for inference about a population mean.

However, the difference is slight and the bootstrap methods offer
an alternate method for students who prefer and understand
simulation better than formula-based methods

Do not use either method for inference about a population mean
when data are skewed and the sample size is not “large enough”.
How large the sample needs to be depends on how skewed the
population data are.

When it is not appropriate to use either method for inference about
a population mean, inference on the population median using the
bootstrap methods is an option.
Contact Information:
Jeff Kollath
Department of Statistics
Oregon State University
Corvallis, OR 97331
541-737-3585
[email protected]