Download Computer lab 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Transcript
732A26 Computational statistics
Department of Computer and Information Science
Computer lab 2: The bootstrap
Learning objectives
The main objective of this computer lab is to make the student acquainted with
parametric and nonparametric bootstrap techniques.
After completing the lab the student shall be able to use bootstrap techniques to compute
a confidence interval for an unknown parameter of a sampled probability distribution.
Programming
All tasks in this lab can be carried out using Excel worksheets. However, you are free to
use Minitab macros, MATLAB or R.
Assignment 1: Assessing the standard deviation of a normal
distribution using the bootstrap
The Excel file ‘bootstrap.xls’ contains worksheet formulae that can be used to compute a
confidence interval for an unknown standard deviation  of a normal distribution. Your
task is to paste your own data onto appropriate worksheets and carry out a bootstrap
analysis of how accurately  can be estimated from a given sample.
Use the random number generator in Excel to generate a sample of size 20 from a normal
distribution with mean  = 0 and standard deviation  = 10. Save the generated numbers
in worksheet ‘Empirical cdf’ (cells B2:B21) and make a scatter chart of the cumulative
distribution function.
Copy your sample to the worksheet ‘bootstrap (2)’ and use the formulae in this worksheet
to resample the original observations. Compute the sample standard deviation s for each
of the bootstrap intervals and use that information to construct a confidence interval for
the standard deviation () of the probability distribution from which the original sample
was generated. (A histogram of the bootstrap standard deviations, can easily be drawn in
Minitab.)
Construct an ordinary confidence interval for the unknown standard deviation  using the
fact that (n-1)s2/2 has a 2 distribution with n-1 degrees of freedom. Compare this
confidence interval with the bootstrap interval and comment the results. Does the
nonparametric bootstrap always produce longer confidence intervals than conventional
parametric inference?
732A26 Computational statistics
Department of Computer and Information Science
Assignment 2: Parametric and nonparametric bootstrap
Let X1, …, Xn denote a sample from an exponential distribution with density f(x) = exp(x). Your task is to use parametric and nonparametric bootstrap techniques to compute
confidence intervals for the standard deviation  = 1/ of X.
Parametric bootstrap implies that:
(i)
the parameter  is estimated from the original sample;
(ii)
new samples are drawn from an exponential distribution with the
estimated parameter ̂ ;
(iii) the standard deviation of the bootstrap samples is estimated using the
relationship  = 1/..
Nonparametric bootstrap is undertaken as in assignment 1 without using any prior
information about the underlying distribution of the observed data.
Compare and comment the length of the confidence intervals obtained with the
parametric and nonparametric bootstrap. Does the nonparametric bootstrap always
produce longer confidence intervals than parametric bootstrap?
Assignment 3: Regression and residual resampling
Consider a bivariate normally distributed vector (X, Y) in which:
(i)
X has a normal distribution with mean  and standard deviation 1;
(ii)
the conditional distribution of Y given X = x has mean 0 + 1x and
standard deviation 2
Your task is to show how a bootstrap technique can be used to assess the uncertainty
(standard error) of the ordinary slope estimator
n
̂1 
 (x
i 1
i
 x )( y i  y )
n
 (x
i 1
i
 x)2
and compute a confidence interval for 1. You are advised to use R for the computations,
but it is feasible to solve your task also in Excel or Minitab.
Bootstrap samples can be generated by using residual resampling. This means that
bootstrap samples are generated according to
Yi*  ˆ0  ˆ1 xi  ei* , i  1, ..., n
where ei* , i  1, ..., n is a bootstrap sample from the residuals
e  Y  ˆ  ˆ x , i  1, ..., n
i
i
0
1 i
obtained when a simple linear regression model is fitted to your original sample of (X, Y).
Note that the set of x-values will be the same in the original sample and in all bootstrap
samples.
732A26 Computational statistics
Department of Computer and Information Science
Start by deciding what model parameters you would like to have in your bivariate normal
distribution and generate a sample of size 50 from that distribution.
Fit a regression model to your data and extract the residuals.
Then, proceed by generating bootstrap samples and computing the slope estimator for
these samples. Finally, make a histogram of your slope estimates and compute a
confidence interval for 1.
Compare the bootstrap interval with the ordinary confidence interval for 1 in a simple
linear regression model.
To hand in
A short report with the results of your simulations including appropriate histograms of
bootstrap estimates and requested confidence intervals. If you use Minitab macros,
MATLAB or R, please also include the code you have used.