* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Computer lab 2
Survey
Document related concepts
Transcript
732A26 Computational statistics Department of Computer and Information Science Computer lab 2: The bootstrap Learning objectives The main objective of this computer lab is to make the student acquainted with parametric and nonparametric bootstrap techniques. After completing the lab the student shall be able to use bootstrap techniques to compute a confidence interval for an unknown parameter of a sampled probability distribution. Programming All tasks in this lab can be carried out using Excel worksheets. However, you are free to use Minitab macros, MATLAB or R. Assignment 1: Assessing the standard deviation of a normal distribution using the bootstrap The Excel file ‘bootstrap.xls’ contains worksheet formulae that can be used to compute a confidence interval for an unknown standard deviation of a normal distribution. Your task is to paste your own data onto appropriate worksheets and carry out a bootstrap analysis of how accurately can be estimated from a given sample. Use the random number generator in Excel to generate a sample of size 20 from a normal distribution with mean = 0 and standard deviation = 10. Save the generated numbers in worksheet ‘Empirical cdf’ (cells B2:B21) and make a scatter chart of the cumulative distribution function. Copy your sample to the worksheet ‘bootstrap (2)’ and use the formulae in this worksheet to resample the original observations. Compute the sample standard deviation s for each of the bootstrap intervals and use that information to construct a confidence interval for the standard deviation () of the probability distribution from which the original sample was generated. (A histogram of the bootstrap standard deviations, can easily be drawn in Minitab.) Construct an ordinary confidence interval for the unknown standard deviation using the fact that (n-1)s2/2 has a 2 distribution with n-1 degrees of freedom. Compare this confidence interval with the bootstrap interval and comment the results. Does the nonparametric bootstrap always produce longer confidence intervals than conventional parametric inference? 732A26 Computational statistics Department of Computer and Information Science Assignment 2: Parametric and nonparametric bootstrap Let X1, …, Xn denote a sample from an exponential distribution with density f(x) = exp(x). Your task is to use parametric and nonparametric bootstrap techniques to compute confidence intervals for the standard deviation = 1/ of X. Parametric bootstrap implies that: (i) the parameter is estimated from the original sample; (ii) new samples are drawn from an exponential distribution with the estimated parameter ̂ ; (iii) the standard deviation of the bootstrap samples is estimated using the relationship = 1/.. Nonparametric bootstrap is undertaken as in assignment 1 without using any prior information about the underlying distribution of the observed data. Compare and comment the length of the confidence intervals obtained with the parametric and nonparametric bootstrap. Does the nonparametric bootstrap always produce longer confidence intervals than parametric bootstrap? Assignment 3: Regression and residual resampling Consider a bivariate normally distributed vector (X, Y) in which: (i) X has a normal distribution with mean and standard deviation 1; (ii) the conditional distribution of Y given X = x has mean 0 + 1x and standard deviation 2 Your task is to show how a bootstrap technique can be used to assess the uncertainty (standard error) of the ordinary slope estimator n ̂1 (x i 1 i x )( y i y ) n (x i 1 i x)2 and compute a confidence interval for 1. You are advised to use R for the computations, but it is feasible to solve your task also in Excel or Minitab. Bootstrap samples can be generated by using residual resampling. This means that bootstrap samples are generated according to Yi* ˆ0 ˆ1 xi ei* , i 1, ..., n where ei* , i 1, ..., n is a bootstrap sample from the residuals e Y ˆ ˆ x , i 1, ..., n i i 0 1 i obtained when a simple linear regression model is fitted to your original sample of (X, Y). Note that the set of x-values will be the same in the original sample and in all bootstrap samples. 732A26 Computational statistics Department of Computer and Information Science Start by deciding what model parameters you would like to have in your bivariate normal distribution and generate a sample of size 50 from that distribution. Fit a regression model to your data and extract the residuals. Then, proceed by generating bootstrap samples and computing the slope estimator for these samples. Finally, make a histogram of your slope estimates and compute a confidence interval for 1. Compare the bootstrap interval with the ordinary confidence interval for 1 in a simple linear regression model. To hand in A short report with the results of your simulations including appropriate histograms of bootstrap estimates and requested confidence intervals. If you use Minitab macros, MATLAB or R, please also include the code you have used.