* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Stat 280 Lab 9: Law of Large Numbers and Central
Survey
Document related concepts
Transcript
Stat 280 Lab 9: Law of Large Numbers and Central Limit Theorem Objectives: This lab is designed to introduce the Law of Large Numbers and Central Limit Theorem. Directions: Follow the instructions below, answering all questions. Your answers should be in the form of a brief report (MS Word), to be handed in to the instructor before you leave. Please include plots and descriptive statistics in your report. Background: Law of Large Numbers: Suppose you draw independent observations at random from any population with finite mean . How accurately do you want to estimate ? As the number of observations drawn increases, the mean, xbar of the observed values eventually approaches the mean of the population. This is called the law of large numbers. The law of large numbers says broadly that the average results of many independent observations are stable and predictable. A grocery store deciding how many gallons of milk to stock and a fast-food restaurant deciding how many beef patties to prepare can predict demand even thought their many customers make independent decisions. The law of large numbers says that these many individual decisions will produce a stable result. Central Limit Theorem: Draw a simple random sample of size n from any population with mean and finite standard deviation . When n is large, the sampling distribution of the sample mean is approximately normal: xbar ~ N(, /√n) More generally, the central limit theorem says that the distribution of a sum or average of many small random quantities is close normal. The central limit theorem allows us to use normal probability calculations to answer questions about sample means from many observations even when the population distribution is not normal. In this lab you will examine the distribution of sample averages for sample sizes of 1, 5, and 30 from a non-normal distribution, the continuous uniform distribution. The mean and variance of a uniform(a,b) random variable is (a + b)/2 and (b – a)^2/12. 1. Start MINITAB and create thirty sets of 300 observations from the Uniform(0,1) distribution. You can do this by selecting Calc -> Random Data -> Uniform and specifying 300 rows, and Store in Columns c1-c30. Set the Lower Endpoint to 0.0 and the Upper Endpoint to 1.0. a) What is the population mean? b) What is the population variance and standard deviation? Now suppose that you want to estimate the mean of this population, but that you didn't know that the distribution was Uniform(0,1). To estimate the mean you could: i) estimate with Xbar from a sample of size 1 ii) estimate with Xbar from a sample of size 5 iii) estimate with Xbar from a sample of size 30 Based on the Law of Large Numbers and the Central Limit Theorem, we expect to see: i) as n gets larger, the standard deviation of Xbar gets smaller, and ii) as n gets larger, the distribution of Xbar begins to approach the normal distribution. To check these distributional properties, we will need to repeat the calculation of Xbar many times. We will repeat the calculation of Xbar 300 times for each sample size Xbar (note: keep this number of replications distinct in your mind from the sample size, 1, 5, or 30, on which the Xbar is based). 2. Computing Xbar1, Xbar5, Xbar30. Create a new column labeled Xbar5 at the top of the worksheet in column 32 but do not enter any values. Instead, select Calc > Row Statistics from the top menu, and select button for Mean (that is, the sample mean). Set the input variables to c1-c5 and store the result in 'Xbar5'. Repeat this process, creating Xbar30 in column 33, with input variables c1-c30. Create a column labeled Xbar1 in column 31 and you can use any one of the columns c1 to c30 to compute Xbar1. Each row now shows a single replication of computing sample means for samples of size 1, 3, and 30, along with the original data. Before plotting the histograms for the observed values of Xbar1, Xbar5, and Xbar30, you will create a column in MINITAB for the histogram bin boundaries. Otherwise MINITAB will use its own bin sizes for each of the distributions (Xbar1, Xbar5, and Xbar30) and the different bins will make the results confusing. Select Calc Make Patterned Data Simple Set of Numbers and Store the Result in c34. From first value: 0, to last value: 1, in steps of: .025. Leave the values for List Each Value and List the whole sequence set to 1. 3. Looking at Xbar1, Xbar5, Xbar30. To see the action of the central limit theorem (CLT) and law of large numbers (LLN), we will look at histograms and quantile-quantile plots of repeated samples of each of these. Select Graph -> Histogram. You can specify all three histograms in the same command: Click on the options button and set the type of histogram to Percent and Define Intervals, Midpoint/Cutpoint positions: c34. Look at the histograms for the Xbar values. Comment on the changing shape as n – -> 1, 5, 30. What happens to the spread? 4. Normal Probability Plots Histograms are not always the greatest way to check distributional shape. In fact, the pattern can change quite a bit when you change the histogram boundaries just a little. To augment this view, we will use quantile-quantile plots. We will plot the empirical quantiles of Xbar1 or Xbar5 or Xbar30 against the theoretical quantiles of Z, a Normal(0,1) random variable. That is, we will plot Xbar1(p) against Z(p) for various values of p, for example: How do we tell whether the fit is good? We will use our experience with plotting truly normal data. To gain some experience, we will first generate some normally distributed values using MINITAB, and look at plots for those data sets. If the data is Normal the quantile plot should be close to a straight line. Create four new columns with 300 normally distributed ( = 0, 2 = 1) values: Select Calc -> Random Data -> Normal and specifying 300 rows, and Store in Columns c35-c38. Set the Lower Endpoint to 0.0 and the Upper Endpoint to 1.0. Now select from the Graph Probability plot. Under variables select C35 as the column and look at the resulting plot. Repeat the procedure to plot C36, C37, and C38. All can be on the screen at the same time, to give you a feel for how much deviation from a straight line you might expect when the data truly are normal. Now you are ready to look at normal plots of the Xbar data. Create normal probability plots for Xbar1, Xbar5, and Xbar30. Do each graph separately and compare. What do you see? Do you see the CLT in action? Finally, look at the average and standard deviation values printed in the normal probability plot window for Xbar1, Xbar5, Xbar30. Jot them down in following table: Xbar1 Xbar5 Xbar30 Average Std Dev Compare them with = _ .5_____ and = sqrt(1/12)___. Do you see the LLN in action? Sources: www.me.psu.edu/lamancusa/Quality/Module3/L3PREVU.doc and Introduction to the Practice of Statistics (Moore, McCabe)