Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Transcript

Statistics MINITAB - Lab 8 Central Limit Theorem The Central Limit Theorem is one of the most important theorems in statistics. The statement of the theorem (without proof) is: Consider a random sample of size n selected from a population (any population) with mean and standard deviation . Then, when n is sufficiently large, the sampling distribution of x will be approximately a normal distribution with mean x = and standard deviation x = / n . The larger the sample size, the better will be the normal approximation to the sampling distribution of x What this means is that for any population distribution, regardless of shape etc., if we repeatedly take samples of a large enough size and record the sample means, then the distribution of those means will approach a normal distribution with a mean the same as the population mean, and a standard deviation equal to the population standard deviation divided by the square root of the sample size. We will now look at the sampling distribution of the mean from a skewed distribution (the exponential). 1. The exponential distribution is an asymmetric continuous probability distribution. The probability distribution function for the exponential is: f x 1 e x where the mean = and the standard deviation = Using MINITAB (release 13.20) generate random variables from an exponential population with parameter = 20. Go the CALC > RANDOM DATA > EXPONENTIAL generate 1000 rows of data and store in columns C1-C50, with a mean of 20: 1 How many rows of data do you want in each column? How many columns of data do you want? Specify which columns you want them in. What is the mean of the distribution you want to simulate? 2. Have a look at the shape of the distribution of these random variables and their descriptive statistics: Go to STAT > BASIC STATISTICS > DISPLAY DESCRIPTIVE STATISTICS. Then select column C1 and click on GRAPHS > HISTOGRAM OF DATA, WITH NORMAL CURVE. This will give both the normal descriptive statistics in the session window and also draw a histogram of the data with a normal curve superimposed on it. What is the mean of C1 ____________ What is the standard deviation of C1____________ Describe the shape of the histogram (could we consider this a normal distribution, why ?) _____________________________________________________________________ _____________________________________________________________________ Have a quick look at C2, C3 and C4 aswell. 2 3. First we will get a sampling distribution of means for a sample size of n=2. Go to the CALC menu and click on ROW STATISTICS. Select the mean as the statistic of interest and input variables columns C1-C2. Store the result in C51. Name column C51 n=2. Column C51 now contains 1000 means from samples of size n=2 from an exponential population with parameter = 20. These 1000 means are a sampling distribution of the mean. What statistic do you want? You want the mean of C1 and C2. Next available column. We want to look at the sampling distribution of the mean for different sample sizes. Repeat this step 3 more times using the following: Input variable Store Result Name for column C1-C5 C52 n=5 C1-C15 C53 n=15 C1-C50 C54 n=50 Columns C51-C54 contain sampling distributions of the mean for samples of sizes 2, 5, 15 and 50 respectively from a exponential population with parameter = 20. Get descriptive statistics and histograms with a normal distribution curve superimposed on it for each of the columns C51-C54. Fill in the following table: 3 Sample Size Mean Median Skew ? Standard Deviation Expected Standard Deviation by C.L.T. * n=2 n=5 n = 15 n = 50 * C.L.T. = Central Limit Theorem What is the expected mean for each case (by the C.L.T.) ? _______________ What pattern are you seeing with regard to shape (skew symmetry, bell shape etc.) as the sample size increase ? ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ What pattern can you see in the standard deviation of the sampling distributions as the sample size increases ? Why might this be important ? (NB: the standard deviation of a sampling distribution of the mean is often called the standard error of the mean). ______________________________________________________________________________ ______________________________________________________________________________ REVISION SUMMARY After this lab you should be able to : - Understand the Central Limit Theorem - Simulate data with a specific distribution - Generate a histogram with the normal curve superimposed on it - Create a sampling distribution of means - Work out expected mean/standard deviation by the CLT END 4