* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Part I - A moderately skewed distribution Part II
Survey
Document related concepts
Transcript
Project 3 The Central Limit Theorem Due, October 18 Our book uses a common rule of thumb: if the sample size is at least 30, the distribution of the sample mean is approximately normal. But this is an oversimplfication. For distributions that are heavily skewed or that have significant outliers, the approximation by a normal distribution for samples of size 30 might not be very good. One of the most important facts about √ the distribution of X̄ that we use is that 95% of the possible values of the sample mean are within 1.96σ/ n of the population mean. (Here σ is the population standard deviation.) We use this to construct 95% confidence intervals. In this project, you are going to check the n = 30 rule by comparing the prediction of the central limit theorem to simulations in some cases where we have the whole population. We are going to look at two populations: one that is moderately skewed and one that is very skewed. Part I - A moderately skewed distribution The moderately skewed distribution you will work with will be a Weibull distribution. To determine which one, you will use the last two digits of your student number. To choose α, multiply the last digit of your student number by 0.1 and add it to 1.1. For example if the last digit of the student number is 6, then α = 1.1 + .6 = 1.7. To choose β, add the second to the last digit of your student number to 30. For example, if second to the last digit of your student number is 8, then β = 30 + 8 = 38. The mean and the standard deviation of the Weibull distribution are given by the following formula: (the gamma function, denoted by Γ(x), can be computed in R by gamma) µ = β Γ(1 + 1/α) q 2 σ = β Γ(1 + 2/α) − (Γ(1 + 1/α)) For each of three sample sizes, n = 10, n = 30 and n = 50, you will take 10,000 samples and find out how σ σ many of the 10,000 sample means are in the interval from µ − 1.96 √ to µ + 1.96 √ . Since we hope that the n n distribution of x̄ is symmetric, we expect about 250 observations to be outside of the interval on either side. You will count the number of observations outside the interval on either side. Part II - A highly skewed distribution There are 3,141 counties in the United States. The (counties) dataset in the M241 package has several variables defined on each county. In this part, we are concerned with the variable Population. (So the population is counties and the variable is population! Just as in Part I, for each sample size of n = 10, n = 30 and n = 50, take 10,000 samples of n counties and σ σ determine how many of the 10,000 samples have mean in the interval from µ − 1.96 √ to µ + 1.96 √ . n n Your report for this project consists of completing the sheet on the next page. Project 3 Report Sheet Last two digits of student number: Part I: Weibull distribution used Mean and standard deviation of population α: µ: β: σ: For each n, in how many of the 10,000 samples of size n is x̄ outside of the interval on the left or the right? Complete the table: n l = µ − 1.96 √σn r = µ + 1.96 √σn No. of x̄ < l No. of x̄ > r 10 30 50 For this distribution and these sample sizes, does it appear that using the Central Limit Theorem to generate a 95% confidence interal is appropriate? Part II: Population mean: Population standard deviation: For each n, in how many of the 10,000 samples of size n is x̄ outside of the interval on the left or the right? Complete the table: n l = µ − 1.96 √σn r = µ + 1.96 √σn No. of x̄ < l No. of x̄ > r 10 30 50 For this distribution and these sample sizes, does it appear that using the Central Limit Theorem to generate a 95% confidence interal is appropriate?