Download Part I - A moderately skewed distribution Part II

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

German tank problem wikipedia , lookup

Central limit theorem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Project 3
The Central Limit Theorem
Due, October 18
Our book uses a common rule of thumb: if the sample size is at least 30, the distribution of the sample mean is
approximately normal. But this is an oversimplfication. For distributions that are heavily skewed or that have
significant outliers, the approximation by a normal distribution for samples of size 30 might not be very good.
One of the most important facts about
√ the distribution of X̄ that we use is that 95% of the possible values of
the sample mean are within 1.96σ/ n of the population mean. (Here σ is the population standard deviation.)
We use this to construct 95% confidence intervals.
In this project, you are going to check the n = 30 rule by comparing the prediction of the central limit theorem
to simulations in some cases where we have the whole population. We are going to look at two populations: one
that is moderately skewed and one that is very skewed.
Part I - A moderately skewed distribution
The moderately skewed distribution you will work with will be a Weibull distribution. To determine which one,
you will use the last two digits of your student number.
To choose α, multiply the last digit of your student number by 0.1 and add it to 1.1. For example if the last
digit of the student number is 6, then α = 1.1 + .6 = 1.7.
To choose β, add the second to the last digit of your student number to 30. For example, if second to the last
digit of your student number is 8, then β = 30 + 8 = 38.
The mean and the standard deviation of the Weibull distribution are given by the following formula: (the
gamma function, denoted by Γ(x), can be computed in R by gamma)
µ = β Γ(1 + 1/α)
q
2
σ = β Γ(1 + 2/α) − (Γ(1 + 1/α))
For each of three sample sizes, n = 10, n = 30 and n = 50, you will take 10,000 samples and find out how
σ
σ
many of the 10,000 sample means are in the interval from µ − 1.96 √ to µ + 1.96 √ . Since we hope that the
n
n
distribution of x̄ is symmetric, we expect about 250 observations to be outside of the interval on either side.
You will count the number of observations outside the interval on either side.
Part II - A highly skewed distribution
There are 3,141 counties in the United States. The (counties) dataset in the M241 package has several variables
defined on each county. In this part, we are concerned with the variable Population. (So the population is
counties and the variable is population!
Just as in Part I, for each sample size of n = 10, n = 30 and n = 50, take 10,000 samples of n counties and
σ
σ
determine how many of the 10,000 samples have mean in the interval from µ − 1.96 √ to µ + 1.96 √ .
n
n
Your report for this project consists of completing the sheet on the next page.
Project 3 Report Sheet
Last two digits of student number:
Part I:
Weibull distribution used
Mean and standard deviation of population
α:
µ:
β:
σ:
For each n, in how many of the 10,000 samples of size n is x̄ outside of the interval on the left or the right?
Complete the table:
n
l = µ − 1.96 √σn r = µ + 1.96 √σn
No. of x̄ < l No. of x̄ > r
10
30
50
For this distribution and these sample sizes, does it appear that using the Central Limit Theorem to generate
a 95% confidence interal is appropriate?
Part II:
Population mean:
Population standard deviation:
For each n, in how many of the 10,000 samples of size n is x̄ outside of the interval on the left or the right?
Complete the table:
n
l = µ − 1.96 √σn r = µ + 1.96 √σn
No. of x̄ < l No. of x̄ > r
10
30
50
For this distribution and these sample sizes, does it appear that using the Central Limit Theorem to generate
a 95% confidence interal is appropriate?