Download Central limit theorem: X is the sample mean of a

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Central limit theorem: X is the sample mean of a random sample (from some population) of size n. If n is large then X
has an approximate normal distribution.
• If the population is normal, X has a normal distribution for any size sample.
• If the population has mean µ and variance σ2, then the mean of X is µ and its variance is σ2/n.
• If the s2 is the sample variance then E(s2) = σ2
• One consequence of the central limit theorem is that the interval X ± t α / 2,n−−1s / n has probability 1-α
α of
containing µ.
Statistics has two kinds of populations: finite populations and infinite populations
Sampling from an infinite population: Infinite populations are defined by probability density functions f(x)
A random sample of size n from f(x) is a collection of n random variables X1, . . ., Xn such that X1, . . ., Xn are
independent and each Xi has the probability distribution defined by f(x). In R, we can generate random samples from
many frequently used f(x).
Our Monte Carlo simulation will proceed as follows:
• Pick an interesting f(x) (this is analogous to choosing a finite population U)
• Single iteration of Monte Carlo simulation: using an R random number generator, generate a sample X1, . . ., Xn
from f(x) and calculate the sample mean X
• Now we do this a large number K of times. This yields K values of X .
• The central limit theorem predicts that if n is large enough, X will have close to a normal distribution. We can
check to see if the K values of X appear to come from a normal distribution.
• We can also check to see if the mean of the X is µ and its variance is σ2/n. Similarly we can calculate the sample
variance s2 for each sample and check if E(s2) = σ2
• For each sample we can compute the confidence interval X ± t α / 2,n−−1s / n and see whether it contains µ.
We will use 3 populations:
• t distribution with 20 d.f. This population is very close to normal. We will use n=3.
• t distribution with 4 d.f. This population has long tails with respect to the normal distribution. We will use n=3
and n=25.
• t distribution with 1 d.f. This population has humongous tails and produces extreme outliers. We will use n=1000
These populations are all symmetric and bell shaped. This could be part of a study on the effect of tail behavior on
convergence rate of the central limit theorem. K=5000 simulated samples per population.
The central limit theorem is also affected by skewness. This will be the subject of a homework exercise.
0.4
0.2
0.0
0.1
ynorm
0.3
normal
t-20 df
t-4 df
t-1 df
-3
-2
-1
0
1
2
3
x
This is a plot of the
but as the number of
distributions are all
of the distribution.
densities of the normal, t-20, t-4, and t-1 distributions. We see that the t-20 is quite close to the normal,
degrees of freedom decreases, more probability is put into the tails (and less near the center).
These 4
symmetric, centered around µ=0, and bell shaped. For the normal, t-20, and t-4 distributions µ=0 is the mean
The t-1 distribution, also known as the Cauchy, has neither a mean nor a variance because the integrals
fail to converge for the t-1 density function
µ = ∫ −∞∞ x f (x)dx σ 2 = ∫ −∞∞ x 2 f(x)dx
f(x) =
1
.
π (1+
+x2)