Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Central limit theorem: X is the sample mean of a random sample (from some population) of size n. If n is large then X has an approximate normal distribution. • If the population is normal, X has a normal distribution for any size sample. • If the population has mean µ and variance σ2, then the mean of X is µ and its variance is σ2/n. • If the s2 is the sample variance then E(s2) = σ2 • One consequence of the central limit theorem is that the interval X ± t α / 2,n−−1s / n has probability 1-α α of containing µ. Statistics has two kinds of populations: finite populations and infinite populations Sampling from an infinite population: Infinite populations are defined by probability density functions f(x) A random sample of size n from f(x) is a collection of n random variables X1, . . ., Xn such that X1, . . ., Xn are independent and each Xi has the probability distribution defined by f(x). In R, we can generate random samples from many frequently used f(x). Our Monte Carlo simulation will proceed as follows: • Pick an interesting f(x) (this is analogous to choosing a finite population U) • Single iteration of Monte Carlo simulation: using an R random number generator, generate a sample X1, . . ., Xn from f(x) and calculate the sample mean X • Now we do this a large number K of times. This yields K values of X . • The central limit theorem predicts that if n is large enough, X will have close to a normal distribution. We can check to see if the K values of X appear to come from a normal distribution. • We can also check to see if the mean of the X is µ and its variance is σ2/n. Similarly we can calculate the sample variance s2 for each sample and check if E(s2) = σ2 • For each sample we can compute the confidence interval X ± t α / 2,n−−1s / n and see whether it contains µ. We will use 3 populations: • t distribution with 20 d.f. This population is very close to normal. We will use n=3. • t distribution with 4 d.f. This population has long tails with respect to the normal distribution. We will use n=3 and n=25. • t distribution with 1 d.f. This population has humongous tails and produces extreme outliers. We will use n=1000 These populations are all symmetric and bell shaped. This could be part of a study on the effect of tail behavior on convergence rate of the central limit theorem. K=5000 simulated samples per population. The central limit theorem is also affected by skewness. This will be the subject of a homework exercise. 0.4 0.2 0.0 0.1 ynorm 0.3 normal t-20 df t-4 df t-1 df -3 -2 -1 0 1 2 3 x This is a plot of the but as the number of distributions are all of the distribution. densities of the normal, t-20, t-4, and t-1 distributions. We see that the t-20 is quite close to the normal, degrees of freedom decreases, more probability is put into the tails (and less near the center). These 4 symmetric, centered around µ=0, and bell shaped. For the normal, t-20, and t-4 distributions µ=0 is the mean The t-1 distribution, also known as the Cauchy, has neither a mean nor a variance because the integrals fail to converge for the t-1 density function µ = ∫ −∞∞ x f (x)dx σ 2 = ∫ −∞∞ x 2 f(x)dx f(x) = 1 . π (1+ +x2)