Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Motivation (about Sampling) Sampling Distributions The Central Limit Theorem The Central Limit Theorem Hyon-Jung Kim-Ollila Department of Mathematical Sciences, University of Oulu Department of Signal Processing and Acoustics, Aalto University March 19, 2013 The Central Limit Theorem Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem Agenda 1 Motivation (about Sampling) 2 Sampling Distributions 3 The Central Limit Theorem The Central Limit Theorem Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem Statistical Inference The purpose of statistical inference is to make statements about a population based on information contained in a sample. Why samples? Often, it is not possible to examine all the elements in a population due to limitation of our time, resources, and efforts. Sampling variability Each time we take a random sample from a population, we are likely to get a different set of individuals and calculate a different summary. Sampling distributions If we take a lot of random samples of the same size, the nature of variation from sample to sample will follow a (predictable) pattern and can be determined or approximated in many situations. This allows us to evaluate the reliability of our inference. The Central Limit Theorem describes the characteristics of distribution of the sample means when the sample size is large. The Central Limit Theorem Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem Definitions and Notations Population: the set of all the elements of interest. Random variable : a variable whose value is subject to variations due to chance, denoted by X , Y , or Z ... Sample: a subset of the population from which data is collected. Data are observations of a random variable denoted by x1 , . . . , xn Parameter: a numerical characteristic of a population. e.g. population mean (μ), population variance (σ2 ) Statistic: a numerical summary of the sample i.e. a function of data p) e.g. sample mean (X ), sample variance (s 2 ), sample proportion(^ Population Distribution : probability distribution of a random variable Sampling Distribution : probability distribution of a sample statistic Def. X = n i =1 n Xi , The Central Limit Theorem s2 = n 2 i =1 (Xi −X ) n−1 , p ^ = total number of successes(X ) the sample size(n) Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem Example I: We want to know the average weight (μ) of eggs of the brown variety produced by a company (satamuna). . We buy a carton of 12 brown eggs and the box weighs 708g. The average egg weight from that sample is 59g (X ). If we take another carton of 12 brown eggs, we might have X =62g... (sampling variability) If we were to sample “many times” with n = 12, the resulting distribution of the values of X is called the sampling distribution of X . The Central Limit Theorem Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem Example II: We want to know what proportion (p) of Helsinki residents who will vote in an upcoming election favor the candidate A. 64 out of 100 residents who answered in a phone survey claim that they favor A : p^ = 0.64 If we selected randomly another 100 residents (all answered), we may have p^ = 0.55... If we were to sample “many times” with n = 100, the resulting distribution of the values of p^ is called the sampling distribution of p^. The Central Limit Theorem Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem Sampling Distribution of the Mean A fair die is thrown (infinitely many times) and the number on the spot is observed. The random variable X = the number observed on the spot The probability distribution of X : x p(x) 1 1/6 2 1/6 3 1/6 4 1/6 5 1/6 6 1/6 Meanof X : μ = xp(x) = 1(1/6) + 2(1/6) + ... + 6(1/6) = 3.5 Variance of X : 2 2 σ = (x − μ) p(x) = (1 − 3.5)2 (1/6) + ... + (6 − 3.5)2 (1/6) = 35/12 Standard deviation of X : √ 2 σ = σ ≈ 1.71 The Central Limit Theorem Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem Two fair six-sided dice are rolled and observe the mean of the numbers. 2 All samples of size n = 2 and their means: X = X1 +X 2 sample (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) x 1.0 1.5 2.0 2.5 3.0 3.5 sample (2,1) (2,2) (2,3) (2,4) (2,5) (2,6) x 1.5 2.0 2.5 3.0 3.5 4.0 sample (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) x 2.0 2.5 3.0 3.5 4.0 4.5 sample (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) x 2.5 3.0 3.5 4.0 4.5 5.0 sample (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) 3.5 6/36 4.0 5/36 4.5 4/36 x 3.0 3.5 4.0 4.5 5.0 5.5 sample (6,1) (6,2) (6,3) (6,4) (6,5) (6,6) x 3.5 4.0 4.5 5.0 5.5 6.0 The sampling distribution of the mean X : x p(x) 1.0 1/36 1.5 2/36 2.0 3/36 2.5 4/36 3.0 5/36 5.0 3/36 5.5 2/36 6.0 1/36 Mean of X : μx = xp(x) = 1.0(1/36) + 1.5(1/36) + ... + 6.0(1/36) = 3.5 X : Variance of 2 2 σ2x = (x − μx )2 p(x) = (1.0 − 3.5) (1/36) + ... + (6.0 − 3.5) (1/36) = 35/24 ≈ 1.46 Standard Error of X : σx = σ2x ≈ 1.21 The Central Limit Theorem Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem Compare... the distribution of X and the sampling distribution of X Histogram of x_bar 0 0.0 1 0.2 2 3 Frequency 0.6 0.4 Frequency 4 0.8 5 6 1.0 Histogram of x 1 2 3 4 x 5 6 1 2 3 4 5 6 x_bar Note that μx = μ and σ2x = σ2 /2. When we generalize the mean and the variance of the sampling distribution of X from two dice to the sampling distribution of X from n dice, √ μx = μ, σ2x = σ2 /n, and σx = σ/ n (This generalization works as long as X is taken over the independent variables (Xi ’s).) The Central Limit Theorem Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem More Illustrations X ∼ Exponential : μ = E[X ] = 2, σ2 = Var[X ] = 4, σ = 2. √ ⇒ X : μx = 2, σx = 2/ n, X ∼ approx. Normal (as n → ∞ by CLT). Population distribution Sample Means (n=4) 0.4 0.3 Density Density 0.3 0.2 0.1 0.1 0.0 0.0 0 2 4 6 8 10 12 14 0 1 2 3 4 5 6 Means of Size 1 Means of Size 4 Sample Means (n=16) Sample Means (n=30) 0.8 7 1.0 0.8 Density 0.6 Density 0.2 0.4 0.2 0.6 0.4 0.2 0.0 0.0 1.0 1.5 2.0 2.5 3.0 3.5 Means of Size 16 The Central Limit Theorem 1.0 1.5 2.0 2.5 3.0 3.5 Means of Size 30 Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem Highly non-Normal Population n=1 Consider X with p.d.f 3 fX (x) = x 2 for − 1 < x < 1 : 2 1.4 1.2 p̂( x ) 1 0.8 μ = E[X ] = 0, σ2 = Var[X ] = 35 . 3 , ⇒ X : μx = 0, σx = 5n 0.6 0.4 0.2 0 −1 −0.8 −0.6 −0.4 −0.2 0 x 0.2 0.4 0.6 0.8 1 X ∼ approx. Normal (as n → ∞). n=4 n = 16 1.6 2 1.4 1.2 1.5 1 0.8 1 0.6 0.4 0.5 0.2 0 −1 −0.8 −0.6 −0.4 −0.2 0 x̄ The Central Limit Theorem 0.2 0.4 0.6 0.8 0 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 x̄ Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem The Central Limit Theorem [Most important theorem in Statistics] The sampling distribution of the mean of a random variable drawn from any population is approximately normal for a sufficiently large sample size (under certain general conditions). It describes characteristics of the ‘population of sample means (sums)’. The sample means are produced from the means of many independent observations from a given ‘parent population’. The parent population can be any distribution or of any form. Ubiquity of the famous bell-shaped ‘Normal’ (‘Gaussian’) distribution: We can use probabilities associated with the normal curve to answer questions about the means of sufficiently large samples. When a random variable is Normal, its mean is Normal by the property of Normal distribution. (Don’t need the Central Limit Theorem in this case.) The Central Limit Theorem Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem Formally... The Central Limit Theorem (CLT) Let X1 , X2 , . . . , Xn be i.i.d. random variables from ANY distribution with E[Xi ] = μ and Var[Xi ] = σ2 (both existing). Then, for a large n, n Xi − nμ X −μ √ = i =1 √ ∼ Approx. Normal(0, 1) σ/ n σ n X is approximately normally distributed with mean μ and variance σ2 /n for a large n. (linear transformation of a Normal r.v.) n i =1 Xi = X1 + · · · + Xn is approximately normally distributed with mean nμ and variance nσ2 for a large n. Def. X1 , ..., Xn is a random sample of X if it is the result of n mutually independent trials of the random process or experiment which generates X ⇔ X1 , ..., Xn i.i.d . (independent and identically distributed) The Central Limit Theorem Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem The sampling distribution of sample proportions The sample proportion: p^ = Xn where X is the number of successes and n is the sample size. Then, X is a Binomial random variable with mean E[Xi ] = np and the variance Var[Xi ] = np(1 − p). Recall that a Binomial random variable can be represented as a sum of independent Bernoulli random variables : X = ni=1 Yi where Yi ∼ Bern(p). Thus, the central limit theorem applies to X for a large n: X ∼ Approx. Normal(np, np(1 − p)) or p(1 − p) p^ ∼ Approx. Normal p, n (Normal Approximation to the Binomial) The Central Limit Theorem Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem The Speed of Convergence in CLT Q. How large does n have to be? It is not only n but “n combined with the skewness of distribution of the random variable (X )” that matters (plus kurtosis to a much lesser degree) by Edgeworth expansions. - skewness of X : E (X − μ)3 /σ3 - kurtosis of X : E (X − μ)4 /σ4 For the Central Limit Theorem (CLT) to be applicable: If the population distribution is reasonably symmetric, n ≥ 30 should suffice. If the population distribution is highly skewed or unusual, even larger sample sizes will be required. For the sampling distribution of sample proportion, the sample size n has to be big enough so that both np and n(1 − p) are at least 10. The Central Limit Theorem Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem Applications of CLT A die is rolled 420 times. What is the probability that the sum of the rolls lies between 1400 and 1550? The sum is a random variable 420 Xi = X1 + · · · + X420 Y = i =1 where Xi has a distribution (in page 8) with μ = 7/2 and σ2 = 35/12. Thus, E[Y ] = nμ = 420 · 7/2 = 1470, Var[Y ] = 420 · 35/12 = 1225. 1550 − 1470 Y − nμ 1400 − 1470 √ ≤ ≤ ) 35 35 σ n ≈ P(−2.0 ≤ Z ≤ 2.2857) = 0.9661 P(1400 ≤ Y ≤ 1550) = P( where Z ∼ Normal(0, 1). When the sample size is large, the effect of the correction factor is negligible. The Central Limit Theorem Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem Proof of the CLT Let Zi = Xiσ−μ . Then, E[Zi ] = 0 and Var[Zi ] = 1, since E[Xi ] = μ & Var[Xi ] = σ2 . n n n Xi − nμ 1 Xi − μ 1 X −μ i =1 √ = √ =√ =√ Zi Let Un = σ σ/ n σ n n n i =1 i =1 Then, the moment generating function of Un is given by n t Zi MUn (t) = E [e tUn ] = E exp √ n i =1 n t E exp √ Zi since Zi are independent = n i =1 n n t t = MZi √ MZi √ :Zi are identically distributed = n n i =1 Note: Zi are i.i.d. because Xi are i.i.d. The Central Limit Theorem Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem By Taylor’s theorem, for t near 0, 1 MZi (t) = MZi (0) + tMZi (0) + t 2 MZi (0) + o(t 2 ) 2 Note that MZi (0) = 1, MZi (0) =E[Zi ] = 0 and MZi (0) =E[Zi2 ]= 1. Then, n 2 n t t2 t +o MZi √ = 1+ MUn (t) = 2n n n → et Thus, Un = X −μ √ σ/ n 2 /2 as n → ∞ for each fixed t. converges to a standard normal r.v. as n → ∞. Moment generating function continuity theorem: If moment generating functions MXn (t) are defined for all t and n and limn→∞ MXn (t) = MX (t) for all t, then Xn converges in distribution to X (limn→∞ FXn (x) = FX (x) at all x ∈ R at which FX is continuous, X ∼ FX , Xn ∼ FXn ). The Central Limit Theorem Hyon-Jung Kim-Ollila Motivation (about Sampling) Sampling Distributions The Central Limit Theorem More General Versions of CLT The central limit theorem is actually fairly robust. Variants of the theorem still apply if you allow the Xi’s not to be identically distributed, or the Xi’s not to be completely independent. Roughly speaking, if you have a lot of little random terms that are “mostly independent” (and no single term contributes more than a “small fraction” of the total sum), then the total sum should be “approximately” normal. A general form of CLT: Let X1 , X2 , . . . , Xn be independent random variables from ANY distribution with E[Xi ] = μi and Var[Xi ] = σ2i (both existing). Then, for a large n, n i =1 (Xi − μi ) ∼ Approx. Normal(0, 1) n 2 i =1 σi The Central Limit Theorem Hyon-Jung Kim-Ollila