Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sampling Distributions, the CLT, and Estimation Carolyn J. Anderson EdPsych 580 Fall 2005 Sampling Distributions, the CLT, and Estimation – p. 1/63 Sampling and Estimation • Sampling Distributions • Normal distribution & Central Limit Theorem • Estimators and estimates • Statistical Inference (interval estimation) Sampling Distributions, the CLT, and Estimation – p. 2/63 Recall The Big Picture ¶ Select a-Subset Sample n Population N µ ³ ¾ ´ Make Inferences Sampling Distributions, the CLT, and Estimation – p. 3/63 Population or “sample space” consists of elementary events. • All potential units that could be observed. • If finite, then number of units countable. If infinite, then number of potential observations is infinite. If “virtually infinite”, then very, very, very large number. • Real or hypothetical. • All college students in U.S. • All possible mean SAT scores from samples drawn from all college student in U.S. Sampling Distributions, the CLT, and Estimation – p. 4/63 Random Variables • A Random Variable is a number assigned to any particular member of the population. This set of numbers has a distribution. • Population Distribution is the (frequency) distribution of these random variables. It has some form with mean µ and variance σ 2 . • Population distributions are almost always treated as (theoretical) probability distributions. • Random sample with replacement −→ long-run relative frequency of a value is the same as the probability of that value. Sampling Distributions, the CLT, and Estimation – p. 5/63 Parameters Parameters of populations (“true values”) are values that summarize (define) the distribution. • Mean • Variance • others Sampling Distributions, the CLT, and Estimation – p. 6/63 Sample • A Sample is a sub-set of n units from the population. • Quantities or values computed using a sample of observations of random variables are Statistics. • Examples: • Mean: X̄ = (1/n) Pn Xi Pn i=1 • Variance: s2n = (1/n) 2 (X − X̄) i i=1 • 2nd observation on X: X2 • Range: (Xmax − X min ) Sampling Distributions, the CLT, and Estimation – p. 7/63 Sampling Distributions ** Key Concept** A “conceptual experiment”: • Imagine randomly sampling n individuals from a population and computing some statistic based on the sample. • Repeat this (independently) many times. • Result: many values of the sample statistic −→ The sampling distribution of the sample statistic. Sampling Distributions, the CLT, and Estimation – p. 8/63 Sampling Distributions (continued) From Hayes: A Sampling Distribution is a theoretical probability distribution that shows the functional relation between possible values of a given statistic based on a sample of n cases and the probability (density) associated with each value, for all possible samples of size n drawn from a particular population. Sampling Distributions, the CLT, and Estimation – p. 9/63 Sampling Distributions (continued) • In general, the sampling distribution will not be the same as the population distribution. • We describe sampling distributions the same way that we describe population (or a sample) distributions. i.e., mean, variance, standard deviation, shape, etc. Sampling Distributions, the CLT, and Estimation – p. 10/63 Characteristics of Sampling Dist. If the population distribution has mean µ and variance σ 2 , then the sampling distribution for a statistic (for samples of size n) has • Mean of the sampling distribution of the statistic equals the population mean of that statistic, µ. • Variance of the sampling distribution of the statistic equals the population variance divided by the sample size, σ 2 /n. • Standard Deviation of the sampling distribution of the √ statistic, “standard error of estimate”, equals σ/ n. Sampling Distributions, the CLT, and Estimation – p. 11/63 Characteristics of Sampling Dist. The statements on the previous slide regarding the mean, variance and standard deviation of sampling distributions are true for all statistics regardless of the shape of the parent/population distribution. Sampling Distributions, the CLT, and Estimation – p. 12/63 Eg: Sampling Dist of the Mean • Population: Y is a random variable with mean µ and variance σ 2 . • Sample: Random (independent) sample from the population: Y1 , Y2 , . . . , Yn . Pn The sample mean Ȳ = (1/n) i=1 Yi . • • Expected value of Ȳ (E(Ȳ ), mean of the sampling distribution) of Ȳ . . . Sampling Distributions, the CLT, and Estimation – p. 13/63 Expected value of Ȳ The mean of the sampling distribution of Ȳ . . . E[Ȳ ] = = = = = 1 E[ (Y1 + Y2 + . . . + Yn )] n 1 E[Y1 + Y2 + . . . + Yn ] n 1 (E[Y1 ] + E[Y2 ] + . . . + E[Yn ]) n 1 (µ + µ + . . . + µ) n n X 1 µ=µ n i=1 Sampling Distributions, the CLT, and Estimation – p. 14/63 Variance of Ȳ • • • Recall that σ 2 = E[(Y − µ)2 ] = E(Y 2 ) − µ2 . var(Ȳ ) = E[(Ȳ − µ)2 ] = E[Ȳ 2 ] − µ2 . Square sample mean, Ȳ 2 • (Y1 + Y2 + . . . + Yn )2 = n2 (Y12 + . . . + Yn2 + 2Y1 Y2 + 2Y1 Y3 + . . . + 2Y(n−1) Yn ) = n2 If two random variables, e.g. Y1 and Y2 , are independent, then E(Y1 Y2 ) = E(Y1 )E(Y2 ) = µµ = µ2 Sampling Distributions, the CLT, and Estimation – p. 15/63 Variance of Ȳ E[Ȳ 2 ] = = = = = = (continued) E[(Y12 + . . . + Yn2 + 2Y1 Y2 + 2Y1 Y3 . . . + 2Y(n−1) Yn )] 2 n P Pn 2 i=1 E[Yi ] + 2 i>j E[Yi Yj ] 2 n P Pn 2 2 2 µ (σ + µ ) + 2 i>j i=1 n2 2 µ n(σ 2 + µ2 ) + 2 (n−1)n 2 n2 σ 2 + nµ2 n σ2 + µ2 n Sampling Distributions, the CLT, and Estimation – p. 16/63 Variance of Ȳ (continued) var(Ȳ ) = E[(Ȳ − µ)2 ] = E[Ȳ 2 ] − µ2 σ2 = ( + µ 2 ) − µ2 n σ2 = n We made no assumptions regarding the nature of the population distribution, except that the mean equals µ and variance equals σ 2 ! Sampling Distributions, the CLT, and Estimation – p. 17/63 Variance of Ȳ (continued) var(Ȳ ) = σȲ2 = σ 2 /n As n increases, var(Ȳ ) decreases (i.e., precision of the estimate of the statistic increases). o Sampling Distributions, the CLT, and Estimation – p. 18/63 Normal Distribution and the C.L.T. • The normal distribution is a particular probability distribution for continuous variables. • The “Bell Curve” • Why is it so important? • It’s a good approximation of the (population) distribution of many measured variables. • Many statistical procedures are based on the assumption of a normal distribution (e.g., sampling distributions of statistics). • It has lots of nice mathematic properties. Sampling Distributions, the CLT, and Estimation – p. 19/63 The Normal Distribution Formal definition: The family of normal distributions is a set of symmetric, bell shaped curves each characterized by its µ and σ 2 . The formula for the normal p.d.f is f (x) = √ 1 2πσ 2 2 − 12 ( x−µ σ ) e where • e = 2.71828 . . . (base of natural log). • π = 3.14159 (circumference/diameter). Sampling Distributions, the CLT, and Estimation – p. 20/63 2 Normal: µ = 0 and σ = 4 Sampling Distributions, the CLT, and Estimation – p. 21/63 2 Normal: σ = 1, µ = 0, 5, 10 Sampling Distributions, the CLT, and Estimation – p. 22/63 2 Normal: µ = 0, σ = 1, 4, 16 Sampling Distributions, the CLT, and Estimation – p. 23/63 Normal: A bunch of different ones Sampling Distributions, the CLT, and Estimation – p. 24/63 The Standard Normal Distribution Sampling Distributions, the CLT, and Estimation – p. 25/63 The Standard Normal Distribution • You can transform any normally distributed variable into a standard normal one: Y −µ z–score = σ • A z-score equals how many standard deviations a value of Y is from it’s mean, zσ = Y − µ • Use z-scores to find probabilities of continuous variables from tabled values or computer programs for the standard normal distribution. • z ∼ N (0, 1). (special case of x ∼ N (µ, σ 2 )). Sampling Distributions, the CLT, and Estimation – p. 26/63 The Standard Normal Distribution Finding areas/probabilities for the standard normal distribution: • Course web-site — downloadable program, pvalue.exe • UCLA web-site • SAS function “probnorm” (default is N (0, 1), but can ask for others). Sampling Distributions, the CLT, and Estimation – p. 27/63 The Central Limit Theorem Version 1 (sums): Consider a random sample from a population distribution having mean µ and variance σ 2 . If n is sufficiently Pn large, then the sampling distribution of i=1 Yi is approximately normal with mean nµ and variance σ 2 . Version 2 (means): Consider a random sample from a population distribution having mean µ and variance σ 2 . If n is sufficiently large, then the sampling distribution of Ȳ is approximately normal with mean µ and variance σȲ2 = σ 2 /n. Sampling Distributions, the CLT, and Estimation – p. 28/63 Example: Normal (0,1) “Parent” Parent N (0, 1) =⇒ Sampling distribution of Ȳ is √ N (0, 1/ n) Sampling Distributions, the CLT, and Estimation – p. 29/63 Uniform Parent (µ = .5) Pink is “kernal” density and Red is normal. Need more than n = 10 for this one. . . Sampling Distributions, the CLT, and Estimation – p. 30/63 Skewed Parent (µ = 1) Pink is “kernal” density and Red is normal. Need more than n = 10 for this one. . . Sampling Distributions, the CLT, and Estimation – p. 31/63 Skewed Parent (µ = 1) (continued) Sampling Distributions, the CLT, and Estimation – p. 32/63 Dice Rolling (“Multinomial”) Sampling Distributions, the CLT, and Estimation – p. 33/63 Dice Rolling (continued) Look pretty normal? Sampling Distributions, the CLT, and Estimation – p. 34/63 Example: Dice Rolling (continued) Sampling Distributions, the CLT, and Estimation – p. 35/63 Example: Dice Rolling (continued) Population: µ = 3.5, σ 2 = 2.92, σ = 1.71 The MEANS Procedure Variable N Mean Std Dev spot1 1 3.5 1.71 mean2 2 3.5 1.21 mean5 5 3.5 0.76 mean20 20 3.5 0.38 mean50 50 3.5 0.24 Std Dev Should be √ 1.71/ 1 = 1.71 √ 1.71/ 2 = 1.21 √ 1.71/ 5 = .76 √ 1.71/ 20 = .38 √ 1.71/ 50 = .24 Sampling Distributions, the CLT, and Estimation – p. 36/63 Example: Dice Rolling (continued) Sampling Distributions, the CLT, and Estimation – p. 37/63 Another Discrete Distribution (Bernoulli) P (Y = 0) = P (Y = 1) = .5, µ = .5, σ 2 = .25 Sampling Distributions, the CLT, and Estimation – p. 38/63 Another Discrete Distribution (Bernoulli) P (Y = 0) = P (Y = 1) = .5, µ = .5, σ 2 = .25 Sampling Distributions, the CLT, and Estimation – p. 39/63 Another Discrete Distribution (Bernoulli) P (Y = 0) = P (Y = 1) = .5, µ = .5, σ 2 = .25 Variable n Mean Std Dev x1 1 .50 .50 mean2 2 .50 .35 mean5 5 .50 .22 mean50 50 .50 .07 mean100 100 .50 .05 mean500 500 .50 .02 mean5000 5,000 .50 .01 Should be p .25/1 = .5 p .25/2 = .35 p .25/5 = .22 p .25/50 = .07 p .25/100 = .05 p .25/500 = .02 p .25/5000 = .01 Sampling Distributions, the CLT, and Estimation – p. 40/63 Another Discrete Distribution (Bernoulli) P (Y = 0) = .99, P (Y = 1) = .01: µ = .01, σ 2 = .0099 Sampling Distributions, the CLT, and Estimation – p. 41/63 Another Discrete Distribution (Bernoulli) P (Y = 0) = .99, P (Y = 1) = .01, µ = .01, σ 2 = .0099 How about n = 500 (left) and n = 5, 000 (right)? Sampling Distributions, the CLT, and Estimation – p. 42/63 Another Discrete Distribution (Bernoulli) µ = .01 and σ 2 = .0099 Variable n Mean Std Dev x1 1 .01 .1028864 mean2 2 .01 .0713202 mean5 5 .01 .0444879 mean50 50 .01 .0140665 mean100 100 .01 .0099457 mean500 500 .01 .0044401 mean5000 5000 .01 .0014053 p p Std Dev is not exactly equal to σ 2 /n = .0099/n because need more than 100,000 sample means. Sampling Distributions, the CLT, and Estimation – p. 43/63 Implication of C.L.T or NOT? • As n increases, σ 2 decreases; the sampling error in Ȳ estimating µ decreases when sample size increases. NOT • Sampling distributions of (most) statistics are approximately normal regardless of the shape of the parent (population) distribution. YES • Sampling distributions of statistics take on more normal shapes as n increases. Usually with as small as n = 25 to 30, the sampling distribution is well approximated by the normal. YES If the population distribution is “well behaved”, the the normal distribution is good for almost all sample sizes. Sampling Distributions, the CLT, and Estimation – p. 44/63 C.L.T: Summary of Implications • Since the sampling distribution of Ȳ is approximately N (µ, σ 2 /n), we can use the tabled probabilities of the standard normal distribution to compute interval estimates of µ and do statistical tests (i.e., make statistical inferences about the degree of uncertainty)....more later • n = 25 or 30 does not imply that we have sufficient precision. We may require much larger n’s to detect small effects. n = 30 means that often the samplings distribution of Ȳ is approximately normal. Sampling Distributions, the CLT, and Estimation – p. 45/63 C.L.T: Summary of Implications • The sampling distribution of Ȳ always has mean µ and σ 2 /n. The shape of the sampling distribution of Ȳ is normal for small n only if the population distribution of Y is normal. • n = 30 does not ensure that the sampling distribution of a statistic will be even approximately normal. There are cases were it requires much larger samples. These cases usually are ones where the statistic equal the sum of values that are discrete (e.g., Y = 0, 1) and the probability of (say) Y = 1 is very very small. • C.L.T. can be proven mathematically. Sampling Distributions, the CLT, and Estimation – p. 46/63 Estimators and Estimates • An estimator is a formula for computing an estimate • An estimator is a random variable whose value depends on your sample. • An estimate is a particular value of an estimator. • Examples: Sampling Distributions, the CLT, and Estimation – p. 47/63 Estimators and Estimates (continued) • Examples of estimators and estimates: • The sample mean and variance are estimators, Ȳ = (1/n) n X Yi i=1 • i=1 (Yi − Ȳ )2 Given data from a sample the estimates are. e.g. HSB reading scores: Ȳ = 55.89 • s2n = (1/n) n X s2n = 80.00 The above estimates are point estimates. Sampling Distributions, the CLT, and Estimation – p. 48/63 Properties of Estimators • Bias • Consistency • Relative efficiency • Sufficiency • Maximum likelihood Sampling Distributions, the CLT, and Estimation – p. 49/63 Properties of Estimators: Bias • An estimator is unbiased if it’s expected value equals the population value. • An estimator is biased if it’s expected value does not equal the population value. Pn The sample mean Ȳ = (1/n) i=1 Yi is an unbiased estimator of µ: • E(Ȳ ) = µ • If the parent population is normal then the median and mode are an unbiased estimators of µ: E(median) = E(mode) = µ Sampling Distributions, the CLT, and Estimation – p. 50/63 Properties of Estimators: Bias (continued) • s2n The sample variance = (1/n) is a biased estimator of σ 2 : E(s2n ) Pn 2 (Y − Ȳ ) i=1 i 1 2 =σ − σ n 2 It’s a little too small. • 2 The unbiased estimator of σ is P s2 = (1/(n − 1)) ni=1 (Yi − Ȳ )2 : E(s2 ) = σ 2 Sampling Distributions, the CLT, and Estimation – p. 51/63 Consistency & Efficiency • Consistency: As the sample size n increases, the sample statistic “converges in probability” to the population value. • The sample mean Ȳ is a consistent estimator of µ. • The 2nd observation in a sample is not a consistent estimator of µ. • Relative Efficiency: An estimator is more efficient if the variance of it’s sampling distribution is less than the variance of another estimator. e.g., For normal Y, Ȳ is more efficient than the median. Sampling Distributions, the CLT, and Estimation – p. 52/63 Sufficient • • • The statistic contains all the information in the data about the population parameter. e.g.,P Ȳ is sufficient for µ n and i=1 Yi is sufficient for µ. Sufficient statistics don’t always exist. In some population distributions, you may need more than 1 parameter to completely specify the distribution. e.g., Bernoulli needs the mean (or probability). Normal distribution needs Ȳ and s2 . Sampling Distributions, the CLT, and Estimation – p. 53/63 Maximum Likelihood An estimator that maximizes the likelihood (probability) of obtaining the sample you got. • ȳ is the M.L.E of µ (it’s also consistent, efficient, and unbiased). • s2n is the M.L.E but is biased. • s2 is not the M.L.E but is unbiased. Sampling Distributions, the CLT, and Estimation – p. 54/63 Interval Estimates & Statistical Inference • So far, we’ve just considered “point estimates” (a “best guess”). • We might want a range of possible values. • A range of values that has a high probability of containing the true population value. • Confidence Interval Estimate Sampling Distributions, the CLT, and Estimation – p. 55/63 Confidence Interval for µ • We know • E(Ȳ ) = µ • σ 2 = σ 2 /n Ȳ • We assume that the sampling distribution of Ȳ is normal (i.e., that n is “large enough”); that is Ȳ ≈ N (µ, σ 2 /n) Sampling Distributions, the CLT, and Estimation – p. 56/63 Sampling Distribution of Ȳ E(Ȳ ) = µ and σȲ2 = σ 2 /n Sampling Distributions, the CLT, and Estimation – p. 57/63 Confidence Interval for µ • Our best point estimate is Ȳ , so an interval estimate should be centered around Ȳ . • We add and subtract an amount c such that £ ¤ Prob (Ȳ − c) ≤ µ ≤ (Ȳ + c) = 1 − α • To find the value of c transform Ȳ to z-scores; that is, Ȳ − µ z= σȲ p where σx̄ = σ 2 /n. Sampling Distributions, the CLT, and Estimation – p. 58/63 Transform to z-Scores z = (Ȳ − µ)/σx̄2 Sampling Distributions, the CLT, and Estimation – p. 59/63 Confidence Interval for µ • • Before we look at data, µ ¶ Ȳ − µ Prob −zα/2 ≤ ≤ zα/2 = 1 − α σȲ ¡ ¢ 2 2 Prob Ȳ − zα/2 σȲ ≤ µ ≤ Ȳ + zα/2 σȲ = 1 − α Once you get data, an interval estimate of µ is x̄ ± zα/2 σx̄ • The probability that µ is in this interval is NOT 1 − α. Sampling Distributions, the CLT, and Estimation – p. 60/63 Correct Interpretation of CI • • • Consider repeating the process of 1. Draw/take a sample of size n 2. Compute the (1 − α)th confidence interval. (1 − α) × 100 percent of the time, the interval would contain µ. Note: later we’ll consider the more realistic situation where estimate σ. Sampling Distributions, the CLT, and Estimation – p. 61/63 HSB Reading Scores for Academic • Sample statistics for students attending academic/prep school and the variable “RDG” (reading achievement in T-scores): n = 308, Ȳ = 55.89, s2 = 87.15, s = 9.34 • Standard √ error of the mean = 9.34/ 308 = .53 • The sampling of distribution of Ȳ should be very well approximated by the normal distribution because of large n and distribution of RDG scores is “nice”. Sampling Distributions, the CLT, and Estimation – p. 62/63 HSB Reading Scores for Academic 64%CI: 90%CI: 95%CI: 99%CI: 55.89 ± 1.00(.53) 55.89 ± 1.645(.53) 55.89 ± 1.96(.53) 55.89 ± 2.58(.53) −→ −→ −→ −→ (55.36, 56.42) (55.02, 56.77) (54.85, 56.93) (54.52, 57.26) Higher confidence levels (smaller α) −→ the wider the intervals. Sampling Distributions, the CLT, and Estimation – p. 63/63