Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A Review and Some Connections Lecture 3: The Normal Distribution and Statistical Inference The Normal Distribution The Central Limit Theorem Estimates of means and proportions: uses and properties Sandy Eckel [email protected] Confidence intervals and Hypothesis tests 24 April 2008 1 / 36 The Normal Distribution 2 / 36 Normal Distribution Normal Density, f(x) A probability distribution for continuous data Characterized by a symmetric bell-shaped curve (Gaussian curve) −∞ µ +∞ x Takes on values between −∞ and +∞ Mean = Median = Mode Area under curve equals 1 Symmetric about its mean µ Notation for Normal random variable: X ∼ N(µ, σ 2 ) Parameters µ = mean σ = standard deviation Under certain conditions, can be used to approximate Binomial(n,p) distribution np>5 n(1-p)>5 3 / 36 4 / 36 Standard Normal Formula: Normal Probability Density Function (pdf) Normal Density, f(x) Definition: a Normal distribution N(µ, σ 2 ) with parameters µ = 0 and σ = 1 Its density function is written as: µ −∞ 1 2 f (x) = √ · e −x /2 , −∞ < x < +∞ 2π +∞ x We typically use the letter Z to denote a standard normal random variable (Z ∼ N(0, 1)) The normal probability density function for X ∼ N(µ, σ 2 ) is: f (x) = √ Important! We use the standard normal all the time because if X ∼ N(µ, σ 2 ), then X σ−µ ∼ N(0, 1) 1 2 2 · e −(x−µ) /2σ , −∞ < x < +∞ 2πσ This process is called “standardizing” a normal random variable Note: π ≈ 3.14 and e ≈ 2.72 are mathematical constants 5 / 36 68% of the density is within one standard deviation of the mean 95% of the density is within two standard deviations of the mean Normal Density, f(x) 68-95-99.7 Rule II Normal Density, f(x) 68-95-99.7 Rule I 6 / 36 0.68 0.16 −∞ µ − 1σ σ 0.16 µ µ + 1σ σ 0.95 0.025 −∞ +∞ µ − 2σ σ 0.025 µ µ + 2σ σ +∞ x x 7 / 36 8 / 36 68-95-99.7 Rule III Different Means Normal Density Normal Density, f(x) 99.7% of the density is within three standard deviations of the mean 0.997 0.0015 −∞ µ − 3σ σ 0.0015 µ µ + 3σ σ +∞ x µ1 µ2 µ3 x Three normal distributions with different means µ1 < µ2 < µ3 9 / 36 Different Standard Deviations 10 / 36 Standard Normal N(0,1) σ=1 Normal Density Normal Density σ1 σ2 σ3 −4 0 2 4 µ=0 x Three normal distributions with different standard deviations σ1 < σ2 < σ3 −2 11 / 36 12 / 36 Example: Birthweights (in grams) of infants in a population Normal Probabilities Density We are often interested in the probability that z takes on values between z0 and z1 Z z1 1 2 √ · e −z /2 dz P(z0 ≤ z ≤ z1 ) = 2π z0 0 1000 2000 3000 4000 5000 6000 Weights Continuous data Mean = Median = Mode = 3000 = µ Standard deviation = 1000 = σ The area under the curve represents the probability (proportion) of infants with birthweights between certain values 13 / 36 Z Tables How do we calculate this probability? Equivalent to finding area under the curve Continuous distribution, so we cannot use sums to find probabilities Performing the integration is not necessary since tables and computers are available 14 / 36 But...we’ll use R For standard normal random variables Z ∼ N(0, 1) we’ll use 1 2 pnorm(?) to find P(Z ≤?) pnorm(?, lower.tail=F) to find P(Z ≥?) <? ? >? ? For any normal random variable X ∼ N(µ, σ 2 ) (but taking X ∼ N(2, 32 ) as an example) we’ll use 1 2 15 / 36 pnorm(?, mean=2, sd=3) to find P(X ≤?) pnorm(?, mean=2, sd=3, lower.tail=F) to find P(X ≥?) 16 / 36 Question I Example: Birthweights (in grams) Density What is the probability of an infant weighing more than 5000g? 0 1000 2000 3000 4000 5000 6000 X −µ 5000 − 3000 > ) σ 1000 = P(Z > 2) Weights P(X > 5000) = P( µ = 3000 σ = 1000 X Z = birthweight X −µ = σ = 0.0228 Get this using pnorm(2, lower.tail=F) (since we standardized) 17 / 36 Question II 18 / 36 Question III What is the probability of an infant weighing between 2500 and 4000g? What is the probability of an infant weighing less than 3500g? X −µ 4000 − 3000 2500 − 3000 < < ) 1000 σ 1000 = P(−0.5 < Z < 1) P(2500 < X < 4000) = P( X −µ 3500 − 3000 P(X < 3500) = P( < ) σ 1000 = P(Z < 0.5) = 1 − P(Z > 1) − P(Z < −0.5) = 0.6915 = 1 − 0.1587 − 0.3085 = 0.5328 19 / 36 20 / 36 Statistical Inference Definitions Statistical inference is “the attempt to reach a conclusion concerning all members of a class from observations of only some of them.” (Runes 1959) A population is a collection of observations Populations and samples Sampling distributions A parameter is a numerical descriptor of a population A sample is a part or subset of a population A statistic is a numerical descriptor of the sample 21 / 36 Population vs. Sample 22 / 36 Estimating the population mean, µ Population population size = N µ = mean, a measure of center σ 2 = variance, a measure of dispersion σ = standard deviation Sample from the population is used to calculate sample estimates (statistics) that approximate population parameters sample size = n X̄ = sample mean s 2 = sample variance s = sample standard deviation Usually µ is unknown and we would like to estimate it We use X̄ to estimate µ We know the sampling distribution of X̄ Definition: Sampling distribution The distribution of all possible values of some statistic, computed from samples of the same size randomly drawn from the same population, is called the sampling distribution of that statistic Population: parameters Sample: statistics 23 / 36 24 / 36 The Central Limit Theorem (CLT) Sampling Distribution of X̄ Distribution of Sample Mean X Population Distribution of X Density X~N(µ µ,σ σ2 n) Density X~N(µ µ,σ σ 2) n=10 n=30 n=100 µ µ X X When sampling from a normally distributed population X̄ will be normally distributed The mean of the distribution of X̄ is equal to the true mean µ of the population from which the samples were drawn The variance of the distribution is σ 2 /n, where σ 2 is the variance of the population and n is the sample size We can write: X̄ ∼ N(µ, σ 2 /n) When sampling from a population whose distribution is not normal and the sample size is large, use the Central Limit Theorem Given a population of any distribution with mean, µ, and variance, σ 2 , the sampling distribution of X̄ , computed from samples of size n from this population, will be approximately N(µ, σ 2 /n) when the sample size is large In general, this applies when n ≥ 25 The approximation of normality becomes better as n increases 25 / 36 What if a random variable has a Binomial distribution? 26 / 36 Binomial CLT First, recall that a Binomial P variable is just the sum of n Bernoulli variable: Sn = ni=1 Xi Notation: For a Bernoulli variable Sn ∼ Binomial(n,p) Xi ∼ Bernoulli(p) = Binomial(1, p) for i = 1, . . . , n µ = mean = p σ 2 = variance = p(1-p) In this case, we want to estimate p by p̂ where Pn Xi Sn p̂ = = i=1 = X̄ n n X̄ ≈ N(µ, σ 2 /n) as before ) Equivalently, p̂ ≈ N(p, p(1−p) n p̂ is just a sample mean! So we can use the central limit theorem when n is large 27 / 36 28 / 36 Distribution of Differences Distribution of Differences: Notation Population 1: Samples of size n1 from Population 1: Mean = µX̄1 = µ1 Size = N1 Often we are interested in detecting a difference between two populations Standard deviation = √ σ1 / n1 = σX̄1 Mean = µ1 Standard deviation = σ1 Differences in average income by neighborhood Population 2: Differences in disease cure rates by age Size = N2 Samples of size n2 from Population 2: Mean = µX̄2 = µ2 Mean = µ2 Standard deviation = √ σ2 / n2 = σX̄2 Standard deviation = σ2 29 / 36 Distribution of Differences: CLT result Difference in proportions? We’re done if the underlying variable is continuous. What if the underlying variable is Binomial? Now by CLT, for large n: X̄1 ∼ N(µ1 , σ12 /n1 ) X̄2 ∼ N(µ2 , σ22 /n2 ) and X̄1 − X̄2 ≈ N(µ1 − µ2 , 30 / 36 Then X̄1 − X̄2 ≈ N(µ1 − µ2 , is replaced by: σ12 n1 + σ22 n2 ) p̂1 − p̂2 ≈ N(p1 − p2 , 31 / 36 σ12 n1 + σ22 n2 ) p1 (1 − p1 ) p2 (1 − p2 ) + ) n1 n2 32 / 36 Summary of Sampling Distributions Statistic X̄ X̄1 − X̄2 p̂ np̂ p̂1 − p̂2 Statistical inference Sampling Distribution Mean Variance σ2 µ n µ1 - µ2 p np p1 − p2 σ12 n1 + pq n Two methods Estimation (Confidence intervals) Hypothesis testing σ22 n2 Both make use of sampling distributions Remember to use CLT npq + pn2 q2 2 p1 q1 n1 33 / 36 Rest of material moved to lecture 4 34 / 36 Lecture 3 Summary The Normal Distribution The Central Limit Theorem Sampling distributions We didn’t get a chance to cover the rest of the material, so it has been moved to lecture 4. Next time, we’ll discuss Confidence intervals for population parameters The t-distribution Hypothesis testing (p-values) 35 / 36 36 / 36