Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 3: The Normal Distribution and Statistical Inference Sandy Eckel [email protected] 24 April 2008 1 / 36 A Review and Some Connections The Normal Distribution The Central Limit Theorem Estimates of means and proportions: uses and properties Confidence intervals and Hypothesis tests 2 / 36 The Normal Distribution A probability distribution for continuous data Characterized by a symmetric bell-shaped curve (Gaussian curve) Symmetric about its mean µ Under certain conditions, can be used to approximate Binomial(n,p) distribution np>5 n(1-p)>5 3 / 36 Normal Density, f(x) Normal Distribution −∞ µ +∞ x Takes on values between −∞ and +∞ Mean = Median = Mode Area under curve equals 1 Notation for Normal random variable: X ∼ N(µ, σ 2 ) Parameters µ = mean σ = standard deviation 4 / 36 Normal Density, f(x) Formula: Normal Probability Density Function (pdf) −∞ µ +∞ x The normal probability density function for X ∼ N(µ, σ 2 ) is: f (x) = √ 1 2 2 · e −(x−µ) /2σ , −∞ < x < +∞ 2πσ Note: π ≈ 3.14 and e ≈ 2.72 are mathematical constants 5 / 36 Standard Normal Definition: a Normal distribution N(µ, σ 2 ) with parameters µ = 0 and σ = 1 Its density function is written as: 1 2 f (x) = √ · e −x /2 , −∞ < x < +∞ 2π We typically use the letter Z to denote a standard normal random variable (Z ∼ N(0, 1)) Important! We use the standard normal all the time because if X ∼ N(µ, σ 2 ), then X σ−µ ∼ N(0, 1) This process is called “standardizing” a normal random variable 6 / 36 68-95-99.7 Rule I Normal Density, f(x) 68% of the density is within one standard deviation of the mean 0.68 0.16 −∞ µ − 1σ σ 0.16 µ µ + 1σ σ +∞ x 7 / 36 68-95-99.7 Rule II Normal Density, f(x) 95% of the density is within two standard deviations of the mean 0.95 0.025 −∞ µ − 2σ σ µ 0.025 µ + 2σ σ +∞ x 8 / 36 68-95-99.7 Rule III Normal Density, f(x) 99.7% of the density is within three standard deviations of the mean 0.997 0.0015 −∞ µ − 3σ σ 0.0015 µ µ + 3σ σ +∞ x 9 / 36 Normal Density Different Means µ1 µ2 µ3 x Three normal distributions with different means µ1 < µ2 < µ3 10 / 36 Different Standard Deviations Normal Density σ1 σ2 σ3 x Three normal distributions with different standard deviations σ1 < σ2 < σ3 11 / 36 Standard Normal N(0,1) Normal Density σ=1 −4 −2 0 2 4 µ=0 12 / 36 Density Example: Birthweights (in grams) of infants in a population 0 1000 2000 3000 4000 5000 6000 Weights Continuous data Mean = Median = Mode = 3000 = µ Standard deviation = 1000 = σ The area under the curve represents the probability (proportion) of infants with birthweights between certain values 13 / 36 Normal Probabilities We are often interested in the probability that z takes on values between z0 and z1 Z z1 1 2 √ · e −z /2 dz P(z0 ≤ z ≤ z1 ) = 2π z0 How do we calculate this probability? Equivalent to finding area under the curve Continuous distribution, so we cannot use sums to find probabilities Performing the integration is not necessary since tables and computers are available 14 / 36 Z Tables 15 / 36 But...we’ll use R For standard normal random variables Z ∼ N(0, 1) we’ll use 1 2 pnorm(?) to find P(Z ≤?) pnorm(?, lower.tail=F) to find P(Z ≥?) <? ? >? ? For any normal random variable X ∼ N(µ, σ 2 ) (but taking X ∼ N(2, 32 ) as an example) we’ll use 1 2 pnorm(?, mean=2, sd=3) to find P(X ≤?) pnorm(?, mean=2, sd=3, lower.tail=F) to find P(X ≥?) 16 / 36 Density Example: Birthweights (in grams) 0 1000 2000 3000 4000 5000 6000 Weights µ = 3000 σ = 1000 X Z = birthweight X −µ = σ 17 / 36 Question I What is the probability of an infant weighing more than 5000g? X −µ 5000 − 3000 > ) σ 1000 = P(Z > 2) P(X > 5000) = P( = 0.0228 Get this using pnorm(2, lower.tail=F) (since we standardized) 18 / 36 Question II What is the probability of an infant weighing less than 3500g? X −µ 3500 − 3000 < ) σ 1000 = P(Z < 0.5) P(X < 3500) = P( = 0.6915 19 / 36 Question III What is the probability of an infant weighing between 2500 and 4000g? X −µ 4000 − 3000 2500 − 3000 < < ) 1000 σ 1000 = P(−0.5 < Z < 1) P(2500 < X < 4000) = P( = 1 − P(Z > 1) − P(Z < −0.5) = 1 − 0.1587 − 0.3085 = 0.5328 20 / 36 Statistical Inference Populations and samples Sampling distributions 21 / 36 Definitions Statistical inference is “the attempt to reach a conclusion concerning all members of a class from observations of only some of them.” (Runes 1959) A population is a collection of observations A parameter is a numerical descriptor of a population A sample is a part or subset of a population A statistic is a numerical descriptor of the sample 22 / 36 Population vs. Sample Population population size = N µ = mean, a measure of center σ 2 = variance, a measure of dispersion σ = standard deviation Sample from the population is used to calculate sample estimates (statistics) that approximate population parameters sample size = n X̄ = sample mean s 2 = sample variance s = sample standard deviation Population: parameters Sample: statistics 23 / 36 Estimating the population mean, µ Usually µ is unknown and we would like to estimate it We use X̄ to estimate µ We know the sampling distribution of X̄ Definition: Sampling distribution The distribution of all possible values of some statistic, computed from samples of the same size randomly drawn from the same population, is called the sampling distribution of that statistic 24 / 36 Sampling Distribution of X̄ Distribution of Sample Mean X Population Distribution of X Density X~N(µ µ,σ σ2 n) Density X~N(µ µ,σ σ2) n=10 n=30 n=100 µ µ X X When sampling from a normally distributed population X̄ will be normally distributed The mean of the distribution of X̄ is equal to the true mean µ of the population from which the samples were drawn The variance of the distribution is σ 2 /n, where σ 2 is the variance of the population and n is the sample size We can write: X̄ ∼ N(µ, σ 2 /n) When sampling from a population whose distribution is not normal and the sample size is large, use the Central Limit Theorem 25 / 36 The Central Limit Theorem (CLT) Given a population of any distribution with mean, µ, and variance, σ 2 , the sampling distribution of X̄ , computed from samples of size n from this population, will be approximately N(µ, σ 2 /n) when the sample size is large In general, this applies when n ≥ 25 The approximation of normality becomes better as n increases 26 / 36 What if a random variable has a Binomial distribution? First, recall that a Binomial Pn variable is just the sum of n Bernoulli variable: Sn = i=1 Xi Notation: Sn ∼ Binomial(n,p) Xi ∼ Bernoulli(p) = Binomial(1, p) for i = 1, . . . , n In this case, we want to estimate p by p̂ where Pn Xi Sn p̂ = = i=1 = X̄ n n p̂ is just a sample mean! So we can use the central limit theorem when n is large 27 / 36 Binomial CLT For a Bernoulli variable µ = mean = p σ 2 = variance = p(1-p) X̄ ≈ N(µ, σ 2 /n) as before ) Equivalently, p̂ ≈ N(p, p(1−p) n 28 / 36 Distribution of Differences Often we are interested in detecting a difference between two populations Differences in average income by neighborhood Differences in disease cure rates by age 29 / 36 Distribution of Differences: Notation Population 1: Samples of size n1 from Population 1: Size = N1 Mean = µX̄1 = µ1 Mean = µ1 Standard deviation = √ σ1 / n1 = σX̄1 Standard deviation = σ1 Population 2: Size = N2 Mean = µ2 Standard deviation = σ2 Samples of size n2 from Population 2: Mean = µX̄2 = µ2 Standard deviation = √ σ2 / n2 = σX̄2 30 / 36 Distribution of Differences: CLT result Now by CLT, for large n: X̄1 ∼ N(µ1 , σ12 /n1 ) X̄2 ∼ N(µ2 , σ22 /n2 ) and X̄1 − X̄2 ≈ N(µ1 − µ2 , σ12 n1 + σ22 n2 ) 31 / 36 Difference in proportions? We’re done if the underlying variable is continuous. What if the underlying variable is Binomial? Then X̄1 − X̄2 ≈ N(µ1 − µ2 , is replaced by: p̂1 − p̂2 ≈ N(p1 − p2 , σ12 n1 + σ22 n2 ) p1 (1 − p1 ) p2 (1 − p2 ) + ) n1 n2 32 / 36 Summary of Sampling Distributions Statistic X̄ Sampling Distribution Mean Variance σ2 µ n X̄1 − X̄2 p̂ np̂ p̂1 − p̂2 µ1 - µ2 p np p1 − p2 σ12 n1 + pq n σ22 n2 npq + pn2 q2 2 p1 q1 n1 33 / 36 Statistical inference Two methods Estimation (Confidence intervals) Hypothesis testing Both make use of sampling distributions Remember to use CLT 34 / 36 Rest of material moved to lecture 4 We didn’t get a chance to cover the rest of the material, so it has been moved to lecture 4. 35 / 36 Lecture 3 Summary The Normal Distribution The Central Limit Theorem Sampling distributions Next time, we’ll discuss Confidence intervals for population parameters The t-distribution Hypothesis testing (p-values) 36 / 36