Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Normal Distribution Mary Lindstrom (Adapted from notes provided by Professor Bret Larget) February 10, 2004 Statistics 371 Last modified: February 11, 2004 The Normal Distribution • The Normal Distribution (AKA Gaussian Distribution) is our first distribution for continuous variables and is the most commonly used distribution. 0.4 0.2 0.0 Probability Density • The normal density curve is the famous symmetric, bellshaped curve. Statistics 371 1 Central Limit Theorem Why is it such an important distribution? Two related reasons: 1. The central limit theorem states that many statistics we calculate from large random samples will have approximate normal distributions (or distributions derived from normal distributions), even if the distributions of the underlying variables are not normally distributed. 2. Many measured variables nave approximately normal distributions. This is true because most things we measure are the sum of many smaller units and the central limit theorem applies. Statistics 371 2 Central Limit Theorem These facts are the basis for most of the methods of statistical inference we will study in the last half of the course. • Chapter 4 introduces the normal distribution as a probability distribution. • Chapter 5 culminates in the central limit theorem, the primary theoretical justification for most of the methods of statistical inference in the remainder of the textbook. Statistics 371 3 The Normal Density Normal curves have the following bell-shaped, symmetric density. y−µ 2 1 −2 σ 1 f (y) = √ e σ 2π Parameters: The parameters of a normal curve are the mean µ and the standard deviation σ. If Y is a normally distributed Random Variable with mean µ and standard deviation σ, then we write: Y ∼ N(µ, σ 2) Statistics 371 4 Standard Shape Probability Density 90 −40 Statistics 371 Normal Distribution mu = 100 , sigma = 3 mu = 2 , sigma = 10 95 100 105 Probability Density Normal Distribution 110 −40 −20 0 20 Normal Distribution Normal Distribution mu = −20 , sigma = 5 mu = 0 , sigma = 1 −30 −20 −10 Probability Density Probability Density All normal curves have the same shape so that every normal curve can be drawn in exactly the same manner, just by changing labels on the axis (this is not true for the Binomial or Poisson distributions). 0 −4 −2 0 2 40 4 5 The 68–95–99.7 Rule For every normal curve, −4 Normal Distribution Normal Distribution mu = 0 , sigma = 1 mu = 0 , sigma = 1 −2 0 Probability Density Probability Density • 68% of the area is within one SD of the mean, • 95% of the area is within two SDs of the mean, • 99.7% of the area is within three SDs of the mean. 2 P( −1 < X < 1 ) = 0.6827 P( X < −1 ) = 0.1587 4 −4 −2 −1 0 Statistics 371 2 −2 0 2 Normal Distribution mu = 0 , sigma = 1 mu = 0 , sigma = 1 −2 P( X > 2 ) = 0.0228 0 2 4 3 Normal Distribution P( −2 < X < 2 ) = 0.9545 P( X < −2 ) = 0.0228 −4 1 Probability Density Probability Density −3 P( X > 1 ) = 0.1587 4 P( −3 < X < 3 ) = 0.9973 P( X < −3 ) = 0.0013 −4 −2 P( X > 3 ) = 0.0013 0 2 4 6 Standardization If Y is a normally distributed Random Variable Y ∼ N(µ, σ 2) And we define Z= Y −µ σ Then Z is a standard normal random variable Z ∼ N(0, 1) Every problem that asks for an area under a normal curve is solved by first finding an equivalent problem for the standard normal curve. Statistics 371 7 Normal Distribution mu = 0 , sigma = 1 mu = −1 , sigma = 2 P( −1 < X < 1 ) = 0.6827 P( X < −1 ) = 0.1587 P( X > 1 ) = 0.1587 −4 −2 Probability Density −3 0 −2 −1 0 2 1 2 P( X > 1 ) = 0.1587 −5 −3 0 −2 −1 0 5 1 2 Normal Distribution Normal Distribution mu = 10 , sigma = 1 mu = 5 , sigma = 4 P( X > 11 ) = 0.1587 8 Statistics 371 P( X < −3 ) = 0.1587 3 P( X < 9 ) = 0.1587 −3 P( −3 < X < 1 ) = 0.6827 4 P( 9 < X < 11 ) = 0.6827 6 Probability Density Normal Distribution −2 10 −1 0 12 1 2 Probability Density Probability Density Standardization 14 3 3 P( 1 < X < 9 ) = 0.6827 P( X < 1 ) = 0.1587 P( X > 9 ) = 0.1587 −10 −5 −3 0 −2 −1 5 0 10 1 15 2 20 3 8 Example Calculation Suppose that egg shell thickness is normally distributed with a mean of 0.381 mm and a standard deviation of 0.031 mm. Find the proportion of eggs with shell thickness less than 0.34 mm. 1. We define our random variable Y = egg shell thickness. 2. We know that Y ∼ N(0.381, 0.0312) 3. We wish to know Pr{Y < 0.34} Statistics 371 9 Example Calculation Here is a way to do it in R. > source("prob.R") > gnorm(0.381, 0.031, b = 0.34) Normal Distribution Probability Density mu = 0.381 , sigma = 0.031 P( X < 0.34 ) = 0.093 P( X > 0.34 ) = 0.907 0.25 0.30 −3 Statistics 371 0.35 −2 −1 0.40 0 0.45 1 2 0.50 3 10 Example Calculation If you don’t have R handy, the way to do this is to transform it into a question about a standard normal distribution. Pr{Y < 0.34} = Pr{Y − 0.381 < 0.34 − 0.381} = Pr{[Y − 0.381]/0.031 < [0.34 − 0.381]/0.031} = Pr{Z < −1.322} Where Z is a standard normal random variable: Z ∼ N(0, 1). This is exactly the type probability tabulated in a standard normal table Statistics 371 11 Standard Normal table • The standard normal table lists the area to the left of z under the standard normal curve for each value from −3.49 to 3.49 by 0.01 increments. • The normal table is on the inside front cover of your textbook. • Numbers in the margins represent z. • Numbers in the middle of the table are areas to the left of z. Statistics 371 12 Standard Normal table Here is a portion of the table on the inside cover of your book: -1.4 -1.3 -1.2 -1.1 -1.0 0 0.0808 0.0968 0.1151 0.1357 0.1587 .01 0.0793 0.0951 0.1131 0.1335 0.1562 .02 0.0778 0.0934 0.1112 0.1314 0.1539 .03 0.0764 0.0918 0.1093 0.1292 0.1515 We want the area to the left of −1.322 • • • • we round to −1.32 look up −1.3 in the row labels on the left look up .02 in the column labels to find P (Z < −1.32) = 0.0934. Statistics 371 13 Standard Normal table • R can do this for general values of z • and R can do the standardization for you. Normal Distribution Normal Distribution mu = 0 , sigma = 1 mu = 0.381 , sigma = 0.031 P( X < −1.322 ) = 0.0931 P( X > −1.322 ) = 0.9069 −4 −2 z = −1.322 Statistics 371 0 2 Probability Density Probability Density • Recall that our original question was given Y ∼ N(0.381, 0.0312) what is Pr{Y < 0.34}. 4 P( X < 0.34 ) = 0.093 P( X > 0.34 ) = 0.907 0.25 0.30 0.35 0.40 0.45 0.50 x = 0.34 14 Example Area Calculations: Area to the left Find the proportion of eggs with shell thickness less than 0.34 mm. > pnorm(q = 0.34, mean = 0.381, sd = 0.031) [1] 0.09298744 > gnorm(0.381, 0.031, b = 0.34, sigma.axis = F) Normal Distribution Probability Density mu = 0.381 , sigma = 0.031 P( X < 0.34 ) = 0.093 P( X > 0.34 ) = 0.907 0.25 0.35 0.45 Pr{Y < 0.34} = Pr{[Y − 0.381]/0.031 < [0.34 − 0.381]/0.031} = Pr{Z < −1.322} = 0.0934 Statistics 371 15 Example Area Calculations: Area to the right Find the proportion of eggs with shell thickness more than 0.4 mm. > 1 - pnorm(q = 0.4, mean = 0.381, sd = 0.031) [1] 0.2699702 > gnorm(0.381, 0.031, a = 0.4, sigma.axis = F) Normal Distribution Probability Density mu = 0.381 , sigma = 0.031 P( X < 0.4 ) = 0.73 P( X > 0.4 ) = 0.27 0.25 0.35 0.45 Pr{Y > 0.4} = Pr{[Y − 0.381]/0.031 > [0.4 − 0.381]/0.031} = Pr{Z > 0.613} = 1 − Pr{Z < 0.613} = 1 − 0.7291 = 0.2709 Statistics 371 16 Example Area Calculations: Area between two values. Find the proportion of eggs with shell thickness between 0.34 and 0.4 mm. > gnorm(0.381, 0.031, a = 0.34, b = 0.4) Normal Distribution mu = 0.381 , sigma = 0.031 Probability Density P( 0.34 < X < 0.4 ) = 0.637 P( X < 0.34 ) = 0.093 P( X > 0.4 ) = 0.27 0.25 0.30 −3 Statistics 371 0.35 −2 −1 0.40 0 0.45 1 2 0.50 3 17 Example Area Calculations: Area outside two values. Find the proportion of eggs with shell thickness smaller than 0.32 mm or greater than 0.40 mm. > gnorm(0.381, 0.031, a = 0.32, b = 0.4) Normal Distribution mu = 0.381 , sigma = 0.031 Probability Density P( 0.32 < X < 0.4 ) = 0.7055 P( X < 0.32 ) = 0.0245 P( X > 0.4 ) = 0.27 0.25 0.30 −3 Statistics 371 0.35 −2 −1 0.40 0 0.45 1 2 0.50 3 18 Example Area Calculations: Central area. Find the proportion of eggs with shell thickness within 0.05 mm of the mean. > gnorm(0.381, 0.031, a = 0.381 - 0.05, b = 0.381 + 0.05) Normal Distribution mu = 0.381 , sigma = 0.031 Probability Density P( 0.331 < X < 0.431 ) = 0.8932 P( X < 0.331 ) = 0.0534 P( X > 0.431 ) = 0.0534 0.25 0.30 −3 Statistics 371 0.35 −2 −1 0.40 0 0.45 1 2 0.50 3 19 Example Area Calculations: Two-tail area. Two-tail area. Find the proportion of eggs with shell thickness more than 0.07 mm from the mean. > gnorm(0.381, 0.031, a = 0.381 - 0.07, b = 0.381 + 0.07) Normal Distribution mu = 0.381 , sigma = 0.031 Probability Density P( 0.311 < X < 0.451 ) = 0.9761 P( X < 0.311 ) = 0.012 P( X > 0.451 ) = 0.012 0.25 0.30 −3 Statistics 371 0.35 −2 −1 0.40 0 0.45 1 2 0.50 3 20 Using R We have seen how to use R and the new function gnorm to graph normal distributions where the graphs include some probability calculations. You can also make the same calculations without graphs using the function pnorm (the “p” stands for probability). The following lists the commands for all of the previous computations. Area to the left. Find the proportion of eggs with shell thickness less than 0.34 mm. > pnorm(0.34, 0.381, 0.031) [1] 0.09298744 Area to the right. Find the proportion of eggs with shell thickness more than 0.36 mm. > 1 - pnorm(0.36, 0.381, 0.031) [1] 0.75093 Statistics 371 21 Using R Area between two values. Find the proportion of eggs with shell thickness between 0.34 and 0.36 mm. > pnorm(0.36, 0.381, 0.031) - pnorm(0.34, 0.381, 0.031) [1] 0.1560825 Area outside two values. Find the proportion of eggs with shell thickness smaller than 0.32 mm or greater than 0.40 mm. > pnorm(0.32, 0.381, 0.031) + 1 - pnorm(0.4, 0.381, 0.031) [1] 0.2945190 Central area. Find the proportion of eggs with shell thickness within 0.05 mm of the mean. > 1 - 2 * pnorm(0.381 - 0.05, 0.381, 0.031) [1] 0.8932345 Two-tail area. Find the proportion of eggs with shell thickness more than 0.07 mm from the mean. > 2 * pnorm(0.381 - 0.07, 0.381, 0.031) [1] 0.02394164 Statistics 371 21 Quantiles Quantile calculations ask you to use the normal table backwards. You know the area but need to find the point or points on the horizontal axis. Statistics 371 22 Example Quantile Calculations Percentile. What is the 90th percentile of the egg shell thickness distribution? > source("prob.R") > gnorm(0.381, 0.031, quantile = 0.9) Normal Distribution Probability Density mu = 0.381 , sigma = 0.031 P( X < 0.4207 ) = 0.9 z = 1.28 0.25 0.30 −3 Statistics 371 0.35 −2 −1 0.40 0 0.45 1 2 0.50 3 23 Example Quantile Calculations Upper cut-off point. What value cuts off the top 15% of egg shell thicknesses? > gnorm(0.381, 0.031, quantile = 0.85) Normal Distribution Probability Density mu = 0.381 , sigma = 0.031 P( X < 0.4131 ) = 0.85 z = 1.04 0.25 0.30 −3 Statistics 371 0.35 −2 −1 0.40 0 0.45 1 2 0.50 3 23 Example Quantile Calculations Central cut-off points. The middle 75% egg shells have thicknesses between which two values? > gnorm(0.381, 0.031, quantile = 0.875) Normal Distribution Probability Density mu = 0.381 , sigma = 0.031 P( X < 0.4167 ) = 0.875 z = 1.15 0.25 0.30 −3 Statistics 371 0.35 −2 −1 0.40 0 0.45 1 2 0.50 3 23 Using R You can also make the same calculations without graphs using the function qnorm (the “q” stands for quantile). The following lists the commands for all of the previous computations. Percentile. What is the 90th percentile of the egg shell thickness distribution? > qnorm(0.9, 0.381, 0.031) [1] 0.4207281 Upper cut-off point. What value cuts off the top 15% of egg shell thicknesses? > qnorm(0.85, 0.381, 0.031) [1] 0.4131294 Central cut-off points. The middle 75% egg shells have thicknesses between which two values? > qnorm(c(0.25/2, 1 - 0.25/2), 0.381, 0.031) [1] 0.3453392 0.4166608 Statistics 371 24 Approximation of Discrete distributions using the Normal distribution • The normal distribution can also be used to compute approximate probabilities of discrete distributions. • Approximations to the binomial distribution use a normal q curve with µ = np and σ = np(1 − p). • Approximations to the Poisson distribution use a normal √ curve with µ = µ and σ = µ. • When approximating a binomial probability, the approximation is usually pretty good when np > 24 and n(1 − p) > 24. • We will not be using the continuity correction described in your book. Statistics 371 25 Binomial Distributions: increasing n Binomial Distribution n = 10 , p = 0.2 0.00 0.0 1.0 1.5 2.0 0.20 Probability 0.3 0.2 0.1 Probability 0.5 0 1 2 3 4 5 0 2 4 6 8 Possible Values Possible Values Possible Values Binomial Distribution n = 15 , p = 0.2 Binomial Distribution n = 30 , p = 0.2 Binomial Distribution n = 50 , p = 0.2 10 2 4 6 Possible Values Statistics 371 8 0.08 0.00 0.00 0 0.04 Probability 0.15 0.10 Probability 0.05 0.20 0.15 0.10 0.00 0.05 Probability 0.12 0.25 0.0 0.10 0.4 0.30 Binomial Distribution n = 5 , p = 0.2 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Probability Binomial Distribution n = 2 , p = 0.2 0 5 10 Possible Values 15 0 5 10 15 Possible Values 26 20 Poisson Distributions: increasing n 1 2 3 4 5 0.10 0.00 0 2 4 6 8 0 2 4 6 8 10 Count Poisson Distribution mu = 10 Poisson Distribution mu = 15 Poisson Distribution mu = 25 10 Count Statistics 371 15 20 0.04 0.00 0.02 Probability 0.06 0.00 0.02 0.04 0.06 0.08 0.10 Probability 0.08 0.04 5 12 0.08 Count 0.12 Count 0.00 0 0.05 Probability 0.20 0.0 0.00 0.10 Probability 0.2 0.1 Probability 0 Probability Poisson Distribution mu = 5 0.15 Poisson Distribution mu = 2 0.3 Poisson Distribution mu = 1 0 5 10 15 Count 20 25 30 0 10 20 30 Count 27 40 14 Example: binomial Assume Y has a binomial distribution with n = 150 and p = 0.4 and we want to compute Pr{Y ≤ 55}. Exact: > pbinom(55, 150, 0.4) [1] 0.2274186 Normal approximation: > mu = 150 * 0.4 > mu [1] 60 > 150 * (1 - 0.4) [1] 90 > sigma = sqrt(150 * 0.4 * 0.6) > sigma [1] 6 > pnorm(55, mu, sigma) [1] 0.2023284 Statistics 371 28 Example: Poisson If Y has a Poisson distribution with µ = 25 what is the probability that Y is greater than or equal to 16? Exact: > 1 - ppois(15, 25) [1] 0.977707 Normal approximation: > mu = 25 > sigma = sqrt(25) > sigma [1] 5 > 1 - pnorm(15, mu, sigma) [1] 0.9772499 With R, do the exact calculation. With a calculator or normal table, use the normal approximation. Statistics 371 29