Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 6 ~ Normal Probability Distributions 常態分佈 P ( a x b) a b x Normal Probability Distributions 常態分佈(normal distribution)又稱 高斯分佈(Gaussian distribution)。 德國的10馬克紙幣, 以高斯(Gauss, 1777-1855)為人像, 人像左側有一 常態分佈之P.D.F.及其圖形。 Chapter Goals • Learn about the normal, bell-shaped, or Gaussian distribution • How probabilities are found • How probabilities are represented • How normal distributions are used in the real world 6.1 ~ Normal Probability Distributions The normal probability distribution is the most important distribution in all of statistics • Many continuous random variables have normal or approximately normal distributions • Need to learn how to describe a normal probability distribution 6.1 ~ Normal Probability Distributions Exampel: 如果調查台灣1000位成年男性的身高, 將會發現身高特別高和特別低的人佔極少數,絕 大多數的人身高都在中間(如170公分)附近。 如果這1000人的平均身高是170公分,那麼身高 離170公分越遠的人,所佔的比例就越少。簡單 的說,大略呈現以170公分為中心,並往兩旁遞 減的分佈圖。(如下頁) 如果調查的是體重、收入、智力等變項,也很可 能出現類似的分佈。 6.1 ~ Normal Probability Distributions 90 80 70 人數 60 50 40 30 20 10 0 150 155 160 165 170 身高 175 180 185 190 6.1 ~ Normal Probability Distributions 上圖的分佈是間斷的,可是理論上身高是連續 的。(因為任何兩個人之間,存在另一人,其身 高介在他們之間) 如果調查更多的人(如10萬人),那麼上圖的長 條圖中間斷現象逐漸會消除。一旦調查人數非 常之大,那麼上圖的長條圖會變成平滑的曲線 圖,如下圖中的平滑曲線所示。 6.1 ~ Normal Probability Distributions 90 80 70 人數 60 50 40 30 20 10 0 150 155 160 165 170 175 180 185 190 身高 Demo: http://163.20.15.7/math/LSC/Normal/Normal.htm Empirical Rule: If a variable is normally distributed, then: 1. Approximately 68% of the observations lie within 1 standard deviation of the mean 2. Approximately 95% of the observations lie within 2 standard deviations of the mean 3. Approximately 99.7% of the observations lie within 3 standard deviations of the mean 3 2 2 3 6.1 Normal Probability Distribution 1. A continuous random variable 2. Normal probability distribution function: 1 ( x) e 2 2 1 2p This is the function for the normal (bell-shaped) curve f ( x) = π = 3.1416,e :自然對數之底 2.7183,x介在 正負無限大,μ:平均數,σ:標準差。一旦確 定μ和σ,帶入公式算得 f(x) 3. The probability that x lies in some interval is the area under the curve Probabilities for a Normal Distribution • Illustration b P(a x b) = f ( x )dx a • The definite integral is a calculus topic a b x 問題:要決定常態分佈的形狀,就必須知道平均數μ 和變異數σ2(或者標準差σ)。常態分佈取決於兩個 參數(parameter)μ和σ2。只要μ或σ2不同,曲線就 不同。 0.1 0.09 0.08 0.07 A: = 170, = 5 f (X ) 0.06 B: = 175, = 5 0.05 0.04 0.03 C: = 170, = 10 0.02 0.01 0 140 145 150 155 160 165 170 175 180 185 190 195 200 Notes We will learn how to compute probabilities for one special normal distribution: the standard normal distribution (標準常 態分佈) (μ= 0和σ2 = 1) Transform all other normal probability questions to this special distribution Recall the empirical rule: the percentages that lie within certain intervals about the mean come from the normal probability distribution We need to refine the empirical rule to be able to find the percentage that lies between any two numbers Percentage, Proportion & Probability Basically the same concepts • Percentage (30%) is usually used when talking about a proportion (3/10) of a population • Probability is usually used when talking about the chance that the next individual item will possess a certain property 6.2 ~ The Standard Normal Distribution There are infinitely many normal probability distributions • They are all related to the standard normal distribution • The standard normal distribution is the normal distribution of the standard variable z (the z-score) Standard Normal Distribution f (Z ) Properties: The total area under the normal curve is equal to 1 The distribution is mounded and symmetric; it extends indefinitely in both directions, approaching but never touching the horizontal axis The distribution has a mean of 0 and a standard deviation of 1 The mean divides the area in half, 0.50 on each side 0.5 Nearly all the area is 0.4 between z = -3.00 and 0.3 z = 3.00 0.2 0.1 0 -3 -2 -1 0 Z 1 2 3 Table 3, Appendix B lists the probabilities associated with the intervals from the mean (0) to a specific value of z Probabilities of other intervals are found using the table entries, addition, subtraction, and the properties above Table 3, Appendix B Entries 0 z The table contains the area under the standard normal curve between 0 and a specific value of z Example Find the area under the standard normal curve between z = 0 and z = 1.45 0 145 . • A portion of Table 3: z 0.00 0.01 0.02 0.03 0.04 0.05 .. . 1.4 .. . P(0 z 145 . ) = 0.4265 0.4265 0.06 z Example Find the area under the normal curve to the right of z = 1.45; P(z > 1.45) Area asked for 0.4265 0 P( z 145 . ) = 0.5000 0.4265 = 0.0735 145 . z Example Find the area to the left of z = 1.45; P(z < 1.45) 0.5000 0.4265 0 P( z 145 . ) = 0.5000 0.4265 = 0.9265 145 . z Notes • The addition and subtraction used in the previous examples are correct because the “areas” represent mutually exclusive events • The symmetry of the normal distribution is a key factor in determining probabilities associated with values below (to the left of) the mean. For example: the area between the mean and z = -1.37 is exactly the same as the area between the mean and z = +1.37. • When finding normal distribution probabilities, a sketch(草圖) is always helpful Example Area from table 0.3962 Area asked for Find the area between the mean (z = 0) and z = -1.26 126 . P( 126 . z 0) = 0.3962 0 126 . z Example Find the area to the left of -0.98; P(z < -0.98) Area asked for 0.98 0 P ( z 0.98) = 0.5000 0.3365 = 0.1635 Area from table 0.3365 0.98 Example Find the area between z = -2.30 and z = 1.80 0.4893 2.30 0.4641 0 P ( 2.30 z 1.80) = P ( 2.30 z 0) P ( 0 z 1.80) = 0.4893 0.4641 = 0.9534 1.80 Example Find the area between Area asked z = -1.40 and z = -0.50 for -1.40 - 0.50 0 0.50 1.40 P( 1.40 z 0.50) = P (0 z 1.40) P (0 z 0.50) = 0.4192 0.1915 = 0.2277 Normal Distribution Note The normal distribution table may also be used to determine a z-score if we are given the area (working backwards) Example: What is the z-score associated with the 85th percentile? 0.3500 15% implies P 0 z Solution In Table 3 Appendix B, find the “area” entry that is closest to 0.3500: z 0.00 0.01 0.02 0.03 0.04 0.05 .. . 1.0 0.3485 0.3500 0.3508 .. . • The area entry closest to 0.3500 is 0.3508 • The z-score that corresponds to this area is 1.04 • The 85th percentile in a standard normal distribution is 1.04 Example What z-scores bound the middle 90% of a standard normal distribution? 0.4500 90% implies 0 z 0 z Solution The 90% is split into two equal parts by the mean. Find the area in Table 3 closest to 0.4500: z 0.00 0.01 0.02 0.03 0.04 0.05 .. . 1.6 0.4495 0.4500 0.4505 .. . • 0.4500 is exactly half way between 0.4495 and 0.4505 • Therefore, z = 1.645 • z = -1.645 and z = 1.645 bound the middle 90% of a normal distribution 6.3 ~ Applications of Normal Distributions Apply the techniques learned for the z distribution to all normal distributions • Start with a probability question in terms of x-values • Convert, or transform, the question into an equivalent probability statement involving z-values Standardization Suppose x is a normal random variable with mean m and standard deviation s x • The random variable z = has a standard normal distribution 0 c c x z Example A bottling(裝瓶) machine is adjusted to fill bottles with a mean of 32.0 oz of soda and standard deviation of 0.02. Assume the amount of fill is normally distributed and a bottle is selected at random: 1) Find the probability the bottle contains between 32.00 oz and 32.025 oz 2) Find the probability the bottle contains more than 31.97 oz Solutions: 32.00 32.00 32.0 = = 0.00 1) When x = 32.00 ; z = 0.02 32.025 32.025 32.0 = = = = 1.25 When x 32.025; z 0.02 Solution Continued Area asked for 32.0 0 32.025 125 . x z 32.0 32.0 x 32.0 32.025 32.0 ÷ P ( 32.0 x 32.025) = P 0.02 0.02 0.02 = P ( 0 z 1.25) = 0. 3944 2) Example, Part 2 3197 . 1.50 32.0 0 x z x 32.0 31.97 32.0 ÷ = P( z 1.50) P( x 31.97) = P 0.02 0.02 = 0.5000 0.4332 = 0.9332 Notes The normal table may be used to answer many kinds of questions involving a normal distribution • Often we need to find a cutoff point: a value of x such that there is a certain probability in a specified interval defined by x. Example: The waiting time x at a certain bank is approximately normally distributed with a mean of 3.7 minutes and a standard deviation of 1.4 minutes. The bank would like to claim that 95% of all customers are waited on by a teller within c minutes. Find the value of c that makes this statement true. Solution 0.0500 0.5000 0.4500 3.7 0 P ( x c) = 0.95 x 3.7 c 3.7 = ÷ 0.95 P 1.4 1.4 c 3.7 = ÷ 0.95 P z 1.4 c 1645 . x z c 3.7 = 1645 . 14 . c = (1645 . )(14 . ) 3.7 = 6.003 c 6 minutes Review: P.30 Example A radar unit is used to measure the speed of automobiles on an expressway(高速公路) during rush-hour traffic. The speeds of individual automobiles are normally distributed with a mean of 62 mph. Find the standard deviation of all speeds if 3% of the automobiles travel faster than 72 mph. 0.0300 0.4700 62 72 x 0 188 . z Solution P( x 72) = 0.03 x ; z= P( z 1.88) = 0.03 72 62 1.88 = 1.88 = 10 = 10 / 1.88 = 5.32 Notation If x is a normal random variable with mean m and standard deviation s, this is often denoted: x ~ N(m, s) Example: Suppose x is a normal random variable with = 35 and = 6. A convenient notation to identify this random variable is: x ~ N(35, 6). 6.4 ~ Notation z-score used throughout statistics in a variety of ways • Need convenient notation to indicate the area under the standard normal distribution • z(a) is the algebraic name, for the z-score (point on the z axis) such that there is a of the area (probability) to the right of z(a) Illustrations z(0.10) represents the value of z such that the area to the right under the standard normal curve is 0.10 010 . 0 z(0.10) z z(0.80) represents the value of z such that the area to the right under the standard normal curve is 0.80 0.80 z(0.80) 0 z Example • Find the numerical value of z(0.10): Table shows this area (0.4000) 0.10 (area information from notation) 0 z(0.10) z • Use Table 3: look for an area as close as possible to 0.4000 • z(0.10) = 1.28 Example • Find the numerical value of z(0.80): Look for 0.3000; remember that z must be negative z(0.80) 0 z • Use Table 3: look for an area as close as possible to 0.3000 • z(0.80) = -0.84 Example • Find the numerical value of z(0.99): 0.01 z z(0.99) 0 • Because of the symmetrical nature of the normal distribution, z(0.99) = -z(0.01) • Using Table 3: z(0.99) = -2.33 Example • Find the z-scores that bound the middle 0.99 of the normal distribution: 0.005 0.005 0.495 z(0.995) or -z(0.005) 0.495 0 • Use Table 3: z(0.005) = 2.575 and z(0.995) = -z(0.005) = -2.575 z(0.005) 6.5 ~ Normal Approximation of the Binomial Recall: the binomial distribution is a probability distribution of the discrete random variable x, the number of successes observed in n repeated independent trials • Binomial probabilities can be reasonably estimated by using the normal probability distribution Background & Histogram Background: Consider the distribution of the binomial variable x when n = 20 and p = 0.5 • Histogram: P( x) 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 x The histogram may be approximated by a normal curve Notes The normal curve has mean and standard deviation from the binomial distribution: = np = (20)(0.5) = 10 = npq = (20)(0.5)(0.5) = 5 2.236 Can approximate the area of the rectangles with the area under the normal curve The approximation becomes more accurate as n becomes larger Two Problems 1. As p moves away from 0.5, the binomial distribution is less symmetric, less normal-looking Solution: The normal distribution provides a reasonable approximation to a binomial probability distribution whenever the values of np and n(1 - p) both equal or exceed 5 2. The binomial distribution is discrete, and the normal distribution is continuous Solution: Use the continuity correction factor. Add or subtract 0.5 to account for the width of each rectangle. Example • Research indicates 40% of all students entering a certain university withdraw(撤回) from a course during their first year. What is the probability that fewer than 650 of this year’s entering class of 1800 will withdraw from a class? ANS: Let x be the number of students that withdraw from a course during their first year x has a binomial distribution: n = 1800, p = 0.4 The probability function is given by: 1800 x 1800 x P( x ) = ( 0 . 4 ) ( 0 . 6 ) for x = 0, 1, 2, ... ,1800 x Solution Use the normal approximation method: = np = (1800)(0.4) = 720 = npq = (1800)(0.4)(0.6) = 432 20.78 P( x is fewer than 650) = P( x 650) = P( x 649.5) (for discrete variable x ) (for a continuous variable x ) x 720 649.5 720 = P 20.78 20.78 = P( z 3.39) = 0.5000 0.4997 = 0.0003