Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
TOPIC 5 Normal Distributions Start Thinking • As a web designer you face a task, one that involves a continuous measurement of downloading time which could be any value and not just a whole number. How can you answer the following questions: What proportion of the homepage downloads take more than 10 seconds? How many seconds elapse before 10% of the downloads are complete? etc. Continuous Probability Distributions Continuous Probability Distributions Uniform Exponential Normal Gamma Weibull Beta Normal Distribution 1. Also called Gaussian distribution 2. ‘Bell-shaped’ & symmetrical 3. Mean, median, mode are equal 4. Random variable has infinite range 5. Area under the curve is 1 (the probability equals 1) 6. Can be used to approximate discrete probability distributions, for example: Binomial and Poisson 7. Basis for classical statistical inference σ Probability Density Function 1 f ( x) e 2 1 x 2 2 • = Standard deviation • = 3.14159; • x = Value of random variable (– < x < ) • = Mean e = 2.71828 The mean and the variance are E X and Var X 2 The notation that denotes the random variable X has a normal distribution 2 X ~ N , Effect of Varying Parameters (μ & σ) • Normal distributions differ by mean & standard deviation • Each distribution would require its own table. f(x) X ~ N 5, 0.5 B X ~ N 5, 2 X ~ N 10, 2 A C 5 10 x That’s an infinite number of table! Then we need a “standardized” normal distribution The Standard Normal Distribution One table! Normal Distribution Standard Normal Distribution =1 X X ~ N , 2 Negative =0 0 Z ~ N 0,1 Z Positive Normal Distribution Probability • If X ~ N(μ, σ2) then the transformed random variable is Z X ~ N 0,1 • The random variable Z is known as the “standardized” version of the random variable X. • The probability values of a general normal distribution can be related to the cumulative distribution function of the standard normal distribution, Φ(z) P X a z PZ z where z a Example Normal Distribution X ~ N 3, 0.8 2 Standard Normal Distribution Z ~ N 0,1 =1 0.8 3 3 4.2 P X 4.2 ? X =0 0 z = 1.5 a 4.2 3 z 1.5 0.8 z PZ z 1.5 PZ 1.5 Therefore P X 4.2 1.5 PZ 1.5 ? Z Standard Normal Tables • The values of the cumulative distribution function of the standard normal distribution, Φ(z) or the probability P(Z ≤ z) is already tabulated =1 =0 0 z=? Z Normal Distribution Probability • We just have the table of the cumulative distribution function of the standard normal distribution, Φ(z) or P(Z ≤ z) to find P(X ≤ a). By using the same table, we can find the other probabilities Pa X b PX b P X a P X a 1 P X a a b X a X Normal Distribution μ = 5, σ = 10 : P(5 < X< 24.6) = ? zupper a 55 0 10 b 24.6 5 1.96 10 zlower P5 X 24.6 P0 Z 1.96 PZ 1.96 PZ 0 Standard Normal Distribution Normal Distribution 10 1 5 5 0 24.6 X 0 1.96 Z Normal Distribution μ = 5, σ = 10 : P(5 < X< 24.6) = ? P5 X 24.6 P0 Z 1.96 Standard Normal PZ 1.96 PZ 0 Probability Table Look up 0.9750 0.5000 the table ! 0.4750 Z 0.04 0.05 0.06 1.8 0.9671 0.9678 0.9686 1 1.9 0.9738 0.9744 0.9750 2.0 0.9793 0.9798 0.9803 2.1 0.9838 0.9842 0.9846 0.4750 0 0 1.96 Z Normal Distribution μ = 5, σ = 10 : P(X ≥ 8) = ? z a P X 8 PZ 0.3 1 PZ 0.30 1 0.6179 0.3821 85 0.3 10 Look up the table ! please Standard Normal Distribution Normal Distribution 10 1 5 5 0 8 X 0 0.3 Z Normal Distribution Example You work in Quality Control for GE. Light bulb life has a normal distribution with = 2000 hours and = 200 hours. What’s the probability that a bulb will last a) between 1800 and 2200 hours? b) less than 1470 hours? c) more than 2500 hours? Example Solution a) a 1800 2000 1.0 200 b 2200 2000 zup 1.0 200 zlow Normal Distribution P1800 X 2200 P 1 Z 1 PZ 1 PZ 1 0.8413 0.1587 0.6826 Standardized Normal Distribution 200 1 0 2000 1800 2200 X -1.0 1.0 Z Example Solution b) z a P X 1470 PZ 2.65 0.0040 1470 2000 2.65 200 Normal Distribution Standardized Normal Distribution = 200 =1 2000 1470 0 0.0040 X -2.65 0 Z Example Solution c) a 2500 2000 2.50 200 P X 2500 PZ 2.5 1 PZ 2.5 1 0.9938 0.0062 Normal Distribution Standardized Normal Distribution z 200 1 2000 2500 0 X 0 2.5 Z The Empirical Rule 19 Finding Random Variable X for Known Probabilities Given that P(X ≤ a) = 0.6216, what is a? Standard Normal Probability Table Firstly, find the value of z ! .6217 =1 Z .00 .01 0.2 0.0 .5000 .5040 .5080 0.1 .5398 .5438 .5478 = 0 0.31 ? z 0.31 Z 0.2 .5793 .5832 .5871 0.3 .6179 .6217 .6255 The closest value Finding Random Variable X for Known Probabilities Secondly, find the value of X = a ! Standard Normal Distribution Normal Distribution = 10 =1 .6217 = 5 8.1? Z X X X Z 5 .3110 8.1 .6217 = 0 0.31 Z Exercise 1. The thicknesses of metal plates made by a particular machine are normally distributed with a mean 0f 4.3 mm and a standard deviation of 0.12 mm a) What is the proportion of the metal plates that have thickness outside the range of 4.1 to 4.5 mm b) What are the upper and lower quartiles of the metal plate thickness? c) What is the value of c for which there is 80% probability that a metal plate has a thickness within the interval [4.3 – c, 4.3 + c]? Answer to the Exercise μ = 4.3 mm a) and σ = 0.12 mm P 4.5 X 4.1 1 P 4.1 X 4.5 1 P X 4.5 P X 4.1 4.1 4.3 4.5 4.3 zlow 1.67 zup 1.67 0.12 0.12 P 4.5 X 4.1 1 P Z 1.67 P Z 1.67 1 0.9525 0.0475 0.095 9.5% b) Lower quartile: P(X ≤ a) = 0.25 and upper quartile: P(X ≤ a) = 0.75 P X a 0.25 z 0.67 X Z 4.3 0.67 0.12 4.2196 P X a 0.75 z 0.67 X Z 4.3 0.67 0.12 4.3804 Answer to the Exercise c) It means P (a ≤ X ≤ b) = 80%, where P (X ≤ a) = 10% = 0.1 or P (X ≤ b) = 90% = 0.9. Pick either one. P X a 0.10 X Z c z 2.33 c 2.33 0.12 0.2796 c Z Linear Combination of Normal Random Variables • Linear Functions of a Normal Random Variable If X ~ N(μ, σ2) and a and b are constants then Y aX b ~ N a b, a 2 2 • The Sum of Two Independent Normal Random Variables If X1 ~ N(μ1, σ12) and X2 ~ N(μ2, σ22) are independent random variables then Y a1 X 1 a2 X 2 ~ N a11 a2 2 , a1212 a22 22 • Averaging Independent Normal Random Variables If Xi ~ N(μ, σ2), 1≤ i ≤ n, are independent random variables then their average X is distributed 2 X ~ N , n Example The annual return of the stock of company A, XA say (in percent), is distributed X A ~ N 8.0, 1.5 N 8.0, 2.25 2 In addition, suppose that the annual return from the stock of company B, XB say, is distributed X B ~ N 9.5, 4.0 independent of the stock of company A. a) What is the probability that company B’s stock performs better than company A’s stock? b) What is the probability that company B’s stock performs at least 2% points better than company A’s stock? Example Solution a) Let Y = XB – XA , then Y a1 X 1 a2 X 2 ~ N a11 a2 2 , a1212 a22 22 Y ~ N 9.5 8.0 , 4 1 2.25 Y ~ N 1.5, 6.25 or 2 1.5 and 6.25 Performs better means Y ≥ 0. PY 0 1 PY 0 and PY 0 1 PZ 0.60 1 0.2743 0.7257 0 1.5 z 0.60 6.25 Example Solution b) It means Y ≥ 2.0. PY 2.0 1 PY 2.0 and PY 2.0 1 PZ 0.20 1 0.5793 0.4207 z 2.0 1.5 0.20 6.25 Normal Approximations to the Binomial Distribution 1. Not all binomial tables exist 2. Requires large sample size 3. Gives approximate probability only n = 10 p = 0.50 f(x) .3 .2 .1 .0 0 x 2 4 6 8 4. Need correction for continuity 5. The distribution B(n, p) can be approximated by a normal distribution with the mean and variance np 2 np1 p N np, np1 p 10 Normal Approximations to the Binomial Distribution .3 f(x) Probability Added by Normal Curve .2 .1 .0 P X a ? a x Binomial Distribution: the area of all the ‘orange’ bars Normal Approximation: the area starting from the ‘blue’ vertical line to the left. So it needs correction of a ‘half’ in order to have the same area as the Binomial Correction for Continuity 1. 2. 3. A 1/2 unit adjustment to discrete variable Improves accuracy Correction for each of four cases: For P(X ≥ a), use the area above â = (a – 0.5). For P(X > a), use the area above â = (a + 0.5). For P(X ≤ a), use the area below â = (a + 0.5). For P(X < a), use the area below â = (a – 0.5). -0.5 a +0.5 Normal Approximation Procedure • Normal approximations to the binomial distribution work well as long as np 5 and n1 p 5 • For each of four cases above, use z where aˆ aˆ a 0.5 np and np1 p Example Suppose that a fair coin is tossed n times. The distribution of the number of heads obtained, X, is B(n, 0.5). If n = 100, what is the probability of obtaining between 45 and 55 heads? np 5 and n1 p 5 are satisfied since np n1 p 100 0.5 50 P 45 X 55 P aˆ X bˆ P X bˆ P X aˆ aˆ 45 0.5 44.5 and bˆ 55 0.5 55.5 zlow aˆ np 44.5 50 1.10 5 np 1 p zup bˆ np 55.5 50 1.10 5 np 1 p Example Solution P 45 X 55 P Z 1.1 P Z 1.1 0.8643 0.1357 0.7286 Using a statistical software or Excel, the exact solution of the binomial probability is 0.7287. The difference is just about 0.0001 Central Limit Theorem • If X1, …, Xn is a sequence of independent identically distributed random variables with a mean μ and a variance σ2 (not necessarily normal distributed), then the distribution of their average X can be approximated by a 2 N , n distribution. Similarly, the distribution of the sum X1 + … + Xn can be approximated by a N n , n 2 distribution. The general rule is that the approximation is adequate as long as n ≥ 30