Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Engineering Statistics Chapter 2 Special Variables 2C Normal Distribution Continuous distributions • Recall that for a continuous random variable X with probability density function f(x), the probability of an event is defined using integral. We are reminded that in this case P(X=k) = 0 for all k. In addition P(X>a) = P(Xa) and P(X<a) = P(Xa). Also P(X>a) = 1 – P(X a). b p(a X b) f ( x)dx a Normality • Among the continuous distributions, one particular one is most useful. Its pdf is shown below. f ( x) 1 2 ( x ) 2 e 2 2 This is the pdf for a normal distribution X. The mean of X is and its variance is 2. The normal distribution • Normal distributions are used to model variables which are usually obtained from measurements of naturally occurred matters. E.g. height of trees, weight of people, length of daylight hours at a certain place. • For man-made things of large quantities, and other human activities, we also expect the measurements to follow the normal distribution: mass of bricks, length of time to complete a fixed job, etc. Properties of normal distributions • If X is a normal distribution with mean and variance 2, then we write X~N(, 2) • The function f(x) of a normal distribution X is symmetric about the mean. The variance determines its shape. Small variances means the function has values close to the mean, and large variances means a flat distribution. Graphs of Normal Distributions • Small variance: Mean=2, variance 1 • Large variance: Mean=2, variance 16. Probabilities of normal distributions • As X is a continuous distribution, we obtain the probabilities by using integration. For the event P(a<X<b), the probability is 1 2 b e a ( x )2 2 2 dx The evaluation of this integral is very complex. Indeed, there is no direct method of calculation. Comparing normal distributions • However, mathematicians found that, by using transforms, we can convert the integral of the pdf into simpler forms. • Most importantly, the probability of an event in a normal distribution X can be transformed into the probability of an event in another normal distribution Y. • This means that, if we can find find the probabilities of events in a fixed normal distribution, we can use this as reference for events in other normal distributions. Comparing events • If X~N(, 2), then the probabilities of P(X>x) depends only on how far x is from in terms of . This means that we can compare the probabilities of events in normal distributions by comparing the measure (x-)/. • If X1~N(5, 4), X2~N(10, 4), X3~N(5,9), and X4~N(10, 100), then P(X1>7) = P(X2>12) = P(X3>8) = P(X4>20) because in each case, the event corresponds to one standard deviation above the mean. • This fact means that we should look for a basic distribution as reference. Standard Normal Distribution • The normal distribution with mean 0 and variance 1 is called the Standard Normal Distribution (SND). WE designate it as Z, and its pdf is usually represented as (z): ( z) 1 2 e z2 2 Properties of Z • The standard normal distribution has its mean at 0 and standard deviation 1. Its distribution is therefore symmetric about the mid-point 0. • Since the sum of all probabilities is 1, we deduce that the probability P(z>0) = P(Z<0) = 0.5. Table for SND • The table for SND is constructed for z from 0 to about 4. The probability for P(Z>4) is less than 0.00001 and is seldom used except in some critical work where extreme accuracies are required. • The UTM table shows value of P(0<Zz). Thus when you read the table for 0.4 and see 0.1554, what is meant is P(0<Z0.4) = 0.1554. Similarly, we see from the table P(0<Z1.33) = 0.4082 and P(0<Z2.17) = 0.4850. Transforming variables • If X~N(, 2), and we wish to transform a value of X = x into z, the formula is z = (x – )/ . Ex: X~N(50, 15). x = 65 z = (65 – 50)/15 = 1 x = 80 z = (80 – 50)/15 = 2 x = 45 z = (45 – 50)/15 = -0.33 x = 30 z = (30 – 50)/15 = -1.67 Finding probabilities • To calculate the probabilities for events with a normal distribution X with mean variance 2, we go through the following steps: I. Convert the x into z (SND) II. Look up the table for z. III. Interpret the results. First example • The mean of a normal distribution X is 10 and its variance is 16. Find the values of the following probabilities: (i) P(10<X14); (ii) P(X>12); (iii) P(12<X<16). Solution to First Example Solution: X~N(10, 42), mean = 10, SD = 4. • x = 10 z = 0; x = 14 z = 1 so P(10<X14) = P(0<z1) = 0.3413. • x = 12 z = 0.5 so P(X > 12) = P(z>0.5) = 0.5 – P(0<z<0.5) = 0.5 – 0.1915 = 0.3085. • x =16 z = 1.5; so P(12<X<16) = P(0.5<z<1.5) = P(0<z<1.5) – P(0<z<0.5) = 0.4332 – 0.1915 = 0.2417. Negative z • As the function z is symmetric about 0, we have P(-k<z<0) = P(0<z<k). • Similarly, we have P(z < -k) = P(z > k) P(-h < z < -k) = P(k < z <h) Thus we can interpret probabilities of events with negative z values by relating them to corresponding positive z values. Graph showing symmetry Example 2 • The height of men in a recruitment exercise has a mean of 177.2 cm and standard deviation 9.4 cm. Assume the height follows the normal distribution. What is the probability a recruit is (i) Between 177.2 and 185.5 cm? (ii) Taller than 190 cm? (iii) Taller than 175 cm? Let H represent their height. Then H~N(177.2, 9.42). (i) x = 177.2 z = 0 x = 185.8 z = (185.8 – 177.2)/9.4 = 0.91. From the table P(0z0.91) = 0.3186. So P(177.2x185.8) = P(0z0.91) = 0.3186 Example 2 solution 2 Solution (contd) (ii) x = 190 z = (190 – 177.2)/9.4 = 1.36. From the table, P(0<z<1.36) = 0.4131. So P(x>190) = P(z>1.36) = 0.5 – P(0z1.36) = 0.0869 (iii) P(X > 175) x = 175 z = (175 – 177.2)/9.4 = –0.23. So P(X > 175) = P(z > – 0.23) P(–0.23x0) = P(0z0.23) = 0.0910 P(X > 175) = P(z > – 0.23) = P(–0.23x0) + 0.5 = 0.5910. Example 2 Solution(contd) Example 3 X~N(65.5, 6.92). Hence P(X<70) = P(z<0.65) • The mean and = 0.5 + P(0<z<0.65) standard deviation of = 0.5 + 0.2734 = 0.7734. the weight of expectant mothers at a So about 77% of them are clinic are 65.5 kg and less than 70 kg in weight. 6.9 kg respectively. How many percent of them will have weight below 70 kg? Solution: Example 4 • The daily traffic volume on PLUS is expected to follow the normal distribution but is different for weekdays (Mon-Fri) and weekends. The mean for weekday is 365000, with SD 78000, and for weekend, 680000 and 102000 respectively. (i) What is the probability the volume on a weekday is between 300000 and 400000 (ii) On a day when the traffic exceeds 850000, one lane on the opposite side is used as a contra lane. How frequently does this happen on a weekend? Example 4 - Solution Solution: (i) Let X be the volume on a weekday. Then X~N(365000, 780002). P(300000<X<4000000) = P(-0.83<z<0.45) = 0.2967 + 0.1736 = 0.4703 • Graph showing (i). Example 4 – Solution (contd) (ii) Let Y represent the volume on a weekend. Y~N(680000, 1020002). So P(Y>850000) = P(z > 1.67) = 0.5 – 0.4525 = 0.0475. So the contra lane is opened 4.7% of the time. Example 5 – Negative mean • The temperature at a town has a mean of – 11.2oC and SD of 5.5oC. What is the probability (i) The temperature drops below –15oC? (ii) The temperature ranges between –5oC and 5oC? Solution: T~N(–11.2, 5.52) Solution to Ex 5 (i) P(T < –15) = P(z <[–15 – (–11.2)]/5.5) = P(z < –0.69) = 0.5 – 0.2549 = 0.2451. (ii) P(–5<T<5) = P([–5 – (–11.2)]/5.5)<z <[5 – (–11.2)]/5.5) = P(1.13 < z < 2.95) = 0.4984 – 0.3708 = 0.1176. Finding x • For a variable following normal distribution, it is also possible to estimate the percentiles based on the mean and SD. • The procedure to find the percentile is the same as calculating probability (or percentage). The only difference is that of inversely using the formula for transform. • Unlike the table for probabilities (table 5), the percentage points table of UTM (table 6) shows the value of z for P(Z>z) • Let’s look at an example. Percentage points on normal graph Example 6 • The mean monthly salary of a company’s workers is RM 1500 and its SD is RM 260. Assume this distributes normally. (i) Ahmad’s salary is higher than 85% of his coworkers. What is his salary? (ii) Mubin’s pay is at the 10th percentile. How much does he make? • X~N(1500, 2602). Percentage point graph (i) P(X>a) = 0.15 P(Z>[a-1500]/260) = 0.15. From Table 6, z = 1.0364. So [a – 1500]/260 = 1.0364. a = 1500 + 1.0364260 = 1769. Ahmad’s pay is about RM 1770. Ex 6 (ii) • Graph of normal distribution, P(Z < -z) (ii) P(X < m) = 0.10. P(Z < [m-1500]/260) = 0.1. From the table, z = 1.2816. So [m – 1500]/260 = –1.2816. M = 1500 – 1.2816260 = 1167. So Mubin gets about RM 1170. Example 7 • A supermarket sells chicken in 4 grades. 20% are grade A chicken. They are the heaviest. On the other hand, grade D, 12%, are the lightest. If the mean weight of the chicken is 2.1 kg and the SD is 0.22 kg, find the range of weight for grade A and D chicken. (Assume the weights distribute normally) W = weight of chicken. W~N(2.1, 0.222) Grade a chicken must be the heaviest. Let a be the minimum weight of a grade A chicken, then P(W>a) = 0.2. P(z>[a– 2.1]/0.22)=0.2 From the table, z0.2 = 0.8416. So [a-2.1]/0.22 = 0.8416 a = 2.29. So grade A chicken are 2.29 kg or more in weight. Negative z Grade D chicken should be lighter than d kg, where P(W < d) = 0.12. This means P(z < [d – 2.1]/0.22) = 0.12. From Table 6, we have z0.12 = 1.175. Since this is on the left of the mean, we put [d – 2.1]/0.22 = -1.175 d = 1.84. So grade D chicken are less than 1.84 kg. Example 8 • A class teacher decides to send top 3 from his 40 students to a competition. The selection is based on the average scores of some tests. The mean score of the test is 65.4, SD is 16.5. What is the minimum score of the students selected? (Assume the scores follow the normal distribution.) Example 8: Solution S~N(65.4, 16.52). Since only 3 out of 40 are selected, P(S>m) = 3/40 = 0.075. This is equivalent to P(z> [m-65.4]/16.5) = 0.075 m = 65.4 + 16.51.4395 = 89.2. So a student should score 89.2 or more to be selected. Example 9 • A forestry officer records the monthly production of logs in his area. He finds that the mean monthly yield is 2200 tonnes, with SD 420 tonnes. (i) What is the probability the production of a certain month is between 2100 and 2400 tonnes? (ii) During a wet month, because of transport problem, the production is among 15% of the lowest. How much would the yield be? Example 9: Solution Let L represent the monthly yield of logs. Then L~N(2200, 4202) (i) P(2100<L<2400) = P([2100-2200]/420<z<[2400-2200]/420) = P(-0.24<z<0.48) = 0.0948 + 0.1844 = 0.2792 (ii) Let k represent the maximum production for that month. Then P(L<k)=0.15 => P(z<[k-2200]/420) = 0.15. From Table 6, z0.15 = 1.0364. So [k-2200]/420 = -1.0364 k = 1765. This means the production for that month is 1765 tonnes or less. Sum of, and Difference between, two variables • If X1~N(1, 12) and X2~(2, 22) then X1 + X2~N(1 + 2, 12+22) and X1 – X2~N(1 – 2, 12+22). Combining two variables is used only when the two variables use the same units, such as weight and weight, height and height and so on. Example 10 • The weight of a cabbage from farms are expected to follow the normal distribution. At Farm A, the mean is 625 g with SD 86 g; at Farm B the mean is 708 g and SD 92 g. A customer selects a cabbage from each of the farms. What is the probability (i) The total weighs less than 1.4 kg? (ii) The cabbage from A is at least 100 g less than that from b? Example 10 - solution (i) A~N(625, 862) B~N(708, 922) So A+B~N(625+708, 862+922). P(A+B<1400) =P(z< [1400-1333]/ (862+922)) = P(z < 0.53) = 0.5 + 0.2019 = 0.7019. (ii) A – B~N(625 – 708, 862+922). P(A – B < –100) = P(z<[–100 –(–83) ]/ (862+922)) = P(z < – 0.13) = 0.5 – 0.0517 = 0.4483 Example 11 • A company maintains two sets of records for different types of client. For type X, the mean size of file for records is 5.8 MB, with SD 0.82 MB. For type Y, the mean size of file for records is 6.2 MB, with SD 0.76 MB. A customer from X and another from Y get married, and their files are to combined. What is the probability the new file will be between 10 and 13 MB? Ex 11 Solution X~N(5.8, 0.822), Y~N(6.2, 0.762) X+Y~(5.8+6.2, 0.822+0.762) P(10<X+Y <13) = P([10-12]/(0.822+0.762) < z < [1312]0.822+0.762)) = P(-1.79 < z < 0.89) = 0.4633 + 0.3133 = 0.7766. Multiples of a variable • If X~N(, 2), and we select 2 elements of X, then, representing the sum of the two items as Y, we have Y~N( + , 2 + 2) or N(2, 22). Generalising, if we sum up n items from X, calling it S, we should have S~N(n, n2). Ex 12 On the average, a student in UAB spends RM 12.50 per day on food, with SD RM 2.20. What is the probability (i) A group of 5 UAB students spend a total RM 70.00 or more today? (ii) A group of 15 UAB students spend a total less than RM 200.00 today? S~N(12.50, 2.202). S5 = food expenditure for 5 students; S12 = food expenditure for 12 students. (i) S5 ~ N(12.505, 2.2025) P(S570) = P(z [70 – 62.5]/(2.22 5) = P(z 1.52) = 0.5 – 0.4357 = 0.0643. (ii) S12 ~ N(12.5015, 2.20215) P(S15<200) = P(z < [200 – 187.5]/(2.22 15) = P(z < 1.78) = 0.5 + 0.4625 = 0.9625.