Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
2.4 Applications of Standard Deviation CH6 p83 Z-Score: Relative Standing - the mean can be considered as the middle of a data set that is symmetrically distributed - since there are many values in our data set it becomes important to determine how unusual a data value maybe. - a data value is considered unusual if it is a great distance away from the mean Commonly a symmetric data distribution is expected when considering task completion times. It is observed that clerks complete a particular task with a mean of 10 minutes and a standard deviation of 2 minutes. Distribution of Task Completion Times Normal, Mean=10, StDev=2 0.20 Probability 0.15 0.10 0.05 0.00 10 12 Time to complete a task ( minutes) 15 Some clerks will take more than 10 minutes while others will require less time. But commonly it will require about 10 minutes to complete this task. For example Clerk A takes 12 minutes while clerk B takes 15 minutes to complete this task. The 15 minute time is more unusual since it is further away from the mean. The 12 minutes would be 1 standard deviation to the right of the mean while 15 is 2.5 standard deviations to the right of the mean. Converting the 15 minutes into a z-score will provide a measure of the degree of extremeness. z= y−μ σ = 15 − 10 = 2.5 2 So the z-score is simply the count of standard deviations away from the mean. Notice that 2.5 is very close to the right tail of the normal curve. So to observe a clerk completing the task at 3 our 4 standard deviations away from the mean is very extreme or unlikely ( but not impossible ). In the future we will calculate the probability of clerks requiring 17 or 20 or more minutes to complete the task. 24Normal 35 W07 1 2/19/2007 - given a symmetric data distribution where typically a clerk completes a task with a mean of 10 minutes and a standard deviation of 2 minutes. eg.) A student scores 75 in economics and 80 in statistics. In which subject did the student do better than the rest of the class? μ Economics 68.1 Statistics 75 σ 5.2 8.3 Student Standing in Economics and Statistics Classes Normal 0.08 Mean StDev 68.1 5.2 75 8.3 0.07 Density 0.06 0.05 0.04 0.03 0.02 0.01 0.00 z= 50 60 70 80 Test Scores 90 100 y−μ σ 75 − 68.1 ze = = 1.3 1.3 std. dev. above the mean 5.2 80 − 75 zs = = 0.6 0.6 std. dev. above the mean 8.3 - the student had a higher standing in economics - a greater proportion of students in economics scored less than the student than in statistics 24Normal 35 W07 2 2/19/2007 Eg.) Person 1 can complete job A in 10.5 minutes while person 2 can complete job B in 22.3 minutes. Which person is quicker doing his job when compared to the average? Job A 12.6 5.2 μ σ Job B 27.3 3.1 Distribution Plot Normal 0.14 Mean StDev 12.6 5.2 27.3 3.1 0.12 Density 0.10 0.08 0.06 0.04 0.02 0.00 0 10 20 30 Task completion times ( minutes) 40 z = x −s x z 1 = 10.5 − 12.6 = −0.4 5.2 z 2 = 22.3 − 27.3 = −1.6 3.1 - the faster person would have a more negative z value Person 2 completing Job B in 22.3 minutes is faster within his group than person A completing a Job A in 10.5 minutes 24Normal 35 W07 3 2/19/2007 eg. ) The average income in community A is $76,000 ( std. dev. $3,100) and the average income in community B is $69,000 ( std. dev. $4,500 ). Jones living in A has an income of $74,500 and Grant in B has an income of $68,000. Which person is better off relative to his community? μ σ Community A 76,000 3,100 Community B 74,500 4,500 Distribution Plot Normal 0.00014 Mean StDev 76000 3100 74500 4500 0.00012 Density 0.00010 0.00008 0.00006 0.00004 0.00002 0.00000 60000 65000 70000 75000 X 80000 85000 90000 z = x −s x z J = 74500 − 76000 = −0.484 3100 z G = 68000 − 69000 = −0.222 4500 Grant has a higher relative standing for income in his community Homework read pg 83 – 97 24Normal 35 W07 pg 99: 5, 7, 15 4 2/19/2007 The 68-95-99.7 Rule (Empirical Rule ) - for an approximately bell shaped data distribution 68% of the data will lie within +/- 1 std. dev. of the mean 95% of the data will lie within +/- 2 std. dev. of the mean 99.7% of the data will lie within +/- 3 std. dev. of the mean eg. An employee reports that he required 26 minutes to complete a task. Historically the average time to complete this task is 19.5 minutes ( 1.6 min. std. dev. ). Is the reported time excessive? 9 9 .7 % 95% 68% 1 9 .5 - 3 X 1 .6 1 4 .7 1 9 .5 - 2 X 1 .6 1 6 .3 1 9 .5 - 1 .6 1 7 .9 0 1 9 .5 1 9 .5 + 1 .6 1 9 .5 + 2 X 1 .9 1 9 .5 + 3 X 1 .6 2 1 .1 2 2 .7 2 4 .3 - expected variability is normally defined by an interval of 3 std. dev. on either side of the mean - if there are 262 employees then you should expect that at least 0.997 x 262 = 261 will complete the task within 14.7 and 24.3 minutes - thus a 26 minute time period appears to be excessive 24Normal 35 W07 5 2/19/2007 Eg) The distribution of heights of adult men is approximately normal with mean 69 inches and standard deviation 2.5 inches. Draw a normal curve on which this mean and standard deviation are correctly located. ( Hint: Draw the curve first, locate the points where the curvature changes, then mark the horizontal axis.) m ean = 69 in std. dev. = 2.5 in 69-3X2.5 61.5 Eg) 69-2X2.5 64.0 69-2.5 69 66.5 69+2.5 71.5 6.9+2X2.5 74.0 69+3 X2.5 76.5 Scores on the Wechsler Adult Intelligence Scale ( a standard “IQ test”) for the 20 to 34 age group are approximately normally distributed with = 110 and " = 25. A) About what percent of people in this age group have scores above 110? the percent greater than 110 is 50% since 110 is the mean and median (middle ). B) About what percent have scores above 160? the percent greater than 160 is 2.5% since 160 is 2 standard deviations to the right of the mean and 5% / 2 = 2.5% C) In what range do the middle 95% of all the scores lie? the middle 95% would lie within mean +/- 2 X std. Dev. 110 +/- 2 X 25 60 to 170 Homework pg 100 11, 15 24Normal 35 W07 6 2/19/2007 Normal Probability Distribution A data distribution which is uimodal and approximately symmetric is known as Normal - note that the shape is smooth and flowing and this is due to the continuous number nature of the data GENERAL OBSERVATIONS 1. the bell shaped form is the result of the symmetrical distribution of continuous data about the mean 2. the x axis represents a measured variable 3. the y axis represents proportion or probability 4. the tails of the curve approach the x axis ( asymptotes ) Note that not all bell shaped curves are normal distributions, one test of normality is to see whether the data meets the Empirical Rule. - mathematics can provide a precise definition of a Normal Probability Distribution Model - an equation can be used to plot the normal curve P(x ) = 1 " 2 e− 1 x− 2 ( ) 2 " -complex in appearance, but most of the symbols represent constants = 3.14159 ) ( e = 2.71828, - it is and " that controls the model - you can construct the NPD model by knowing only and " Distribution Plot Normal Mean StDev 10 1 20 3 0.4 Density 0.3 0.2 0.1 0.0 10 15 20 25 30 X 2 > 1 the mean controls the position of the curve along the x axis " 2 > " 1 the standard deviation controls the shape 24Normal 35 W07 7 2/19/2007 - the NPD curve is a density curve - since all possible observations can be identified under the curve - then let the area under the curve equal one - the proportion of observations can be determined by calculating the area under the NPD curve eg) The management of an online company wishes to determine the proportion of customers who spend more than 15 minutes at their WEB site. Historically it has been shown that the NPD serves as a good model for the situation with a = 13.1 minutes and a " = 2.2 minutes. What can management conclude? -find the proportion of the shaded area to the whole area std. dev. = 2.2 shaded area 32 P( x ) = ------------------- = ---- = 0.211 = 21.1% whole area 152 -thus 21.1% of the total area is found to be greater than 15 13.1 15 minutes and this is also the proportion of customers expected to spend more than 15 minutes at the WEB page -the method is not practical -we can use a table to solve the problem, but we cannot use the problem data directly, otherwise, we would need a table for each new problem -we must transform the x values into z values or z scores z= x− " std. dev. = 2.2 15 - 13.1 z score of 15 mins. = ----------------- = 0.863 2.2 - the z score of the mean is 0 13.1 0 -ve 15 +0.86 +ve - z scores to the right of the mean are positive and to the left are negative -go to table A56 in your text book -the standard normal probabilities table relates a z value to the area under the curve 24Normal 35 W07 8 2/19/2007 x z A( 0.86 ) = 0.8051 0.1949 thus the proportion of customers spending more than 15 minutes is 1 - 0.8051 = or 19.5% -compare to the initial calculated value Finding the Area Under the Normal Curve eg) Find the area under the NPD curve which lies a) Area { z < +0.69 } 0 A( 0.69 ) = 0.7549 + 0 .6 9 b) Area { z > -1.31 } A( -1.31 ) = 0.0951 -1 .31 0 A = 1.0 - 0.0951 = 0.9049 c) Area { z > 2.0 } A( 2.0 ) = 0.9772 A = 1.0 - 0.9772 = 0.0228 0 2 0 1 d) Area { -1.0 < z < +1.0 } A( -1.0 ) = 0.1587 A( +1.0 ) = 0.8413 A = 0.8413 - 0.1587 = 0.6826 -1 e) Area { 1.11 < z < 1.71 } A( 1.71 ) = 0.9564 A( 1.11 ) = 0.8665 A = 0.9564 - 0.8665 = 0.0899 0 1.11 1.71 24Normal 35 W07 9 2/19/2007 F) Area { -1.4 < z OR z > 2.1 } A( -1.4 ) = 0.0808 A( 2.1 ) = 0.9821 -1 .4 0 2.1 A = 0.0808 + ( 1 - 0.9821 ) = 0.0987 eg) Find the z value that a) leaves 0.1251 in the lower tail A( z ) = 0.1251 z = -1.15 0.1251 0 z b) leaves 45.62% in the upper tail 0.4562 A( z ) = 1 - 0.4562 = 0.5438 z = +0.11 0 z c) the area to the right of z is 0.0044 A( z ) = 1.0 - 0.0044 = 0.9956 0.0044 z = 2.62 0 d) leaves 60% of the observations fall above it z A( z ) = 1 - 0.600 = 0.4000 0 .6 0 0 0 0.25 : 0.4013 - 0,4000 = 0.0013 z 0 0.26: 0.4000 - 0.3974 = 0.0026 E) the z values that symmetrically distribute 10% in the two tails A( z ) = 0.05; z = -1.65 Due to symmetry the upper tail z = 1.65 0.05 0.05 z Homework pg 101 19, 21 24Normal 35 W07 10 2/19/2007 0 z Applied Normal Distribution Problems eg) An IQ test is given to 750 students and the results show = 100 and "= 15. a) How many students scored between 90 and 105? z= 90 - 100 z = -------------- = -0.67 x− " A( -0.67 ) = 0.2514 105 - 100 z = -------------- = +0.33 15 15 A( +0.33 ) = 0.6293 90 -0.67 x z 100 105 0 +0.33 A = 0.6293 - 0.2514 = 0.3779 number of students 0.3779 * 750 = 283 b) Find the number of students who scored above genius ( IQ > 140 )? 140 - 100 z = --------------- = 2.67 15 A( 2.67 ) = 0.9962 P = 1.0 - 0.9962 = 0.0038 100 140 0 2.67 x z number of students 0.0038 * 750 = 3 c) Find the first quartile score A( z ) = 0.25 z = -0.67 x - 100 -0.67 = ------------15 -10.05 = x - 100 x = 89.95 0.25 x 100 -0.67 0 24Normal 35 W07 11 2/19/2007 x z eg) The length of trout caught in Muskoka Lake is normally distributed with = 21 cm and "= 6 cm. By law trout less than 17 cm must be returned. a) What proportion of your catch must be returned? 17 - 21 z = ----------- = -0.667 6 A( -0.67 ) = 0.2514 17 21 -.67 0 x z P( <17 cm ) = 0.2514 b) What is the chance of catching a trout between 25 and 35 cm 25 - 21 z = ----------- = +0.667 6 A( 0.67 ) = 0.7486 35 - 21 z = ----------- = 2.333 6 A( 2.33 ) = 0.9901 6 21 25 35 x 0 .67 2.33 z P = 0.9901 - 0.7486 = 0.2415 eg) The design tolerances of a sheet of glass are 36.50 mm and 36.60 mm. If the mean is 36.56 mm and the std. dev. is 0.02 mm, then how many from a shipment of 1000 sheets will be rejected. z= x− " 36.50 - 36.56 z = -------------------- = -3.00 0.02 36.60 - 36.56 z = -------------------- = 2.00 0.02 A( -3.00 ) = 0.0013 0.02 36.50 -3.0 36.56 0 A( 2.00 ) = 0.9772 A = 1 - ( 0.9772 - 0.0013 ) = 0.0241 number of sheets = 1000 * 0.0241 = 24 24Normal 35 W07 12 2/19/2007 36.60 x 2.0 z eg) A large tire dealer promised to refund the total purchase price of snow tires to November customers if the total snowfall for the winter was less than 40 cm. Climatological records for the city reveal that the total snowfall averages 120 cm with a standard deviation of 35 cm. Calculate the probability that the dealer will have to repay his customers. 40 - 120 z = ------------ = -2.29 35 0.011 40 -2.29 120 0 x z A( -2.29 ) = 0.011 P = 0.011 there is a 1% chance eg) The average weight of soap in 1kg boxes is 1.02 kg ( "= 0.018 kg ). a) What percent of the boxes weigh less than 1.00 kg? 1.00 - 1.02 z = -------------- = -1.11 0.018 13.4% x z 1 1.02 -1.11 0 A( -1.11 ) = 0.1335 P = 0.1335 = 13.4% b) Regulation requires that no more than 5% of the boxes are to weigh less than 1 kg. What should be the new mean setting? A( z ) = 0.05 z = -1.65 1.0 - -1.65 = -------------0.018 -0.0297 = 1.0 - 24Normal 35 W07 1 mean - = 1.03 kg 13 2/19/2007 x z eg) The manager of a pizza business observes that the average delivery time is 24.4 min ( " = 3.7 min ). What is the shortest time period in which 99% of the deliveries can be made? A( z ) = 0.99 z = 2.33 x - 24.4 2.33 = -----------3.7 8.62 = x - 24.4 eg) 1 - 24.4 0 x x z 2.33 x = 33 min What affect does the variability in delivery times have on the guaranteed delivery time? Thirty percent of the time, a task will require at most 9.6 minutes to complete a task. Find the average time to complete the task, if " = 2.8 minutes. A( z ) = 0.3 z = -0.52 0.30 9.6 - -0.52 = ---------2.8 -1.456 = 9.6 - 9.6 mean -0.52 0 = 11.1 min Homework pg 102: 25, 27, 29, 31 24Normal 35 W07 14 2/19/2007 x z Bonus Question It has been observed that people spend on average 15.3 minutes ( " =2.6 minutes ) visiting a bank WEB page. A) What proportion of the visitors will spend at most 12 minutes? z= x− " = 12 − 15.3 = −1.27 2.6 x z 12.3 15.3 -1.27 0 A(−1.27) = 0.1020 B) What is the shortest time period in which 30% ( third decile ) of the visits are completed? A(z) = 0.300 z = −0.52 − 0.52 = x − 15.3 2.6 x 15.3 -0.52 0 − 1.352 = x − 15.3 x = 13.9 minutes Air Canada claims that using the Ckeck In Kiosk takes less than 30 seconds. 24Normal 35 W07 15 2/19/2007 x z Extra Problems Eg) An important measure of the performance of a locomotive is its “adhesion”, which is the locomotive’s pulling force as a multiple of its weight. The adhesion of one 4400 horsepower diesel locomotive model varies in actual use according to a normal distribution with = 0.37and standard deviation " = 0.04. A) What proportion of adhesions measured in use are higher than 0.40? x− " = 0.4 − 0.37 = 0.75 0.04 A(0.75) = 0.7734 z= A = 1 − 0.7734 = 0.2266 0.37 0.4 0 0.75 x z 22.7% of adhesions measured in use are greater than 0.40. B) What proportion of adhesions are between 0.40 and 0.50? x− " z 1 = 0.4 − 0.37 = 0.75 A(0.75) = 0.7734 0.04 z 2 = 0.5 − 0.37 = 3.25 A(3.25) = 0.9994 0.04 A = 0.9994 − 0.7734 = 0.226 z= 0.37 0.4 0.5 x 0 0.75 3.25z 22.6% of the adhesions fall between 0.4 and 0.5 C) Improvements in the locomotive’s computer control’s change the distribution of adhesion to a normal distribution with = 0.37 and " = 0.02. Find the proportions in (a) and (b) after this improvement. x− " = 0.4 − 0.41 = −0.5 0.02 A(−0.5) = 0.3085 z= A = 1 − 0.3085 = 0.6915 24Normal 35 W07 16 2/19/2007 69.2% of adhesions measured in use are greater than 0.40. 0.4 0.41 -0.5 0 x− z= " z 1 = 0.4 − 0.41 = −0.5 A(−0.5) = 0.3085 0.02 z 2 = 0.5 − 0.41 = 4.5 0.02 x z - note that z = 4.5 is beyond the table which stops at z = 3.09 - it is possible to have z values beyond the table and since the accumulated area at the end of the table is very very close to 1 the accumulated area beyond the table can be taken as 1 or at least 0.9998 0.4 0.41 -0.5 0 x 0.5 z 4.5 A(4.5) = 1 A = 1 − 0.3085 = 0.6915 69.2% of adhesions measured in use are between 0.4 and 0.5 Eg) The annual rate of return on stock indexes (which combine many individual stocks ) is approximately normal. Since 1945, the Standard & Poor’s 500 index has had a mean yearly return of 12%, with a standard deviation of 16.5%. Take this normal distribution to be the distribution of yearly returns over a long period. A) In what range do the middle 95% of all early returns lie? 95 % -2 stdev 12 2 stdev the 68-95-99.7 Rule specifies that the interval which will capture the middle 95% is found at the mean ! 2(std. dev. ) = 12% ! 2(16.5% ) 24Normal 35 W07 17 − 21% w 45% 2/19/2007 B) The market is down for the year if the return on the index is less than zero. In what proportion of year’s is the market down? z = 0 − 12 = −0.727 16.5 A(−0.73) = 0.2327 Since 1945 the market has been down for 23.7% of the years x z 0 12 -0.73 0 C) In what proration of years does the index gain 25% or more? z = 25 − 12 = 0.788 16.5 A(0.79) = 0.7852 A = 1 − 0.7852 = 0.2121 12 0 25 0.79 Since 1945 the index has gained more than 25% for 21.2% of the years 24Normal 35 W07 18 2/19/2007 x z Eg) The length of human pregnancies from conception to birth varies according to a distribution that is approximately normal with mean 266 days and standard deviation 16 days. A) What percent of pregnancies last less than 240 days ( that’s about 8 months )? x− " = 240 − 266 = −1.625 16 A(−1.63) = 0.0516 z= 240 266 -1.63 0 x z 5.2% of the pregnancies last less than 240 days B) What percent of the pregnancies last between 240 and 270 days ( roughly between 8 and 9 months)? x− " z 1 = 240 − 266 = −1.63 A(1.63) = 0.0516 16 z 2 = 270 − 266 = 0.25 A(0.25) = 0.5987 16 A = 0.5987 − 0.0516 = 0.5471 z= 240 266 270 -1.63 0 0.25 x z 5.5% of the pregnancies last between 240 and 270 days C) How long do the longest 20% of the pregnancies last? A(z) = 1 − 0.2 = 0.8 z = 0.84 x− z= " 0.84 = x − 266 16 13.44 = x − 266 266 x 0 .84 x = 279.44 The longest 20% of the pregnancies would require more than 279 days 24Normal 35 W07 19 2/19/2007 x z