Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Page 1 of 8 Normal Distributions What makes a normal distribution normal? This is just another way of saying we should see something that looks like a bell-shaped continuous curve. Each of the curves to the right has that characteristic. If we are talking about heights of men in a certain group, say the Bambutu pygmy tribe in Africa, we are not surprised that the average height of an adult male is about 51 inches. Most adults have a height near that size while a few are somewhat taller or shorter. By contrast, in the Tutsi tribe, also called the Watusi, heights average near a spectacular 7 feet tall! Neither of these populations is normal in height to our way of thinking, but each tribe’s distribution of heights is normal. C Mathematically a curve is normal when it demonstrates a symmetry with scores more concentrated in the middle than in the tails. C Normal curves are defined by two parameters: the mean (µ) and the standard deviation (σ). C The more normal a curve is the closer the median value1 is to the mean (or vice versa). They also get very close to the absolute maximum of the normal curve. C Many kinds of behavioral data are approximated well by a normal distribution. Many statistical tests assume a normal distribution. Most of those tests work well even if the distribution is only approximately normal as long as the distribution does not deviate greatly from normality. We should be able to find a mean (µ) and a standard deviation (σ) so that a normally distributed population can be modeled by or fitted to 1 f ( x) 2 2 e ( x )2 2 2 . That function has all the characteristics mentioned before and has become the standard equation for a normal curve. After that, everything we have ever said about a continuous probability distribution holds true. C The area under a normal curve is one. Hence, f ( x) C 2 2 e 1. No probability of an event can exceed one and all probabilities must be positive values. b Hence, 0 b P ( a x b) f ( x ) a C 1 ( x )2 2 2 a 1 2 2 e ( x )2 2 2 1 where a b are both reasonable outcomes in the experiment. Since f ( x ) is continuous, the probability of any specific result, say a height of exactly 6 feet, doesn’t really come from the function directly. The value x 6 describes one of infinitely many 1 This is the value that splits the data into to halves with 50% above and 50% below. Copyright 2010 - ASU School of Mathematical and Statistical Sciences (Terry Turner) Page 2 of 8 Normal Distributions points. So P ( x 6) 0 technically and is infinitesimally small realistically. We should more correctly ask what is the probability of a height between say 5.999 feet and 6.001 feet. Then we can use integration to find that probability. x2 There’s the problem. Notice that the function above is a translation of . You should recall y e from your previous calculus experience that ye x2 has no neat antiderivative. So the integration process must be done “numerically” or with some kind of table of values. For the latter, numerous tables have been created to do this kind of stuff. However, most of them assume that the data has been “standardized” in some way. Standardizing Data (z-scores) Standardization takes raw data with any mean (µ) and standard deviation (σ) and fits it to a curve with mean of 0 and standard deviation of 1. Then those neat tables work well. This is really a simple process. We just calculate the z-score, z x . Suppose the Bambutu tribe does have an average adult male height of 51 inches and the standard deviation for the male population is 4 inches. 1. Someone with a height of 51 inches (an average joe) would have a zscore of zero. 2. Someone with a height of 55 inches is one standard deviation above average 1 . 3. Someone with a height of 47 inches is one standard deviation below average 1 . 4. Note that z 55 51 47 51 1 and z 1 . 4 4 Then if we wanted to know what portion of the population is within one standard deviation of the mean, we should do the calculation stream below. Using the given mean and standard deviation, x 1 P z 1 2 P( x ) P x 1 P Notice that it doesn’t matter what the mean and standard deviation are. All we need to know is the z-score for the range of values of interest to us. Then we can apply the table to the right 3 and what we know about the symmetry of the normal curve to extract the probability we need. Normal Distribution z -Table Z 0.00 0.5040 0.1 0.5398 0.5438 0.2 0.5793 0.5832 0.3 0.6179 0.6217 0.4 0.6554 0.6591 0.5 0.6915 0.6950 0.6 0.7257 0.7291 0.7 0.7580 0.7611 0.8 0.7881 0.7910 0.9 0.8159 0.8186 1.0 0.8413 0.8438 1.1 0.8643 0.8665 1.2 0.8849 0.8869 1.3 0.9032 0.9049 1.4 0.9192 0.9207 1.5 0.9332 0.9345 1.6 0.9452 0.9463 1.7 0.9554 0.9564 1.8 0.9641 0.9649 1.9 0.9713 0.9719 2.0 0.9772 0.9778 2 Since σ > 0, we can do the division and slip it into the absolute value without a problem. 3 A more complete table is provided at the end of this lesson. You can find tables tailored to many needs just by googling for normal distribution tables. Copyright 2010 - ASU School of Mathematical and Statistical Sciences (Terry Turner) 0.01 0.0 0.5000 Normal Distributions Page 3 of 8 Note that from the first red entry ( z 0 ) to the second red entry ( z 1 ) we have z-scores in one-hundredths. Since z 0 is the line of symmetry for our normal bell-shaped curve, half of the area under the curve is already accounted for. To get the area for 0 z 1 we need to subtract the two red values: 0.8413 0.5000 0.3413 . Normal Distribution z -Table Z Since we need the range from 1 z 1 , symmetry allows us to double the result to get about 0.68 . 68%.4 So what is P z 2 ? 1. This is the probability that a data value is within 2 standard deviations of the mean. 2. To get this we need the blue value. 3. Then we calculate the area under the curve from 0 z 2 as 0.9772 0.5000 0.4772 . 4. Once again we double the result to get another rule of thumb: P z 2 95.44% . You should use the complete table to estimate the area within three standard deviations of the mean yourself. 0.00 0.01 0.0 0.5000 0.5040 0.1 0.5398 0.5438 0.2 0.5793 0.5832 0.3 0.6179 0.6217 0.4 0.6554 0.6591 0.5 0.6915 0.6950 0.6 0.7257 0.7291 0.7 0.7580 0.7611 0.8 0.7881 0.7910 0.9 0.8159 0.8186 1.0 0.8413 0.8438 1.1 0.8643 0.8665 1.2 0.8849 0.8869 1.3 0.9032 0.9049 1.4 0.9192 0.9207 1.5 0.9332 0.9345 1.6 0.9452 0.9463 1.7 0.9554 0.9564 1.8 0.9641 0.9649 1.9 0.9713 0.9719 2.0 0.9772 0.9778 Now what if we wanted P .5 z 1.5 ? 1. The table does it with the green entries. 2. We need the area reading for z 0.5 and z 1.5 . 3. Hence, P .5 z 0 0.6915 .5000 0.1915 by symmetry P(0 z 1.5) 0.9332 .5000 0.4332 P(.5 z 1.5) 0.1915 0.4332 0.6247 Interpretations .5 z 1.5 x .5 1.5 Now let’s see what all that means. The problem began with a statement that “the Bambutu tribe does have an average adult male height of 51 inches and the standard deviation for the male population is 4 inches.” So what is the height range for some male who falls into the .5 z 1.5 band? It is just a matter of unwrapping the z-score as I did to the right. Anyone falling into that range of z-scores is in the height range from 49 inches to 57 inches. There is about a 62.5% chance that some male falls into that height range in the normal population. 4 x 51 1.5 4 .5(4) x 51 1.5(4) .5(4) 51 x 1.5(4) 51 This is a rule of thumb: For a normal curve about 68% of all data points fall into the band Copyright 2010 - ASU School of Mathematical and Statistical Sciences (Terry Turner) .5 49 x 57 1 z 1 . Page 4 of 8 Normal Distributions Let’s get back to the Tutsi tribe. Let’s suppose the average height is 70 inches and the standard deviation is 3 inches. What is the z-score for someone with a height of 57 inches? z 57 70 4.3 ! 3 He is more than 4 standard deviations below the average height. He would rate as a midget in his hometown and a giant among the Bambutu! What is the probability that someone from the Tutsi would be 57 inches or shorter? In my table, the value at z 4.3 is not given. We need a better table or a calculator or computer. Your TI fits the bill nicely. Look for the “VARS” key and then look at the calculator body. You should see “DISTR.” Do a 2NDVARS to get to it. In there you’ll find a number of distributions. Find option 2, “normalcdf.”5 • The format for the option is normalcdf(lower, upper, μ, σ). upper • It is the calculation P (lower x upper ) lower 1 2 2 e ( x )2 2 2 . Select, then complete, the command to see normalcdf(-E99,57,70,3).6 Try this command: normalcdf(0,57,70,3). Notice that since this population must have some height, we really didn’t need to go out to the infinitely long tail to the left. Go back now and calculate the probability that a Bambutu tribesman is in the height range 49 to 57 inches with a mean of 51 and standard deviation of 4 inches. The command normalcdf(49,57,51,4) produces a probability of 0.6246552391... While you are at it, try this command: normalcdf(-.5,1.5). You should get 0.6246552391... Without the mean and standard deviation, normalcdf assumes that they are 0 and 1, respectively. If you needed a number of values, try entering the command normalcdf(0,x,51,4) as a function in the “Y=” area. Then you can use trace or TABLE to get your values. Notice that the curve is asymptotic to y = 1. Don’t forget that the normal distribution is continuous with a domain of all real numbers even though its application may have a truncated domain. A Few More Examples Example: Find the probability that anyone is within 5 standard deviations of the average height for the entire world. Note that neither μ nor σ are given. However we don’t need them because we have been given a z-score problem. We use normalcdf(-5,5)to get 0.9999994258 . . . 5 Note: The only good use for normalpdf is to draw the normal curve. 6 Here’s the rub about the TI: It doesn’t have a symbolic infinity. However, we can fool it by using -E99 or E99 (by using the EE key) since it is so far below (or above) zero that it does the job. Copyright 2010 - ASU School of Mathematical and Statistical Sciences (Terry Turner) Normal Distributions Example: Page 5 of 8 Find the probability that someone is more than 5 standard deviations above the average height for the entire world. Note that neither μ nor σ are given. We still don’t need them because it is a z-score problem. We use normalcdf(5,E99)to get Example: 2.871049995 107 . Find the probability that someone is at most 5 standard deviations above the average height for the entire world. We use normalcdf(-E99,5) or 1 - normalcdf(5,E99) to get 0.9999997129 . . . Example: One hundred students took a standardized test. The average was 77.5 and the standard deviation was 6.5. How many students were within one standard deviation of the mean? Try all of these: normalcdf(77.5-6.5,77.5+6.5,77.5,6.5) normalcdf(-1,1) normalcdf(0,1)*2 Then recall that this is a standardized test, intended to create a normal distribution. We did that calculation at the beginning of the lesson. Hence, we expected to see at least 68 students and might see a 69th. We don’t expect to see 68.2 since that constitutes a gory mess in the testing center! Normal versus Binomial Suppose you wanted to find the probability that between 0 and 10 flights might arrive late out of a flight schedule of 5,000 flights. That takes us back the binomial probability model. We need a probability for any particular flight to be late. Let’s assume p = 0.01 (1%) for a flight to be late under usual conditions. Using my TI, I calculated using binomcdf(5000,0.01,0,10)= 5.48478 1012 . That’s pretty small. Back in the old days calculating these probabilities was extremely tedious. So for large numbers of trials we relied on the visible fact that the binomial probability model begins to look like a stair stepish version of the normal model. Look at the graph (obtained through excel®)of this situation. If that doesn’t look “normal,” it is difficult to imagine what would be. The red curve is a normal distribution with appropriate mean and standard deviation. 0.06 0.05 0.04 0.03 0.02 0.01 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 Copyright 2010 - ASU School of Mathematical and Statistical Sciences (Terry Turner) Normal Distributions Page 6 of 8 Approximate binomcdf(5000,0.01,0,10)using the facts that for the binomial probability model np and npq np (1 p ) and the normalcdf(lower, upper, , ) feature of your TI.7 For practice, convert the values to z-scores for your range (0 to 10): z x np 0 50 x np 10 50 7.10669 and z 5.68535 np (1 p ) 49.5 np (1 p ) 49.5 Then, the correct command is normalcdf 7.10669, 5.68535 . Using it, I got a result of 6.54518 109 . While both results are incredibly small, the discrepancy is incredibly large (in relative terms one is 1,000 times greater than the other). So the questions is when do these seemingly similar (in shape) distributions begin to approximate each other? Our rules-of-thumb for knowing when the normal approximation to the binomial is valid are as follows: 1. n must be at least 30, AND 2. np must be at least 10, AND 3. nq n(1 p) must be at least 10. We met these criteria, but we did choose a bad place for approximating. The results would be more valid for a range further away from the tails of the graph because the values we chose to include in the binomial distribution have a minuscule impact on the total. Let’s try a little more middle-of-the-road calculation. Example: Suppose you wanted to approximate the probability that between 50 and 150 flights might arrive late out of a flight schedule of 5,000 flights. Let’s assume p = 0.01 (1%) for a flight to be late under usual conditions. Try these commands: normalcdf 50,150,50, 49.5) = 0.49999999995 binomcdf(5000,0.01,0,150) binomcdf(5000,0.01,0,49) = 0.519092 .8 The two results are not far part. So the normal curve does approximate the binomial curve. There is one other feature we can use. We can apply a “continuity correction” where we adjust the range interval to handle how the binomial histogram works. Generally, we would adjust one-half unit at each end of the range. There is a little “Kentucky windage” here. Try this calculation which stretched out at both ends: normalcdf 49.5,150.5,50, 49.5) 0.528328 . Looking at the graph, notice that the histogram is inside the normal curve to the right. We try this as a 7 These facts were developed in your text book. 8 Notice that since this is discrete, we discarded the {0,1,2,...,49} set of probabilities. Copyright 2010 - ASU School of Mathematical and Statistical Sciences (Terry Turner) Page 7 of 8 Normal Distributions “better” approximation: normalcdf 49.5,149.5,50, 49.5) 0.528328 also. The result is a little better. Since we were approximating in any case, we should be happy enough with either of them. These days we aren’t likely to do the approximation anyway. With calculators and computers we can usually get the correct result directly. In this course you have studied only two regularly used distributions, the normal and the binomial. Almost every other one was some arbitrarily created example just to get you to work with the probability modeling process. However, the process of working with any distribution is the same. Being solidly founded in these two will take you a long way in a statistics course. If you have access to a spread-sheeting program, you should find that every useful command the TI can create is available there, but you can get a neat print-out to work with. You should learn about this if you plan to make use of statistical or probabilistic processes in your business life. Normal Distribution Table ( 0 z 3.9 ) Z 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 0.00 0.5000 0.5398 0.5793 0.6179 0.6554 0.6915 0.7257 0.7580 0.7881 0.8159 0.8413 0.8643 0.8849 0.9032 0.9192 0.9332 0.9452 0.9554 0.9641 0.9713 0.9772 0.9821 0.9861 0.9893 0.9918 0.9938 0.9953 0.9965 0.9974 0.9981 0.9987 0.01 0.5040 0.5438 0.5832 0.6217 0.6591 0.6950 0.7291 0.7611 0.7910 0.8186 0.8438 0.8665 0.8869 0.9049 0.9207 0.9345 0.9463 0.9564 0.9649 0.9719 0.9778 0.9826 0.9864 0.9896 0.9920 0.9940 0.9955 0.9966 0.9975 0.9982 0.9987 0.02 0.5080 0.5478 0.5871 0.6255 0.6628 0.6985 0.7324 0.7642 0.7939 0.8212 0.8461 0.8686 0.8888 0.9066 0.9222 0.9357 0.9474 0.9573 0.9656 0.9726 0.9783 0.9830 0.9868 0.9898 0.9922 0.9941 0.9956 0.9967 0.9976 0.9982 0.9987 0.03 0.5120 0.5517 0.5910 0.6293 0.6664 0.7019 0.7357 0.7673 0.7967 0.8238 0.8485 0.8708 0.8907 0.9082 0.9236 0.9370 0.9484 0.9582 0.9664 0.9732 0.9788 0.9834 0.9871 0.9901 0.9925 0.9943 0.9957 0.9968 0.9977 0.9983 0.9988 0.04 0.5160 0.5557 0.5948 0.6331 0.6700 0.7054 0.7389 0.7704 0.7995 0.8264 0.8508 0.8729 0.8925 0.9099 0.9251 0.9382 0.9495 0.9591 0.9671 0.9738 0.9793 0.9838 0.9875 0.9904 0.9927 0.9945 0.9959 0.9969 0.9977 0.9984 0.9988 0.05 0.5199 0.5596 0.5987 0.6368 0.6736 0.7088 0.7422 0.7734 0.8023 0.8289 0.8531 0.8749 0.8944 0.9115 0.9265 0.9394 0.9505 0.9599 0.9678 0.9744 0.9798 0.9842 0.9878 0.9906 0.9929 0.9946 0.9960 0.9970 0.9978 0.9984 0.9989 0.06 0.5239 0.5636 0.6026 0.6406 0.6772 0.7123 0.7454 0.7764 0.8051 0.8315 0.8554 0.8770 0.8962 0.9131 0.9279 0.9406 0.9515 0.9608 0.9686 0.9750 0.9803 0.9846 0.9881 0.9909 0.9931 0.9948 0.9961 0.9971 0.9979 0.9985 0.9989 Copyright 2010 - ASU School of Mathematical and Statistical Sciences (Terry Turner) 0.07 0.5279 0.5675 0.6064 0.6443 0.6808 0.7157 0.7486 0.7794 0.8078 0.8340 0.8577 0.8790 0.8980 0.9147 0.9292 0.9418 0.9525 0.9616 0.9693 0.9756 0.9808 0.9850 0.9884 0.9911 0.9932 0.9949 0.9962 0.9972 0.9979 0.9985 0.9989 0.08 0.5319 0.5714 0.6103 0.6480 0.6844 0.7190 0.7517 0.7823 0.8106 0.8365 0.8599 0.8810 0.8997 0.9162 0.9306 0.9429 0.9535 0.9625 0.9699 0.9761 0.9812 0.9854 0.9887 0.9913 0.9934 0.9951 0.9963 0.9973 0.9980 0.9986 0.9990 0.09 0.5359 0.5753 0.6141 0.6517 0.6879 0.7224 0.7549 0.7852 0.8133 0.8389 0.8621 0.8830 0.9015 0.9177 0.9319 0.9441 0.9545 0.9633 0.9706 0.9767 0.9817 0.9857 0.9890 0.9916 0.9936 0.9952 0.9964 0.9974 0.9981 0.9986 0.9990 Normal Distributions Copyright 2010 - ASU School of Mathematical and Statistical Sciences (Terry Turner) Page 8 of 8