Download Math 104 Handout: January 28, 2010 - Alex Kasman

Math 104 Handout: January 28, 2010 Alex Kasman College of Charleston Key Ideas: Normal Distributions (Continuing from Where We Left Off) • Last time we learned how to use our calculator and the Empirical Rule to compute probabilities associated to a normal distribution. Here is an example: Question 1: The length (in inches) of adult western rattlesnakes is a normally distributed variable with mean 42 and standard deviation 2. What is the probability that a randomly selected adult western rattlesnakes is between 38 and 46 inches in length? If an adult western rattlesnake is selected at random, what is the probability that it is between 40 and 44 inches in length? What is the probability that it is exactly 42 inches in length? • Solution: Since 38 is 2 standard deviations below the mean, its z -score is −2. Similarly, 46 is the same as z = 2. The Empirical Rule then tells us that the probability that a rattlesnake is between these lengths is .95. (Alternatively, we could compute this with our calculator by typing normalcdf(38,46,42,2) and we get 0.954499876. This is actually more accurate than what the Empirical rule gave, but once you’ve memorized it the Empirical Rule is easier!) Now, since 40 and 44 are only one standard deviation above and below the mean, the probability of finding a snake of these lengths is only .68. (The calculator agrees: normalcdf(40,44,42,2)=0.6826894809.) The probability of finding a snake whose length is exaclty 42 inches is ZERO because there is no area between x = 42 and x = 42. This last answer is one of the quirks of continuous distributions. Only intervals have positive probability. This makes sense when you realize that a supposedly 42 inch snake is probably a tiny bit longer or shorter than exactly 42 inches. If you plan to round off the length to the nearest quarter inch anyway, then it would be more reasonable to look at the probability that 41.75 ≤ x ≤ 42.25! Note also that in the case of the snakes, the probability can also be interpreted as a percentage of the population! (That is more statistics than probability, but it is true.) That is, 68% of the snakes are between 40 and 44 inches, and that is why the probability of selecting such a snake is .68. • The Area to the Left of z -Score and Percentile Rank: If you know the area under the standard normal curve to the left of z = a, then you know the probability of winding up with a z -score less than a. If we think in terms of a random selection from a population, this tells us the percentile ranking of someone with a z -score of a. For instance, recall that the lengths of adult western rattlesnakes is a normally distributed variable with µ = 42 inches and σ = 2 inches. Then a rattler who is 40 inches long has a z -score of z = −1. The area to the left of z = −1 is 0.1587. This is the probability of picking a rattler of length less than or equal to one, and it is also the statement that a 40 inch rattler is bigger than 15.87% of the rest of its species. • Finding Areas of the Standard Normal Distribution with Tables: You can use Table A at the end of your book to determine the area on the graph of the standard normal distribution to the left of z = a if a is a number between −3.49 and 3.49 having only two places after the decimal point. Suppose a = c.de where d and e are digits and c is a whole number between −3 and 3. You look in the row corresponding to c.d and the column corresponding to .0e to get the area. To get the area between z = a and z = b you look up the values for both a and b and subtract the smaller area from the larger area. To get the area to the right of z = a you can either look up the area to the left of z = −a or find the area to the left of a and subtract it from one. If a is directly between two values on the table, then you’ll want to use the average of the corresponding areas as the value. What are the advantages/disadvantages of using the table? If you want the area to the left of some z -score, then this is exactly what it gives. Moreover, in the “old days”, it was really the only way. Perhaps this is “quaint”, but it is certainly less convenient than using a calculator. (Unless you just happen to have a table and no calculator.) Personally, I’d be happy to teach this class without the table at all. However , I know that other departments at the college (many in social sciences) expect students to be able to use the table, and so I will be teaching and testing its use as well. Question 2: A bottling plant has a machine that fills 2 liter soda bottles. But, it is not possible for it to give exactly two liters each time. Instead, the amount that the bottles are actually filled to is a normally distributed random variable. The machine is set so that the mean is µ = 2 liters. However, there is a standard deviation of .03 liters. A filled bottle is rejected if it has less than 1.9 liters. What percentage of bottles filled by this machine are rejected? • Solution: We need to work out the z -score for x = 1.9 and then find the area of the region under the bell curve to the left of it. (You should draw a picture to illustrate this.) The z -score is (1.9−2)/.03 = −3.3333. Now, I will either use my calculator or the table. This is easy to look up on the table: “The table says that the area to the left of z = −3.33 is .0004.” (Alternatively, we could get this same answer as normalcdf(1.7,1.9,2,.03)= 4.2911 × 10−4. Note that here I used 1.7 = 2 − 10 × .03 as the left endpoint since it is far enough to the left that it can be treated like −∞.) Since the table only gives the area to the left of a given z -score, you have to do some thinking and perhaps some arithmetic in order to find the probability that a random variable takes a value between two endpoints or to the right of an endpoint using the table. Question 3: The length of human pregnancies from conception to birth varies according to a distribution that is approximately normal with mean 266 days and standard deviation 16 days. Sonya is an important corporate executive who knows exactly when she got pregnant (I’ll leave the explanation for that up to your imagination) but wants to plan her business travel for the next year. Her expected due date (266 days from conception) is August 14, 2006. If Sonya schedules a two week vacation from August 6 (day 258) to August 22 (day 274), what is the probability that the birth will actually take place during that time? • Solution: To do this with the table, we first need to find the appropriate z -scores. They are z = (258 − 266)/16 = −.5 and z = (274 − 266)/16 = .5. We need to find the area of the region between these values. (You should draw a simple picture to illustrate this.) According to the table, the area to the left of z = −.5 is .3085 and the are to the left of z = .5 is .6915, so the area between them is .6915 − .3085 = .3830. So, the probability is about .38. With the calculator, of course, this is much easier. We simply say normalcdf(258,274,266,16) and find .3829249356...easier and more accurate. (But, you need to learn to use the table also, if only because your other courses may require it.) Question 4: Continuing with Sonya’s pregnancy: What is the probability that the baby will be born after August 28 (day 280)? • Solution: With a calculator, this is easy again. I just need a number big enough that it can be used as if it were +∞ for the right endpoint. I will use µ + 10σ = 266 + 160 = 426 and type normalcdf(280,426,266,16) which gives me .1907869099. With a table, this one is really tricky. The z -score is (280 − 266)/16 = .875. This time, however, we want the area to the right of that. (Draw a figure to illustrate.) If we want to use the table, there are two alternatives. We can look up the area to the left of z = −.875 (which is the same). Uh oh, we also have to worry about the fact that −.875 itself is not on the table...it is between two values that are! The area to the left of z = −.87 is .1922 and the area to the left of z = −.88 is .1894. The average of these is (.1894 + .1922)/2 = .1908, which is what we would get by rounding off the answer obtained from the calculator. We could also do this by looking up z = .87 and .88 and averaging the two areas. But, since this would give us the area to the left of z = .875 and we want the area to the right, we would have to subtract the result from one. • Reversing the Procedure: Sometimes, rather than needing to know the probability associated to a certain value of the random variable, you need to know the value associated to a certain probability. Perhaps this makes more sense when you think of it in terms of percentages of a population, which is equivalent. Then we refer to “percentile rankings” such as you may know from taking standardized tests. For example: Question 5: The distribution of heights of American women aged 18 to 24 is approximately normally distributed with mean 65.5 inches and standard deviation 2.5 inches. If a woman is in the 95th percentile, that means she is taller than 95% of the population. How tall would such a woman be? • • • Answer from the Table: Note that this is the same as asking what x would satisfy P (x < X) = .9. We can look up the corresponding z -score on the table. Note that the probability for z = 1.64 is .9495 and the probability for z = 1.65 is .9505. So, the z -score corresponding to a probability of .9500 would be somewhere between them or about z = (1.64 + 1.65)/2 = 1.645. We can then turn this into a height using the formula x = µ + zσ = 65.5 + 1.645 × 2.5 = 69.6125. Answer from the Calculator: As you might guess, there is an easier way to do this with the calculator. We use the function invNorm(p,µ,σ) (selected also from the DISTR menu) to find the value of x which satisfies P (x < X) = p for a random variable X that is normally distributed with mean µ and standard deviation σ . (There is no need to deal with z -scores here...the calculator does that work for you.) Here we fine invNorm(.95,65.5,2.5)= 69.61213406. Notice that this is is not exactly the same as what we found above when rounded to four places...but in fact this answer is more accurate! You have to do some work using either method if you want to find the range of corresponding to a certain probability that is centered at the mean! x values Question 6: We know from the Empirical Rule that 68% of American women have heights betwen 65.5 − 2.5 = 63 inches and 65.5 + 2.5 = 68 inches. Between what two heights (equally spaced above and below the average height of 65.5 inches) does 80% of the population lie? • Answer: Basically, what we want to do here is start with the normal distribution for heights of American women and chop off the tails in a symmetric way so that the area of what remains is exactly .8. Note that we have to cut off an area of exactly .2 to do this. Since each tail would have the same size, each of the two tails would have half of this area, or an area of .1. So, what I want to do is determine what x value has an area of .1 to its left and what value of x has an area of .1 ot its right. The answer will be the interval in between these two values. We can do that either with the calculator or the table. invNormal(.1,65.5,2.5) Using the calculator, we just compute and invNormal(.9,65.5,2.5) (because if .1 is to the right then .9 is to the left and invNormal always thinks about the area to the left). These are 62.29612108 and 68.7038782. So, 80% of American women have heights between 62.3 and 68.7 inches. With the table, we look up the area .1000 (which would be in the body of the table, not the margin). I see that z = −1.28 has area .1003 to the left...that’s pretty close so I think I will just use that z -score. Thus, the left endpoint will be z = µ + zσ = 65.5 − 1.28 × 2.5 = 62.3 (same answer!). I do not need to look anything up to get the right endpoint since (by symmetry) it will simply be the same number of standard deviations to the right rather than the left of the mean. In other words, it will be z = +1.28 and the right endpoint is z = µ + zσ = 65.5 + 1.28 × 2.5 = 68.7 Homework Read: Read Section 6.2 in the book. Do: Answer the questions in the online homework assignment “Section 6.2 (Table)”. It is due on Tuesday at 5PM. So, try to do it right away and ask me about it in class next time if there are any problems. Reminder: Our first test is coming up on February 4th (a week from today). The test will cover Chapter 5 and Sections 6.1-6.2.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Math 104 Handout: January 28, 2010 - Alex Kasman