Download Math 104 Handout: January 28, 2010 - Alex Kasman

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Law of large numbers wikipedia , lookup

Probability amplitude wikipedia , lookup

Transcript
Math 104 Handout: January 28, 2010
Alex Kasman
College of Charleston
Key Ideas: Normal Distributions (Continuing from Where We Left Off)
•
Last time we learned how to use our calculator and the Empirical Rule to compute
probabilities associated to a normal distribution. Here is an example:
Question 1: The length (in inches) of adult western rattlesnakes is a normally distributed variable
with mean 42 and standard deviation 2. What is the probability that a randomly selected
adult western rattlesnakes is between 38 and 46 inches in length? If an adult western
rattlesnake is selected at random, what is the probability that it is between 40 and 44 inches
in length? What is the probability that it is exactly 42 inches in length?
•
Solution: Since 38 is 2 standard deviations below the mean, its z -score is −2. Similarly, 46
is the same as z = 2. The Empirical Rule then tells us that the probability that a rattlesnake
is between these lengths is .95. (Alternatively, we could compute this with our calculator
by typing normalcdf(38,46,42,2) and we get 0.954499876. This is actually more
accurate than what the Empirical rule gave, but once you’ve memorized it the Empirical Rule
is easier!)
Now, since 40 and 44 are only one standard deviation above and below the mean,
the probability of finding a snake of these lengths is only .68. (The calculator agrees:
normalcdf(40,44,42,2)=0.6826894809.)
The probability of finding a snake whose length is exaclty 42 inches is ZERO because there
is no area between x = 42 and x = 42.
This last answer is one of the quirks of continuous distributions. Only intervals have positive
probability. This makes sense when you realize that a supposedly 42 inch snake is probably
a tiny bit longer or shorter than exactly 42 inches. If you plan to round off the length to the
nearest quarter inch anyway, then it would be more reasonable to look at the probability that
41.75 ≤ x ≤ 42.25!
Note also that in the case of the snakes, the probability can also be interpreted as a
percentage of the population! (That is more statistics than probability, but it is true.) That is,
68% of the snakes are between 40 and 44 inches, and that is why the probability of selecting
such a snake is .68.
•
The Area to the Left of z -Score and Percentile Rank: If you know the area under the standard
normal curve to the left of z = a, then you know the probability of winding up with a z -score
less than a. If we think in terms of a random selection from a population, this tells us the
percentile ranking of someone with a z -score of a. For instance, recall that the lengths of
adult western rattlesnakes is a normally distributed variable with µ = 42 inches and σ = 2
inches. Then a rattler who is 40 inches long has a z -score of z = −1. The area to the left
of z = −1 is 0.1587. This is the probability of picking a rattler of length less than or equal
to one, and it is also the statement that a 40 inch rattler is bigger than 15.87% of the rest of
its species.
• Finding Areas of the Standard Normal Distribution with Tables:
You can use
Table A at the end of your book to determine the area on the graph of the standard normal
distribution to the left of z = a if a is a number between −3.49 and 3.49 having only two
places after the decimal point. Suppose a = c.de where d and e are digits and c is a
whole number between −3 and 3. You look in the row corresponding to c.d and the column
corresponding to .0e to get the area. To get the area between z = a and z = b you look
up the values for both a and b and subtract the smaller area from the larger area. To get the
area to the right of z = a you can either look up the area to the left of z = −a or find the
area to the left of a and subtract it from one. If a is directly between two values on the table,
then you’ll want to use the average of the corresponding areas as the value.
What are the advantages/disadvantages of using the table? If you want the area to the left of
some z -score, then this is exactly what it gives. Moreover, in the “old days”, it was really the
only way. Perhaps this is “quaint”, but it is certainly less convenient than using a calculator.
(Unless you just happen to have a table and no calculator.) Personally, I’d be happy to teach
this class without the table at all. However , I know that other departments at the college
(many in social sciences) expect students to be able to use the table, and so I will be teaching
and testing its use as well.
Question 2: A bottling plant has a machine that fills 2 liter soda bottles. But, it is not possible for
it to give exactly two liters each time. Instead, the amount that the bottles are actually filled
to is a normally distributed random variable. The machine is set so that the mean is µ = 2
liters. However, there is a standard deviation of .03 liters. A filled bottle is rejected if it has
less than 1.9 liters. What percentage of bottles filled by this machine are rejected?
• Solution: We need to work out the z -score for x = 1.9 and then find the area of the region
under the bell curve to the left of it. (You should draw a picture to illustrate this.) The z -score
is (1.9−2)/.03 = −3.3333. Now, I will either use my calculator or the table. This is easy
to look up on the table: “The table says that the area to the left of z = −3.33 is .0004.”
(Alternatively, we could get this same answer as normalcdf(1.7,1.9,2,.03)=
4.2911 × 10−4. Note that here I used 1.7 = 2 − 10 × .03 as the left endpoint
since it is far enough to the left that it can be treated like −∞.)
Since the table only gives the area to the left of a given z -score, you have to do some thinking
and perhaps some arithmetic in order to find the probability that a random variable takes a
value between two endpoints or to the right of an endpoint using the table.
Question 3: The length of human pregnancies from conception to birth varies according to a
distribution that is approximately normal with mean 266 days and standard deviation 16 days.
Sonya is an important corporate executive who knows exactly when she got pregnant (I’ll
leave the explanation for that up to your imagination) but wants to plan her business travel
for the next year. Her expected due date (266 days from conception) is August 14, 2006. If
Sonya schedules a two week vacation from August 6 (day 258) to August 22 (day 274), what
is the probability that the birth will actually take place during that time?
• Solution: To do this with the table, we first need to find the appropriate z -scores. They are
z = (258 − 266)/16 = −.5 and z = (274 − 266)/16 = .5. We need to find
the area of the region between these values. (You should draw a simple picture to illustrate
this.) According to the table, the area to the left of z = −.5 is .3085 and the are to the
left of z = .5 is .6915, so the area between them is .6915 − .3085 = .3830. So, the
probability is about .38. With the calculator, of course, this is much easier. We simply say
normalcdf(258,274,266,16) and find .3829249356...easier and more accurate.
(But, you need to learn to use the table also, if only because your other courses may require
it.)
Question 4: Continuing with Sonya’s pregnancy: What is the probability that the baby will be
born after August 28 (day 280)?
•
Solution: With a calculator, this is easy again. I just need a number big enough that it can be
used as if it were +∞ for the right endpoint. I will use µ + 10σ = 266 + 160 = 426 and
type normalcdf(280,426,266,16) which gives me .1907869099. With a table,
this one is really tricky. The z -score is (280 − 266)/16 = .875. This time, however, we
want the area to the right of that. (Draw a figure to illustrate.) If we want to use the table,
there are two alternatives. We can look up the area to the left of z = −.875 (which is the
same). Uh oh, we also have to worry about the fact that −.875 itself is not on the table...it is
between two values that are! The area to the left of z = −.87 is .1922 and the area to the
left of z = −.88 is .1894. The average of these is (.1894 + .1922)/2 = .1908, which
is what we would get by rounding off the answer obtained from the calculator. We could also
do this by looking up z = .87 and .88 and averaging the two areas. But, since this would
give us the area to the left of z = .875 and we want the area to the right, we would have to
subtract the result from one.
• Reversing the Procedure:
Sometimes, rather than needing to know the probability
associated to a certain value of the random variable, you need to know the value associated
to a certain probability. Perhaps this makes more sense when you think of it in terms of
percentages of a population, which is equivalent. Then we refer to “percentile rankings” such
as you may know from taking standardized tests. For example:
Question 5: The distribution of heights of American women aged 18 to 24 is approximately
normally distributed with mean 65.5 inches and standard deviation 2.5 inches. If a woman is
in the 95th percentile, that means she is taller than 95% of the population. How tall would
such a woman be?
•
•
•
Answer from the Table: Note that this is the same as asking what x would satisfy
P (x < X) = .9. We can look up the corresponding z -score on the table. Note that
the probability for z = 1.64 is .9495 and the probability for z = 1.65 is .9505. So,
the z -score corresponding to a probability of .9500 would be somewhere between them or
about z = (1.64 + 1.65)/2 = 1.645. We can then turn this into a height using the
formula x = µ + zσ = 65.5 + 1.645 × 2.5 = 69.6125.
Answer from the Calculator: As you might guess, there is an easier way to do this with
the calculator. We use the function invNorm(p,µ,σ) (selected also from the DISTR
menu) to find the value of x which satisfies P (x < X) = p for a random variable
X that is normally distributed with mean µ and standard deviation σ . (There is no
need to deal with z -scores here...the calculator does that work for you.) Here we fine
invNorm(.95,65.5,2.5)= 69.61213406. Notice that this is is not exactly the
same as what we found above when rounded to four places...but in fact this answer is more
accurate!
You have to do some work using either method if you want to find the range of
corresponding to a certain probability that is centered at the mean!
x values
Question 6: We know from the Empirical Rule that 68% of American women have heights
betwen 65.5 − 2.5 = 63 inches and 65.5 + 2.5 = 68 inches. Between what two
heights (equally spaced above and below the average height of 65.5 inches) does 80% of
the population lie?
•
Answer: Basically, what we want to do here is start with the normal distribution for heights of
American women and chop off the tails in a symmetric way so that the area of what remains
is exactly .8. Note that we have to cut off an area of exactly .2 to do this. Since each tail
would have the same size, each of the two tails would have half of this area, or an area of
.1. So, what I want to do is determine what x value has an area of .1 to its left and what
value of x has an area of .1 ot its right. The answer will be the interval in between these two
values. We can do that either with the calculator or the table.
invNormal(.1,65.5,2.5)
Using
the
calculator,
we
just
compute
and invNormal(.9,65.5,2.5) (because if .1 is to the right then .9 is to the left
and invNormal always thinks about the area to the left). These are 62.29612108 and
68.7038782. So, 80% of American women have heights between 62.3 and 68.7 inches.
With the table, we look up the area .1000 (which would be in the body of the table, not the
margin). I see that z = −1.28 has area .1003 to the left...that’s pretty close so I think I will
just use that z -score. Thus, the left endpoint will be z = µ + zσ = 65.5 − 1.28 × 2.5 =
62.3 (same answer!). I do not need to look anything up to get the right endpoint since
(by symmetry) it will simply be the same number of standard deviations to the right rather
than the left of the mean. In other words, it will be z = +1.28 and the right endpoint is
z = µ + zσ = 65.5 + 1.28 × 2.5 = 68.7
Homework
Read: Read Section 6.2 in the book.
Do: Answer the questions in the online homework assignment “Section 6.2 (Table)”. It is
due on Tuesday at 5PM. So, try to do it right away and ask me about it in class next time if
there are any problems.
Reminder: Our first test is coming up on February 4th (a week from today). The test will cover
Chapter 5 and Sections 6.1-6.2.