Download STAT-UB.0103 NOTES for 2012.FEB.29 Let`s note some interesting

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
STAT-UB.0103
NOTES for 2012.FEB.29
Let’s note some interesting things you can do in Minitab with regard to continuous
distributions.
The command Graph ⇒ Probability Distribution Plot will allow you to see various
probability densities. Here, for example, is the density of the standard normal:
Distribution Plot
Normal, Mean=0, StDev=1
0.4
Density
0.3
0.2
0.1
0.0
-3
-2
-1
0
X
1
2
3
For the sake of comparison, here’s the normal with μ = 50 and σ = 10:
Distribution Plot
Normal, Mean=50, StDev=10
0.04
Density
0.03
0.02
0.01
0.00
20
30
40
50
X
60
70
80
This should clue us in to the fact that the general normal is just a rescaled version of the
standard normal.
1
The Graph ⇒ Probability Distribution Plot feature will also make probability
histograms for discrete distributions. You can even mix discrete and continuous. For
instance, it’s interesting to see together binomial (n = 100, p = 0.50) and normal
(μ = 50, σ = 5).
You can also put approximating normal curves (densities) on data histograms. Here are
plots related to MONET2010.MTW. The first shows a plot of the actual prices, and on
this the approximating normal is terrible:
Histogram of Price (US$)
Normal
Mean 3089996
StDev 4311260
N
430
140
120
Frequency
100
80
60
40
20
0
-6000000
0
6000000 12000000 18000000 24000000 30000000
Price (US$)
This one is of the base-e logarithms of the price:
Histogram of ln (US$)
Normal
80
Mean 14.15
StDev 1.350
N
430
70
Frequency
60
50
40
30
20
10
0
10.5
12.0
13.5
ln (US$)
15.0
16.5
The approximation is still bad, but it’s so much better than the previous.
2
The approximating distribution is normal by default, but you can get other choices by
invoking Graph ⇒ Histogram ⇒ With Fit ⇒ Data View. For this example, “largest
extreme value” seems to fit well.
We will have other methods for assessing whether data might be considered
approximately normal.
See the pamphlet on Normal Distribution for two sections that deal with use of the
normal distribution in finding probabilities. The captions are
Applications of the Normal Distribution (1)
EXAMPLES ON NORMAL DISTRIBUTION (2)
There is a critical relationship between the general normal random variable and the
standard normal random variable. If X follows a normal distribution with mean μ and
X −μ
with standard deviation σ, then
follows a standard normal distribution. In
σ
X −μ
symbols, we’ll express this as Z =
.
σ
Here’s a simple version of this. Suppose that the fill amounts for a coffee vending
machine have a mean of 11.2 oz and a standard deviation of 0.45 oz. What is the
probability that a single 12 oz coffee cup will overflow?
To solve this, let X be the random amount that goes into a cup, and assume that X has, at
least approximately, a normal distribution. The question asks P[ X > 12 ]. Here’s how
the work proceeds:
12.0 − 11.2 ⎤
⎡ X − 11.2
>
≈ P[ Z > 1.78 ]
P[ X > 12.0 ] = P ⎢
0.45 ⎥⎦
⎣ 0.45
12.0 − 11.2
to 1.78.
0.45
Given the structure of the printed normal table, it’s reasonable to round to
two figures after the decimal point. The ≈ might also be appropriate if
you are saying that the normal distribution is an approximation.
The ≈ here represents the rounding of the fraction
At this point, it’s a table look-up problem.
P[ Z > 1.78 ] = 0.50 – P[ 0 ≤ Z ≤ 1.78 ] = 0.50 – 0.4625 = 0.0375
3
Here’s another problem. This was not covered in class.
Suppose that the weights of pumpkins at a certain farm are normally distributed, at least
approximately, with mean weight 18.2 lbs and standard deviation 4.6 lbs. About what
proportion of the pumpkins weight more than 25 lbs.?
NOTE: An equivalent version of this question goes as follows. Suppose
that a pumpkin is selected at random. What is the probability that
its weight will exceed 25 lbs.?
Let X be the weight of a randomly selected pumpkin. (We’re not quite sure what it
means to randomly select a pumpkin, but we’ll put that aside for now.) Then
25 − 18.2 ⎤
⎡ X − 18.2
>
P[ X > 25 ] = P ⎢
≈ P[ Z > 1.48 ]
4.6 ⎥⎦
⎣ 4.6
= 0.50 - P[ 0 ≤ Z ≤ 1.48 ] = 0.50 - 0.4306 = 0.0694 ≈ 7%
About 7% of the pumpkins will weigh more than 25 lbs.
Let’s examine the last two examples from handout on normal distribution (2). We’ll start
by doing an example logically equivalent to EXAMPLE 5.
Note: This is an unusual example (and not very useful), and it was not done in class.
EXAMPLE: You have been told that the mean score on a reading test for fourth-grade
children in a certain district is 122.4. However, you also observe that 20% of the children
fall below the mandated threshold of 110. Assuming approximate normal distributions,
what is the standard deviation of the scores?
SOLUTION: Let X be the score of a random child. We know that the mean is 122.4,
but the standard deviation must be the unknown symbol σ. We also know that
P[ X < 110] = 0.20. The only thing we know how to do is standardize.
P[ X < 110 ] = P
LM X − 122.4 < 110 − 122.4 OP = PLMZ < −12.4 OP
σ
N σ
Q N σ Q
want
= 0.20
The normal table gives us P[ Z < -0.84 ] = 0.20.
Actually, the fact we get is P[0 ≤ Z ≤ 0.84] = 0.30, and we infer the above.
Thus, we solve -0.84 =
−12.4
12.4
to get σ =
≈ 14.8.
σ
0.84
4
This next item is on “fill” amounts.
EXAMPLE: Suppose that the “fill” amount for cans of peaches is normally distributed
with mean 16.3 ounces and standard deviation 0.14 ounce. What is the probability that a
single can will have an amount below 16.0 ounces?
SOLUTION: Use X for the (random) amount in the can, μ = 16.3, σ = 0.14. Use Z for
the standardized version. Then
⎡ X − 16.3 16.0 − 16.3 ⎤
P[ X < 16.0] = P ⎢
<
⎥⎦ ≈ P[ Z < −2.14] = 0.0162
0.14
⎣ 0.14
We’d get the same answer if the question said “What proportion of the cans will
have.....”.
We can also have examples of the reverse character. Suppose, for example, that a
machine that loads bags of potato chips dispenses a random amount X. Let’s suppose
that this random X is approximately normally distributed with a mean of μ = 1.87 oz and
with a standard deviation of σ = 0.08 oz. What label should be placed on the bag so that
only 10% are underweight?
Suppose that w is the weight to go on the bag. This is our decision variable; it’s what we
have to decide. If X is the random amount going into the bag, the required condition is
want
P[ X < w ] ≤ 0.10
It appears that this w will have to be below the mean 1.87 oz.
Let’s set up this problem at the margin. That is, let’s solve
want
P[ X < w ] = 0.10
This solution should give us just what we want.
The only thing we know how to do is standardize. So we proceed . . .
w − 1.87 ⎤
w − 1.87 ⎤
⎡
⎡ X − 1.87
P[ X < w ] = P ⎢
P
Z
= 0.10
<
<
=
⎢⎣
0.08 ⎥⎦
0.08 ⎥⎦
⎣ 0.08
5
Suppose that we could find (value) so that P[ Z < (value) ] = 0.10. That is, we search for
the cutoff in this picture:
Distribution Plot
Normal, Mean=0, StDev=1
0.4
Density
0.3
0.2
0.1
0.1
0.0
0
X
Let’s embellish this picture a bit:
Distribution Plot
Normal, Mean=0, StDev=1
0.4
Density
0.3
0.2
0.1
0.1
A
B
B
0.0
A
0
X
The two regions marked A have the same probability content, here 0.10.
The two regions marked B have the same probability content.
It happens that A + B = 0.50. Thus each B is 0.40. The normal table corresponds to
the B on the right.
6
Our problem now becomes finding the cutoff noted here:
Distribution Plot
Normal, Mean=0, StDev=1
0.4
Density
0.3
0.2
0.1
0.4
0.1
A
B
B
0.0
A
0
X
We can do this by searching in the body of the normal table for the value 0.4000. The
closest is at 1.28. Thus the arrow points to 1.28 and the relevant left-side cutoff must
be -1.28.
These pictures were made in Minitab, using Graph ⇒ Probability
Distribution Plot ⇒ View Probability ⇒ OK ⇒ Shaded Area. The
drawing tools were used to insert the lines and the text.
w − 1.87 ⎤
⎡
Finally we match P[ Z < -1.28 ] = 0.10 to P ⎢ Z <
= 0.10. The solution
0.08 ⎥⎦
⎣
occurs for
w − 1.87
= − 1.28
0.08
which is w = 1.7676. We’ll probably end up labeling the bags with 1.77 oz.
7
By the way, Minitab would have solved this problem rather easily. Do
Calc ⇒ Probability Distributions ⇒ Normal
and then set up the information panel like this:
Be very careful. Observe that Inverse cumulative probability has been used.
The result is this:
Inverse Cumulative Distribution Function
Normal with mean = 1.87 and standard deviation = 0.08
P( X <= x )
x
0.1 1.76748
8