Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STAT-UB.0103 NOTES for 2012.FEB.29 Let’s note some interesting things you can do in Minitab with regard to continuous distributions. The command Graph ⇒ Probability Distribution Plot will allow you to see various probability densities. Here, for example, is the density of the standard normal: Distribution Plot Normal, Mean=0, StDev=1 0.4 Density 0.3 0.2 0.1 0.0 -3 -2 -1 0 X 1 2 3 For the sake of comparison, here’s the normal with μ = 50 and σ = 10: Distribution Plot Normal, Mean=50, StDev=10 0.04 Density 0.03 0.02 0.01 0.00 20 30 40 50 X 60 70 80 This should clue us in to the fact that the general normal is just a rescaled version of the standard normal. 1 The Graph ⇒ Probability Distribution Plot feature will also make probability histograms for discrete distributions. You can even mix discrete and continuous. For instance, it’s interesting to see together binomial (n = 100, p = 0.50) and normal (μ = 50, σ = 5). You can also put approximating normal curves (densities) on data histograms. Here are plots related to MONET2010.MTW. The first shows a plot of the actual prices, and on this the approximating normal is terrible: Histogram of Price (US$) Normal Mean 3089996 StDev 4311260 N 430 140 120 Frequency 100 80 60 40 20 0 -6000000 0 6000000 12000000 18000000 24000000 30000000 Price (US$) This one is of the base-e logarithms of the price: Histogram of ln (US$) Normal 80 Mean 14.15 StDev 1.350 N 430 70 Frequency 60 50 40 30 20 10 0 10.5 12.0 13.5 ln (US$) 15.0 16.5 The approximation is still bad, but it’s so much better than the previous. 2 The approximating distribution is normal by default, but you can get other choices by invoking Graph ⇒ Histogram ⇒ With Fit ⇒ Data View. For this example, “largest extreme value” seems to fit well. We will have other methods for assessing whether data might be considered approximately normal. See the pamphlet on Normal Distribution for two sections that deal with use of the normal distribution in finding probabilities. The captions are Applications of the Normal Distribution (1) EXAMPLES ON NORMAL DISTRIBUTION (2) There is a critical relationship between the general normal random variable and the standard normal random variable. If X follows a normal distribution with mean μ and X −μ with standard deviation σ, then follows a standard normal distribution. In σ X −μ symbols, we’ll express this as Z = . σ Here’s a simple version of this. Suppose that the fill amounts for a coffee vending machine have a mean of 11.2 oz and a standard deviation of 0.45 oz. What is the probability that a single 12 oz coffee cup will overflow? To solve this, let X be the random amount that goes into a cup, and assume that X has, at least approximately, a normal distribution. The question asks P[ X > 12 ]. Here’s how the work proceeds: 12.0 − 11.2 ⎤ ⎡ X − 11.2 > ≈ P[ Z > 1.78 ] P[ X > 12.0 ] = P ⎢ 0.45 ⎥⎦ ⎣ 0.45 12.0 − 11.2 to 1.78. 0.45 Given the structure of the printed normal table, it’s reasonable to round to two figures after the decimal point. The ≈ might also be appropriate if you are saying that the normal distribution is an approximation. The ≈ here represents the rounding of the fraction At this point, it’s a table look-up problem. P[ Z > 1.78 ] = 0.50 – P[ 0 ≤ Z ≤ 1.78 ] = 0.50 – 0.4625 = 0.0375 3 Here’s another problem. This was not covered in class. Suppose that the weights of pumpkins at a certain farm are normally distributed, at least approximately, with mean weight 18.2 lbs and standard deviation 4.6 lbs. About what proportion of the pumpkins weight more than 25 lbs.? NOTE: An equivalent version of this question goes as follows. Suppose that a pumpkin is selected at random. What is the probability that its weight will exceed 25 lbs.? Let X be the weight of a randomly selected pumpkin. (We’re not quite sure what it means to randomly select a pumpkin, but we’ll put that aside for now.) Then 25 − 18.2 ⎤ ⎡ X − 18.2 > P[ X > 25 ] = P ⎢ ≈ P[ Z > 1.48 ] 4.6 ⎥⎦ ⎣ 4.6 = 0.50 - P[ 0 ≤ Z ≤ 1.48 ] = 0.50 - 0.4306 = 0.0694 ≈ 7% About 7% of the pumpkins will weigh more than 25 lbs. Let’s examine the last two examples from handout on normal distribution (2). We’ll start by doing an example logically equivalent to EXAMPLE 5. Note: This is an unusual example (and not very useful), and it was not done in class. EXAMPLE: You have been told that the mean score on a reading test for fourth-grade children in a certain district is 122.4. However, you also observe that 20% of the children fall below the mandated threshold of 110. Assuming approximate normal distributions, what is the standard deviation of the scores? SOLUTION: Let X be the score of a random child. We know that the mean is 122.4, but the standard deviation must be the unknown symbol σ. We also know that P[ X < 110] = 0.20. The only thing we know how to do is standardize. P[ X < 110 ] = P LM X − 122.4 < 110 − 122.4 OP = PLMZ < −12.4 OP σ N σ Q N σ Q want = 0.20 The normal table gives us P[ Z < -0.84 ] = 0.20. Actually, the fact we get is P[0 ≤ Z ≤ 0.84] = 0.30, and we infer the above. Thus, we solve -0.84 = −12.4 12.4 to get σ = ≈ 14.8. σ 0.84 4 This next item is on “fill” amounts. EXAMPLE: Suppose that the “fill” amount for cans of peaches is normally distributed with mean 16.3 ounces and standard deviation 0.14 ounce. What is the probability that a single can will have an amount below 16.0 ounces? SOLUTION: Use X for the (random) amount in the can, μ = 16.3, σ = 0.14. Use Z for the standardized version. Then ⎡ X − 16.3 16.0 − 16.3 ⎤ P[ X < 16.0] = P ⎢ < ⎥⎦ ≈ P[ Z < −2.14] = 0.0162 0.14 ⎣ 0.14 We’d get the same answer if the question said “What proportion of the cans will have.....”. We can also have examples of the reverse character. Suppose, for example, that a machine that loads bags of potato chips dispenses a random amount X. Let’s suppose that this random X is approximately normally distributed with a mean of μ = 1.87 oz and with a standard deviation of σ = 0.08 oz. What label should be placed on the bag so that only 10% are underweight? Suppose that w is the weight to go on the bag. This is our decision variable; it’s what we have to decide. If X is the random amount going into the bag, the required condition is want P[ X < w ] ≤ 0.10 It appears that this w will have to be below the mean 1.87 oz. Let’s set up this problem at the margin. That is, let’s solve want P[ X < w ] = 0.10 This solution should give us just what we want. The only thing we know how to do is standardize. So we proceed . . . w − 1.87 ⎤ w − 1.87 ⎤ ⎡ ⎡ X − 1.87 P[ X < w ] = P ⎢ P Z = 0.10 < < = ⎢⎣ 0.08 ⎥⎦ 0.08 ⎥⎦ ⎣ 0.08 5 Suppose that we could find (value) so that P[ Z < (value) ] = 0.10. That is, we search for the cutoff in this picture: Distribution Plot Normal, Mean=0, StDev=1 0.4 Density 0.3 0.2 0.1 0.1 0.0 0 X Let’s embellish this picture a bit: Distribution Plot Normal, Mean=0, StDev=1 0.4 Density 0.3 0.2 0.1 0.1 A B B 0.0 A 0 X The two regions marked A have the same probability content, here 0.10. The two regions marked B have the same probability content. It happens that A + B = 0.50. Thus each B is 0.40. The normal table corresponds to the B on the right. 6 Our problem now becomes finding the cutoff noted here: Distribution Plot Normal, Mean=0, StDev=1 0.4 Density 0.3 0.2 0.1 0.4 0.1 A B B 0.0 A 0 X We can do this by searching in the body of the normal table for the value 0.4000. The closest is at 1.28. Thus the arrow points to 1.28 and the relevant left-side cutoff must be -1.28. These pictures were made in Minitab, using Graph ⇒ Probability Distribution Plot ⇒ View Probability ⇒ OK ⇒ Shaded Area. The drawing tools were used to insert the lines and the text. w − 1.87 ⎤ ⎡ Finally we match P[ Z < -1.28 ] = 0.10 to P ⎢ Z < = 0.10. The solution 0.08 ⎥⎦ ⎣ occurs for w − 1.87 = − 1.28 0.08 which is w = 1.7676. We’ll probably end up labeling the bags with 1.77 oz. 7 By the way, Minitab would have solved this problem rather easily. Do Calc ⇒ Probability Distributions ⇒ Normal and then set up the information panel like this: Be very careful. Observe that Inverse cumulative probability has been used. The result is this: Inverse Cumulative Distribution Function Normal with mean = 1.87 and standard deviation = 0.08 P( X <= x ) x 0.1 1.76748 8