* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 6: Normal Distributions
Survey
Document related concepts
Transcript
Chapter 6: Normal Distributions Section 1 2 3 4 Y. Butterworth Title Graphs of Normal Probability Distributions Standard Units & Areas Under Normal Dist. Areas Under Any Normal Curve Normal Approximation to the Binomial Dist. Ch. 6 Notes – Foothill M10 Notes Pages 2–4 5–9 10 – 15 16 – 1 §6.1 Graphs of Normal Probability Distributions The defining formula, called a density function, for all normally distributed random variables is as follows: f(x) = -1/2[(x – μ)/σ]2 e σ √2π μ = mean σ = std. dev. The graph of this distribution has the following characteristics: 1) It is bell-shaped 2) It is symmetric about the vertical line x = μ 3) The x-axis forms a horizontal asymptote (it approaches, but never touches or crosses the x-axis) 4) 5) 6) The inflection points (point where the concavity changes sign) represent σ Area under the curve is equal to 1 There is no area under the curve for a single value, P(X=x) doesn’t have any meaning on a continuous curve. We can only determine the area under the curve for an interval P(X<x) or P(X>x) or P(xs < X < xB). This idea can get messy as we are really talking about a Calculus concept here. Here is a picture of a normal curve. We’ll use it to point out some of the key concepts we are discussing here. The normal density curve changes shape in two ways: 1) Based on its location along the x-axis – change based upon μ 2) Based on its width and height – change based upon σ Let’s look at some example using technology: Example: Y. Butterworth Using your TI-83 or 84 we will plot 3 normal density curve with different mean. Curve 1: μ = 0 & σ = 1 Curve 2: μ = 1 & σ = 1 Curve 3: μ = 2 & σ = 1 a) Find μ – 3.5σ for curve 1 b) Find μ + 3.5σ for curve 3 c) In Y= menu input the following for each curve, so that you have Y1, Y2 & Y3 with equations for curve 1, 2 & 3 Ch. 6 Notes – Foothill M10 2 respectively. DISTR normalpdf(x, μ, σ) Note: The x is key: X,T,θ,n. Make sure equal sign of Y1, Y2 & Y3 are highlighted (if not move cursor over and enter). d) e) f) Example: In WINDOW menu change the x-min to a)’s calculation & x-max to b)’s calculation In ZOOM menu select ZOOMFIT What do we notice? Using the TI-83/84 we will plot 3 normal density curves with different standard deviation Curve 1: μ = 0 & σ = 0.5 Curve 2: μ = 0 & σ = 1.5 Curve 3: μ = 0 & σ = 2.5 a) Find μ – 3.5σ for curve 3 b) Find μ + 3.5σ for curve 3 c) In Y= menu input the following for each curve, so that you have Y1, Y2 & Y3 with equations for curve 1, 2 & 3 respectively. DISTR normalpdf(x, μ, σ) Note: The x is key: X,T,θ,n. Make sure equal sign of Y1, Y2 & Y3 are highlighted (if not move cursor over and enter). d) e) f) In WINDOW menu change the x-min to a)’s calculation & x-max to b)’s calculation In ZOOM menu select ZOOMFIT What do we notice this time? If you need to know which graph is which, use the TRACE and the arrow up key to scroll through them. The cursor will blink on the graph and at the top the function will appear. Note: This is not as easy with EXCEL, as we must create a column with the values of -3.5σ to 3.5σ by increments of 0.2σ. In a second column, we need to create the y-values which are come from pasting the Statistical function: NORMDIST(x, μ, σ, false), the x is the value from the x-column. You will have to replicate this for the 3 curves. Once the data is there you can create a chart using the Chart Wizard. You will use a SCATTERPLOT with smooth , connected points. To input the values you will want to choose SERIES at the first dialogue box and I advise the use of the column/row selection for inputting the data. After entering x & y values click ADD and do the next til you have all 3. I may show my finished example in class. It depends if my MAC version and the PC will play “nice”. Y. Butterworth Ch. 6 Notes – Foothill M10 3 Now, let’s recall the Empirical Rule that we studied earlier. Recall that your book had only loosely referred to this rule and we had developed the entire thing. We will now do a couple of examples that are like those that we did in §3.2 (p. 11 of my notes). This time we will be finding probability based upon the Empirical Rule, because those percentages in the Empirical Rule represent the area under the curve of a normally distributed random variable, and we know the area under the curve represents probability. Empirical Rule 68% of the data will fall within 1 standard deviation of the mean 95% of the data will fall within 2 standard deviations of the mean 99.7% of the data will fall within 3 standard deviations of the mean μ – 2σ μ – σ Example: μ μ + σ μ + 2σ a) The heights of men in the US are normally distributed with a mean of 69 inches and a standard deviation of 2.8 inches. What percentage of men are: Between 63.4 inches and 74.6 inches? b) Taller than 69 inches? c) Shorter than 60.6 inches? Note: To answer these questions we are using the Empirical Rule and finding the probability based upon the number of standard deviations from the mean, in other words, based on the z-score that we discussed in §3.2. Y. Butterworth Ch. 6 Notes – Foothill M10 4 Example: The number of days of gestation for a human is normally distributed with a mean of 268 days and a standard deviation of 15 days. What is the probability of a gestation lasting: a) Between 253 and 283 days? b) Longer than 268 days? c) Less than 223 days? Also in this section is a discussion on control charts. Control charts are a time series chart that plot the values of the mean against the theoretical values for the mean (μ), 2 standard deviations from the mean (μ±2σ) and 3 standard deviations from the mean (μ±3σ). The values used for μ & σ are values that are the “accepted values” (meaning that they may not be known population values, but after many statistical experiments this has come to be the accepted or average mean and the accepted standard deviation) . Control Charts help to establish when data is no longer conforming to an accepted distribution. Why would looking at a control chart help? 1) Anything outside 3 standard deviations is considered a rare event, of which only 0.3% of all data points would lie outside those “bounds” 2) 95% of all data points will lie within 2 standard deviation If we have a violation on 1) then we have to seriously consider whether the distribution is conforming to the accepted or not. The probability that it is a “false alarm” that a data point lies outside 3σ is 0.003, which means it is highly unlikely to be a “false alarm”. If we have too many on one side of the mean or the other that is also an indication that our mean isn’t where it is expected to be, and is also a cause for alarm. Using probability we can determine that if 9 or more consecutive points are either above or below the mean, but still within 2σ, then our probability of a “false alarm” is 0.004, which still means it is highly unlikely to be a false alarm. Finally, if 2 of 3 consecutive points lie between 2σ and 3σ above or below the mean, this is also cause for alarm. Again, it is the percentage of data that we are expecting to see brings serious doubt that we would see 2 or more consecutive data points in this region, thus it is cause for alarm. Probability again tells us that there is a 0.004 chance that it is a “false alarm”. Y. Butterworth Ch. 6 Notes – Foothill M10 5 There is a nice graphic on p. 242 of your book that shows the Out-of-Control Signals for a control chart. Example: Let’s look at #14 on p. 247 The manager of Motel 11 has 316 rooms in Palo Alto, CA. From observation over a long period of time, she knows that on an average night, 268 rooms will be rented. The longterm standard deviation is 12 rooms. This distribution is approximately mound-shaped and symmetrical. a) Night 1 Rooms 234 For 10 consecutive nights, the following numbers of rooms were rented each night: 2 3 4 5 6 7 8 9 10 258 265 271 283 267 290 286 263 240 Step 1: Calculate μ±2σ and μ±3σ Step 2: Draw the time series plot with nights on the x-axis and the values of μ, μ±2σ and μ±3σ on the y-axis 340 320 rooms 300 mean 2 std above 280 2 std below 260 3 std above 3 std below 240 220 200 0 1 2 3 4 Step 3: 5 6 7 8 9 10 11 Are any of the Out-of Control Signals present? Your Turn: You do the same with the part b) data. You can use the chart above to draw the Control Chart. Night 1 2 3 4 5 6 7 8 9 10 Rooms 238 245 261 269 273 250 241 230 215 217 Y. Butterworth Ch. 6 Notes – Foothill M10 6 §6.2 Standard Units & Areas Under Normal Distribution In this section, we finally get to the z-score and discuss the difference between a raw score and a z-score. We’ve already discussed the z-score multiple times, but I will reiterate. A raw score is the value of a random variable from a sample. This value must be standardized to find the probability under the normal distribution when using a table. Standardizing just means finding the z-score, finding the value of the raw score in terms of how many standard deviations it is from the mean. As we mentioned before, this helps in comparing 2 raw scores to one another and also to compare two raw scores from different distributions!! Z-Scores z= x – μ σ x = raw score μ = mean σ = std. dev. Note: This is valid for a sample too. Interpreting Positive Z Above the mean Negative Z Below the mean | z | is distance from mean | z | > is more unusual when comparing two To find normal probabilities on Tables we have to normalize and then use the Standard Normal Table to look up the probability that a Standard Normal random variable is below a certain value (this is for a left-tail table, of which most are). However, we will be using our calculators to find the probability under the curve. We will practice all the skills just as if we were going to use a table, however. We will learn the following: 1) 2) 3) 4) 5) Y. Butterworth How to write the desired probability using probability notation and the random variable, X. How to write the desired probability using probability notation and the standardized random variable, Z. How to find the probability of being less than a raw score as if we were going to use a left-tail table How to find the probability of being greater than a raw score as if we were going to use a left-tail table. How to find the probability of being between two raw scores as if we were going to use a left-tail table Ch. 6 Notes – Foothill M10 7 Example: Fawns in Mesa Verde Nat’l Park have a body weight that is approximately normally distributed with a mean of 27.2 kg and a standard deviation of 4.3 kg (based upon info from The Mule Deer of Mesa Verde Nat’l Park, by GW Mierau and JL Schmidt, Mesa Verde Museum Ass.) a) Y. Butterworth If X is the weight of a fawn in kilograms, answer all the following questions based on the given information. Write the correct probability notation for this scenario & draw a picture for the area under the curve: What is the probability that a randomly chosen fawn weighs at most 30 kg? b) Write the correct probability notation for this scenario & draw a picture for the area under the curve: What is the probability that a fawn weighs more than 19 kg? c) Write the correct probability notation for this scenario & draw a picture for the area under the curve: What is the probability that a fawn weighs between 32 and 35 kg? d) Using the probability notation in a), rewrite your probability in terms of the standardized random variable, Z. Does the picture change? e) Using the probability in c) rewrite your probability in terms of the standardized random variable, Z. f) If you were given the information that the standardized random variable was at least 1.28, for a randomly chosen fawn, write this using probability notation and then convert it back to a probability for the random variable, X. Ch. 6 Notes – Foothill M10 8 When working with a left-tail table we can only find the area under the curve to the left of the random variable, X. Because of this limitation we must learn to rewrite probabilities in terms of being less than the random variable, X. Here are the short-cuts: P(X < x) Straight table look up. P(X > x) = 1 – P(X < x) [Technically it needs to ≤ x but there is no area under x, so for any continuous distribution it doesn’t matter.] P(xs < X < xB) = P(X < xB) – P(X < xs) Example: Using the last example, rewrite parts b & c so that you could look up the probabilities in a left-tail table. a) b) Now we will find probabilities. Since we have a TI-83/84 we can use the DIST menu and the normalcdf(left limit, right limit, mean, std dev) If you leave off the mean and the std dev the calculator will assume a standard normal, with mean, 0 and std dev, 1. For the left limit use a very small number (e.g. -1x1010)when looking for a left-tail, and if looking for a right-tail use a very large number (e.g. 1x1010). Example: a) Now, using fawn example, find the probabilities in a, b & c. b) c) d) Y. Butterworth Use the standardized score you found in part d) and make sure it agrees with the probability that you found in a). Make sure that you take into account that the mean is now 0 and the std. dev. is now 1. Ch. 6 Notes – Foothill M10 9 §6.3 Areas Under Normal Curves This section discusses how to use the Standard Normal Table to find the probability of any Normally Distributed random variable despite the mean and standard deviation. Since we will be using the TI-83/84 to find normal probabilities. We will discuss how to find probabilities for any normal distribution using a standard normal, just in case you ever find yourself without a calculator that will calculate these probabilities. 1) 2) Translate the random variable to a z-score Look up in a left-tail table, translating if necessary (a pix will help) a) P(X < x) = P(Z < z) b) P(X > x) = 1 – P(X < x) = 1 – P(Z < z) c) P(xS < X < xB) = P(X < xB) – P(X < xS) = P(Z < zB) – P(Z < zS) Recall: For finding probabilities with the TI-83/84, use DISTR menu and normalcdf(lower bound, upper bound, mean, std. dev. Remember if you don’t put in mean and std dev. the calculator will assume std. normal. Example: Let’s do an example where we draw the pictures and prepare for the look up using a Z-score, then we’ll use our calculator to find the probabilities with mean 0 and std dev 1 and compare it with the answer from the original mean and std. dev. Porphyrin is a pigment in blood protoplasm and other body fluids that is significant in body energy and storage. Let x be a random variable that represents the number of milligrams of porphyrin per deciliter of blood. In health adults, x is approximately normally distributed with mean of 38 and std. dev. of 12 (Diagnostic Tests with Nursing Implications, edited by S. Loeb, Springhouse Press). This is problem #26 p. 269 of your text. a) Y. Butterworth Write the probability that a randomly chosen individual will have a porphyrin level lower than 60 using probability notation. Next, convert it to a probability using a Z . Sketch the picture of the probability that you are trying to find. Use your calculator to find the probability, using the z-score. Compare it to the answer for the probability of x under the original mean and std. dev. Ch. 6 Notes – Foothill M10 10 b) Answer all the same questions as a) for the probability that a randomly chosen individual will have a porphyrin level above 16. c) Answer all the same questions as a) for the probability that a randomly chosen individual will have a porphyrin level between 16 and 60. d) Answer all the same questions as a) for the probability that a randomly chosen individual will have a porphyrin level greater than 60. Our next topic is how to find the value of the random variable given the probability of that random variable’s occurrence. As before, I’m going to teach you based upon doing it with a table, so that if you ever find yourself without the TI-83/84 you could still do the problem. 1) Find the z based upon the reverse table look up (look in the body) 2) Use x = zσ + μ to find x, since z = x – μ σ Since you will have your calculator, we’ll practice all but the table look-up. You will use DISTR menu again and this time INVNORM(probability, μ,σ) Just as before, if you leave off mean and std dev, the calculator will assume std. normal. Y. Butterworth Ch. 6 Notes – Foothill M10 11 Just as before we will draw pictures to help us with the concepts. 1) P(X < x) = known probability known probability x 2) P(X > x) = known probability known probability Convert To 1 –known x x SO, 3) P(X > x) = 1 – P(X < x) = known ∴ P(X < x) = 1 – known P(-x < X < x) = known known prob -x x SO, Example: Example: so, if –x & x are symmetric about μ, the area in the tails is 1 – known, so below –x is half that probability 1 – known 2 -x P(-x < X < x) = 1 – 2P(X < -x) = known ∴ P(X < -x) = 1 – known 2 Human gestations periods are approximately normally distributed with a mean of 268 days and std. dev. of 15 days. Premature babies are those that are in the lowest 4%. Find the length of gestation, in days, that separates the premature babies from the non-premature babies. The number of years before replacement is necessary for a specific brand of CD players is approximately normally distributed with a mean of 9.4 years and std. dev. of 4.2 years. Find the length of time, in years, that represents the cut-off points for the amount of time that these CD players “consistently” last. Recall that your book terms “consistently” within _____ std. dev. See p.11 Ch. 5 notes! Y. Butterworth Ch. 6 Notes – Foothill M10 12 Example: Most exhibition shows open in the morning and close in the late evening. A study of Saturday arrival times showed that t he average arrival time was 3 hours and 48 minutes after the doors opened, and the std. dev. was estimated at about 52 minutes. Assume the arrival times have a normal distribution. Remember units must be consistent! This is problem #37 from p. 270 of our text. *a) At what time, after the doors open, will all but 10% of the people who are coming to the Sat. show have arrived? b) At what time, after the doors open, will only 15% of the people who are coming to the Sat. show have arrived? *c) What are the arrival times that represent the cut-off points for which the “usual” amount of people will have arrived? See p. 11 of Ch. 5 notes! *Note: I have changed the problem or added the problem to the original. One of the questions that is of concern when analyzing data is whether we have data that is approximately normally distributed. When we can answer this question in the affirmative we have an easier time analyzing data, especially small data sets that don’t conform to the requirements of the Central Limit Theorem for their normality. We are now going to see some ways of checking for Normality. Y. Butterworth Ch. 6 Notes – Foothill M10 13 Approximately Normal? 1) Histogram ≈ Bell Shaped (Symetric) 2) Outliers Not > 1 Outliers below Q1–1.5IQR or beyond Q3 + 1.5IQR 3) Skewness Pearson’s Index = 3(x-bar – x-tilde) is between -1 & 1 s 4) Normal Probability (Quantile) Plot Z-score (x’s) plotted against values of R.V, x (y’s) forms ≈ straight line Technology can help us with: 1) Histograms as we have already seen BTW here are the instruction for doing a histogram with EXCEL’s data analysis package. I had forgotten that a histogram was possible with EXCEL. Histogram Input Upper Class Limits into another column ToolsData AnalysisHistogram Highlight the data in the 1st column of your original workbook Click on Bin Range and highlight the column containing upper class limits Uncheck the labels box below Bin Range Check New Workbook Check Chart Output Click on OK 2) 3) Pearson’s Skewness: EXCEL &TI used to get mean, median & std. dev., but calculation must be done by hand Normal Probability Plot: Both EXCEL & the TI we can force the issue Enter raw scores & find the Z-Score for each In EXCEL =STANDARDIZE(x, μ, σ) Now, use a scatterplot with just dots and plot standardized values against the x values (make sure std. vals are in 1st column & x vals are in 2nd column) In TI’s DATA EDITOR with the cursor on the L2, type in (L1-mean)/std dev, where L1 contains the data and L2 will contain the z-scores Now, use the STAT PLOT menu and turn on the first one that looks like dots by putting your cursor on the graph and entering. In the X-List enter L2, and in the Y-List enter L1. Mark with the square. ZOOM and choose ZOOMSTAT. Y. Butterworth Ch. 6 Notes – Foothill M10 14 Example: a) b) c) f) Y. Butterworth Now let’s test the normality of the following data which represents the average number of murders per capita for a large city. 12.6, 9.5, 14.9, 11.9, 11.0, 10.7, 9.7, 10.5, 8.2, 7.8, 9.5, 13.1, 8.2, 7.1, 9.2, 8.3, 8.7, 9.6, 10.9, 9.4, 12.1, 15.6, 12.0 Create a histogram of the data Calculate Pearson’s Skewness Are there any outliers? Make a Normal Probability Plot Ch. 6 Notes – Foothill M10 15 §6.4 Normal Approximation to the Binomial This approximation is used appropriately when the following conditions are met. np ≥ 5 and nq ≥ 5 or n > 30 If X ~ Binomial(n,p,q) then X ~ N(μ, σ) where μ = np σ = √npq The only trouble with the normal approximation to the binomial is the discrepancy between the types of distributions. The binomial is discrete. Remember the “chunky” probability histograms that we used to show the shape of the distribution? Well, this causes some problems because there is area under the curve for a given value in the binomial, whereas in a continuous distribution such as the normal, there is no area under the curve for any particular value. As such, we have to “fake” the area. This is done with a continuity correction. This takes into account that each bar on the binomial is one unit wide, so for each value we subtract 0.5 and add 0.5 to the value to approximate that interval. Below is what it will look like on the normal curve. P(X = x) where x is Binomially Dist P(x – 0.5 < X < x + 0.5) where x is Normal with μ = np & σ = √npq x x–0.5 x x+0.5 P(X > x) where x is Binomially Dist P(X > x + 0.5) where x is Normal with μ = np & σ = √npq x x x+0.5 P(X ≥ x) where x is Binomially Dist P(X > x – 0.5) where x is Normal with μ = np & σ = √npq x–0.5 x x Y. Butterworth Ch. 6 Notes – Foothill M10 16 P(X < x) where x is Binomially Dist P(X < x – 0.5) where x is Normal with μ = np & σ = √npq x–0.5 x x P(X ≤ x) where x is Binomially Dist P(X < x + 0.5) where x is Normal with μ = np & σ = √npq x x x+0.5 Now, all we need to do is try some examples!! Example: Step 1: *Use the normal approximation to the binomial to find the probability that at least 70 of 100 mosquitoes will be killed with an insect spray when the probability of killing them with the spray is 0.75. Use your calculator to calculate the actual probability too and compare. What are n, p & x? Step 2: Calculate μ&σ Step 3: Define the probability in terms of the binomial & do the continuity correction. Step 4: Find the probability using the normal distribution Your Turn: *If 23% of all patients with high blood pressure have bad side effects from a certain medicine, use the normal approximation to find the probability that among 120 patients more than 32 will have bad side effects. *Note: Both examples are from Wadpole p. 230 Ex. 25 & p. 238 Ex. 23. Y. Butterworth Ch. 6 Notes – Foothill M10 17