Download Document

MM 207 Unit #5 The Normal Distribution Copyright © 2009 Pearson Education, Inc. Slide 1.1- 1 Section 5.1 WHAT IS NORMAL? Copyright © 2009 Pearson Education, Inc. Slide 1.1- 2 Suppose a friend is pregnant and due to give birth on June 30. Would you advise her to schedule an important business meeting for June 16, two weeks before the due date? Figure 5.1 is a histogram for a distribution of 300 natural births. The left vertical axis shows the number of births for each 4day bin. The right vertical axis shows relative frequencies. Figure 5.1 Copyright © 2009 Pearson Education, Inc. Slide 5.1- 3 We can find the proportion of births that occurred more than 14 days before the due date by adding the relative frequencies for the bins to the left of -14. These bins have a total relative frequency of about 0.21, which says that about 21% of the births in this data set occurred more than 14 days before the due date. Figure 5.1 Copyright © 2009 Pearson Education, Inc. Slide 5.1- 4 The Normal Shape The distribution of the birth data has a fairly distinctive shape, which is easier to see if we overlay the histogram with a smooth curve (Figure 5.2). Copyright © 2009 Pearson Education, Inc. Slide 5.1- 5 For our present purposes, the shape of this smooth distribution has three very important characteristics: • The distribution is single-peaked. Its mode, or most common birth date, is the due date. • The distribution is symmetric around its single peak; therefore, its median and mean are the same as its mode. The median is the due date because equal numbers of births occur before and after this date. The mean is also the due date because, for every birth before the due date, there is a birth the same number of days after the due date. • The distribution is spread out in a way that makes it resemble the shape of a bell, so we call it a “bell-shaped” distribution. Copyright © 2009 Pearson Education, Inc. Slide 5.1- 6 Figure 5.3 Both distributions are normal and have the same mean of 75, but the distribution on the left has a larger standard deviation. Copyright © 2009 Pearson Education, Inc. Slide 5.1- 7 Definition The normal distribution is a symmetric, bell-shaped distribution with a single peak. Its peak corresponds to the mean, median, and mode of the distribution. Its variation can be characterized by the standard deviation of the distribution. Copyright © 2009 Pearson Education, Inc. Slide 5.1- 8 The Normal Distribution and Relative Frequencies Relative Frequencies and the Normal Distribution • The area that lies under the normal distribution curve corresponding to a range of values on the horizontal axis is the relative frequency of those values. • Because the total relative frequency must be 1, the total area under the normal distribution curve must equal 1, or 100%. Copyright © 2009 Pearson Education, Inc. Slide 5.1- 9 Figure 5.5 The percentage of the total area in any region under the normal curve tells us the relative frequency of data values in that region. Copyright © 2009 Pearson Education, Inc. Slide 5.1- 10 EXAMPLE 2 Estimating Areas Look again at the normal distribution in Figure 5.5 (slide 5.1-11). a. Estimate the percentage of births occurring between 0 and 60 days after the due date. Solution: a. About half of the total area under the curve lies in the region between 0 days and 60 days. This means that about 50% of the births in the sample occur between 0 and 60 days after the due date. Copyright © 2009 Pearson Education, Inc. Slide 5.1- 11 EXAMPLE 2 Estimating Areas Look again at the normal distribution in Figure 5.5 (slide 5.1-11). b. Estimate the percentage of births occurring between 14 days before and 14 days after the due date. Solution: b. Figure 5.5 shows that about 18% of the births occur more than 14 days before the due date. Because the distribution is symmetric, about 18% must also occur more than 14 days after the due date. Therefore, a total of about 18% 18% 36% of births occur either more than 14 days before or more than 14 days after the due date. The question asked about the remaining region, which means between 14 days before and 14 days after the due date, so this region must represent 100% - 36% = 64% of the births. Copyright © 2009 Pearson Education, Inc. Slide 5.1- 12 When Can We Expect a Normal Distribution? Conditions for a Normal Distribution A data set that satisfies the following four criteria is likely to have a nearly normal distribution: 1. Most data values are clustered near the mean, giving the distribution a well-defined single peak. 2. Data values are spread evenly around the mean, making the distribution symmetric. 3. Larger deviations from the mean become increasingly rare, producing the tapering tails of the distribution. 4. Individual data values result from a combination of many different factors, such as genetic and environmental factors. Copyright © 2009 Pearson Education, Inc. Slide 5.1- 13 EXAMPLE 3 Is It a Normal Distribution? Which of the following variables would you expect to have a normal or nearly normal distribution? a. Scores on a very easy test Solution: a. Tests have a maximum possible score (100%) that limits the size of data values. If the test is easy, the mean will be high and many scores will be close to the maximum possible. The few lower scores may be spread out well below the mean. We therefore expect the distribution of scores to be left-skewed and nonnormal. Copyright © 2009 Pearson Education, Inc. Slide 5.1- 14 EXAMPLE 3 Is It a Normal Distribution? Which of the following variables would you expect to have a normal or nearly normal distribution? b. Heights of a random sample of adult women Solution: b. Height is determined by a combination of many factors (the genetic makeup of both parents and possibly environmental or nutritional factors). We expect the mean height for the sample to be close to the mode (most common height). We also expect there to be roughly equal numbers of women above and below the mean, and extremely large and small heights should be rare. That is why height is nearly normally distributed. Copyright © 2009 Pearson Education, Inc. Slide 5.1- 15 Section 5.2 PROPERTIES OF THE NORMAL DISTRIBUTION Copyright © 2009 Pearson Education, Inc. Slide 1.1- 16 Consider a Consumer Reports survey in which participants were asked how long they owned their last TV set before they replaced it. The variable of interest in this survey is replacement time for television sets. Based on the survey, the distribution of replacement times has a mean of about 8.2 years, which we denote as m (the Greek letter mu). The standard deviation of the distribution is about 1.1 years, which we denote as s (the Greek letter sigma). Copyright © 2009 Pearson Education, Inc. Slide 5.2- 17 Making the reasonable assumption that the distribution of TV replacement times is approximately normal, we can picture it as shown in Figure 5.16. Figure 5.16 Normal distribution for replacement times for TV sets with a mean of m 8.2 years and a standard deviation of s 1.1 years. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 18 A simple rule, called the 68-95-99.7 rule, gives precise guidelines for the percentage of data values that lie within 1, 2, and 3 standard deviations of the mean for any normal distribution. Figure 5.17 Normal distribution illustrating the 68-95-99.7 rule. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 19 The 68-95-99.7 Rule for a Normal Distribution • About 68% (more precisely, 68.3%), or just over twothirds, of the data points fall within 1 standard deviation of the mean. • About 95% (more precisely, 95.4%) of the data points fall within 2 standard deviations of the mean. • About 99.7% of the data points fall within 3 standard deviations of the mean. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 20 EXAMPLE 1 SAT Scores The tests that make up the verbal (critical reading) and mathematics SAT (and the GRE, LSAT, and GMAT) are designed so that their scores are normally distributed with a mean of m = 500 and a standard deviation of s = 100. Interpret this statement. Solution: From the 68-95-99.7 rule, about 68% of students have scores within 1 standard deviation (100 points) of the mean of 500 points; that is, about 68% of students score between 400 and 600. About 95% of students score within 2 standard deviations (200 points) of the mean, or between 300 and 700. And about 99.7% of students score within 3 standard deviations (300 points) of the mean, or between 200 and 800. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 21 EXAMPLE 1 SAT Scores Solution: (cont.) Figure 5.18 shows this interpretation graphically; note that the horizontal axis shows both actual scores and distance from the mean in standard deviations. Figure 5.18 Normal distribution for SAT scores, showing the percentages associated with 1, 2, and 3 standard deviations. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 22 EXAMPLE 2 Detecting Counterfeits Vending machines can be adjusted to reject coins above and below certain weights. The weights of legal U.S. quarters have a normal distribution with a mean of 5.67 grams and a standard deviation of 0.0700 gram. If a vending machine is adjusted to reject quarters that weigh more than 5.81 grams and less than 5.53 grams, what percentage of legal quarters will be rejected by the machine? Solution: A weight of 5.81 is 0.14 gram, or 2 standard deviations, above the mean. A weight of 5.53 is 0.14 gram, or 2 standard deviations, below the mean. Therefore, by accepting only quarters within the weight range 5.53 to 5.81 grams, the machine accepts quarters that are within 2 standard deviations of the mean and rejects those that are more than 2 standard deviations from the mean. By the 68-95-99.7 rule, 95% of legal quarters will be accepted and 5% of legal quarters will be rejected. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 23 Applying the 68-95-99.7 Rule We can apply the 68-95-99.7 rule to determine when data values lie 1, 2, or 3 standard deviations from the mean. For example, suppose that 1,000 students take an exam and the scores are normally distributed with a mean of m = 75 and a standard deviation of s = 7. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 24 Figure 5.19 A normal distribution of test scores with a mean of 75 and a standard deviation of 7. (a) 68% of the scores lie within 1 standard deviation of the mean. (b) 95% of the scores lie within 2 standard deviations of the mean. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 25 Identifying Unusual Results In statistics, we often need to distinguish values that are typical, or “usual,” from values that are “unusual.” By applying the 68-95-99.7 rule, we find that about 95% of all values from a normal distribution lie within 2 standard deviations of the mean. This implies that, among all values, 5% lie more than 2 standard deviations away from the mean. We can use this property to identify values that are relatively “unusual”: Unusual values are values that are more than 2 standard deviations away from the mean. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 26 EXAMPLE 4 Normal Heart Rate You measure your resting heart rate at noon every day for a year and record the data. You discover that the data have a normal distribution with a mean of 66 and a standard deviation of 4. On how many days was your heart rate below 58 beats per minute? Solution: A heart rate of 58 is 8 (or 2 standard deviations) below the mean. According to the 68-95-99.7 rule, about 95% of the data points are within 2 standard deviations of the mean. Therefore, 2.5% of the data points are more than 2 standard deviations below the mean, and 2.5% of the data points are more than 2 standard deviations above the mean. On 2.5% of 365 days, or about 9 days, your measured heart rate was below 58 beats per minute. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 27 EXAMPLE 5 Finding a Percentile On a visit to the doctor’s office, your fourth-grade daughter is told that her height is 1 standard deviation above the mean for her age and sex. What is her percentile for height? Assume that heights of fourth-grade girls are normally distributed. Solution: Recall that a data value lies in the nth percentile of a distribution if n% of the data values are less than or equal to it (see Section 4.3). According to the 68-95-99.7 rule, 68% of the heights are within 1 standard deviation of the mean. Therefore, 34% of the heights (half of 68%) are between 0 and 1 standard deviation above the mean. We also know that, because the distribution is symmetric, 50% of all heights are below the mean. Therefore, 50% + 34% = 84% of all heights are less than 1 standard deviation above the mean (Figure 5.21). Your daughter is in the 84th percentile for heights among fourth-grade girls. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 28 Figure 5.21 Normal distribution curve showing 84% of scores less than 1 standard deviation above the mean. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 29 Standard Scores Computing Standard Scores The number of standard deviations a data value lies above or below the mean is called its standard score (or z-score), defined by data value – mean z = standard score = standard deviation The standard score is positive for data values above the mean and negative for data values below the mean. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 30 EXAMPLE 6 Finding Standard Scores The Stanford-Binet IQ test is scaled so that scores have a mean of 100 and a standard deviation of 16. Find the standard scores for IQs of 85, 100, and 125. Solution: We calculate the standard scores for these IQs by using the standard score formula with a mean of 100 and standard deviation of 16. standard score for 125: z = 85 – 100 = -0.94 16 100 – 100 standard score for 100: z = = 0.00 16 Copyright © 2009 Pearson Education, Inc. Slide 5.2- 31 EXAMPLE 6 Finding Standard Scores Solution: (cont.) 125 – 100 standard score for 125: z = = 1.56 16 We can interpret these standard scores as follows: 85 is 0.94 standard deviation below the mean, 100 is equal to the mean, and 125 is 1.56 standard deviations above the mean. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 32 Figure 5.22 shows the values on the distribution of IQ scores from Example 6. Figure 5.22 Standard scores for IQ scores of 85, 100, and 125. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 33 Standard Scores and Percentiles Once we know the standard score of a data value, the properties of the normal distribution allow us to find its percentile in the distribution. This is usually done with a standard score table, such as Table 5.1 (next slide). (Appendix A has a more detailed standard score table.) Copyright © 2009 Pearson Education, Inc. Slide 5.2- 34 Copyright © 2009 Pearson Education, Inc. Slide 5.2- 35 EXAMPLE 7 Cholesterol Levels Cholesterol levels in men 18 to 24 years of age are normally distributed with a mean of 178 and a standard deviation of 41. a. What is the percentile for a 20-year-old man with a cholesterol level of 190? Solution: a.The standard score for a cholesterol level of 190 is data value – mean 190 – 178 z = standard score = = ≈ 0.29 standard deviation 41 Table 5.1 shows that a standard score of 0.29 corresponds to about the 61st percentile. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 36 EXAMPLE 7 Cholesterol Levels Cholesterol levels in men 18 to 24 years of age are normally distributed with a mean of 178 and a standard deviation of 41. b. What cholesterol level corresponds to the 90th percentile, the level at which treatment may be necessary? Solution: b. Table 5.1 shows that 90.32% of all data values have a standard score less than 1.3. Thus, the 90th percentile is about 1.3 standard deviations above the mean. Given the mean cholesterol level of 178 and the standard deviation of 41, a cholesterol level 1.3 standard deviations above the mean is A cholesterol level of about 231 corresponds to the 90th percentile. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 37 Toward Probability Suppose you pick a baby at random and ask whether the baby was born more than 15 days prior to his or her due date. Because births are normally distributed around the due date with a standard deviation of 15 days, we know that 16% of all births occur more than 15 days prior to the due date (see Example 3). For an individual baby chosen at random, we can therefore say that there’s a 0.16 chance (about 1 in 6) that the baby was born more than 15 days early. In other words, the properties of the normal distribution allow us to make a probability statement about an individual. In this case, our statement is that the probability of a birth occurring more than 15 days early is 0.16. This example shows that the properties of the normal distribution can be restated in terms of ideas of probability. Copyright © 2009 Pearson Education, Inc. Slide 5.2- 38 Section 5.3 THE CENTRAL LIMIT THEOREM Copyright © 2009 Pearson Education, Inc. Slide 1.1- 39 Suppose we roll one die 1,000 times and record the outcome of each roll, which can be the number 1, 2, 3, 4, 5, or 6. Figure 5.23 shows a histogram of outcomes. All six outcomes have roughly the same relative frequency, because the die is equally likely to land in each of the six possible ways. That is, the histogram shows a (nearly) uniform distribution (see Section 4.2). It turns out that the distribution in Figure 5.23 has a mean of 3.41 and a standard deviation of 1.73. Copyright © 2009 Pearson Education, Inc. Figure 5.23 Frequency and relative frequency distribution of outcomes from rolling one die 1,000 times. Slide 5.3- 40 Now suppose we roll two dice 1,000 times and record the mean of the two numbers that appear on each roll. To find the mean for a single roll, we add the two numbers and divide by 2. Figure 5.25a shows a typical result. The most common values in this distribution are the central values 3.0, 3.5, and 4.0. These values are common because they can occur in several ways. The mean and standard deviation for this distribution are 3.43 and 1.21, respectively. Figure 5.25a Frequency and relative frequency distribution of sample means from rolling two dice 1,000 times. Copyright © 2009 Pearson Education, Inc. Slide 5.3- 41 Suppose we roll five dice 1,000 times and record the mean of the five numbers on each roll. A histogram for this experiment is shown in Figure 5.25b. Once again we see that the central values around 3.5 occur most frequently, but the spread of the distribution is narrower than in the two previous cases. The mean and standard deviation are 3.46 and 0.74, respectively. Figure 5.25b Frequency and relative frequency distribution of sample means from rolling five dice 1,000 times. Copyright © 2009 Pearson Education, Inc. Slide 5.3- 42 If we further increase the number of dice to ten on each of 1,000 rolls, we find the histogram in Figure 5.25c, which is even narrower. In this case, the mean is 3.49 and standard deviation is 0.56. Figure 5.25c Frequency and relative frequency distribution of sample means from rolling ten dice 1,000 times. Copyright © 2009 Pearson Education, Inc. Slide 5.3- 43 Table 5.2 shows that as the sample size increases, the mean of the distribution of means approaches the value 3.5 and the standard deviation becomes smaller (making the distribution narrower). More important, the distribution looks more and more like a normal distribution as the sample size increases. Copyright © 2009 Pearson Education, Inc. Slide 5.3- 44 The Central Limit Theorem Suppose we take many random samples of size n for a variable with any distribution (not necessarily a normal distribution) and record the distribution of the means of each sample. Then, 1. The distribution of means will be approximately a normal distribution for large sample sizes. 2. The mean of the distribution of means approaches the population mean, m, for large sample sizes. 3. The standard deviation of the distribution of means approaches σ/ n for large sample sizes, where s is the standard deviation of the population. Copyright © 2009 Pearson Education, Inc. Slide 5.3- 45 Be sure to note the very important adjustment, described by item 3 above, that must be made when working with samples or groups instead of individuals: The standard deviation of the distribution of sample means is not the standard deviation of the population, s, but rather s/ n , where n is the size of the samples. Copyright © 2009 Pearson Education, Inc. Slide 5.3- 46 TECHNICAL NOTE (1) For practical purposes, the distribution of means will be nearly normal if the sample size is larger than 30. (2) If the original population is normally distributed, then the sample means will be normally distributed for any sample size n. (3) In the ideal case, where the distribution of means is formed from all possible samples, the mean of the distribution of means equals μ and the standard deviation of the distribution of means equals σ/ n. Copyright © 2009 Pearson Education, Inc. Slide 5.3- 47 Figure 5.26 As the sample size increases (n = 5, 10, 30), the distribution of sample means approaches a normal distribution, regardless of the shape of the original distribution. The larger the sample size, the smaller is the standard deviation of the distribution of sample means. Copyright © 2009 Pearson Education, Inc. Slide 5.3- 48 EXAMPLE 1 Predicting Test Scores You are a middle school principal and your 100 eighth-graders are about to take a national standardized test. The test is designed so that the mean score is m = 400 with a standard deviation of s = 70. Assume the scores are normally distributed. a. What is the likelihood that one of your eighth-graders, selected at random, will score below 375 on the exam? Solution: a. In dealing with an individual score, we use the method of standard scores discussed in Section 5.2. Given the mean of 400 and standard deviation of 70, a score of 375 has a standard score of data value – mean z = standard deviation = 375 – 400 = -0.36 70 Copyright © 2009 Pearson Education, Inc. Slide 5.3- 49 EXAMPLE 1 Predicting Test Scores Solution: (cont.) According to Table 5.1, a standard score of -0.36 corresponds to about the 36th percentile— that is, 36% of all students can be expected to score below 375. Thus, there is about a 0.36 chance that a randomly selected student will score below 375. Notice that we need to know that the scores have a normal distribution in order to make this calculation, because the table of standard scores applies only to normal distributions. Copyright © 2009 Pearson Education, Inc. Slide 5.3- 50 EXAMPLE 1 Predicting Test Scores You are a middle school principal and your 100 eighth-graders are about to take a national standardized test. The test is designed so that the mean score is m = 400 with a standard deviation of s = 70. Assume the scores are normally distributed. b. Your performance as a principal depends on how well your entire group of eighth-graders scores on the exam. What is the likelihood that your group of 100 eighth-graders will have a mean score below 375? Solution: b. The question about the mean of a group of students must be handled with the Central Limit Theorem. According to this theorem, if we take random samples of size n = 100 students and compute the mean test score of each group, the distribution of means is approximately normal. Copyright © 2009 Pearson Education, Inc. Slide 5.3- 51 EXAMPLE 1 Predicting Test Scores Solution: (cont.) Moreover, the mean of this distribution is m = 400 and its standard deviation is s / n = 70/ 100 = 7. With these values for the mean and standard deviation, the standard score for a mean test score of 375 is data value – mean z = standard deviation = 375 – 400 = -0.357 7 Table 5.1 shows that a standard score of -3.5 corresponds to the 0.02th percentile, and the standard score in this case is even lower. In other words, fewer than 0.02% of all random samples of 100 students will have a mean score of less than 375. Copyright © 2009 Pearson Education, Inc. Slide 5.3- 52 EXAMPLE 1 Predicting Test Scores Solution: (cont.) Therefore, the chance that a randomly selected group of 100 students will have a mean score below 375 is less than 0.0002, or about 1 in 5,000. Notice that this calculation regarding the group mean did not depend on the individual scores’ having a normal distribution. This example has an important lesson. The likelihood of an individual scoring below 375 is more than 1 in 3 (36%), but the likelihood of a group of 100 students having a mean score below 375 is less than 1 in 5,000 (0.02%). In other words, there is much more variation in the scores of individuals than in the means of groups of individuals. Copyright © 2009 Pearson Education, Inc. Slide 5.3- 53 The Value of the Central Limit Theorem The Central Limit Theorem allows us to say something about the mean of a group if we know the mean, m, and the standard deviation, s, of the entire population. This can be useful, but it turns out that the opposite application is far more important. Two major activities of statistics are making estimates of population means and testing claims about population means. Is it possible to make a good estimate of the population mean knowing only the mean of a much smaller sample? As you can probably guess, being able to answer this type of question lies at the heart of statistical sampling, especially in polls and surveys. The Central Limit Theorem provides the key to answering such questions. Copyright © 2009 Pearson Education, Inc. Slide 5.3- 54 Q & A??? Copyright © 2009 Pearson Education, Inc. Slide 5.3- 55

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document