Download Distribution of Data and the Empirical Rule

302360_File_B.qxd 7/7/03 7:18 AM Page 1 Distribution of Data and the Empirical Rule 1 Distribution of Data and the Empirical Rule ■ Stem-and-Leaf Diagrams ■ Frequency Distributions and Histograms ■ Normal Distributions and the Empirical Rule ■ z-Scores ■ Stem-and-Leaf Diagrams Although the mean, the median, the mode, and the standard deviation provide some information about a set of data and the distribution of the data, it is often helpful to use graphical procedures that visually illustrate precisely how the values in a set of data are distributed. Many small sets of data can be graphically displayed by using a stem-andleaf diagram. For instance, consider the following history test scores: 65, 72, 96, 86, 43, 61, 75, 86, 49, 68, 98, 74, 84, 78, 85, 75, 86, 73 A Stem-and-Leaf Diagram of a Set of History Test Scores Stems 4 Leaves 39 5 6 158 7 234558 8 45666 9 68 Legend: 8/6 represents 86 In this form the data are called raw data because the data have not been organized. With raw data it is generally difficult to observe how the data are distributed. In the stem-and-leaf diagram shown at the left, we have organized the test scores by placing all the scores that are in the 40s in the top row, the scores that are in the 50s in the second row, the scores that are in the 60s in the third row, and so on. The tens digits of the scores have been placed to the left of the vertical line. In this diagram they are referred to as stems. The ones digits of the test scores have been placed in the proper row to the right of the vertical line. In this diagram they are the leaves. It is now easy to make observations about the distribution of the scores. Only two of the scores are in the 90s, six of the scores are in the 70s, and none of the scores are in the 50s. The lowest score is 43 and the highest is 98. Steps in the Construction of a Stem-and-Leaf Diagram 1. Determine the stems and list the stems in a column from smallest to largest. 2. List the remaining digits of each stem as a leaf to the right of its stem. 3. Include a legend that explains the meaning of the stem and the leaves. Include a title for the diagram. The choice of how many leading digits to use as the stem will depend on the particular application and can be best explained with an example. EXAMPLE 1 Construct a Stem-and-Leaf Diagram A travel agent has recorded the amount spent by customers for a cruise. Construct a stem-and-leaf diagram for the data. Amount Spent for a Cruise, Summer of 2003 $3600 $4700 $7200 $2100 $5700 $4400 $9400 $6200 $5900 $2100 $4100 $5200 $7300 $6200 $3800 $4900 $5400 $5400 $3100 $3100 $4500 $4500 $2900 $3700 $3700 $4800 $4800 $2400 Continued ➤ Copyright © Houghton Mifflin Company. All rights reserved. 302360_File_B.qxd 7/7/03 7:18 AM Page 2 2 Solution One method of choosing the stems is to let each thousands digit be a stem and each hundreds digit be a leaf. If the stems and leaves are assigned in this manner, then the notation 21, which has a stem of 2 and a leaf of 1, represents a cost of $2100 and the notation 54 represents a cost of $5400. The diagram can now be constructed by writing all of the stems, from smallest to largest, in a column to the left of a vertical line and writing the corresponding leaves to the right of the vertical line. Amount Spent for a Cruise Stems Leaves 2 1149 3 116778 4 14557889 5 24479 6 22 7 23 8 9 4 Legend: 73 represents $7300 CHECK YOUR PROGRESS 1 The following table lists the ages of the customers who purchased a cruise. Construct a stem-and-leaf diagram for the data. Ages of Customers Who Purchased a Cruise 32 45 66 21 62 68 61 55 23 38 44 77 46 50 33 35 42 45 51 28 40 41 52 52 72 64 51 33 Solution See page S1. Sometimes two sets of data can be compared by using a back-to-back stemand-leaf diagram, which has common stems with leaves from one data set displayed to the right of the stems and leaves from the other data set displayed to the left of the stems. For instance, the following back-to-back stem-and-leaf diagram shows the test scores for two biology classes that took the same test. Copyright © Houghton Mifflin Company. All rights reserved. 7/7/03 7:18 AM Page 3 Distribution of Data and the Empirical Rule 3 Biology Test Scores 8 A.M. class 10 A.M. class 2 4 58 7 5 6799 58 6 2348 1233378 7 1335568 44556889 8 23666 24558 9 45 Legend: 37 represents 73 Legend: 82 represents 82 QUESTION Which biology class did better on the test? ■ Frequency Distributions and Histograms Large sets of data are often displayed using a frequency distribution or a histogram. For example, consider the following situation. An Internet service provider (ISP) has installed new computers. To estimate the new download times its subscribers will experience, the ISP surveyed 1000 subscribers to determine the time each subscriber required to download a particular file from the Internet site music.net. The results of that survey are summarized in the following table. Download time (in seconds) Number of subscribers 0– 10 28 10– 20 129 20– 30 355 30– 40 345 40– 50 121 50– 60 22 A grouped frequency distribution 400 Number of subscribers 302360_File_B.qxd 350 300 250 200 150 100 50 0 10 20 30 40 50 60 Download time, in seconds A histogram of the frequency distribution at the left The above table is called a grouped frequency distribution. It shows how often (frequently) certain events occurred. Each interval 0–10, 10–20, . . . is called a ANSWER The 8 A.M. class did better on the test because it had more scores in the 80s and 90s and fewer scores in the 40s, 50s, and 60s. The scores in the 70s were similar for both classes. Copyright © Houghton Mifflin Company. All rights reserved. 302360_File_B.qxd 7/7/03 7:18 AM Page 4 4 class. This distribution has six classes. For the 10–20 class, 10 is the lower class boundary and 20 is the upper class boundary. Any data value that lies on a common boundary is assigned to the higher class. The graph of a frequency distribution is called a histogram. A histogram provides a pictorial view of how the data are distributed. In the above histogram, the height of each bar indicates how many subscribers experienced the download times indicated by the class represented below on the horizontal axis. The center point of a class is called a class mark. In the above histogram, the class marks 5, 15, 25, 35, 45, 55 are shown by the red tick marks on the horizontal axis. Instead of using classes with a width of 10 seconds, the ISP could have chosen a smaller class width. A smaller class width produces more classes. For instance, if each class width were 5 seconds, the frequency distribution and histogram for the music.net example would have the form shown below. Number of subscribers 0– 5 8 5– 10 20 10– 15 40 15– 20 89 20– 25 155 25– 30 200 30– 35 196 35– 40 149 40– 45 76 45– 50 45 50– 55 14 55– 60 8 A frequency distribution with 12 classes 200 Number of subscribers Download time (in seconds) 175 150 125 100 75 50 25 0 5 10 15 20 25 30 35 40 45 50 55 60 Download time, in seconds A histogram of the frequency distribution at the left Examine the following distribution. It shows the percent of subscribers who are in each class, as opposed to the frequency distribution above, which shows the number of subscribers in each class. The type of frequency distribution that lists the percent of data in each class is called a relative frequency distribution. The relative frequency histogram shown at the right below was drawn by using the data in the relative frequency distribution. It shows the percent of subscribers along its vertical axis. Copyright © Houghton Mifflin Company. All rights reserved. 302360_File_B.qxd 7/7/03 7:18 AM Page 5 Distribution of Data and the Empirical Rule Number of subscribers 0– 5 0.8 5– 10 2.0 10– 15 4.0 15– 20 8.9 20– 25 15.5 25– 30 20.0 30– 35 19.6 35– 40 14.9 40– 45 7.6 45– 50 4.5 50– 55 1.4 55– 60 0.8 A relative frequency distribution 20 Percent of subscribers Download time (in seconds) 5 15 10 5 0 5 10 15 20 25 30 35 40 45 50 55 60 Download time, in seconds A relative frequency histogram One advantage of using a relative frequency distribution instead of a frequency distribution is that there is a direct correspondence between the percent of the data that lie in a particular portion of the relative frequency distribution and probability. For instance, in the relative frequency distribution above, the percent of the data that lie between 35 and 40 seconds is 14.9%. Thus, if a subscriber is chosen at random, the probability that the subscriber will require between 35 and 40 seconds to download the music file is 0.149. Download time (in seconds) Percent of subscribers 0 –5 0.8 5 –10 2.0 10 –15 4.0 15 –20 8.9 20 –25 15.5 25 –30 20.0 30 –35 19.6 35 –40 14.9 40 –45 7.6 45 –50 4.5 50 –55 1.4 55 –60 0.8 EXAMPLE 2 Use a Relative Frequency Distribution   Sum is   14.9%  Use the music.net relative frequency distribution above to determine       Sum is   68.8%      Solution a. The percent of data in all classes with a lower bound of 25 seconds or more is the sum of the percents for all of the classes highlighted in red in the distribution at the left. The percent of subscribers who required at least 25 seconds to download the file is 68.8%. b. The percent of data in all classes with a lower bound of at least 5 seconds and an upper bound of 20 seconds or less is the sum of the percents for all of the classes highlighted in blue in the distribution at the left. Thus the percent of subscribers who required from 5 to 20 seconds to download the file is 14.9%. The probability that a subscriber chosen at random will require from 5 to 20 seconds to download the file is 0.149. Continued ➤ a. the percent of subscribers who required at least 25 seconds to download the file. b. the probability that a subscriber chosen at random will require from 5 to 20 seconds to download the file. Copyright © Houghton Mifflin Company. All rights reserved. 302360_File_B.qxd 7/7/03 7:18 AM Page 6 6 CHECK YOUR PROGRESS 2 Use the relative frequency distribution below to determine a. the percent of the states that pay an average teacher salary of at least $45,000. b. the probability that a state selected at random pays an average teacher salary of at least $30,000 but less than $39,000. Average Salaries of Public School Teachers, 1998–1999 Average Salary, s Number of States Relative Frequency $27,000 s $30,000 3 6% $30,000 s $33,000 7 14% $33,000 s $36,000 12 24% $36,000 s $39,000 9 18% $39,000 s $42,000 6 12% $42,000 s $45,000 3 6% $45,000 s $48,000 5 10% $48,000 s $51,000 3 6% $51,000 s $54,000 2 4% Source: www.nea.org. Solution See page S1. There is a geometric analogy between the percents of data and probabilities we calculated in Example 2 and the relative frequency histogram for the data. For instance, the percent of data described in part a. of Example 2 corresponds to the area shown by the red bars in the histogram on the left below. The percent of data described in part b. corresponds to the area shown by the blue bars in the histogram on the right below. 20 Percent of subscribers Percent of subscribers 20 15 10 5 0 5 10 15 20 25 30 35 40 45 50 55 60 15 10 5 0 Download time, in seconds 25 seconds or more ■ 5 10 15 20 25 30 35 40 45 50 55 60 Download time, in seconds At least 5 but less than 20 seconds Normal Distributions and the Empirical Rule A histogram for a set of data provides us with a tool that can indicate patterns or trends in the distribution of data. The terms uniform, skewed, symmetrical, and normal are used to describe the distributions of some sets of data. Copyright © Houghton Mifflin Company. All rights reserved. 302360_File_B.qxd 7/7/03 7:18 AM Page 7 Distribution of Data and the Empirical Rule 7 A uniform distribution, shown in the figure below, is generated when all of the observed events occur with the same frequency. The graph of a uniform distribution remains at the same height over the range of the data. Some random processes produce distributions that are uniform or nearly uniform. For example, if the spinner below is used to generate numbers, then in the long run each of the numbers 1, 2, 3, . . . , 8 will be generated with approximately the same frequency. Random number generator Frequency of x Uniform distribution 1 2 7 5 3 1 x 3 4 5 6 7 8 x Skewed distributions Frequency of x mean = median = mode 2 8 A symmetrical distribution, shown at the left, is symmetrical about a vertical center line. If you fold a symmetrical distribution along the center line, the right side of the distribution will match the left side. The following data sets are examples of distributions that are nearly symmetrical: the weights of all male students, the heights of all teenage females, the prices of a gallon of regular gasoline in a large city, the mileages for a particular type of automobile tire, and the amounts of soda dispensed by a vending machine. In a symmetrical distribution, the mean, the median, and the mode are all equal and they are located at the center of the distribution. Skewed distributions, shown in the figures below, have a longer tail on one side of the distribution and shorter tail on the other side. A distribution is skewed to the left if it has a longer tail on the left and is skewed to the right if it has a longer tail on the right. In a distribution that is skewed to the left, the mean is less than the median, which is less than the mode. In a distribution that is skewed to the right, the mode is less than the median, which is less than the mean. Frequency of x Frequency of x Symmetrical distribution Center line 6 4 Skewed left mean median mode x Skewed right x mode median mean Many examinations yield test scores that have skewed distributions. For instance, if a test designed for students in the sixth grade is given to students in a ninth grade class, most of the scores will be high, and the distribution of the test scores will be skewed to the left. Discrete values are separated from each other by an increment, or “space.” For example, only whole numbers are used to record the number of points a Copyright © Houghton Mifflin Company. All rights reserved. 302360_File_B.qxd 7/7/03 7:18 AM Page 8 8 basketball player scores in a game. The possible numbers of points s that the player can score are restricted to the discrete values 0, 1, 2, 3, 4, . . . . The variable s is a discrete variable. Different scores are separated from each other by at least 1 point. Any variable that is based on counting procedures is a discrete variable. Histograms are generally used to show the distribution of discrete variables. Continuous values are values that can take on all real numbers in some interval. For example, the possible times that it takes to drive to the grocery store represent a continuous value. The time is not restricted to natural numbers such as 4 minutes or 5 minutes. In fact, the time may be any part of a minute, or of a second if we care to measure that precisely. A variable such as time that is based on measuring with smaller and smaller units is a continuous variable. Continuous curves, rather than histograms, are used to show the distributions of continuous variables. Distributions of continuous variables f (t) f (x) a. Bimodal t f (w) b. Skewed right x c. Symmetrical w In some cases a continuous curve is used to display the distribution of a set of discrete data. For instance, when we have a large set of data and the class intervals are very small, the shape of the top of the histogram approaches a smooth curve. See the two figures below. Thus, when graphing the distribution of very large sets of data with very small class intervals, it is common practice to replace the histogram with a smooth continuous curve. A histogram for discrete data A continuous distribution curve f (x) f (x) x If x is a continuous variable with mean (the Greek letter mu) and standard deviation , then its normal distribution is given by f x e 1 2 x 2 2 x One of the most important statistical distributions is known as a normal distribution. The precise mathematical definition of a normal distribution is given by the equation in the Take Note at the left; however, for many problems it is sufficient to know that all normal distributions have the following properties. Copyright © Houghton Mifflin Company. All rights reserved. 302360_File_B.qxd 7/7/03 7:18 AM Page 9 Distribution of Data and the Empirical Rule 9 Properties of a Normal Distribution A normal distribution has a bell shape that is symmetric about a vertical line through its center. The mean, the median, and the mode of a normal distribution are all equal and they are located at the center of the distribution. A normal distribution f (x) 2.15% µ − 3σ 13.6% µ − 2σ 34.1% 34.1% 13.6% µ −σ µ µ +σ 68.2% of the data 95.4% of the data 99.7% of the data µ + 2σ 2.15% x µ + 3σ The Empirical Rule: In a normal distribution, about 68.2% of the data lies within 1 standard deviation of the mean. 95.4% of the data lies within 2 standard deviations of the mean. 99.7% of the data lies within 3 standard deviations of the mean. The Empirical Rule can be used to solve many problems that involve a normal distribution. EXAMPLE 3 Use the Empirical Rule A survey of 1000 U.S. gas stations found that the price charged for a gallon of regular gas can be closely approximated by a normal distribution with a mean of $1.90 and a standard deviation of $0.20. How many of the stations charge a. between $1.50 and $2.30 for a gallon of regular gas? b. less than $2.10 for a gallon of regular gas? c. more than $2.30 for a gallon of regular gas? Data within 2σ of µ f(x) 34.1% 34.1% 13.6% µ − 2σ 13.6% µ 95.4% µ + 2σ x Solution a. The $1.50 per gallon price is 2 standard deviations below the mean. The $2.30 price is 2 standard deviations above the mean. In a normal distribution, 95.4% of all data lies within 2 standard deviations of the mean. (See the normal distribution at the left.) Therefore, approximately 95.4%1000 0.9541000 954 of the stations charge between $1.50 and $2.30 for a gallon of regular gas. Continued ➤ Copyright © Houghton Mifflin Company. All rights reserved. 302360_File_B.qxd 7/7/03 7:18 AM Page 10 10 f(x) Data less than 1 σ above µ b. The $2.10 price is 1 standard deviation above the mean. (See the normal distribution at the left.) In a normal distribution, 34.1% of all data lies between the mean and 1 standard deviation above the mean. Thus, approximately 34.1% 34.1%1000 0.3411000 341 50% x µ µ +σ 84.1% of the data f(x) c. Data more than 2 σ above µ 2.3% µ − 2σ 2.3% µ 95.4% µ + 2σ x of the stations charge between $1.90 and $2.10 for a gallon of regular gasoline. Half of the stations charge less than the mean. Therefore, about 341 500 841 of the stations charge less that $2.10 for a gallon of regular gas. This problem can also be solved by computing 34.1% 50% 84.1% of 1000. The $2.30 price is 2 standard deviations above the mean. In a normal distribution, 95.4% of all data is within 2 standard deviations of the mean. This means that the other 4.6% of the data will lie either more than 2 standard deviations above the mean or less than 2 standard deviations below the mean. We are only interested in the data that lie more than 2 standard deviations 1 above the mean, which is 2 of 4.6%, or 2.3%, of the data. (See the distribution at the left.) Thus about 2.3%1000 0.0231000 23 of the stations charge more than $2.30 for a gallon of regular gas. CHECK YOUR PROGRESS 3 A vegetable distributor knows that during the month of August, the weights of its tomatoes were normally distributed with a mean of 0.61 pound and a standard deviation of 0.15 pound. a. What percent of the tomatoes weighed less than 0.76 pound? b. In a shipment of 6000 tomatoes, how many tomatoes can be expected to weigh more than 0.31 pound? c. In a shipment of 4500 tomatoes, how many tomatoes can be expected to weigh between 0.31 and 0.91 pound? Solution See page S1. ■ z-Scores When you take a test, it is natural to wonder how you will do compared to the other students in the class. Will you finish in the top 10%, or will you be closer to the middle? One statistic that is used to measure the position of a data value with respect to other data values is known as the z-score. z-Score The z-score for a given data value x is the number of standard deviations between x and the mean of the data. The following formulas are used to calculate the z-score for a data value x. Population: zx x Sample: zx xx s In the next example, we use a student’s z-scores for two tests to determine how well the student did on each test in comparison to the other students. Copyright © Houghton Mifflin Company. All rights reserved. 302360_File_B.qxd 7/7/03 7:18 AM Page 11 Distribution of Data and the Empirical Rule EXAMPLE 4 11 Use z-Scores Ruben has taken two tests in his math class. He scored 72 on the first test, for which the mean was 65 and the standard deviation was 8. He received a 60 on the second test, for which the mean was 45 and the standard deviation was 12. In comparison to the other students, did Ruben do better on the first or the second test? b. Stacy is in the same math class as Ruben. Stacy’s z-score for the first test was 0.75 . What was Stacy’s score on the first test? a. In any application, the quantity x and the standard deviation are both measured in the same units. Thus a z-score, which is the quotient of x and , is a dimensionless measure. Solution 72 65 60 45 a. The z-score formula yields z72 8 0.875 and z60 12 1.25. Thus Ruben scored 0.875 standard deviations above the mean on his first test and 1.25 standard deviations above the mean on the second test. In comparison to his classmates, Ruben scored better on the second test than on the first test. b. Substitute into the z-score formula and score for x. x 65 8 6 x 65 x 59 0.75 Stacy’s score on the first test was 59. CHECK YOUR PROGRESS 4 Cheryl took two quizzes in her history class. She scored 15 on the first quiz, for which the mean was 12 and the standard deviation was 2.4. Her score on the second quiz, for which the mean was 11.5 and the standard deviation was 2.2, was 14. In comparison to her classmates, did Cheryl do better on the first or the second quiz? b. Greg is in the same history class as Cheryl. Greg’s z-score for the first quiz was 2.5 . What was Greg’s score on the first quiz? a. Solution See page S1. Topics for Discussion 1. Is it possible, in a normal distribution of data, for the mean to be much larger than the median? Explain. 2. Must all large data sets have a normal distribution? Explain. 3. A professor gave a final examination to 110 students. Eighteen students had examination scores that were more than one standard deviation above the mean. Does this indicate that 18 of the students had examination scores that were less than one standard deviation below the mean? Explain. 4. A set of data consists of the 525 monthly salaries, listed in dollars, of the employees of a large company. What units should be used for the z-scores associated with the salaries? Explain. Copyright © Houghton Mifflin Company. All rights reserved. 302360_File_B.qxd 7/7/03 7:18 AM Page 12 12 EXERCISES In Exercises 1 to 8, determine whether the given statement is true or false. 1. If a distribution is symmetric about a vertical line, then it is a normal distribution. 2. Every normal distribution has a bell-shaped graph. 3. In a normal distribution, the mean, the median, and the mode of the distribution all are located at the center of the distribution. 4. In a distribution that is skewed to the left, the median of the data is greater than the mean. 5. If a z-score for a data value x is negative, then x must also be negative. 6. In every data set, 68.2% of the data lies within 1 standard deviation of the mean. 7. Let x be the number of people who attend a baseball game. The variable x is a discrete variable. 8. The time of day d in the lobby of a bank is measured with a digital clock. The variable d is a continuous variable. In Exercises 9 and 10, use the Empirical Rule to answer each question. 9. In a normal distribution, what percent of the data lies a. within 2 standard deviations of the mean? b. more than 1 standard deviation above the mean? c. between 1 standard deviation below the mean and 2 standard deviations above the mean? 10. In a normal distribution, what percent of the data lies a. within 3 standard deviations of the mean? Business and Economics 11. State Sales Tax Rates Use the following frequency distribution to determine a. the percent of states in the U.S. that had a 2001 sales tax of at least 5%. b. the probability that a state selected at random had a 2001 sales tax rate of at least 3% but less than 5%. 2001 State Sales Tax Rate Tax rate, r Number of states Relative frequency 0% r 1% 5 10% 1% r 2% 0 0% 2% r 3% 1 2% 3% r 4% 0 0% 4% r 5% 13 26% 5% r 6% 15 30% 6% r 7% 13 26% 7% r 8% 3 6% Source: Time Almanac 2002 12. Waiting Time The amount of time customers spend waiting in line at a bank is normally distributed, with a mean of 3.5 minutes and a standard deviation of 0.75 minute. Find the probability that the time a customer will spend waiting is a. at most 2.75 minutes. b. less than 2 minutes. 13. Weights of Parcels During a particular week, an overnight delivery company found that the weights of its parcels were normally distributed, with a mean of 24 ounces and a standard deviation of 6 ounces. b. less than 2 standard deviations below the mean? a. What percent of the parcels weighed between 12 ounces and 30 ounces? c. between 2 standard deviations below the mean and 3 standard deviations above the mean? b. What percent of the parcels weighed more than 42 ounces? Copyright © Houghton Mifflin Company. All rights reserved. 302360_File_B.qxd 7/7/03 7:18 AM Page 13 Distribution of Data and the Empirical Rule 14. Weights of Boxes of Corn Flakes The weights of the boxes of corn flakes filled by a machine are normally distributed, with an average weight of 14.5 ounces and a standard deviation of 0.5 ounce. What percent of the boxes Social Sciences 19. Presidential Inauguration Ages and Ages at Death The table in Exercise 26 of Section 8.4 lists the U.S. presidents and their ages at inauguration. The table in Exercise 27 of Section 8.4 lists the Marshall/Liaison/Getty Images deceased U.S. presidents as of December 2002, and their ages at death. a. weigh less than 14.0 ounces? b. weigh between 13.5 and 15.0 ounces? 15. Duration of Long Distance Telephone Calls A telephone company has found that the lengths of its long distance telephone calls are normally distributed, with a mean of 225 seconds and a standard deviation of 55 seconds. What percent of its long distance calls last a. Construct a back-to-back stem-and-leaf diagram for the data in the tables. a. more than 335 seconds? b. b. between 170 and 390 seconds? 13 20. What patterns, if any, are evident from the diagram? Average Salaries of Teachers Use the following frequency distribution to determine a. the percent of states in the U.S. that paid a 1998 – 1999 average teacher salary of at least $39,000. Life and Health Sciences 16. Median Income for Physicians The 1995 median income for physicians was $160,000. (Source: AMA Center for Health Policy Research) The distribution of these incomes is skewed to the right. Is the mean of these incomes greater than or less than $160,000? b. the probability that a state selected at random paid a 1998 –1999 average teacher salary of at least $36,000 but less than $45,000. Average Salaries of Public School Teachers, 1998–1999 17. Heights of Women A survey of 1000 women aged 20 to 30 found that their heights are normally distributed, with a mean of 65 inches and a standard deviation of 2.5 inches. a. How many of the women have a height that is within 1 standard deviation of the mean? b. How many of the women have a height that is between 60 inches and 70 inches? 18. Distribution of Data Consider the set of the heights of all babies born in the United States during a particular year. Do you think this data set can be closely approximated by a normal distribution? Explain. Average salary, s Number of states Relative frequency $27,000 s $30,000 3 6% $30,000 s $33,000 7 14% $33,000 s $36,000 12 24% $36,000 s $39,000 9 18% $39,000 s $42,000 6 12% $42,000 s $45,000 3 6% $45,000 s $48,000 5 10% $48,000 s $51,000 3 6% $51,000 s $54,000 2 4% Source: www.nea.org. Copyright © Houghton Mifflin Company. All rights reserved. 302360_File_B.qxd 7/7/03 7:18 AM Page 14 14 21. Test Scores The following relative frequency histogram shows the distribution of test scores for 50 students who took a history test. Relative frequency 25% 20% 25. Comparison of Quiz Scores Ryan took two quizzes in his art class. He scored 45 on the first quiz, for which the mean was 51.4 and the standard deviation was 9.5. His score on the second quiz, for which the mean was 53.6 and the standard deviation was 7.2, was 49. In comparison to his classmates, did Ryan do better on the first or the second quiz? 15% 10% 5% 0% 28 36 44 52 60 68 76 84 92 100 Test scores a. What percent of the students scored at least 76 on the test? b. How many of the students received a score of at least 60 but less than 84? 22. Examination Duration Times At a university, 500 law students took an examination. One student completed the exam in 24 minutes. The mode for the completion time is 50 minutes. The distribution of the times the students took to complete the exam is skewed to the left. Is the mean of these times greater than or less than 50 minutes? 23. Intelligence Quotients A psychologist finds that the intelligence quotients of a group of patients are normally distributed, with a mean of 104 and a standard deviation of 26. Find the percent of the patients with IQs 26. Comparison of Test Scores Tanya took two tests in her chemistry class. She scored 85 on the first test, for which the mean was 79.4 and the standard deviation was 6.4. Her score on the second test, for which the mean was 70.5 and the standard deviation was 5.3, was 78. In comparison to her classmates, did Tanya do better on the first or the second test? Sports and Recreation 27. Super Bowl Results, 1967–2001 AP/Wide World Photos b. b. between 130 and 182. Distribution of Data The population of a resort city consists mostly of wealthy families and families with low incomes. Do you think the set of family incomes for this city can be closely approximated by a normal distribution? Explain. 35– 10 24– 7 27–10 42 –10 49– 26 33– 14 16– 6 26–21 20 –16 27– 17 16– 7 21– 17 27–17 55 –10 35– 21 23– 7 32– 14 38–9 20 –19 31– 24 16– 13 27– 10 38–16 37 –24 34– 19 24– 3 35– 31 46–10 52 –17 23– 16 14– 7 31– 19 39–20 30 –13 34– 7 a. Construct a back-to-back stem-and-leaf diagram for the winning scores and the losing scores. a. above 130. 24. Super Bowl Scores The following table lists the winning and losing scores for all of the Super Bowl games up to the year 2001. 28. What patterns, if any, are evident from the backto-back stem-and-leaf diagram? Ironman Triathlon The following table lists the winning times for the men’s and women’s Ironman Triathlon World Championships, held in Kailua-Kona, Hawaii. (Source: http://www.3athlon.org/races/ironman/ hawaii2001/statistik/index.php) Copyright © Houghton Mifflin Company. All rights reserved. 302360_File_B.qxd 7/7/03 7:18 AM Page 15 Distribution of Data and the Empirical Rule 30. Race Times The following relative frequency histogram shows the distribution of times for the 1200 contestants who finished a race. Women, 1979–2000 11:47 8:29 8:20 12:55 9:35 9:17 11:16 8:34 8:21 11:21 9:01 9:07 9:25 8:31 8:04 12:01 9:01 9:32 9:38 8:09 8:33 10:54 9:14 9:24 9:08 8:28 8:24 10:44 9:08 9:13 9:06 8:19 8:17 10:25 8:55 9:26 8:54 8:09 8:21 10:25 8:58 8:51 8:08 9:49 9:20 16% 12% 8% 4% 0% Time, in seconds a. What percent of the contestants finished the race in less than 80 seconds? b. How many contestants had a time of at least 60 seconds but less than 80 seconds? What patterns, if any, are evident from the backto-back stem-and-leaf diagram? 31. Baseball Attendance A baseball franchise finds that the attendance at its home games is normally distributed, with a mean of 16,000 and a standard deviation of 4000. a. What percent of the home games have an attendance between 8000 and 16,000? Home Run Leaders The following tables list the 29. 20% 50 60 70 80 90 100 110 120 a. Construct a back-to-back stem-and-leaf diagram for the data in the tables. Hint: Use the two-digit “minutes” as your leaves, and insert a comma between the leaves in each row so that they can be easily distinguished from each other. b. 24% Relative frequency Ironman Triathlon World Championships (Winning times rounded to the nearest minute) Men, 1978– 2000 15 numbers of home runs hit by the home run leaders in the National and the American League from 1971 to 2001. b. What percent of the home games have an attendance of less than 12,000? Home Run Leaders, 1971– 2001 National League Physical Sciences and Engineering 48 40 44 36 38 38 52 40 48 48 31 37 40 36 37 37 49 39 47 40 38 35 46 43 40 47 49 70 65 50 73 American League a. less than 326 pounds? 33 37 32 32 36 32 39 46 45 41 22 39 39 43 40 40 49 42 36 51 44 43 46 40 50 52 56 56 48 47 b. between 302 and 398 pounds? 52 a. Construct a back-to-back stem-and-leaf diagram for the data in the tables. b. 32. Breaking Points of Ropes The breaking points of a particular type of rope are normally distributed, with a mean of 350 pounds and a standard deviation of 24 pounds. What is the probability that a piece of this rope chosen at random will have a breaking point of What patterns, if any, are evident from the backto-back stem-and-leaf diagram? 33. Tire Mileage The mileages of WearEver tires are normally distributed, with a mean of 48,000 miles and a standard deviation of 6000 miles. What is the probability that the WearEver tire you purchase will provide a mileage of a. more than 60,000 miles? b. between 42,000 and 54,000 miles? Copyright © Houghton Mifflin Company. All rights reserved. 302360_File_B.qxd 7/7/03 7:18 AM Page 16 16 34. Highway Speed of Vehicles A study of 8000 vehicles that passed by a highway checkpoint found that their speeds were normally distributed, with a mean of 61 miles per hour and a standard deviation of 7 miles per hour. a. How many of the vehicles had a speed of more than 68 miles per hour? b. How many of the vehicles had a speed of less than 40 miles per hour? Explorations Applying Chebyshev’s theorem with z 2 yields 1 3 1 1 1 1 21 z2 2 4 4 3 75% means that at least 75% of the data 4 in any data set must lie within 2 standard deviations of the mean of the data set. This result of 1. Use Chebyshev’s theorem to determine the minimum percentage of data (to the nearest percent) in any data set that must lie within a. 1.2 standard deviations of the mean. Chebyshev’s Theorem The following well-known theorem is called Chebyshev’s theorem. It is named after the Russian mathematician Pafnuty Lvovich Chebyshev (1821–1894). Chebyshev’s theorem states that a mathematical relationship exists between the spread of data and the standard deviation of the data. A remarkable property of Chebyshev’s theorem is that it is valid for any set of data. This is unlike the Empirical Rule, which applies only to sets of data that have normal distributions. b. 2.5 standard deviations of the mean. c. 3.1 standard deviations of the mean. 2. A new automobile dealership found that during the month of March, the mean selling price of its cars was $29,200, with a standard deviation of $5100. Use Chebyshev’s theorem to determine the minimum percentage (to the nearest percent) of the dealership’s cars that have a selling price within Chebyshev’s Theorem a. 1.5 standard deviations of the mean— that is, between $21,550 and $36,850. The proportion or percentage of any data set that lies within z standard deviations of the mean, where z is any positive number greater than 1, is at least b. 2.8 standard deviations of the mean—that is, between $14,920 and $43,480. 1 1 z2 Copyright © Houghton Mifflin Company. All rights reserved.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Distribution of Data and the Empirical Rule