Download Section 4-3 Measures of Variation

4.3 Measures of Variation LEARNING GOAL Understand and interpret these common measures of variation: range, the five-number summary, and standard deviation. Copyright © 2009 Pearson Education, Inc. Why Variation Matters Customers at Big Bank can enter any one of three different lines leading to three different tellers. Best Bank also has three tellers, but all customers wait in a single line and are called to the next available teller. Here is a sample of wait times are arranged in ascending order. Big Bank (three lines): 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Best Bank (one line): 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8 Copyright © 2009 Pearson Education, Inc. Slide 4.3- 2 Why Variation Matters You’ll probably find more unhappy customers at Big Bank than at Best Bank, but this is not because the average wait is any longer. In fact, the mean and median waiting times are 7.2 minutes at both banks. Big Bank (three lines): 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Best Bank (one line): 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8 The difference in customer satisfaction comes from the variation at the two banks. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 3 Figure 4.13 Histograms for the waiting times at Big Bank and Best Bank, shown with data binned to the nearest minute. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 4 TIME OUT TO THINK Explain why Big Bank, with three separate lines, should have a greater variation in waiting times than Best Bank. Then consider several places where you commonly wait in lines, such as a grocery store, a bank, a concert ticket outlet, or a fast food restaurant. Do these places use a single customer line that feeds multiple clerks or multiple lines? If a place uses multiple lines, do you think a single line would be better? Explain. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 5 Range Definition The range of a set of data values is the difference between its highest and lowest data values: range = highest value (max) - lowest value (min) Copyright © 2009 Pearson Education, Inc. Slide 4.3- 6 EXAMPLE 1 Misleading Range Consider the following two sets of quiz scores for nine students. Which set has the greater range? Would you also say that this set has the greater variation? Quiz 1: 1 10 10 10 10 10 10 10 10 Quiz 2: 2 3 4 5 6 7 8 9 10 Solution: Solution The range for Quiz 1 is 10 – 1 = 9 points and the range for Quiz 2 is 10 – 2 = 8 points. Thus, the range is greater for Quiz 1. However, aside from a single low score (an outlier), Quiz 1 has no variation at all because every other student got a 10. In contrast, no two students got the same score on Quiz 2, and the scores are spread throughout the list of possible scores. Quiz 2 therefore has greater variation even though Quiz 1 has greater range. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 7 Quartiles and the Five-Number Summary Quartiles are values that divide the data distribution into quarters. Lower quartile (Q1) Median (Q2) Upper quartile (Q3) Big Bank: 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Best Bank: 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8 Copyright © 2009 Pearson Education, Inc. Slide 4.3- 8 Definitions The lower quartile (or first quartile or Q1) divides the lowest fourth of a data set from the upper three-fourths. It is the median of the data values in the lower half of a data set. (Exclude the middle value in the data set if the number of data points is odd.) The middle quartile (or second quartile or Q2) is the overall median. The upper quartile (or third quartile or Q3) divides the lowest three-fourths of a data set from the upper fourth. It is the median of the data values in the upper half of a data set. (Exclude the middle value in the data set if the number of data points is odd.) Copyright © 2009 Pearson Education, Inc. Slide 4.3- 9 TECHNICAL NOTE Statisticians do not universally agree on the procedure for calculating quartiles, and different procedures can result in slightly different values. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 10 The Five-Number Summary The five-number summary for a data distribution consists of the following five numbers: low value lower quartile median upper quartile Copyright © 2009 Pearson Education, Inc. high value Slide 4.3- 11 Drawing a Boxplot Step 1. Draw a number line that spans all the values in the data set. Step 2. Enclose the values from the lower to the upper quartile in a box. (The thickness of the box has no meaning.) Step 3. Draw a line through the box at the median. Step 4. Add “whiskers” extending to the low and high values. Figure 4.14 Boxplots show that the variation of the waiting times is greater at Big Bank than at Best Bank. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 12 TECHNICAL NOTE The boxplots shown in this book are called skeletal boxplots. Some boxplots are drawn with outliers marked by an asterisk (*) and the whiskers extending only to the smallest and largest nonoutliers; these types of boxplots are called modified boxplots. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 13 Percentiles Definition The nth percentile of a data set divides the bottom n% of data values from the top (100 - n)%. A data value that lies between two percentiles is often said to lie in the lower percentile. You can approximate the percentile of any data value with the following formula: percentile of data value = number of values less than this data value x 100 total number of values in data set Copyright © 2009 Pearson Education, Inc. Slide 4.3- 14 There are different procedures for finding a data value corresponding to a given percentile, but one approximate approach is to find the Lth value, where L is the product of the percentile (in decimal form) and the sample size. For example, with 50 sample values, the 12th percentile is around the 0.12 × 50 = 6th value. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 15 Copyright © 2009 Pearson Education, Inc. Slide 4.3- 16 EXAMPLE 3 Smoke Exposure Percentiles Answer the following questions concerning the data in Table 4.4 (previous slide). a. What is the percentile for the data value of 104.54 ng/ml for smokers? Solution: The following results are approximate. a. The data value of 104.54 ng/ml for smokers is the 35th data value in the set, which means that 34 data values lie below it. Thus, its percentile is number of values less than 104.54 ng/ml x 100 = total number of values in data set 34 x 100 = 68 50 In other words, the 35th data value marks the 68th percentile. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 17 EXAMPLE 3 Smoke Exposure Percentiles Answer the following questions concerning the data in Table 4.4 (slide 16). b. What is the percentile for the data value of 61.33 ng/ml for nonsmokers? Solution: The following results are approximate. b. The data value of 61.33 ng/ml for smokers is the 50th and highest data value in the set, which means that 49 data values lie below it. Thus, its percentile is number of values less than 61.33 ng/ml x 100 = total number of values in data set 49 x 100 = 98 50 In other words, the highest data value marks the 98th percentile. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 18 EXAMPLE 3 Smoke Exposure Percentiles Answer the following questions concerning the data in Table 4.4 (slide 16). c. What data value marks the 36th percentile for the smokers? For the nonsmokers? Solution: c. Because there are 50 data values in the set, the 36th percentile is around the 0.36 x 50 =18th value. For smokers this value is 20.16 ng/ml, and for nonsmokers it is 0.33 ng/ml. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 19 Standard Deviation Statisticians often prefer to describe variation with a single number. The single number most commonly used to describe variation is called the standard deviation. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 20 Calculating the Standard Deviation To calculate the standard deviation for any data set: Step 1. Compute the mean of the data set. Then find the deviation from the mean for every data value by subtracting the mean from the data value. That is, for every data value, deviation from mean = data value – mean Step 2. Find the squares (second power) of all the deviations from the mean. Step 3. Add all the squares of the deviations from the mean. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 21 Calculating the Standard Deviation (cont.) Step 4. Divide this sum by the total number of data values minus 1. Step 5. The standard deviation is the square root of this quotient. Overall, these steps produce the standard deviation formula: sum of (deviations from the mean)2 standard deviation = total number of data values - 1 (This formula is shown in summation notation on slide 36.) Copyright © 2009 Pearson Education, Inc. Slide 4.3- 22 TECHNICAL NOTE In finding the standard deviation when dealing with data from a sample, one part of the calculation involves dividing the sum of the squared deviations by the total number of data values minus 1. When dealing with an entire population, we do not subtract the 1. In this book, we will use only the formula for a sample. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 23 TECHNICAL NOTE (2) The result of Step 4 is called the variance of the distribution. In other words, the standard deviation is the square root of the variance. Although the variance is used in many advanced statistical computations, we will not use it in this book. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 24 EXAMPLE 4 Calculating Standard Deviation Calculate the standard deviation for the waiting times at Big Bank. Solution: We follow the five steps to calculate the standard deviations. Table 4.5 shows how to organize the work in the first three steps. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 25 EXAMPLE 4 Calculating Standard Deviation Calculate the standard deviation for the waiting times at Big Bank. Solution: We follow the five steps to calculate the standard deviations. Table 4.5 shows how to organize the work in the first three steps. The first column for each bank lists the waiting times (in minutes). Copyright © 2009 Pearson Education, Inc. Slide 4.3- 26 EXAMPLE 4 Calculating Standard Deviation Calculate the standard deviation for the waiting times at Big Bank. Solution: We follow the five steps to calculate the standard deviations. Table 4.5 shows how to organize the work in the first three steps. The first column for each bank lists the waiting times (in minutes). The second column lists the deviations from the mean (Step 1). Copyright © 2009 Pearson Education, Inc. Slide 4.3- 27 EXAMPLE 4 Calculating Standard Deviation Calculate the standard deviation for the waiting times at Big Bank. Solution (cont.): The third column lists the squares of the deviations (Step 2). Copyright © 2009 Pearson Education, Inc. Slide 4.3- 28 EXAMPLE 4 Calculating Standard Deviation Calculate the standard deviation for the waiting times at Big Bank. Solution (cont.): The third column lists the squares of the deviations (Step 2). We add all the squared deviations to find the sum at the bottom of the third column (Step 3). Copyright © 2009 Pearson Education, Inc. Slide 4.3- 29 EXAMPLE 4 Calculating Standard Deviation Calculate the standard deviation for the waiting times at Big Bank. Solution (cont.): For Step 4, we divide the sums from Step 3 by the total number of data values minus 1. Because there are 11 data values, we divide by 10: 38.46 = 3.846 10 Finally, Step 5 tells us that the standard deviation is the square root of the number from Step 4: = 1.96 minutes Copyright © 2009 Pearson Education, Inc. Slide 4.3- 30 Interpreting the Standard Deviation A good way to develop a deeper understanding of the standard deviation is to consider an approximation called the range rule of thumb. The Range Rule of Thumb The standard deviation is approximately related to the range of a distribution by the range rule of thumb: range standard deviation ≈ 4 If we know the range of a distribution (range = high – low), we can use this rule to estimate the standard deviation. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 31 The Range Rule of Thumb (cont.) Alternatively, if we know the standard deviation, we can use this rule to estimate the low and high values as follows: low value ≈ mean – (2 x standard deviation) high value ≈ mean + (2 x standard deviation) The range rule of thumb does not work well when the high or low values are outliers. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 32 TECHNICAL NOTE Another way of interpreting the standard deviation uses a mathematical rule called Chebyshev’s Theorem. It states that, for any data distribution, at least 75% of all data values lie within two standard deviations of the mean, and at least 89% of all data values lie within three deviations of the mean. Although we will not use this theorem in this book, you may encounter it if you take another statistics course. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 33 EXAMPLE 5 Using the Range Rule of Thumb Use the range rule of thumb to estimate the standard deviations for the waiting time at Big Bank. Compare the estimate to the actual value found in Example 4. Solution: The waiting times for Big Bank vary from 4.1 to 11.0 minutes, which means a range of 11.0 – 4.1 = 6.9 minutes. 6.9 standard deviation ≈ = 1.7 4 The actual standard deviation calculated in Example 4 is 1.96. For this case the estimate from the range rule of thumb slightly underestimates the actual standard deviation. Nevertheless, the estimate puts us in the right ballpark, showing that the rule is useful. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 34 EXAMPLE 6 Estimating a Range Studies of the gas mileage of a BMW under varying driving conditions show that it gets a mean of 22 miles per gallon with a standard deviation of 3 miles per gallon. Estimate the minimum and maximum typical gas mileage amounts that you can expect under ordinary driving conditions. Solution: From the range rule of thumb, the low and high values for gas mileage are approximately low value ≈ mean – (2 x standard deviation) = 22 – (2 x 3) = 16 high value ≈ mean + (2 x standard deviation) = 22 + (2 x 3) = 28 The range of gas mileage for the car is roughly from a minimum of 16 miles per gallon to a maximum of 28 miles per gallon. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 35 Standard Deviation with Summation Notation (Optional Section) The summation notation introduced earlier makes it easy to write the standard deviation formula in a compact form. The symbol s is the conventional symbol for the standard deviation of a sample. For the standard deviation of a population, statisticians use the Greek letter s (sigma), and the term n - 1 in the formula is replaced by n. Consequently, you will get slightly different results for the standard deviation depending on whether you assume the data represent a sample or a population. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 36 TECHNICAL NOTE The formula for the variance is The standard symbol for the variance, s2, reflects the fact that it is the square of the standard deviation. Copyright © 2009 Pearson Education, Inc. Slide 4.3- 37 The End Copyright © 2009 Pearson Education, Inc. Slide 4.3- 38

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Section 4-3 Measures of Variation