Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
4.3 Measures of Variation LEARNING GOAL Understand and interpret these common measures of variation: range, the five-number summary, and standard deviation. Copyright © 2014 Pearson Education. All rights reserved. 4.3-1 Why Variation Matters Customers at Big Bank can enter any one of three different lines leading to three different tellers. Best Bank also has three tellers, but all customers wait in a single line and are called to the next available teller. Here is a sample of wait times are arranged in ascending order. Big Bank (three lines): 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Best Bank (one line): 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8 Copyright © 2014 Pearson Education. All rights reserved. 4.3-2 Slide 4.3- 2 Why Variation Matters You’ll probably find more unhappy customers at Big Bank than at Best Bank, but this is not because the average wait is any longer. In fact, the mean and median waiting times are 7.2 minutes at both banks. Big Bank (three lines): 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Best Bank (one line): 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8 The difference in customer satisfaction comes from the variation at the two banks. Copyright © 2014 Pearson Education. All rights reserved. 4.3-3 Slide 4.3- 3 Figure 4.13 Histograms for the waiting times at Big Bank and Best Bank, shown with data binned to the nearest minute. Copyright © 2014 Pearson Education. All rights reserved. 4.3-4 Slide 4.3- 4 Range Definition The range of a set of data values is the difference between its highest and lowest data values: range = highest value (max) - lowest value (min) Copyright © 2014 Pearson Education. All rights reserved. 4.3-5 Slide 4.3- 5 EXAMPLE 1 Misleading Range Consider the following two sets of quiz scores for nine students. Which set has the greater range? Would you also say that this set has the greater variation? Quiz 1: 1 10 10 10 10 10 10 10 10 Quiz 2: 2 3 4 5 6 7 8 9 10 Copyright © 2014 Pearson Education. All rights reserved. 4.3-6 Slide 4.3- 6 M L eo dw e ar n q Q u a r Quartiles and the Five-Number Summary Quartiles are values that divide the data distribution into quarters. Lower quartile (Q1) Median (Q2) Upper quartile (Q3) Big Bank: 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Best Bank: 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8 e Copyright © 2014 Pearson Education. All rights reserved. 4.3-7 Slide 4.3- 7 Definitions The lower quartile (or first quartile or Q1) divides the lowest fourth of a data set from the upper three-fourths. It is the median of the data values in the lower half of a data set. (Exclude the middle value in the data set if the number of data points is odd.) The middle quartile (or second quartile or Q2) is the overall median. The upper quartile (or third quartile or Q3) divides the lowest three-fourths of a data set from the upper fourth. It is the median of the data values in the upper half of a data set. (Exclude the middle value in the data set if the number of data points is odd.) Copyright © 2014 Pearson Education. All rights reserved. 4.3-8 Slide 4.3- 8 The Five-Number Summary The five-number summary for a data distribution consists of the following five numbers: low value lower quartile Copyright © 2014 Pearson Education. All rights reserved. median upper quartile high value 4.3-9 Slide 4.3- 9 Drawing a Boxplot (Box and Whisker Plot) Step 1. Draw a number line that spans all the values in the data set. Step 2. Enclose the values from the lower to the upper quartile in a box. (The thickness of the box has no meaning.) Step 3. Draw a line through the box at the median. Step 4. Add “whiskers” extending to the low and high values. Copyright © 2014 Pearson Education. All rights reserved. 4.3-10 Slide 4.3- 10 Percentiles Definition The nth percentile of a data set divides the bottom n% of data values from the top (100 - n)%. A data value that lies between two percentiles is often said to lie in the lower percentile. You can approximate the percentile of any data value with the following formula: percentile of data value = number of values less than this data value x 100 total number of values in data set Copyright © 2014 Pearson Education. All rights reserved. 4.3-11 Slide 4.3- 11 Copyright © 2014 Pearson Education. All rights reserved. 4.3-12 Slide 4.3- 12 EXAMPLE 2 Smoke Exposure Percentiles Answer the following questions concerning the data in Table 4.4 (previous slide). a.What is the percentile for the data value of 104.54 ng/ml for smokers? b. What is the percentile for the data value of 61.33 ng/ml for nonsmokers? c. What data value marks the 36th percentile for the smokers? For the nonsmokers? Copyright © 2014 Pearson Education. All rights reserved. 4.3-13 Slide 4.3- 13 Standard Deviation Statisticians often prefer to describe variation with a single number. The single number most commonly used to describe variation is called the standard deviation. standard deviation = sum of (deviations from the mean)2 total number of data values - 1 Copyright © 2014 Pearson Education. All rights reserved. 4.3-14 Slide 4.3- 14 EXAMPLE 4 Calculating Standard Deviation Calculator the standard deviation of 9, 2, 5, 4, 12, 7, 8, 11 Copyright © 2014 Pearson Education. All rights reserved. 4.3-15 Slide 4.3- 15 Interpreting the Standard Deviation A good way to develop a deeper understanding of the standard deviation is to consider an approximation called the range rule of thumb. The Range Rule of Thumb The standard deviation is approximately related to the range of a distribution by the range rule of thumb: range standard deviation ≈ 4 If we know the range of a distribution (range = high – low), we can use this rule to estimate the standard deviation. Copyright © 2014 Pearson Education. All rights reserved. 4.3-16 Slide 4.3- 16 The Range Rule of Thumb (cont.) Alternatively, if we know the standard deviation, we can use this rule to estimate the low and high values as follows: low value ≈ mean – (2 x standard deviation) high value ≈ mean + (2 x standard deviation) The range rule of thumb does not work well when the high or low values are outliers. Copyright © 2014 Pearson Education. All rights reserved. 4.3-17 Slide 4.3- 17 EXAMPLE 5 Estimating a Range Studies of the gas mileage of a BMW under varying driving conditions show that it gets a mean of 22 miles per gallon with a standard deviation of 3 miles per gallon. Estimate the minimum and maximum typical gas mileage amounts that you can expect under ordinary driving conditions. Copyright © 2014 Pearson Education. All rights reserved. 4.3-18 Slide 4.3- 18 EXAMPLE 6 Comparing Variations The following data sets show the ages of the first seven U.S. presidents (Washington through Jackson) and seven recent U.S. presidents (Ford through Obama) at the time of inauguration. First 7: 57, 61, 57, 57, 58, 57, 61 Last 7: 61, 52, 69, 64, 46, 54, 47 (a)Find the mean, media, and range for each of the two data sets. (b) Give the five-number summary and draw a boxplot for each of the two data sets. (c) Find the standard deviation for each of the two data sets. (d) Apply the range rule of thumb to estimate the standard deviation of each of the two data sets. How well does the rule work in each case? Briefly discuss why it does or does not work well. Copyright © 2014 Pearson Education. All rights reserved. 4.3-19 Slide 1.1- 19