Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Chapter 3 Numerical Summaries of Center and Variation Copyright © 2017, 2014 Pearson Education, Inc. Slide 1 Chapter 3 Topics • Summaries for center and spread in: • Symmetric distributions: Mean and standard deviation • Skewed distributions: Median and IQR • Other summaries for variation (variance, range) • The Empirical Rule and z-scores • Boxplots, Five Number Summary, and outliers • Comparing distributions Copyright © 2017, 2014 Pearson Education, Inc. Slide 2 Todd Taulman. Shutterstock Section 3.1 SUMMARIES FOR SYMMETRIC DISTRIBUTIONS • Measure for Center (Balance Point): Mean • Measure of Horizontal Spread (Variability): Standard Deviation Copyright © 2017, 2014 Pearson Education, Inc. Slide 3 Appropriate Measures Recall: When dealing with numerical data, you need to describe the distribution using these 3 characteristics: Characteristic Which Is Shape Center Spread Symmetric, skewed, etc. Typical value Horizontal variability In Chapter 3 we learn specific ways to describe the center and spread of a distribution. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 4 The Mean • Can be thought of as the “balancing point of the distribution” Africa Studio. Shutterstock Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 5 The Mean: Symmetric Distributions This dotplot shows the distribution of ACT scores for a sample of statistics students. For symmetric distributions, the mean is a good representation of a “typical value” of the data set. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 6 The Mean: Skewed Distributions This distribution shows the salaries of professional baseball players in 2010. Do you think the mean is a good representation of the “typical” baseball salary for that year? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 7 Using the Mean to Describe “Typical” Values • The mean represents a typical value in a set of data when the data is roughly symmetric. • For skewed distributions, the mean is NOT a good estimate of a typical value. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 8 Computing the Mean For small data sets: • Add data values. • Divide by number of numbers. For larger data sets: • Use some kind of appropriate technology. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 9 The Mean: Example Suppose a sample of prices for 1 gallon of regular gas at 10 different gas stations in a neighborhood in Austin, Texas, is taken on one fall day in 2013. Find and interpret the mean. $3.19, $3.09, $3.09, $2.93, $2.95, $3.09, $2.99, $2.99, $2.95, $2.97 (A dotplot will show this distribution is roughly symmetric.) Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 10 Dotplot of Gas Prices What do you think a good estimate would be for a “typical” gas prices for that neighborhood? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 11 Calculating the Mean 3.19+3.09+3.09+2.93+2.95+3.09+2.99+2.99+2.95+2.97 10 30.24 x 3.02 10 x INTERPRETATION of the Mean: The typical price of 1 gallon of gas at these gas stations in Austin, Texas, was $3.02 on this particular day in 2013. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 12 Dotplot of Gas Prices Mean = $3.02 Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 13 Using the TI-84 Calculator NOTE: Your calculator can find the mean, but you must be able to interpret the results. To find the mean on the TI-84 calculator: 1. 2. 3. 4. 5. Push STAT then select option 1: Edit. Enter the data set in L1. Push STAT and arrow over to Calc. Choose option 1: 1-var stats (press ENTER). The mean is given at the top (x̄). Note that the calculator gives you a lot more information. We will cover this shortly. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 14 Measuring the Spread Recall: The variability in a distribution can be measured by the horizontal spread. Why care? We need to know if most of the data is near the center or far from it. However: Assigning a number to the horizontal spread is not straightforward. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 15 Measuring the Spread: Example The following histograms record the daily high temperatures in degrees Fahrenheit over one recent year at two locations: Provo, Utah (elevation 4500 feet) San Francisco, CA (at sea level) Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 16 Measuring the Spread: Example Note: • Both distributions have roughly the same: – Shape: Symmetric – Center: 67°F in Provo vs 65°F in SF • The spread in both distributions, however, is very different: – Provo: More spread out (data values farther from center) – SF: Less spread out (data values closer to the center) Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 17 Measure Spread: Standard Deviation Standard deviation • A number that measures how far away the typical observation is from the mean (center) • For most distributions, a majority of the data is within one standard deviation of the mean. Note: • Think of the standard deviation as the typical distance of the observations from their mean. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 18 Standard Deviation: Example The graph below shows the distribution of the amount of particulate matter, or smog, in the air in 333 cities in the United States in 2008, as reported by the Environmental Protection Agency (EPA). The mean particulate matter is 10.7 micrograms per cubic meter, and the standard deviation is 2.6 micrograms per cubic meter. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 19 Standard Deviation: Example 1. Find the level of particulate matter one standard deviation above the mean and one standard deviation below the mean. 2. Keeping in mind that the EPA says that levels over 15 micrograms per cubic meter are unsafe, what can we conclude about the air quality of most of the cities in this sample? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 20 Standard Deviation: Example 1. The level of particulate matter one standard deviation above/below the mean is: 10.7 + 2.6 = 13.3 micrograms per cubic meter 10.7 - 2.6 = 8.1 micrograms per cubic meter 2. Since most cities in this sample have a particulate level between 8.1 and 13.3 micrograms per cubic meter, which is less than 15 micrograms per cubic meter, the air quality in most of the cities in this sample is safe. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 21 Standard Deviation: Formula The formula for the standard deviation is: s 3. Add all squared deviations x x n 1 2 2. Square to make positive. 1. Deviation (or distance) of observation, x, from the mean. 4. Divide by 1 less than the sample size (see text). Think of this as averaging the squared deviations. 5. Take square root to restore original units. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 22 Standard Deviation: Example Suppose a sample of prices for 1 gallon of regular gas at 10 different gas stations in a neighborhood in Austin, Texas, is taken on one fall day in 2013. Find and interpret the standard deviation. $3.19, $3.09, $3.09, $2.93, $2.95, $3.09, $2.99, $2.99, $2.95, $2.97 Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 23 Standard Deviation: Example From before, the mean is $3.02 (rounded). Using the standard deviation formula, we have: x x x x x 3.19 (3.19 – 3.02) = 0.17 (0.17)2 = 0.0289 3.09 (3.09 – 3.02) = 0.07 (0.07)2 = 0.0049 3.09 (3.09 – 3.02) = 0.07 (0.07)2 = 0.0049 2.93 (2.93 – 3.02) = −0.09 (−0.09)2 = 0.0081 2.95 (2.95 – 3.02) = −0.07 (−0.07)2 = 0.0049 3.09 (3.09 – 3.02) = 0.07 2.99 (2.99 – 3.02) = −0.03 (−0.03)2 = 0.0009 2.99 (2.99 – 3.02) = −0.03 (−0.03)2 = 0.0009 2.95 (2.95 – 3.02) = −0.07 (−0.07)2 = 0.0049 2.97 (2.97 – 3.02) = −0.05 (−0.05)2 = 0.0025 2 (0.07)2 = 0.0049 s 0.0289 0.0049 ... 0.0025 9 s= 0.0658 » 0.0855 9 Note: This is slightly off because of rounding the mean. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 24 Standard Deviation: Example Therefore, we know: • The mean is $3.02. x x • The standard deviation is about $0.09. 2 INTERPRETATION of the standard deviation: At most of these gas stations, the price of a gallon of gas is within 9 cents of $3.02. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 25 Using the TI-84 Calculator NOTE: Usually technology is used to calculate the standard deviation. To find the standard deviation on the TI-84 calculator: 1. 2. 3. 4. 5. Push STAT then select option 1: Edit. Enter the data set in L1. Push STAT and arrow over to Calc. Choose option 1: 1-var stats (press ENTER). The standard deviation for a sample is given by sx Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 26 Variance: Formula The variance is the standard deviation squared: s2 x x n 1 2 2 2 x x n 1 The standard deviation is preferred over the variance since it has the same units as the original data set. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 27 Section 3.2 Ljupco Smokovski. Shutterstock WHAT’S UNUSUAL? THE EMPIRICAL RULE AND Z-SCORES. • The Empirical Rule (for Symmetric Distributions) and z-Scores • Determining if a Data Value Is Unusual Copyright © 2017, 2014 Pearson Education, Inc. Slide 28 The Empirical Rule The empirical rule is a rough guideline for the approximate percentage of data within 1 to 3 standard deviations of the mean in unimodal, symmetric distributions. Guidelines: (% of data within __ standard deviation of the mean) – 68% of the data is within 1 standard deviation. – 95% of the data is within 2 standard deviations. – Almost all of the data is within 3 standard deviations. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 29 Empirical Rule: Example Data on smog levels in a sample of cities was collected and the distribution was found to be roughly symmetric and unimodal. The mean particulate level for the samples was 10.7 micrograms per cubic liter, with a standard deviation of 2.6 micrograms per cubic liter. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 30 Empirical Rule: Smog Levels Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 31 Empirical Rule: Smog Levels • About 68% of the cities will have smog levels between 8.1 and 13.3 (10.7 ± SD). • About 95% of the cities will have smog levels between 5.5 and 15.9 (10.7 ± 2SD). • Almost all of the cities will have smog levels between 2.9 and 18.5 (10.7 ± 3SD). • Note: SD = 2.6 Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 32 The Empirical Rule: Temperatures The mean daily high temperature in San Francisco is 65°F with a standard deviation of 8°F. 1. Find the temperature ranges for 68%, 95%, and 99.7% of the data. 2. Using the Empirical Rule, decide whether it is unusual to have a day when the maximum temperature is colder than 49°F in San Francisco. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 33 The Empirical Rule: Example 1. The temperature ranges are: • 68% of data: • 95% of data: • 99.7% of data: 65°F ± 8°F → 57°F to 73°F 65°F ± 16°F → 49°F to 81°F 65°F ± 24°F → 41°F to 89°F 2. Since 95% of the daily high temperatures are between 49°F to 81°F according to the Empirical Rule, only 5% of the temperatures are outside this range. Due to the symmetry of the distribution, 2.5% of the days are colder than 49°F (and 2.5% are warmer than 81°F), so having a temperature colder than 49°F in San Francisco is fairly unusual. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 34 Z-Scores Z-score Measures how many standard deviations an observed data value is from the mean Example: A z-score of 1.5 means the observed data value is 1.5 standard deviations above the mean. A z-score of –1.5 means the observed data value is 1.5 standard deviations below the mean. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 35 Z-Scores This dotplot shows heights for a sample of men. How many men would have z-scores: • Greater than 2? • Less than –2? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 36 Z-Scores This dotplot shows heights for a sample of men. How many men would have z-scores: • Two men have z-scores greater than 2 • Two men have z-scores less than –2 Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 37 Z-Scores: Usefulness Example Note: Z-scores allow us to compare observations in different distributions. Example: Suppose Road A has a mean speed of 60 mph with a standard deviation of 5 mph, and Road B has a mean speed of 60 mph with a standard deviation of 10 mph. Is a driver going 70 mph on Road A travelling relatively faster or slower than a driver going 70 mph on Road B? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 38 Z-Scores: Usefulness Example Known facts: Road A: x ̄ = 60 mph, s = 5 mph Road B: x ̄ = 60 mph, s = 10 mph Solution: Although both drivers are traveling at 70 mph, the driver on Road A is travelling relatively faster since 70 mph is 2 standard deviations above the mean on Road A and only 1 standard deviation above the mean on Road B. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 39 Z-Scores: Formula xx z s Distance from mean Divide by standard deviation to determine how many standard deviations x is from the mean Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 40 Z-Scores: Example Maria scored 80 out of 100 on her first stats exam in a course and 85 out of 100 on her second stats exam. On the first exam, the mean was 70 and the standard deviation was 10. On the second exam, the mean was 80 and the standard deviation was 5. On which exam did Maria perform better when compared to the whole class? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 41 Z-Scores: Example On which exam did Maria perform better when compared to the whole class? 80 70 z 1 First exam: 10 85 80 Second exam: z 1 5 Conclusion The second exam was a little easier; on average, students scored higher and there was less variability in the scores. But Maria scored one standard deviation above average on both exams, so she did equally well on both when compared to the whole class. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 42 Section 3.3 afoto6267. Shutterstock SUMMARIES FOR SKEWED DISTRIBUTIONS • Measure for Center (Middle Point): Median • Measure of Horizontal Spread (Variability): IQR Copyright © 2017, 2014 Pearson Education, Inc. Slide 43 The Median The median is the middle number when the data has been sorted from smallest to largest. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 44 The Median This distribution shows the distribution of incomes for a sample of New York City residents. About half the residents have incomes above $25,200 and about half have incomes below $25,200. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 45 Typical Value: The Mean vs. the Median Which is a better measure of the “typical” income of a New York City resident: the mean or the median? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 46 The Median Median The middle number • Arrange all numbers in order • Find the middle number (If two numbers in the middle, average them) Note: • Think of the median as the middle point. • A good measure of a typical value for skewed distributions Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 47 The Median: Example Suppose a sample of prices for 1 gallon of regular gas at 10 different gas stations in a neighborhood in Austin, Texas is taken on one fall day in 2013. Find and interpret the median. $3.19, $3.09, $3.09, $2.93, $2.95, $3.09, $2.99, $2.99, $2.95, $2.97 Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 48 The Median: Example First arrange the numbers in order: 2.93, 2.95, 2.95, 2.97, 2.99, 2.99, 3.09, 3.09, 3.09, 3.19 Since there are 2 numbers in the middle, average them: 2.93, 2.95, 2.95, 2.97, 2.99, 2.99, 3.09, 3.09, 3.09, 3.19 M 2.99 2.99 2.99 2 INTERPRETATION of the Median: The median price of 1 gallon of gas at these gas stations in Austin, Texas, was $2.99 on this particular day in 2013. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 49 Using the TI-84 Calculator NOTE: Your calculator can find the median, but you must be able to interpret the results. To find the median on the TI-84 calculator: 1. 2. 3. 4. 5. Push STAT then select option 1: Edit. Enter the data set in L1. Push STAT and arrow over to Calc. Choose option 1: 1-var stats (press ENTER). Arrow down to see the median (Med). Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 50 Measuring the Spread Recall: The standard deviation measured spread using the distance from the mean. Now? Since we don’t use the mean in skewed distributions, we need a measure of spread related to the median. So: We use the Interquartile Range (IQR). Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 51 Range Before we get to the IQR, we need to talk about the range and quartiles. Range Difference between the largest and smallest values Example: A group of eight children have the following heights (in inches): 48.0, 48.0, 53.0, 53.5, 54.0, 60.0, 62.0, and 71.0 The range in the children’s heights is 71.0 - 48.0 = 23.0 inches. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 52 Quartiles Quartiles Example: Divide the distribution into fourths. Each quartile contains 25% of the data. The dotplot shows the distribution of weights for a class of introductory statistics students. The vertical lines slice the distribution into four parts, so each part has about 25% of the observations. 25% of the weight are between 101 and 121 pounds, and so on. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 53 Interquartile Range: IQR IQR The range of the middle 50% of the data Example: The dotplot shows the distribution of weights for a class of introductory statistics students. The vertical lines slice the distribution into four parts, so each part has about 25% of the observations. IQR = 160 – 121 = 39 pounds (distance between the first and third “slice”) Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 54 Using the TI-84 Calculator NOTE: Your calculator can find the quartiles, and you must use them to find the IQR. To find the IQR on the TI-84 calculator: 1. 2. 3. 4. 5. Push STAT then select option 1: Edit. Enter the data set in L1. Push STAT and arrow over to Calc. Choose option 1: 1-var stats (press ENTER). Arrow down to see the first and third quartiles (Q1 and Q3). 6. Calculate the IQR: IQR = Q3 – Q1. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 55 Interquartile Range: Example A group of eight children have the following heights (in inches): 48.0, 48.0, 53.0, 53.5, 54.0, 60.0, 62.0, and 71.0 Find the interquartile range for the distribution of the children’s heights using your calculator. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 56 Interquartile Range: Example The calculator display should look like this: Solution: IQR = Q3 - Q1 = 61.00 - 50.50 = 10.50 The interquartile range of the heights of the eight children is 10.5 inches. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 57 Section 3.4 ArtWell. Shutterstock COMPARING MEASURES OF CENTER • Symmetric Distributions: Mean and Standard Deviation • Skewed Distributions: Median and IQR • How to Compare Measures Copyright © 2017, 2014 Pearson Education, Inc. Slide 58 Appropriate Measures Recall: When dealing with numerical data, you need to describe the distribution using these 3 characteristics: Characteristic Which Is Shape Symmetric or Skewed Center Mean or Median Spread Standard Deviation or IQR Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 59 Choosing a Measure Primary Goal: Always: Shape Choose a pair of measures that is best suited for the data, which depends on the shape! Begin with a picture! Symmetric Measure for Center Mean Skewed Median Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Measure for Spread Standard Deviation IQR Slide 60 Choosing a Measure: Example One of the authors created a data set of the songs on his mp3 player. He wants to describe the distribution of song lengths. 1. What shape do you expect the distribution to have? 2. What measures should you use for this shape? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 61 Choosing a Measure: Example 1. Shape: • No song can be shorter than 0 seconds. • Most songs on the radio are around 4 minutes long. • A few songs (eg: classical tracks) are extremely long. • The distribution is probably right-skewed. 2. Measures: Median and IQR Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 62 Choosing a Measure: Example 1. Shape: Right-Skewed 2. Measures: Median and IQR It turns out the median length is 226 seconds (roughly 3 minutes and 46 seconds) and the interquartile range is 117 seconds (close to 2 minutes). In other words, the typical track on the author’s mp3 player is about 4 minutes, with the middle 50% of the tracks differing by about 2 minutes. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 63 Mean vs. Median Things to keep in mind: • Skewed data and outliers affect the mean (and standard deviation). • The median and IQR are resistant to (are not affected greatly by) skewed data/outliers. Roughly Speaking: Shape Mean vs. Median Skewed Left Mean < Median Symmetric Mean = Median Skewed Right Mean > Median Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 64 Mean vs. Median: Example A (very small) fast-food restaurant has five employees, all of whom work full-time for $7 per hour. Each employee’s annual income is about $16,000 per year. The owner, on the other hand, makes $100,000 per year. Find both the mean and the median. Which would you use to represent the typical income at this business—the mean or the median? Which value is smaller? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 65 Mean vs. Median: Example • The mean income is $30,000. • The median income is $16,000. • Use the median income (since skewed) – better represents typical income. • Mean > Median (since skewed right) Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 66 Comparing Different Distributions When comparing two distributions: • Always use the same measures of center and spread for both distributions. Otherwise, the comparison is not valid. • If one of the distributions is skewed, use Median and IQR to compare both distributions! Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 67 Comparing Different Distributions: Example Comparing the distributions of running times for amateur and Olympic marathon runners is below. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 68 Comparing Different Distributions: Example Note the Shapes: – Olympic runners: Right-skewed (use median) – Amateur runners: Fairly symmetric (use median since Olympic runners is skewed) Answer: The typical woman Olympic runner finishes the marathon considerably faster: a median time of 154.8 minutes (about 2.6 hours) compared to 240.0 minutes (about 4 hours) for the amateur athlete. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 69 prochasson frederic. Shutterstock Section 3.5 USING BOXPLOTS TO DISPLAY SUMMARIES • Boxplots • Five Number Summary Copyright © 2017, 2014 Pearson Education, Inc. Slide 70 Finding Outliers Outliers Extreme data values General Rule for finding outliers: – Find the fences (“cutoffs” ) for usual data values: Lower fence = Q1 – 1.5 (IQR) Upper fence = Q3 + 1.5 (IQR) – Values more extreme than the fences are outliers (values less than lower fence or greater than upper fence). Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 71 Finding Outliers: Example The first and third quartiles in the distribution of daily high temperatures in San Francisco are 59°F and 70°F respectively. Using these values, what temperatures would be considered outliers in San Francisco? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 72 Finding Outliers: Example The first and third quartiles in the distribution of daily high temperatures in San Francisco are 59°F and 70°F respectively. Using these values, what temperatures would be considered outliers in San Francisco? • Fences: Lower fence = 59 – 1.5(70 – 59) = 42.5°F Upper fence = 70 + 1.5(70 – 59) = 86.5°F • Outliers: Any temperature below 42.5°F or above 86.5°F would be considered an outlier. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 73 Boxplots – Help us visualize certain summary statistics. – Show where the bulk of the data lie. – The box is drawn from Q1 to Q3 with a line for the median inside the box. – Whiskers are drawn to the most extreme values within the fences (extreme values that are not outliers). – Potential outliers are marked with an asterisk. CAUTION! Boxplots work best for unimodal distributions. (They hide multi-modal information!) Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 74 Boxplots: Example The boxplot for the daily high temperatures in San Francisco is given below. • What is the minimum data value? • What is the median data value? • How many potential outliers are there? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 75 Boxplots: Example The boxplot for the daily high temperatures in San Francisco is given below. • What is the minimum data value? • What is the median data value? • How many potential outliers are there? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. 49°F 64°F 5 Slide 76 Using the TI-84 Calculator To create a boxplot on the TI-84 calculator: 1. 2. 3. 4. 5. Push STAT then select option 1: Edit. Enter the data set in L1. Push 2nd Y= (for Stat Plot). Turn on Plot1 (press ENTER twice). Use the down arrow , followed by the right arrow, to select Type that looks like a boxplot with outliers (first boxplot option) and push ENTER. 6. Make sure Xlist is set to L1. 7. Push GRAPH > ZOOM followed by the number 9 (for option 9:Zoom Stat) to see the boxplot. 8. Use TRACE and the arrow keys to navigate about the graph (you will see relevant information on the screen). Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 77 Boxplots: Comparing Two Distributions We examine the temperatures in Provo and SF earlier. Note: • Both distributions have roughly the same: – Shape: Symmetric – Center: 67°F in Provo vs 65°F in SF • The spread in both distributions, however, is very different: – Provo: – SF: More spread out (data values farther from center) Less spread out (data values closer to the center) Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 78 Boxplots: Comparing Two Distributions Now, looking at the boxplots, we can see a few more facts: • Describing the distributions: – Shape: Symmetric (median in the center of the box) – Center: 67°F in Provo vs 64°F in SF (close to the same) – Spread: Provo: More spread out (box is wider) SF: Less spread out (narrower box); outliers! • Both cities have 100 degrees, these days are unusual in San Francisco but merely fall in the upper 25% for Provo. Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 79 Five Number Summary The key summary statistics that boxplots reveal are known as the five number summary: Minimum, Q1, Median, Q3, Maximum Example: Consider the boxplot of daily high temperatures in SF. The five number summary is: 49, 59, 64, 70, 97 Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 80 Case Study • Question: How do people perceive risk? – Measure: 500 subjects rated activities on risks involved (0 = no risk, 100 = greatest possible risk) – Focus: Risk of appliances and risk of x-rays – Look for: Differences among men/women’s perceptions of risk • Analysis of data: – The distributions for men and women were similar. Both were right skewed. – Summary stats: – Why are the median and IQR being reported? Copyright Copyright©©2017, 2017,2014 2014Pearson PearsonEducation, Education,Inc. Inc. Slide 81