Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 Numerical Summaries of Center and Variation Copyright © 2014 Pearson Education, Inc. All rights reserved Learning Objectives 3- 2 Understand how measures of center and spread are used to describe characteristics of real-life samples of data. Understand when it is appropriate to use the mean and standard deviation and when it is better to use the median and interquartile range. Understand the mean as the balancing point of the distribution of a sample of data and the median as the point that has roughly 50% of the distribution below it. Be able to write comparisons between samples of data in context. Copyright © 2014 Pearson Education, Inc. All rights reserved 3.1 Summaries for Symmetric Distributions Copyright © 2014 Pearson Education, Inc. All rights reserved Summaries for Symmetric Distribution The mean describes the center. The standard deviation describes the spread. 3- 4 A numerical summary. The balancing point for the distribution. Can be used as a typical value for symmetric mound shaped distributions. A numerical summary Measures a typical distance of the observations from the mean. Measures the variability when the distribution is symmetric Copyright © 2014 Pearson Education, Inc. All rights reserved The Mean as a Balancing Point If we place a finger at the mean, the histogram will balance perfectly. Major League Baseball 2010 3- 5 Copyright © 2014 Pearson Education, Inc. All rights reserved Skewness and the Mean For a skewed right histogram, the mean is to the right of the typical value. Major League Baseball 2010 3- 6 Copyright © 2014 Pearson Education, Inc. All rights reserved Symmetric Distributions and the Mean 3- 7 For a symmetric distribution, the mean is at the center. Copyright © 2014 Pearson Education, Inc. All rights reserved The Formula for the Mean To calculate the mean, use the formula: x x n Σ, read “sigma”, means “add”. x represents all of the data values. n represents the sample size. x represents the sample mean. 3- 8 Copyright © 2014 Pearson Education, Inc. All rights reserved Calculating the Sample Mean Find the mean of the number of siblings for the 8 students questioned: 3,2,2,1,2,3,5,2 The sample size: n = 8. x 3 2 2 1 2 3 5 2 x n 2.5 3- 9 Copyright © 2014 Pearson Education, Inc. All rights reserved 8 Standard Deviation The Standard Deviation, s, is a measure of the spread. It represents a typical distance from the mean of the observations. For mound shaped distributions, the majority of the observations are less than one standard deviation from the mean. The square of the standard deviation is called the variance. 3 - 10 Copyright © 2014 Pearson Education, Inc. All rights reserved Put the Following in Order From Smallest Standard Deviation to Largest Solution: 3 - 11 (c), (b), (a) Copyright © 2014 Pearson Education, Inc. All rights reserved The Standard Deviation and the Mean 3 - 12 In San Francisco, the mean high temperature is 65 degrees and the standard deviation is 8 degrees. In Provo the mean is 67 and the standard deviation is 21. Is a high temperature of 52 rarer in San Francisco or in Provo? SF: 65 – 8 = 57, Provo: 67 – 21 = 46 Since 52 degrees is within one standard deviation of Provo’s mean and not of San Francisco’s mean, a temperature of 52 is rarer in San Francisco. Copyright © 2014 Pearson Education, Inc. All rights reserved Using StatCrunch to Find the Mean and Standard Deviation 3 - 13 Enter Data, then go to Stat →Summary Stats → Columns Click on the variable name and hit Calculate. This calculates the mean, standard deviation and other statistics that will be used later. Copyright © 2014 Pearson Education, Inc. All rights reserved 3.2 What’s Unusual? The Empirical Rule and z-Scores Copyright © 2014 Pearson Education, Inc. All rights reserved The Empirical Rule Graphically 3 - 15 Copyright © 2014 Pearson Education, Inc. All rights reserved Empirical Rule The Empirical Rule: If a distribution is unimodal and symmetric, then Approximately 68% of the observations (roughly two-thirds) will be within one standard deviation of the mean. Approximately 95% of the observations will be within two standard deviations of the mean. Nearly all the observations will be within three standard deviations of the mean. 3 - 16 Copyright © 2014 Pearson Education, Inc. All rights reserved Empirical Rule Example 3 - 17 The mean body weight for women between 18 and 25 years old is 134 lbs and the standard deviation is 26 lbs. Assume a mound shaped distribution. 134 – 26 = 108, 134 + 26 = 160 About 68% of women in this age group weigh between 108 and 160 lbs. 134 – 2(26) = 82, 134 + 2(26) = 186 About 95% weigh between 82 and 186 lbs. Almost all weigh between 56 and 212 lbs. Copyright © 2014 Pearson Education, Inc. All rights reserved Using the Empirical Rule High temperatures in San Francisco follow a unimodal and symmetric distribution with mean 65 degrees and standard deviation 8 degrees. Give a range of temperatures that includes the middle 95% of high temperature days in San Francisco. 65 – 2(8) = 49, 65 + 2(8) = 81 About 95% of all days in San Francisco have high temperatures between 49 and 81 degrees. 3 - 18 Copyright © 2014 Pearson Education, Inc. All rights reserved Empirical Rule Example 3 - 19 Daily cash register receipts at a local store follow a mound shaped distribution with mean $9,200 and standard deviation $150. The day a new employee was hired the store took in $4,500. Should the manager be concerned? 9200 – 3(150) = 4700 Yes, the manager should be concerned, since it is highly unlikely that such a low receipt total for the day would happen by random chance alone. Copyright © 2014 Pearson Education, Inc. All rights reserved The Trouble With Evaluating if a Data Value is Unusual Is 2 less than the mean male height short? 2 feet shorter is much shorter. 2 millimeters shorter is not much shorter. Instead, statisticians normalize the values by citing the Z-Score. 3 - 20 Copyright © 2014 Pearson Education, Inc. All rights reserved The Z-Score The Z-Score measures the number of standard deviations the value is from the mean. The resulting units are called Standard Units. The Z-Score is used to compare values measured in different units such as feet and millimeters. 3 - 21 Copyright © 2014 Pearson Education, Inc. All rights reserved The Z-Score Formula xx z s The mean price for a loaf of bread is $3.12 and the standard deviation is $0.89. Find the z-Score for a loaf of bread that costs $2.00. 2.00 3.12 z 1.26 0.89 3 - 22 The z-Score is about -1.26. Copyright © 2014 Pearson Education, Inc. All rights reserved Comparing values What is more unusual: a value of 0.26 from a distribution with mean 0.37 and standard deviation 0.03 or a value of 45 from a distribution with mean 38 and standard deviation 4? 0.26 0.37 3.67 0.03 The value of 0.26 is z0.26 more unusual since it has a z-score that 45 38 z45 1.75 is farther from 0. 4 3 - 23 Copyright © 2014 Pearson Education, Inc. All rights reserved 3.3 Summaries for Skewed Distributions Copyright © 2014 Pearson Education, Inc. All rights reserved Skewness and the Trouble with the Mean For a skewed distribution, the mean gets “pulled” towards the tail. The mean is also “pulled” towards outliers. For a skewed distribution or a distribution with only upper or only lower outliers, the mean does not represent a typical value. 3 - 25 Copyright © 2014 Pearson Education, Inc. All rights reserved The Median to Represent the Center The middle value, called the median is often a better representation of the center. The median is defined by the middle number or the average of the two middle numbers if the sample size is even. The median cuts the data in half. Typically half the values are below the median and half are above. 3 - 26 Copyright © 2014 Pearson Education, Inc. All rights reserved Median vs. Mean The median income of $18,000 better represents the typical income than much higher mean income. The right tail greatly increases the mean but only slightly increases the median. 3 - 27 Copyright © 2014 Pearson Education, Inc. All rights reserved Calculating the Median Sort the data from largest to smallest. If the set contains an odd number of observed values, the median is the middle observed value. If the set contains an even number of observed values, the median is the average of the two middle observed values. This places the median precisely halfway between the two middle values. 3 - 28 Copyright © 2014 Pearson Education, Inc. All rights reserved Example The following data represent eight home prices in thousands of dollars. Find the median: 123, 457, 278, 184, 216, 336, 192, 184 First sort from smallest to largest: 123, 184, 184, 192, 216, 278, 336, 457 Since there are an even number of numbers take the average of the middle two: 192 216 204 2 3 - 29 Copyright © 2014 Pearson Education, Inc. All rights reserved Quartiles The First Quartile (Q1) is the value such that 25% of the data lie at or below this value. Q1 is roughly the median of the lower half of the data. The Third Quartile (Q3) is the value such that 75% of the data lie at or below this value. Q3 is roughly the median of the upper half of the data. 3 - 30 Copyright © 2014 Pearson Education, Inc. All rights reserved The Interquartile Range (IQR) The Interquartile Range (IQR) represents the range of the middle 50% of the data. Cut the ordered data into four equal parts. The distance taken up by the middle two parts is the interquartile range. IQR = Q3 – Q1 3 - 31 Copyright © 2014 Pearson Education, Inc. All rights reserved Interpreting Q1, Q3, and IQR The first quartile for birth weights is 3.1 kg and the third quartile is 3.7 kg. Interpret Q1, Q3, and the IQR. Q1 = 3.1. This means that 25% of all babies are born weighing at or below 3.1 kg. Q3 = 3.7. This means that 75% of all babies are born weighing at or below 3.7 kg. Q3 – Q1 = 0.6. The middle half of all birth weights has a range of 0.6 kg. 3 - 32 Copyright © 2014 Pearson Education, Inc. All rights reserved How the Quartiles and IQR are Used Quartiles and the IQR are primarily used when there are large data sets, for example: National Exam Scores Physical Measurements: Weight, Height Cholesterol Levels, BMI, etc. Income of state residents Time to run one mile 3 - 33 Copyright © 2014 Pearson Education, Inc. All rights reserved The Range The range is the distance spanned by the entire data set. Range = Maximum ˗ Minimum The range is easy to calculate, but is subject to peculiarities of the data set and is very sensitive to outliers. A smaller sample size is likely to produce a smaller range. The range of a sample is a poor predictor of the range for the population. 3 - 34 Copyright © 2014 Pearson Education, Inc. All rights reserved 3.4 Comparing Measures of Center Copyright © 2014 Pearson Education, Inc. All rights reserved Mean and Standard Deviation or Median and IQR? Use the mean and standard deviation when the distribution is mound shaped. Use the Median and IQR when the distribution is skewed left or skewed right. If the distribution is not unimodal, it may be better to split the data. 3 - 36 Copyright © 2014 Pearson Education, Inc. All rights reserved Song lengths Song lengths are skewed right because there are many short songs, no negative length songs, but a few long songs. The mean is influenced greatly by the right tail. The median isn’t. The median of 226 seconds better represents the typical song. The IQR of 117 seconds covers the high bars of the histogram. 3 - 37 Copyright © 2014 Pearson Education, Inc. All rights reserved San Francisco Temperatures 3 - 38 The distribution is approximately mound shaped. With mound shaped distributions the mean and median are nearly the same number. The mean is preferred over the median if they are close together. One standard deviation from the mean gives a lower bound of 57 and an upper bound of 73. This covers the high bars of the histogram. Copyright © 2014 Pearson Education, Inc. All rights reserved The Effect of Outliers Number of employees at several businesses on main street: 6, 7, 14, 18, 23, 25, 26 Mean 3 - 39 Median = 18 If the 26 employee business is turned into a Wal-Mart: 6, 7, 14, 18, 23, 25, 334 Mean = 17 = 61 Median = 18 Conclusion: The mean is strongly affected by outliers, while the median is not affected by outliers. Copyright © 2014 Pearson Education, Inc. All rights reserved Affected by Outliers? Affected by Outliers: Mean Standard Deviation Range Not Affected by Outliers: Median Interquartile 3 - 40 Range (IQR) Copyright © 2014 Pearson Education, Inc. All rights reserved Bimodal Distributions For most bimodal distributions, neither the mean nor the median represent typical values. Investigate further to see if there are two separate sub-populations. Consider separating the two populations and present their graphs and statistics individually. 3 - 41 Copyright © 2014 Pearson Education, Inc. All rights reserved Trouble with Bimodal Distributions There are two typical values. Neither the mean nor the median describe the typical values. The data should be separated out by lunch customers and dinner customers. 3 - 42 Copyright © 2014 Pearson Education, Inc. All rights reserved Separating Lunch and Dinner 3 - 43 Displaying the data with two histograms allows a comparison between lunch and dinner. Copyright © 2014 Pearson Education, Inc. All rights reserved Separating Lunch and Dinner 3 - 44 The Lunch distribution is mound shaped and the Dinner distribution is skewed right. Do not compare the mean of one data set with the median of another. Use the medians for comparisons. Lunch median is $8 and Dinner median is $22 Copyright © 2014 Pearson Education, Inc. All rights reserved 3.5 Using Boxplots for Displaying Summaries Copyright © 2014 Pearson Education, Inc. All rights reserved The Five Point Summary When the data are partitioned into four equal segments, five important numbers arise. They are called the Five Point Summary: – Smallest Value First Quartile (Q1) – The Median of the Lower Half Median – The Middle Number or Center Third Quartile – The Median of the Upper Half Maximum – Largest Value Minimum 3 - 46 Copyright © 2014 Pearson Education, Inc. All rights reserved How many Boyfriends/Girlfriends? The results of a survey asking how many boyfriends/girlfriends people have had is shown below: 0, 1, 1, 2, 3, 4, 4, 5, 6, 8,10 The five point summary is: Minimum =0 Median = 4 Maximum = 10 3 - 47 Q1 =1 Q3 = 6 Copyright © 2014 Pearson Education, Inc. All rights reserved Potential Outliers A Potential Outlier is a data value that is a distance of more than 1.5 interquartile ranges below the first quartile or above the third quartile. 1. 2. 3. 4. 3 - 48 Calculate IQR = Q3 – Q1 Find m = Q1 – (1.5)(IQR) Find M = Q3 + (1.5)(IQR) Any values less than m or more than M are potential outliers. Copyright © 2014 Pearson Education, Inc. All rights reserved Finding Possible Outliers The first quartile, Q1, for triglycerides is 109 mg/dL. The third quartile, Q2, is 150 mg/dL. Determine which if any of the following triglyceride readings are potential outliers: 38, 200, 225 3 - 49 IQR = 150 – 109 = 41 Q1 – (1.5)(IQR) = 109 – (1.5)(41) = 47.5 Q3 – (1.5)(IQR) = 150 + (1.5)(41) = 211.5 38 and 225 are potential Outliers since 38 < 47.5 and 225 > 211.5. Copyright © 2014 Pearson Education, Inc. All rights reserved Boxplots A Boxplot is a chart that visually displays Q1, the median, Q3, and the potential outliers. To create a boxplot: 1. 2. 3. 4. 3 - 50 Plot the potential outliers Draw small vertical line segments at Q1, Q3, and the median. Draw a box with base from Q1 to Q3. Sketch horizontal line segments from the ends of the box to the smallest and largest values that are not potential outliers. Copyright © 2014 Pearson Education, Inc. All rights reserved Box Plot 3 - 51 Copyright © 2014 Pearson Education, Inc. All rights reserved Interpreting a Boxplot What percent of students scored below 83%? Answer: What percent of students scored between 83% and 92%? Answer: 3 - 52 25% 50% Copyright © 2014 Pearson Education, Inc. All rights reserved Comparing Distributions with Boxplots Both cities have similar typical temperatures. Both cities have fairly symmetric distributions. Provo has a much greater variation in temperatures than San Francisco. 3 - 53 Copyright © 2014 Pearson Education, Inc. All rights reserved What Boxplots Show and Don’t Show Boxplots Show: Typical Range of Values Possible Outliers Variation Boxplots Don’t Show: Modality Mean Anything 3 - 54 for small data sets, especially < 5. Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 3 Case Study Copyright © 2014 Pearson Education, Inc. All rights reserved Perceived Risk 3 - 56 Copyright © 2014 Pearson Education, Inc. All rights reserved Perceived Risk of Appliances 3 - 57 Skewed right for both men and women. Unimodal for both men and women. Women’s typical value slightly higher than men’s. Five Point Summary appropriate for both. Copyright © 2014 Pearson Education, Inc. All rights reserved Risk of Appliance: Statistics Men’s median is 10, women’s median is higher at 15. The middle 50% of men varied by 20, while the variation was higher, 25 for women. 3 - 58 Copyright © 2014 Pearson Education, Inc. All rights reserved Perceived Risk X-rays 3 - 59 Relatively symmetric for both men and women. Unimodal for both men and women. Women’s typical value close to men’s. Mean and standard deviation appropriate for both. Copyright © 2014 Pearson Education, Inc. All rights reserved Risk of X-rays: Statistics Mean Standard Deviation Men 46.8 20 Women 47.8 20.8 Men and Women have similar mean and standard deviation risk perception for X-rays. About 68% of men perceive a risk between 26.8 and 66.8. About 68% of women perceive a risk between 27 and 68.6. 3 - 60 Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 3 Guided Exercise 1 Copyright © 2014 Pearson Education, Inc. All rights reserved The mean rate of violent crime in the west was 406 per 100,000 people, and the standard deviation was 177. Assume the distribution is approximately unimodal and symmetric. 3 - 62 Between which two values would you expect to find about 95% of the violent crime rates? Between which two values would you expect to find about 68% of the violent crime rates? If a western state had a violent crime rate of 584 crimes per 100,000 people, would you consider this unusual? Would 30 crimes per 100,000 people be unusual? Copyright © 2014 Pearson Education, Inc. All rights reserved The mean rate of violent crime in the west was 406 per 100,000 people, and the standard deviation was 177. Assume the distribution is approximately unimodal and symmetric. 3 - 63 Copyright © 2014 Pearson Education, Inc. All rights reserved The mean rate of violent crime in the west was 406 per 100,000 people, and the standard deviation was 177. Assume the distribution is approximately unimodal and symmetric. By the Empirical Rule, about 95% of the data is within two standard deviations of the mean. 3 - 64 This represents the green and blue areas together. The number 583 represents one standard deviation more than the mean: 406 + 177 = 583. Copyright © 2014 Pearson Education, Inc. All rights reserved The mean rate of violent crime in the west was 406 per 100,000 people, and the standard deviation was 177. Assume the distribution is approximately unimodal and symmetric. 406 – 177 = 229 406 – 2(177) = 52 406 + 2(177) = 760 3 - 65 Copyright © 2014 Pearson Education, Inc. All rights reserved The mean rate of violent crime in the west was 406 per 100,000 people, and the standard deviation was 177. Assume the distribution is approximately unimodal and symmetric. 3 - 66 Between which two values would you expect to find about 95% of the violent crime rates? 95% of the violent crime rates are between 52 and 760 crimes per 100,000 people. Between which two values would you expect to find about 68% of the violent crime rates? 68% of the violent crime rates are between 229 and 583 crimes per 100,000 people. Copyright © 2014 Pearson Education, Inc. All rights reserved The mean rate of violent crime in the west was 406 per 100,000 people, and the standard deviation was 177. Assume the distribution is approximately unimodal and symmetric. 3 - 67 If a western state had a violent crime rate of 584 crimes per 100,000 people, would you consider this unusual? No, since 584 is within 2 standard deviations of the mean. Would 30 crimes per 100,000 people be unusual? Yes, because less than 5% occur so far from the mean. Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 3 Guided Exercise 2 Copyright © 2014 Pearson Education, Inc. All rights reserved The head circumferences in centimeters for some men and women in a statistics class are given. Men: 58, 60, 62.5, 63, 59.5, 59, 60, 57, 55 Women: 63, 55, 54.5, 53.5, 53, 58.5, 56, 54.5, 55, 56, 56, 54, 56,53, 51 Compare the circumferences of the men’s and women’s heads. 3 - 69 Copyright © 2014 Pearson Education, Inc. All rights reserved Histograms of the two sets of Data. 3 - 70 Copyright © 2014 Pearson Education, Inc. All rights reserved Shapes The distribution for men is unimodal and not too far from symmetric. The distribution for women is unimodal and nearly symmetric except one possible outlier. 3 - 71 Copyright © 2013 2014 Pearson Education, Inc.. Inc. All rights reserved. reserved Mean and Standard Deviation or Quartiles and IQR? 3 - 72 Since the women’s distribution has a possible outlier, the quartiles and IQR should be used for comparisons. Copyright © 2014 Pearson Education, Inc. All rights reserved Compare Centers 3 - 73 The median head circumference for the men was 59.5 cm, and the median head circumference for the women was 55 cm. This shows that the men tended to have larger heads. Copyright © 2014 Pearson Education, Inc. All rights reserved Compare Variances 3 - 74 The interquartile range for the head circumferences for the men was 2 cm, and the interquartile range for the women was 2.5 cm. This shows that the women tended to have more variation, as measured by the interquartile range. Copyright © 2014 Pearson Education, Inc. All rights reserved Outliers Men: 58, 60, 62.5, 63, 59.5, 59, 60, 57, 55 Women: 63, 55, 54.5, 53.5, 53, 58.5, 56, 54.5, 55, 56, 56, 54, 56, 53, 51 3 - 75 Q1 – (1.5)(IQR) = 55, Q3 + (1.5)(IQR) = 63 No Possible outliers for the men. Q1 – (1.5)(IQR) = 49.75, Q3 + (1.5)(IQR) = 59.75 63 is a possible outlier for the women. Copyright © 2014 Pearson Education, Inc. All rights reserved Final Comparison 3 - 76 The typical head circumference for men is about 4.5 cm larger than the head circumference for women. The women’s head circumference had slightly more variation than the men’s. Copyright © 2014 Pearson Education, Inc. All rights reserved