Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Document related concepts

Transcript

14 Descriptive Statistics 14.1 Graphical Descriptions of Data 14.2 Variables 14.3 Numerical Summaries 14.4 Measures of Spread Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 2 The Range An obvious approach to describing the spread of a data set is to take the difference between the highest and lowest values of the data. This difference is called the range of the data set and usually denoted by R. Thus, R = Max – Min. The range of a data set is a useful piece of information when there are no outliers in the data. In the presence of outliers the range tells a distorted story. Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 3 The Range For example, the range of the test scores in the Stat 101 exam is 24 – 1 = 23 points, an indication of a big spread within the scores (i.e., a very heterogeneous group of students). True enough, but if we discount the two outliers, the remaining 73 test scores would have a much smaller range of 16 – 6 = 10 points. Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 4 The Interquartile Range To eliminate the possible distortion caused by outliers, a common practice when measuring the spread of a data set is to use the interquartile range, denoted by the acronym IQR. The interquartile range is the difference between the third quartile and the first quartile (IQR = Q3 – Q1), and it tells us how spread out the middle 50% of the data values are. For many types of real-world data, the interquartile range is a useful measure of spread. Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 5 Example 14.18 2007 SAT Math Scores: Part 3 The five-number summary for the 2007 SAT math scores was Min = 200 (yes, there were a few jokers who missed every question!), Q1 = 430, M = 590, Max = 800 (there are still a few geniuses around!). It follows that the 2007 SAT math scores had a range of 600 points (800 – 200 = 600) and an interquartile range of 160 points (IQR = 590 – 430 = 160). Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 6 Standard Deviation The most important and most commonly used measure of spread for a data set is the standard deviation. The key concept for understanding the standard deviation is the concept of deviation from the mean. If A is the average of the data set and x is an arbitrary data value, the difference x – A is x’s deviation from the mean. The deviations from the mean tell us how “far” the data values are from the average value of the data. The idea is to use this information to figure out how spread out the data is. Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 7 Standard Deviation The deviations from the mean are themselves a data set, which we would like to summarize. One way would be to average them, but if we do that, the negative deviations and the positive deviations will always cancel each other out so that we end up with an average of 0. This, of course, makes the average useless in this case. The cancellation of positive and negative deviations can be avoided by squaring each of the deviations. Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 8 Standard Deviation The squared deviations are never negative, and if we average them out, we get an important measure of spread called the variance, denoted by V. Finally, we take the square root of the variance and get the standard deviation, denoted by the Greek letter (and sometimes by the acronym SD). The following is an outline of the definition of the standard deviation of a data set. Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 9 ■ ■ ■ THE STANDARD DEVIATION OF A DATA SET Let A denote the mean of the data set. For each number x in the data set, compute its deviation from the mean (x – A) and square each of these numbers. These numbers are called the squared deviations. Find the average of the squared deviations. This number is called the variance V. The standard deviation is the square root of the variance V . Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 10 Example 14.19 Calculation of a SD Over the course of the semester, Angela turned in all of her homework assignments. Her grades in the 10 assignments (sorted from lowest to highest) were 85, 86, 87, 88, 89, 91, 92, 93, 94, and 95. Our goal in this example is to calculate the standard deviation of this data set the old-fashioned way (i.e., doing our own grunt work). The first step is to find the mean A of the data set. It’s not hard to see that A = 90. Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 11 Example 14.19 Calculation of a SD The second step is to calculate the deviations from the mean and then the squared deviations. When we average the squared deviations, we get 11. This means that the variance is V = 11 and thus the standard deviation (rounded to one decimal place) is 11 3.3 points. Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 12 Interpreting the Standard Deviation It is clear from just a casual look at Angela’s homework scores that she was pretty consistent in her homework, never straying too much above or below her average score of 90 points. The standard deviation is, in effect, a way to measure this degree of consistency (or lack thereof). A small standard deviation tells us that the data are consistent and the spread of the data is small, as is the case with Angela’s homework scores. Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 13 Interpreting the Standard Deviation The ultimate in consistency within a data set is when all the data values are the same (like Angela’s friend Chloe, who got a 20 in every homework assignment). When this happens the standard deviation is 0. Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 14 Interpreting the Standard Deviation On the other hand, when there is a lot of inconsistency within the data set, we are going to get a large standard deviation. This is illustrated by Angela’s other friend, Tiki, whose homework scores were 5, 15, 25, 35, 45, 55, 65, 75, 85, and 95. We would expect the standard deviation of this data set to be quite large–in fact, it is almost 29 points. Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 15 Summary of the Standard Deviation The standard deviation is arguably the most important and frequently used measure of data spread.Yet it is not a particularly intuitive concept. Here are a few basic guidelines that recap our preceding discussion: Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 16 Summary of the Standard Deviation ■ The standard deviation of a data set is measured in the same units as the original data. For example, if the data are points on a test, then the standard deviation is also given in points. Conversely, if the standard deviation is given in dollars, then we can conclude that the original data must have been money–some prices, salaries, or something like that. For sure, the data couldn’t have been test scores on an exam. Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 17 Summary of the Standard Deviation ■ It is pointless to compare standard deviations of data sets that are given in different units. Even for data sets that are given in the same units–say, for example, test scores–the underlying scale should be the same. We should not try to compare standard deviations for SAT scores measured on a scale of 200–800 points with standard deviations of a set of homework assignments measured on a scale of 0–100 points. Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 18 Summary of the Standard Deviation ■ For data sets that are based on the same underlying scale, a comparison of standard deviations can tell us something about the spread of the data. If the standard deviation is small, we can conclude that the data points are all bunched together– there is very little spread. As the standard deviation increases, we can conclude that the data points are beginning to spread out. Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 19 Summary of the Standard Deviation The more spread out they are, the larger the standard deviation becomes. A standard deviation of 0, means that all data values are the same. As a measure of spread, the standard deviation is particularly useful for analyzing real-life data. Copyright © 2010 Pearson Education, Inc. Excursions in Modern Mathematics, 7e: 14.4 - 20