Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Measures of Center Measures of Spread Distributions MATH 112 Section 7.2: Measuring Distribution, Center, and Spread Prof. Jonathan Duncan Walla Walla College Fall Quarter, 2006 Conclusion Measures of Center Measures of Spread Outline 1 Measures of Center The Arithmetic Mean The Geometric Mean The Median The Mode 2 Measures of Spread 3 Distributions 4 Conclusion Distributions Conclusion Measures of Center Measures of Spread Distributions Conclusion Analyzing Data In the last section we focused on presenting an overall picture of data using tables or graphs. Now we will examine ways to analyze certain characteristics of data such as the data set’s center, spread, and distribution. Example In your first year as a teacher, you and another teacher both give the same test to your classes of 25 children. The two classes have the following scores: 93 85 77 71 68 Your Class 92 92 90 84 80 79 76 75 74 71 71 71 66 62 59 87 78 71 70 53 98 93 82 70 61 Other Class 98 97 95 89 88 87 77 76 72 65 64 63 60 58 58 94 84 71 61 47 Measures of Center Measures of Spread Distributions Conclusion The Arithmetic Mean Defining the Arithmetic Mean If you wanted one number to capture your classes’ performance, what would it be? Arithmetic Mean The arithmetic mean is found by adding all numbers in the data set and dividing by the number of values in the data set. Example ≈ 75.8 Your Class: 53+59+···+92+93 25 47+58+···+98+98 Other Class: ≈ 76.4 25 Based on these numbers, which class did better? Is the arithmetic mean an accurate summary of the exam scores? Measures of Center Measures of Spread Distributions Conclusion The Arithmetic Mean The Arithmetic Mean and Outliers There are some issues to keep in mind when using the arithmetic mean. One of these issues is the effect of outliers–data points which are much smaller or larger than the typical value. Example A new student transfers into your class and scores 100% on the test. How does this affect the mean score? The arithmetic mean changes from 75.8 to 76.7. Example One of the students in your class who scored a 71 on the exam is found to have cheated and their score is changed to a zero. How does this affect your mean score? The arithmetic mean changes from 75.8 to 73.0. Measures of Center Measures of Spread Distributions Conclusion The Arithmetic Mean Cautions about the Arithmetic Mean Using the arithmetic mean to measure the center of a data set has both advantages and disadvantages. Advantages of the Arithmetic Mean The following are advantages of the arithmetic mean. The arithmetic mean is the standard measure of center. It is usually called the “average”. The arithmetic mean detects and is effected by outliers. Disadvantages of the Arithmetic Mean The following are disadvantages of using the arithmetic mean. The arithmetic mean is strongly affected by outliers and can be skewed because of outliers. The arithmetic mean my not accurately represent the “typical” value of a data point. Measures of Center Measures of Spread Distributions Conclusion The Geometric Mean Defining the Geometric Mean The geometric mean is found using a process similar to the arithmetic mean. The difference is that instead of using addition and division, we use multiplication and roots. The Geometric Mean To find the geometric mean, multiply all numbers in the data set together and then take the nth root of this product where n is the number of values in the data set. Example √ Your Class: 25√53 × 59 × · · · × 92 × 93 ≈ 75.1 Other Class: 25 47 × 58 × · · · × 98 × 98 ≈ 74.8 Notice that the geometric mean and arithmetic mean are usually quite close. For your class, the arithmetic mean was 75.8 compared to a geometric mean of 75.1. Measures of Center Measures of Spread Distributions Conclusion The Geometric Mean Cautions about the Geometric Mean The geometric mean also has advantages and disadvantages. Advantages of the Geometric Mean The following are advantages of the geometric mean. Like the arithmetic mean, the geometric mean will detect outliers. Each value in the data set contributes to the geometric mean. Disadvantages of the Geometric Mean The following are disadvantages of the geometric mean. In a data set containing a zero, the geometric mean will not work. The geometric mean is not a well known measure of center. Measures of Center Measures of Spread Distributions Conclusion The Median Defining the Median While the arithmetic mean is often used to measure the center of a set of data, they are both sensitive to outliers and therefore not appropriate for data sets in which there outlying values. The Median To find the median of a set of data, arrange the values in order from least to greatest. The middle number (or the arithmetic mean of the two middle numbers if the number of values is even) is the median. Example Your Class: 75 is the middle number Other Class: 76 is the middle number In our example, the mean and median are very close. The mean for your class was 75.8 and for the other class it was 76.4. Measures of Center Measures of Spread Distributions Conclusion The Median The Median and Outliers The median does not make use of every number in the data set and because of this, it is not as sensitive to outliers. Consider the following smaller set of data. Example Find the mean and median of 1, 70, 71, 72. The mean is 1+70+71+72 ≈ 53.5 4 The median is the mean of 70 and 71, so 70.5 Measures of Center Measures of Spread Distributions Conclusion The Median Cautions about the Median The median is a useful tool for measuring centers in data sets in which there are many or extreme outliers. However, there are still some cautions to keep in mind when using the median. Advantages of the Median The following are some advantages of using the Median to measure center. The median is not sensitive to outliers One half of the data is below the median and one half above. Disadvantages of the Median The following are some disadvantages of using the Median. Not every number is included in the computation of the median. The median is not as well known as the mean. Measures of Center Measures of Spread Distributions Conclusion The Mode Defining the Mode The last measure of center which we will examine is the mode. The mode is the only way to measure the center of a set of non-numeric data. The Mode The mode of a set of data is the value which appears most often. If no number appears more than once, there is no mode. If several numbers appear the same maximal number of times, the data is multi-modal. Example Your Class: The mode is 71 Other Class: The data has two modes–58 and 61 Measures of Center Measures of Spread Distributions Conclusion The Mode More on the Mode The mode is a useful tool when measuring the center of a set of data which is not numeric. Example Find the mode of the classes favorite pizza topping. Advantages and Disadvantages of the Mode Advantages: works well with non-numeric data, usually easy to find, is the most “typical” data. Disadvantages: may not exist, my not be related to the “real” center of the data. Measures of Center Measures of Spread Distributions Conclusion The Mode Visualizing the Center Graphs can be used to visualize the center of a set of data. Frequency Distribution A frequency distribution is a table in which data values are arranged into bins. It can also be a bar graph of the frequency distribution table. Example Create a frequency distribution table and graph for your test scores. Stem-and-Leaf Plots A stem-and-leaf plot also gives us as a picture of the data, but it preserves the individual values. Example Construct a stem-and-leaf plot for your test scores. Measures of Center Measures of Spread Distributions Conclusion Measuring Spread One of the things you may have noticed about the graphs we just created is that they give a good picture of how spread out the data is. There are a couple of ways that we can measure spread. Measuring the Spread of Data There are two ways in which we will look at the spread of data. One each associated with the mean and with the median. When using the Median for the center, we often use a 5-number summary to measure the spread of the data. When using the Mean for the center, we often use the standard deviation to measure the spread of the data. Measures of Center Measures of Spread Distributions Conclusion The 5-Number Summary The five number summary consists 5 numbers which give us a picture of how spread out a given data set is. 5-Number Summary The 5-Number Summary for a set of data consists of: The Minimum The 1st Quartile (Median of the bottom half) The Median The 3rd Quartile (Median of the top half) The Maximum Example Find the 5-number summary for your class and the other class and display them graphically as a “box plot”. Measures of Center Measures of Spread Distributions Conclusion The Standard Deviation The standard deviation of a set of data is a measure of the average distance of each data point from the mean of the data set. Standard Deviation The standard deviation of a set of data {x1 , x2 , . . ., xn } is r (x1 − x)2 + (x2 − x)2 + · · · + (xn − x)2 n Example Find the standard deviation of 64, 75, 75, 82, 90, 90, and 95. Measures of Center Measures of Spread Distributions Conclusion Relating Center and Spread to a Picture The measures of center and spread which we have studied thus far relate to a the larger picture of the data called the spread. Types of Distributions There are several different types of distributions. Consider each of the following. Uniform Distribution Normal Distribution Skewed Distribution Example Which distribution does the distribution of your exam scores most closely match? Measures of Center Measures of Spread Distributions Conclusion The Normal Distribution The normal distribution can be of particular importance because it is a very well understood distribution. Properties of the Normal distribution The normal distribution has the following properties. The center of the distribution is the mean of the data set The distribution is symmetric about the mean 68% of the data is within one standard deviations of the mean 95% of the data is within two standard deviations of the mean 99.8% of the data is within two standard deviations Example A standardized test has a mean score of 72 and a standard deviation of 9. Suppose that 5000 students took the test and their scores were normally distributed. How many would we expect to score a) above 72 b) above 81 c) below 36? Measures of Center Measures of Spread Distributions Important Concepts Things to Remember from Section 7.2 1 Finding measures of center 1 2 3 Arithmetic mean Median Mode 2 Creating frequency distributions and stem-and-leaf plots 3 Finding measures of spread 1 2 5-number summary standard deviation 4 Identifying distributions 5 Working with the normal distribution Conclusion