Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1.2 Describing Distributions with Numbers Example. Table 1.3 (see TM-13) gives the ages of the presidents at inauguration and Figure 1.10 (see TM-14) gives a histogram of the data. Measuring Center: The Mean Definition. If n observations are denoted by x1 , x2, . . . , xn, their mean is x1 + x2 + · · · + xn n or in more compact notation x= n 1 x= xi . n i=1 Example 1.6. The mean of the data in Table 1.3 (see TM-13) is 54.8 years. Note. To compute the mean of a data set using the Sharp EL-546G, do the following: • Put the calculator in “statistics mode” by pressing MODE and 3. • Press 0 to put the calculator in single-variable statistics mode (ST0 appears in the display). 1 • Press 2ndF and CA to clear the statistics memory. • Enter the data and press DATA (on the M+ key) after each entry. • To get the mean, press RCL and x (the 4 key). See pages 36-40 of the calculator owner’s manual for more details. Measuring Center: The Median Definition. The median of a data set is the “middle value.” To find the median: 1. Arrange all observations in order of size, from smallest to largest. 2. If the number of observations n is odd, the median M is the center observation in the ordered list. Find the location of the median by counting (n + 1)/2 observations up from the bottom of the list. 3. If the number of observations n is even, the median M is the mean of the two center observations in the ordered list. The location of the median is again (n + 1)/2 from the bottom of the list. Example 1.8. The median of this data set is 34: 20 25 25 27 28 31 33 34 36 37 44 50 59 85 86 The median of this data set is 18.5: 5 7 10 14 18 19 25 29 31 33 2 Comparing the Mean and the Medain Note. The mean and median of a symmetric distribution are close together. In a skewed distribution, the mean is farther out in the long tail than is the median. This is because a few outliers can changed the mean, but may have no effect on the median. Measuring Spread: The Quartiles Definition. The range of a data set is the difference between the largest and smallest observations. The first quartile Q1 lies one-quarter of the way up the list, the third quartile Q3 lies threequarters of the way up the list. Note. To calculate the quartiles: 1. Arrange the observations in increasing order and locate the median M in the ordered list of observations. 2. The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median. 3. The third quartile Q2 is the median of the observations whose position in the ordered list is to the right of the location of the overall median. Example 1.10. We saw above that the median of the data set: 20 25 25 27 28 31 33 34 36 37 44 50 59 85 86 3 is 34. The first quartile is the median of the 7 observations to the left of the median, and so Q1 = 27. Similarly, Q3 = 50. For the data: 5 7 10 14 18 19 25 29 31 33 we have Q1 = 10 and Q3 = 29. The Five-Number Summary and Boxplots Definition. The five-number summary of a data set consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. In symbols the five-number summary is: Minimum Q1 M Q3 Maximum A boxplot is a way to graphically represent the five-number summary. Example. The data above is represented in a boxplot in Figure 1.11 (see TM-15). Measuring Spread: The Standard Deviation Definition. The variance s2 of a set of observations is the average of the squares of the deviations of the observations from their mean. In symbols, the variance of n observations x1 , x2, . . . , xn is s2 = (x1 − x)2 + (x2 − x)2 + · · · + (xn − x)2 n−1 4 or more compactly, s2 = n 1 (xi − x)2 . n − 1 i=1 The standard deviation s is the square root of the variance s2 : s= n 1 (xi − x)2 . n − 1 i=1 (Some texts call these the “sample variance” and “sample standard deviation” - versus the population variance and standard deviation.) Example 1.11. Consider the data set: 1792 1666 1362 1614 1460 1867 1439 The mean is x = 1600. We can calculate the variance as: Observations Deviations Squared Deviations xi xi − x (xi − x)2 1792 1792 − 1600 = 192 1922 = 36, 864 1666 1666 − 1600 = 66 662 = 4, 356 1362 1362 − 1600 = −238 (−238)2 = 56, 644 1614 1614 − 1600 = 14 142 = 196 1460 1460 − 1600 = −140 (−140)2 = 19, 600 1867 1867 − 1600 = 267 2672 = 71, 289 1439 1439 − 1600 = −161 (−161)2 = 25, 921 sum = 0 sum = 214,870 So the variance is s2 = n 1 1 (xi − x)2 = (214, 870) = 35, 811.67. n − 1 i=1 6 5 The standard deviation is s = 35, 811.67 = 189.24. Note. Your Sharp EL-546G calculator can much more easily calculate variance and standard deviation. Do the following: • Put the calculator in “statistics mode” by pressing MODE and 3 . • Press 0 to put the calculator in single-variable statistics mode (ST0 appears in the display). • Press 2ndF and CA to clear the statistics memory. • Enter the data and press DATA (on the M+ key) after each entry. • To get the (sample) standard deviation, press RCL and sx (the 5 key). See pages 36-40 of the calculator owner’s manual for more details. Note. Some properties of the standard deviation are: • s measures spread about the mean and should be used only when the mean is chosen as the measure of center. • s = 0 only when there is no spread. This happens only when all observations have the same value. Otherwise s > 0. As the observations become more spread out about their mean, s gets larger. • s, like the mean x, is strongly influenced by extreme observa6 tions. A few outliers can make s very large. Note. The five-number summary is usually better than the mean and standard deviation for describing a skewed distribution. Use x and s only for reasonably symmetric distributions. 7