Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
3.4 Measures of Relative Location: z-score is the quantity which can be used to measure the relative location of the data. Z-score, referred to as the standardized value for observation i, is defined as zi Note: xi x s . z i is the number of standard deviation xi from the mean x . Example 2 (continue): Factory 1: xi 10.1 zi 10.1 9.9 9.9 10.1 9.9 10.1 9.9 10.1 0.948 -0.948 0.948 -0.948 -0.948 0.948 -0.948 0.948 -0.948 0.948 Factory 2: xi 16 zi 9.9 5 7 14 6 15 3 13 9 12 1.305 -1.088 -0.652 0.870 -0.870 1.088 -1.523 0.652 -0.217 0.435 There are two results related to the location of the data. The first result is Chebyshev’s theorem. Chebyshev’s Theorem: For any population, within k standard deviation of mean, there are at least (1 1 ) 100% k2 of the data, where k is any value greater than 1. Based on Chebyshev’s theorem, for any data set, it could be roughly estimated that at least (1 1 ) 100% of data within k sample standard deviation of mean. k2 Example 2 (continue): As k=2, based on Chebyshev’s theorem, at least (1 1 ) 100% 75% 22 of the data are estimated within 2 standard deviations of mean. For the data from factory 1 and factory 2, all the data are within 2 sample deviations of mean, i.e., all 1 the data have z-score with absolute values smaller than 2. The second result is based on the empirical rule. The rule is especially applicable as the data have a bell-shaped distribution. The empirical rule is Approximately 68% of the data will be within one standard deviation of the mean ( 1 zi 1 ). Approximately 95% of the data will be within two standard deviation of the mean ( 2 zi 2 ). Almost all of the data will be within three standard deviation of the mean ( 3 zi 3 ). Example 2 (continue): For data from factory 1, all the data are within one standard deviation of the mean while 60% of the data are within one standard deviation of the mean for the data from the factory2. The result based on the empirical rule is not applicable to the two data set since the two data sets are not bell-shaped. However, for the following data, 2.11 -0.83 -1.43 1.35 -0.42 -0.69 -0.65 -0.29 -0.54 1.92 0.53 -0.27 1.7 0.88 1.25 0.32 -2.18 0.68 0.85 0.34 0 1 2 3 4 The histogram of the above data given below indicates the data is roughly bell-shaped. -2 -1 0 1 2 rn1 Approximately 65% of the data are within one standard deviation of the mean, which is similar to the result based on the empirical rule (68%). 2 Detecting Outliers: To identify the outliers, we can use either the box-plot or the z-score. The outliers identified by the box-plot are those data outside the upper limit or lower limit while the outliers identified by z-score are those with z-score smaller than –3 or greater than 3. Note: the outliers identified by box-plot might be different from those identified by using z-score . Example 4: The flashlight batteries produced by one of the manufacturers are known to have an average life of 60 hours with a standard deviation of 4 hours. (a) At least what percentage of batteries will have a life of 54 to 66 hours? (b) At least what percentage of the batteries will have a life of 52 to 68 hours? (c) Determine an interval for the batteries’ lives that will be true for at least 80% of the batteries. [solution:] Denote x 60, s 4 (a) [54,66] 60 6 x 1.5s Thus, by Chebyshev’s theorem, within 1.5 standard deviation, there is at least 1 100% 55.55% 1 2 1 . 5 of batteries. (b) [52,68] 60 8 x 2s Thus, by Chebyshev’s theorem, within 2 standard deviation, there is at least 1 1 2 100% 75% 2 of batteries. (c) 3 1 1 1 2 100% 80% 1 2 0.8 k 5 k k Thus, within 5 standard deviation, there is at least 80% of batteries. Therefore, x 5s 60 5 4 60 8.94 51.06,68.94 . Online Exercise: Exercise 3.4.1 4