Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
3.4-Measures of Relative Standing This section introduces measures of relative standing, which are numbers showing the location of data values relative to the other values within a data set. They can be used to compare values from different data sets, or to compare values within the same data set. The most important concept is the z score. We will also discuss percentiles and quartiles, as well as a new statistical graph called the boxplot. Z Scores: A z score is found by converting a value to a standardized scale and represents the number of standard deviations that a data value is from the mean. It is calculated as follows: z= z= x−x This is the z score for a sample. It uses the sample mean and standard deviation. s x−µ σ This is the z score for a population. It uses the population mean and standard deviation Z scores should always be rounded to two decimal places. Example: Scores on a test have a mean of 70 and a standard deviation of 11. Michelle has a score of 48. Convert Michelle's score to a z-score. Solution: Because the formulas for a sample and population are the same, it makes no difference which formula we use, however assuming we have all the scores, we have: z= x−µ σ = 48 − 70 = −2.00 11 Michelle has a z score of 2. This means that Michelle’s score of 48 was 2 standard deviations below the mean. Z Scores, Unusual Values, and Outliers: A value is considered unusual if it is more than 2 standard deviations from the mean. Whenever a value is less than the mean, its corresponding z score is negative. Therefore unusual values will have z scores of less than -2 or greater than +2. Ordinary values: –2 ≤ z score ≤ 2 Unusual Values: z score < –2 or z score > 2 An outlier is a data value that is far away from the other data values. In general we will define an outlier as an unusual value; that is, more than 2 standard deviations from the mean. Example: A department store, on average, has daily sales of $ 29,876.76. The standard deviation of sales is $ 1000. On Tuesday, the store sold $34,893.71 worth of goods. Find Tuesday's z score. Was Tuesday an unusually good day? Solution: Find the z score. z= x − x 34,893.71 − 29,876.76 = = 5.02 s 1000 The z score of 5.02 is slightly more than 5 standard deviations above the mean which is a very unusual number. It should therefore be obvious that Tuesday was a very unusual day. Example: Which is better: a score of 82 on a test with a mean of 70 and a standard deviation of 8, or a score of 82 on a test with a mean of 75 and a standard deviation of 4? Solution: Find the z score for each value. z= x − x 82 − 70 = 1.50 = s 8 z= x − x 82 − 75 = = 1.75 s 4 The second score is a better score because it is 1.75 standard deviations above the mean as compared to the first score which is 1.5 standard deviations above the mean. Percentiles: Percentiles are one type of quantiles which partition data into groups. Percentiles are measures of location which divide a set of data into 100 groups with about 1% of the values in each group. To find the percentile of a data value, use the formula: Percentile of a Value x = Number of Values less than x ⋅ 100% Total Number of Values Example: Find the percentile for the data value 14, given the following data set. Data set: 4 6 14 10 4 10 18 18 22 6 6 18 12 2 18 Solution: There are 9 data values less than 14 and a total of 15 data values. Therefore we have: Percentile of 14 = 9 ⋅ 100% = 60% 15 Converting a Percentile to a Data Value: To find a data value that corresponds to a given percentile, use the following formula on a sorted data set. L represents the location of the data value in the sorted data set. L= n k L Pk k ⋅n 100 total number of values in the data set percentile being used locator that gives the position of a value kth percentile Example: The test scores of 32 students are listed below. Find P46 . 89 56 70 57 37 80 48 82 41 59 74 83 99 63 74 86 46 77 75 32 71 66 65 92 53 68 78 95 55 69 79 44 Solution: Use the formula to find the location of the data value that is in the 46th percentile. L= 46 ⋅ 32 = 14.72 100 Because 14.72 is not a whole number, round up to the nearest whole number which is 15. When the data set is sorted in order from highest to lowest, the data value that is number 15 from the bottom is 68. 32 56 70 80 37 57 71 82 41 59 74 83 44 63 74 86 46 65 75 89 48 66 77 92 53 68 78 95 55 69 79 99 Quartiles: Quartiles are measures of location which divide a set of data into four groups with about 25% of the values in each group. 1st Quartile: 25% of the sorted values are less than or equal to this value. 2nd Quartile: 50% of the sorted values are less than or equal to this value. 3rd Quartile: 75% of the sorted values are less than or equal to this value. Example: The test scores of 32 students are listed below. Find Q3 . 89 56 70 57 37 80 48 82 41 59 74 83 99 63 74 86 46 77 75 32 71 66 65 92 53 68 78 95 55 69 79 44 Solution: Because Q3 = P75 use the formula to find the location of the data value that is in the 3rd Quartile. L= 75 ⋅ 32 = 24 100 Because 24 is a whole number, the value of the 3rd Quartile (75th Percentile) is half way between the value of L and L+1. Therefore, the 3rd Quartile is 79.5. 32 56 70 80 37 57 71 82 41 59 74 83 44 63 74 86 46 65 75 89 48 66 77 92 53 68 78 95 55 69 79 99 For a set of data, the 5-number summary consists of the minimum value; the first quartile Q1; the median (or second quartile Q2); the third quartile, Q3; and the maximum value. A 5-number summary is used to create a boxplot. Boxplots: A boxplot (or box-and-whisker-diagram) is a graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile, Q1; the median; and the third quartile, Q3. Boxplot from a normal distribution: Boxplot from a skewed distribution: Outliers and Modified Boxplots: An outlier is a value that lies very far away from the vast majority of the other values in a data set. An outlier can have a dramatic effect on the mean. An outlier can have a dramatic effect on the standard deviation. An outlier can have a dramatic effect on the scale of the histogram so that the true nature of the distribution is totally obscured. Modified Boxplots: For purposes of constructing modified boxplots, we can consider outliers to be data values meeting specific criteria. In modified boxplots, a data value is an outlier if it is above Q3 by an amount greater than 1.5 × IQR or below Q1 by an amount greater than 1.5 × IQR. A modified boxplot is constructed with these specifications: A special symbol (such as an asterisk) is used to identify outliers. The solid horizontal line extends only as far as the minimum data value that is not an outlier and the maximum data value that is not an outlier. Example: The weights (in pounds) of 30 newborn babies are listed below. Construct a boxplot for the data set. 5.5 5.7 5.8 5.9 6.1 6.1 6.3 6.4 6.5 6.6 6.7 6.7 6.7 6.9 7.0 7.0 7.0 7.1 7.2 7.2 7.4 7.5 7.7 7.7 7.8 8.0 8.1 8.1 8.3 8.7 Solution: Determine the quartile values of 5.5, 6.4, 7.0, 7.7, and 8.7. Then construct the boxplot.