Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Section 4.4: Interpreting Center and Variability: Chebyshev’s Rule, The Empirical Rule, and zscores • To compare information such as the mean and standard deviation it is useful to be able to describe how far away a particular observation is from the mean in terms of standard deviation. • Suppose we have a data set of scores on a standardized test with mean of 100 and standard deviation of 15. We can make the following statements: – Because 100 – 15 = 85, we say that a score of 85 is “ 1 standard deviation below the mean” similarly 100 + 15 = 115 is “1 standard deviation above the mean” – Because 2 standard deviations is 2(15) = 30 and 100 + 30 = 130 and 100 – 30 = 70 scores between 70 and 130 are within 2 standard deviations of the mean. – Because 100 + (3)(15) = 145, scores above 145 exceed the mean by more than 3 standard deviations Chebyshev’s Rule Chebyshev’s Rule Consider any number k, where k 1. Then the percentage of observations that are within k standard deviations of the mean is at least . 1 100 1 2 % k For specific values of k Chebyshev’s Rule reads • At least 75% of the observations are within 2 standard • • • • • deviations of the mean. At least 89% of the observations are within 3 standard deviations of the mean. At least 90% of the observations are within 3.16 standard deviations of the mean. At least 94% of the observations are within 4 standard deviations of the mean. At least 96% of the observations are within 5 standard deviations of the mean. At least 99% of the observations are with 10 standard deviations of the mean. Example – Chebyshev’s Rule • Consider the student age data 17 18 18 19 19 19 19 19 19 20 20 20 21 21 21 22 22 22 22 23 23 25 26 28 Color Code: 18 19 19 20 21 22 23 28 18 18 19 19 19 19 19 19 19 19 19 19 19 19 20 20 20 20 20 20 21 21 21 21 21 21 21 21 21 21 22 22 22 22 22 22 23 23 23 24 24 24 30 37 38 44 47 within 1 standard deviation of the mean within 2 standard deviations of the mean within 3 standard deviations of the mean within 4 standard deviations of the mean within 5 standard deviations of the mean Example continued Chebyshev’s Actual within 1 standard deviation of the mean 0% 72/79 = 91.1% within 2 standard deviations of the mean 75% 75/79 = 94.9% within 3 standard deviations of the mean 88.8% 76/79 = 96.2% within 4 standard deviations of the mean 93.8% 77/79 = 97.5% within 5 standard deviations of the mean 96.0% 79/79 = 100% Interval • Notice that Chebyshev gives very conservative lower bounds and the values aren’t very close to the actual percentages. Empirical Rule • If the histogram of values in a data set is reasonably symmetric and unimodal (specifically, is reasonably approximated by a normal curve), then 1. Approximately 68% of the observations are within 1 standard deviation of the mean 2. Approximately 95% of the observations are within 2 standard deviations of the mean 3. Approximately 99.7% of the observations are within 3 standard deviations of the mean Z-Score The z score corresponding to a particular observation in a data set is zscore observation mean standard deviation • The z-score is how many standard deviations the observation is from the mean. • A positive z-score indicates the observation is above the mean and a negative z-score indicates the observation is below the mean • Computing the z score is often referred to as standardization and the z score is called a standardized score. The formula used with sample data is z score x s x Example A sample of GPAs of 38 statistics students appear below (sorted in increasing order) 2.00 2.25 2.36 2.37 2.50 2.50 2.60 2.67 2.70 2.70 2.75 2.78 2.80 2.80 2.82 2.90 2.90 3.00 3.02 3.07 3.15 3.20 3.20 3.20 3.23 3.29 3.30 3.30 3.42 3.46 3.48 3.50 3.50 3.58 3.75 3.80 3.83 3.97 Mean = 3.0434 and s = 0.4720 • The following stem and leaf indicates that the GPA data is reasonably symmetric and uimodal 2 2 2 2 2 3 3 3 3 3 0 233 55 667777 88899 0001 2222233 444555 7 889 Stem: Units digit Leaf: Tenths digit x we compute Using the formula z score x s the z scores and color code the values as we did in an earlier example. -2.21 -1.68 -1.45 -1.43 -1.15 -1.15 -0.94 -0.79 -0.73 -0.73 -0.62 -0.56 -0.52 -0.52 -0.47 -0.30 -0.30 -0.09 0.23 0.33 0.33 0.33 -0.05 0.06 0.40 0.52 0.54 0.54 0.80 0.88 0.93 0.97 0.97 1.14 1.50 1.60 1.67 1.96 Interval Empirical Rule within 1 standard deviation of the mean 68% within 2 standard deviations of the mean 95% within 3 standard deviations of the mean 99.7% Actual 27/38 = 71% 37/38 = 97% 38/38 = 100% • Notice that the empirical rule gives reasonably good estimates for this example. Comparison of Chebyshev’s Rule and the Empirical Rule • The following refers to the weights in the sample of 79 students. Notice that the stem and leaf diagram suggest the data distribution is unimodal but is positively skewed because of the outliers on the high side. Nevertheless, the results for the Empirical Rule are good. 10 11 12 13 14 15 16 17 18 19 20 21 22 23 3 37 011444555 000000455589 000000000555 000000555567 000005558 0000005555 0358 5 00 0 55 79 Stem: Hundreds & tens digits Leaf: Units digit Interval Chebyshev’s Empirical Rule Rule Actual within 1 standard deviation of the mean 0% 68% 56/79 = 70.9% within 2 standard deviations of the mean 75% 95% 75/79 = 94.9% within 3 standard deviations of the mean 88.8% 99.7% 79/79 = 100% • Notice that even with moderate positive skewing of the data, the Empirical Rule gave a much more usable and meaningful result.