Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Section 2.4 Measures of Variation Statistics Mrs. Spitz Fall 2008 Objectives § How to find the range of a data set § How to find the variance and standard deviation of a population and of a sample § How to use the Empirical Rule and Chebychev’s Theorem to interpret standard deviation § How to approximate the sample standard deviation for grouped data Larson/Farber Ch 2 Assignment § Pp. 78-83 #1-8, 10-22, 24-32 Larson/Farber Ch 2 Range § The range of a data set is the difference between the maximum and minimum entries in the set. Range = (maximum entry) – (minimum entry) Larson/Farber Ch 2 Two Data Sets Closing prices for two stocks were recorded on ten successive Fridays. Calculate the mean, median and mode for each. Stock A Mean = 61.5 Median = 62 Mode = 67 Larson/Farber Ch 2 56 56 57 58 61 63 63 67 67 67 33 42 48 52 57 67 67 77 82 90 Stock B Mean = 61.5 Median = 62 Mode = 67 Measures of Variation Range = Maximum value – Minimum value Range for A = 67 – 56 = $11 Range for B = 90 – 33 = $57 The range is easy to compute but only uses two numbers from a data set. If only one number changes, the range can be vastly changed. If a $56 stock dropped to $12, the range would change. You need to different symbols for population parameters and sample statistics. Larson/Farber Ch 2 Measures of Variation To learn to calculate measures of variation that use each and every value in the data set, you first want to know about deviations. The deviation for each value x is the difference between the value of x and the mean of the data set. In a population, the deviation for each value x is: In a sample, the deviation for each value x is: Larson/Farber Ch 2 Deviations Stock A Deviation 56 – 5.5 56 – 61.5 56 – 5.5 56 – 61.5 57 – 4.5 57 – 61.5 58 – 3.5 58 – 61.5 61 – 0.5 63 1.5 63 1.5 67 5.5 67 5.5 67 5.5 Larson/Farber Ch 2 The sum of the deviations is always zero. Population Variance Population Variance: The sum of the squares of the deviations, divided by N. x 56 56 57 58 61 63 63 67 67 67 – 5.5 – 5.5 – 4.5 – 3.5 – 0.5 1.5 1.5 5.5 5.5 5.5 30.25 30.25 20.25 12.25 0.25 2.25 2.25 30.25 30.25 30.25 188.50 Larson/Farber Ch 2 Sum of squares Population Standard Deviation Population Standard Deviation: The square root of the population variance. The population standard deviation is $4.34. Larson/Farber Ch 2 Sample Variance and Standard Deviation To calculate a sample variance divide the sum of squares by n – 1. The sample standard deviation, s, is found by taking the square root of the sample variance. Larson/Farber Ch 2 Summary Range = Maximum value – Minimum value Population Variance Population Standard Deviation Sample Variance Sample Standard Deviation Larson/Farber Ch 2 Empirical Rule (68-95-99.7%) Data with symmetric bell-shaped distribution have the following characteristics. 13.5% 13.5% 2.35% –4 –3 2.35% –2 –1 0 1 2 3 4 About 68% of the data lies within 1 standard deviation of the mean About 95% of the data lies within 2 standard deviations of the mean About 99.7% of the data lies within 3 standard deviations of the mean Larson/Farber Ch 2 Using the Empirical Rule The mean value of homes on a street is $125 thousand with a standard deviation of $5 thousand. The data set has a bell shaped distribution. Estimate the percent of homes between $120 and $135 thousand. 105 110 115 120 125 130 $120 thousand is 1 standard deviation below the mean and $135 thousand is 2 standard deviations above the mean. 135 140 145 68% + 13.5% = 81.5% So, 81.5% have a value between $120 and $135 thousand. Larson/Farber Ch 2 Chebychev’s Theorem For any distribution regardless of shape the portion of data lying within k standard deviations (k > 1) of the mean is at least 1 – 1/k2. For k = 2, at least 1 – 1/4 = 3/4 or 75% of the data lie within 2 standard deviation of the mean. For k = 3, at least 1 – 1/9 = 8/9 = 88.9% of the data lie within 3 standard deviation of the mean. Larson/Farber Ch 2 Chebychev’s Theorem The mean time in a women’s 400-meter dash is 52.4 seconds with a standard deviation of 2.2 sec. Apply Chebychev’s theorem for k = 2. Mark a number line in standard deviation units. 2 standard deviations A 45.8 48 50.2 52.4 54.6 56.8 59 At least 75% of the women’s 400-meter dash times will fall between 48 and 56.8 seconds. Larson/Farber Ch 2