Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
MEASURES OF SPREAD Section 3.2 Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Objectives 1. 2. 3. 4. 5. 6. 7. Compute the range of a data set Compute the variance of a population and a sample Compute the standard deviation of a population and a sample Approximate the standard deviation with grouped data Use the Empirical Rule to summarize data that are unimodal and approximately symmetric Use Chebyshevβs Inequality to describe a data set Compute the coefficient of variation Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. OBJECTIVE 1 Compute the range of a data set Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. The Range The range of a data set is the difference between the largest value and the smallest value. The average monthly temperatures, in degrees Fahrenheit, for San Francisco are San Francisco Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 51 54 55 56 58 60 60 61 63 62 58 52 The range of temperatures is: 63 β 51 = 12. Although the range is easy to compute, it is not often used in practice. The reason is that the range involves only two values from the data set: the largest and smallest. Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. OBJECTIVE 2 Compute the variance of a population and a sample Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Variance When a data set has a small amount of spread, like the San Francisco temperatures, most of the values will be close to the mean. When a data set has a larger amount of spread, more of the data values will be far from the mean. The variance is a measure of how far the values in a data set are from the mean, on the average. The variance is computed slightly differently for populations and samples. The population variance is presented first. Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Definition: Population Variance Let π₯1 , π₯2 , π₯3 , β¦ , π₯π denote the values in a population of size π. Let π denote the population mean. The population variance, denoted by π 2 , is Population Variance π2 β π₯π β π = π 2 Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example β Population Variance Compute the population variance for the San Francisco temperatures. San Francisco Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 51 54 55 56 58 60 60 61 63 62 58 52 Solution: Step 1: Compute the population mean π. π= βπ₯π π = 51+54+55+56+58+60+60+61+63+62+58+52 12 = 57.5 Step 2: For each population value π₯π compute π₯π β π. These values are shown in the second row below. π₯π 51 54 55 56 58 60 60 61 63 62 58 52 ππ β π β6.5 β3.5 β2.5 β1.5 0.5 2.5 2.5 3.5 5.5 4.5 0.5 β5.5 Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example β Population Variance Step 3: Square the deviations to obtain the quantity π₯π β π 2 . These values are shown in the third row. π₯π 51 54 55 56 58 60 60 61 63 62 58 52 π₯π β π β6.5 β3.5 β2.5 β1.5 0.5 2.5 2.5 3.5 5.5 4.5 0.5 β5.5 42.25 12.25 6.25 2.25 0.25 6.25 6.25 12.25 30.25 20.25 0.25 30.25 ππ β π π Step 4: Sum the squared deviations to obtain the quantity β π₯π β π 2 . β π₯π β π 2 = 42.25 + 12.25 + 6.25 + 2.25 + 0.25 + 6.25 + 6.25 +12.25 + 30.25 + 20.25 + 0.25 + 30.25 = 169 Step 4: Divide the sum obtained in Step 4 by the population size π to obtain the population variance π 2 . β π₯π β π 2 π = π 2 169 = = 14.083. 12 Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Sample Variance When the data values come from a sample rather than a population, the variance is called the sample variance. The procedure for computing the sample variance is a bit different from the one used to compute a population variance. In the formula, the mean π is replaced by the sample mean π₯ and the denominator is π β 1 instead of π. The sample variance is denoted by π 2 . Sample Variance π 2 β π₯π β π₯ = πβ1 2 Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Why Divide by π β 1 ? When computing the sample variance, we use the sample mean to compute the deviations. For the population variance we use the population mean for the deviations. It turns out that the deviations using the sample mean tend to be a bit smaller than the deviations using the population mean. If we were to divide by π when computing a sample variance, the value would tend to be a bit smaller than the population variance. It can be shown mathematically that the appropriate correction is to divide the sum of the squared deviations by π β 1 rather than π. Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example β Sample Variance A company that manufactures batteries is testing a new type of battery designed for laptop computers. They measure the lifetimes, in hours, of six batteries, and the results are 3, 4, 6, 5, 4, 2. Find the sample variance of the lifetimes. Solution: The sample mean is π₯ = 3+4+6+5+4+2 6 = 4. The sample variance is given by β π₯π β π₯ 2 2 π = πβ1 3β4 2+ 4β4 2+ 6β4 2+ 5β4 = 6β1 10 = =2 5 2 + 4β4 2 + 2β4 2 Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. OBJECTIVE 3 Compute the standard deviation of a population and a sample Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Standard Deviation Because the variance is computed using squared deviations, the units of the variance are the squared units of the data. For example, in the Battery Lifetime example, the units of the data are hours, and the units of variance are squared hours. In most situations, it is better to use a measure of spread that has the same units as the data. We do this simply by taking the square root of the variance. This quantity is called the standard deviation. The standard deviation of a sample is denoted π , and the standard deviation of a population is denoted by π. Sample Standard Deviation π = π 2 Population Standard Deviation π= π2 Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example β Standard Deviation Example: The population variance of temperatures in San Francisco is π 2 = 14.083. Find the population standard deviation. Solution: The population standard deviation is π = π 2 = 14.083 = 3.753. Example: The variance of the lifetimes for a sample of six batteries π 2 = 2. Find the sample standard deviation. Solution: The sample standard deviation is π = π 2 = 2 = 1.414. Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Standard Deviation on the TI-84 PLUS The following steps will compute the standard deviation for both sample data and population data on the TI-84 PLUS Calculator: Enter the data into L1 in the data editor. Run the 1-Var Stats command (the same command used for means and medians), selecting L1 as the location of the data. Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Standard Deviation and Resistance Recall that a statistic is resistant if its value is not affected much by extreme values (large or small) in the data set. The standard deviation is not resistant. That is, the standard deviation is affected by extreme values. Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. OBJECTIVE 4 Approximate the standard deviation using grouped data Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Approximating the Standard Deviation Sometimes we donβt have access to the raw data in a data set, but we are given a frequency distribution. In these cases we can approximate the standard deviation using the following steps. Step 1: Compute the midpoint of each class and approximate the mean of the frequency distribution. Step 2: For each class, subtract the mean from the class midpoint to obtain (Midpoint β Mean). Step 3: For each class square the difference obtained in Step 2 to obtain (Midpoint β Mean)2, and multiply by the frequency to obtain (Midpoint β Mean)2 x (Frequency). Step 4: Add the products (Midpoint β Mean)2 x (Frequency) over all classes. Step 5: To compute the population variance, divide the sum obtained in Step 4 by π. To compute the sample variance, divide the sum obtained in Step 4 by π β 1. Step 6: Take the square root of the variance obtained in Step 5. The result is the standard deviation. Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example The following table presents the number of text messages sent via cell phone by a sample of 50 high school students. Approximate the sample standard deviation number of messages sent. Number of Text Messages Sent Frequency 0 β 49 10 50 β 99 5 100 β 149 13 150 β 199 11 200 β 249 7 250 β 299 4 Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Solution Step 1: Compute the midpoint of each class. Recall from the last section that the sample mean was computed as 137. Number of Text Messages Sent Class Midpoint 0 β 49 25 50 β 99 75 100 β 149 125 150 β 199 175 200 β 249 225 250 β 299 275 Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Solution Step 2: For each class, subtract mean from the class midpoint to obtain (Midpoint β Mean). Number of Text Messages Sent Class Midpoint (Midpoint β Mean) 0 β 49 25 β112 50 β 99 75 β62 100 β 149 125 β12 150 β 199 175 38 200 β 249 225 88 250 β 299 275 138 Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Solution Step 3: For each class, square the differences obtained in Step 2 to obtain (Midpoint β Mean)2, and multiply by the frequency to obtain (Midpoint β Mean)2 x (Frequency). Number of Text Messages Sent Frequency (Midpoint β Mean) (Midpoint β Mean)2 x (Frequency) 0 β 49 10 β112 125,440 50 β 99 5 β62 19,220 100 β 149 13 β12 1,872 150 β 199 11 38 15,884 200 β 249 7 88 54,208 250 β 299 4 138 76,176 Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Solution Step 4: Add the products (Midpoint β Mean)2 x (Frequency) over all classes. (Midpoint β Mean)2 x (Frequency) 125,440 19,220 1,872 β MidpointβMean 2 × Frequency = 125,440 + 19,220 + 1,872 + 15,884 + 54,208 + 76,176 = 292,800 15,884 54,208 76,176 Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Solution Step 5: Since we are computing the sample variance, we divide the sum obtained in Step 4 by π β 1. π 2 β MidpointβMean 2 × Frequency 292,800 = = πβ1 50 β 1 = 5975.51020 Step 6: Take the square root of the variance to obtain the standard deviation. π = π 2 = 5975.51020 = 77.30142 Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Grouped Data on the TI-84 PLUS The same procedure used to compute the mean for grouped data in a frequency distribution may be used to compute the standard deviation. Enter the midpoint for each class into L1 and the corresponding frequencies in L2. Next, select the 1-Var stats command and enter L1 in the List field and L2 in the FreqList field, if using Stats Wizards. If you are not using Stats Wizards, you may run the1-Var Stats command followed by L1, comma, L2. Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example Class Midpoint Frequency 25 10 75 5 125 13 175 11 225 7 275 4 The output for the last example on the TI-84 PLUS Calculator is presented below. The value of s represents the approximate sample standard deviation. In this example s = 77.30142. Therefore the approximate standard deviation is 77.30142. Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. OBJECTIVE 5 Use the Empirical Rule to summarize data that are unimodal and approximately symmetric Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Bell-Shaped Histogram Many histograms have a single mode near the center of the data, and are approximately symmetric. Such histograms are often referred to as bell-shaped. Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. The Empirical Rule When a data set has a bell-shaped histogram, it is often possible to use the standard deviation to provide an approximate description of the data using a rule known as The Empirical Rule. When a population has a histogram that is approximately bell-shaped, then: β’ Approximately 68% of the data will be within one standard deviation of the mean. β’ Approximately 95% of the data will be within two standard deviations of the mean. β’ All, or almost all, of the data will be within three standard deviations of the mean. Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example β The Empirical Rule Example: The following table presents the U.S. Census Bureau projection for the percentage of the population aged 65 and over for each state and the District of Columbia. Use the Empirical Rule to describe the data. 14.1 14.1 12.3 13.1 14.3 13.3 14.1 12.2 14.4 14.3 15.3 12.4 17.8 16.0 13.0 15.0 12.0 8.1 13.6 12.6 Solution: We first note that the histogram is approximately bell-shaped and we may use the TI-84 PLUS calculator, or other technology, to compute the population mean and standard deviation. 14.9 11.5 10.5 13.6 12.6 14.1 12.4 13.7 13.7 10.2 13.5 15.5 12.8 12.4 13.9 14.6 13.8 13.4 10.7 9.0 13.7 15.6 11.5 12.2 12.4 12.8 14.3 14.0 13.8 13.9 12.7 Mean: π = ππ. πππ Standard Deviation: π = π. ππππ Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example β The Empirical Rule Solution (continued): We compute the following: π β π = 13.249 β 1.6827 = 11.57 π + π = 13.249 + 1.6827 = 14.93 Approximately 68% of the data values are between these. π β 2π = 13.249 β 2(1.6827) = 9.88 π + 2π = 13.249 + 2(1.6827) = 16.61 Approximately 95% of the data values are between these. π β 3π = 13.249 β 3(1.6827) = 8.20 π + 3π = 13.249 + 3(1.6827) = 18.30 Almost all of the data values are between these. Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. OBJECTIVE 6 Use Chebyshevβs Inequality to describe a data set Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Any Data Set When a distribution is bell-shaped, we use The Empirical Rule to approximate the proportion of data within one or two standard deviations. Another rule called Chebyshevβs Inequality holds for any data set. Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chebyshevβs Inequality In any data set, the proportion of the data that is within K standard deviations of the mean is at least 1 β 1/K2. Specifically, by setting K = 2 or K = 3, we obtain the following results. β’ At least 3/4, or 75%, of the data are within two standard deviations of the mean. β’ At least 8/9, or 89%, of the data are within three standard deviations of the mean. Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example β Chebyshevβs Inequality Example: As part of a public health study, systolic blood pressure was measured for a large group of people. The mean was 120 and the standard deviation was 10. What information does Chebyshevβs Inequality provide about these data? Solution: We compute the following: π₯ β 2π = 120 β 2 10 = 100 π₯ + 2π = 120 + 2 10 = 140 π₯ β 3π = 120 β 3 10 = 90 π₯ + 3π = 120 + 3 10 = 150 We conclude: β’ At least 3/4 (75%) of the people had systolic blood pressures between 100 and 140. β’ At least 8/9 (89%) of the people had systolic blood pressures between 90 and 150. Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. OBJECTIVE 7 Compute the coefficient of variation Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Coefficient of Variation The coefficient of variation (CV for short) tells how large the standard deviation is relative to the mean. It can be used to compare the spreads of data sets whose values have different units. The coefficient of variation is found by dividing the standard deviation by the mean. CV = π π Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example β Coefficient of Variation Example: National Weather service records show that over a thirty-year period, the annual precipitation in Atlanta, Georgia had a mean of 49.8 inches with a standard deviation of 7.6 inches, and the annual temperature had a mean of 62.2 degrees Fahrenheit with a standard deviation of 1.3 degrees. Compute the coefficient of variation for precipitation and for temperature. Which has greater spread relative to its mean? Solution: We compute the following: CV for precipitation = standard deviation for precipitation 7.6 = = 0.15 49.8 mean precipitation CV for temperature = standard deviation for temperature 1.3 = = 0.02 62.2 mean temperature The CV for precipitation is larger than the CV for temperature. Therefore, precipitation has a greater spread relative to its mean. Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. You Should Knowβ¦ β’ β’ β’ β’ β’ β’ β’ β’ How to compute the range of a data set The notation for population variance, population standard deviation, sample variance, and sample standard deviation How to compute the variance and the standard deviation for populations and samples How to use the TI-84 PLUS calculator to compute the variance and standard deviation for populations and samples How to approximate the standard deviation for grouped data How to use The Empirical Rule to describe a bell-shaped data set How to use Chebyshevβs Inequality to describe any data set How to compute and interpret the coefficient of variation Copyright © 2016 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.