Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Answer Key Name: _____________________________________ Date: ____________________ VARIATION WITHIN A DATA SET COMMON CORE ALGEBRA I Measures of central tendency give us numbers that describe the typical data value in a given data set. But, they do not let us know how much variation there is in the data set. Two data sets can have the same mean but look radically different depending on how varied the numbers are in the set. Exercise #1: The two data sets below each have equal means but differ in the variation within the data set. Use your calculator to determine the Interquartile Range (IQR) of each data set. The IQR is defined as the difference between the third quartile value and the first quartile value. Data Set #1: 3, 3, 4, 4, 5, 5, 6, 6, 7, 8, 8, 9, 9, 10, 10, 11, 11 Q1 4.5 and Q3 9.5 IQR 9.5 4.5 5 Data Set #2: 5, 5, 6, 6, 7, 7, 8, 8, 9, 9 Q1 6 and Q3 8 IQR 8 6 2 The interquartile range gives a good measure of how spread out the data set is. But, the best measure of variation within a data set is the standard deviation. The actual calculation of standard deviation is complex and we will not go into it here. We will rely on our calculators for its calculation. Exercise #2: Using the same data sets above, use your calculator to produce the standard deviation (shown as x on the calculator) of the two data sets. Round your answers to the nearest tenth. Data Set #1: Data Set #2: 2.7 1.4 Standard Deviation The standard deviation of a data set tells us, on average, how far a data point is away from the mean of the data set. The larger the standard deviation, the greater the variation within the data set. Exercise #3: A farm is studying the weight of baby chickens (chicks) after 1 week of growth. They find the weight, in ounces, of 20 chicks. The weights are shown below. Find the mean, the interquartile range and the standard deviation for this data set. Round any non-integer values to the nearest tenth. Include appropriate units in your answers. Give an interpretation of the standard deviation. 2, 1, 3, 4, 2, 2, 3, 1, 5, 3, 4, 4, 5, 6, 3, 8, 5, 4, 6, 3 mean x 3.7 ounces interquartile range (IQR) Q1 2.5 and Q3 5 IQR 5 2.5 2.5 ounces standard deviation 1.7 ounces We can interpret this as saying that on average a typical chick’s weight lies 1.7 ounces away from the mean weight of 3.7 ounces. COMMON CORE ALGEBRA I, UNIT #10 – STATISTICS – LESSON #4 eMATHINSTRUCTION, RED HOOK, NY 12571, © 2013 Exercise #4: A marketing company is trying to determine how much diversity there is in the age of people who drink different soft drinks. They take a sample of people and ask them which soda they prefer. For the two sodas, the age of those people who preferred them is given below. Soda A: 18, 16, 22, 16, 28, 18, 21, 38, 22, 29, 25, 44, 36, 27, 40 Soda B: 25, 22, 18, 30, 27, 19, 22, 28, 25, 19, 23, 29, 26, 18, 20 (a) Explain why standard deviation is a better measure of the diversity in age than the mean. The mean is only going to give us a sense for what the typical age of a particular soda drinker is. It won’t tell us how wide of a variety of ages we have. Standard deviation will tell us how far a typical data point is away from the mean, and thus a higher value means a greater diversity in age than a lower value. (b) Which soda appears to have a greater diversity in the age of people who prefer it? How did you decide on this? Soda A appears to have the greater diversity. The ages range from a low of 16 to a high of 40 and lots of ages spread between. On the other hand, Soda B has a low of 18 and a high of 30. (c) Use your calculator to determine the sample standard deviation, normally given as sx , for both data sets. Round your answers to the nearest tenth. Did this answer reinforce your pick from (b)? How? s A 9.1 years and sB 4.1 years This did reinforce the answer from part (b). The standard deviation is much higher for Soda A, with a typical age being 9.1 year away from the mean, while for Soda B the typical age is only 4.1 years away from the mean, showing much less diversity in age. Population Versus Sample Standard Deviation When we are working with every possible data point of interest, we call this a population and use the population standard deviation, . When we have only a sample of all possible values we use the sample standard deviation, s. The formulas for these two differ very slightly, so their values tend to be slightly different. Exercise #5: Which of the following data sets would have a standard deviation (population) closest to zero? Do this without your calculator. Explain how you arrived at your answer. (1) 5, 2, 1, 0, 1, 2, 5 (3) 11, 11, 12, 13, 13 (2) 5, 8, 10, 16, 20 (4) 3, 7, 11, 11, 11,18 Choice (3) has almost no variation within the data set at all. Choice (4) also has little deviation, but the inclusion of the 3 and 18 will make it have a larger standard deviation than Choice (3). COMMON CORE ALGEBRA I, UNIT #10 – STATISTICS – LESSON #4 eMATHINSTRUCTION, RED HOOK, NY 12571, © 2013 (3) Answer Key Name: _____________________________________ Date: ____________________ VARIATION WITHIN A DATA SET COMMON CORE ALGEBRA I HOMEWORK 1. For each of the following data sets, use your calculator to help find the interquartile range and the population standard deviation. Show your calculation for the IQR. Round all non-integer values to the nearest tenth. (a) 4, 6, 8, 10, 15, 19, 22, 25 (b) 3, 3, 4, 5, 5, 6, 6, 7, 7, 8 Q1 7 and Q3 20.5 Q1 4 and Q3 7 IQR 20.5 7 13.5 IQR 7 4 3 7.3 1.6 2. For the data set shown in the dot plot below, which of the following is closest to its population standard deviation? (1) 2.7 (3) 3.3 (2) 4.2 (4) 5.8 3.333 (3) 3. What is the interquartile range of the data set represented in the box plot shown below? (1) 24 (3) 8 (2) 14 (4) 12 0 10 20 (4) IQR 18 6 12 4. Which of the following best measures the average distance that a data value lies away from the mean? (1) mean (3) median (2) standard deviation (4) range The mean and median measure the central tendency, not variation, of a data set. The range does measure variation, but only between the highest and lower values. (2) 5. Which of the following data sets would have the largest standard deviation? (1) 3, 3, 4, 5, 5 (3) 2, 8,18, 26, 35 (2) 72, 73, 74, 75, 76 (4) 8, 10,12, 14,16 The calculator could be used to find the standard deviation of each set, but it is clear that most have very little variation except (3). Don’t be fooled by the large data values in (2). They have very little variation. COMMON CORE ALGEBRA I, UNIT #10 – STATISTICS – LESSON #4 eMATHINSTRUCTION, RED HOOK, NY 12571, © 2013 (3) 6. We are going to revisit our survey of households that have video enabled devices (televisions, smart phones, tablets, etcetera). Recall that two surveys were done, each with 30 participants. In the first case (Survey A), the survey was random, in the second case (Survey B), the survey only included families with at least one teenager. The dot plots of the results are shown below. Survey A Survey B (a) Enter the data into your calculator and use it to calculate the mean number of devices, the interquartile range, and the standard deviation of both data sets. Round all non-integers to the nearest tenth. Remember, you will have to enter a given data point more than once. For example, in Survey A, you will need to enter 2-0’s, 5-1’s, 3-2’s, etcetera. Use the sample standard deviation. Survey A Statistics: Survey B Statistics: mean 3.4 devices mean 6.4 devices Q1 2 and Q3 4 Q1 4 and Q3 8 IQR 4 2 2 devices IQR 8 4 4 devices s 2.5 devices s 3.3 devices (b) Which of these two survey data sets had the greatest variation in the data? Explain based on the statistics you found in part (a). Survey B clearly had more variation within the data set. The IQR is much higher meaning the middle 50% of the data is more spread out than in Survey A. Also, the standard deviation is higher in Survey B’s results, meaning that a typical data point lies further from the mean. (c) How many of the 30 values in Survey B fall within one standard deviation of the mean? To do this calculation, add the standard deviation and subtract the standard deviation from the mean and then count the number of values between the results of this addition and subtraction. Low Data Point 6.4 3.3 3.1 High Data Point 6.4 3.3 9.7 So, we should count all data points that are from 4 to 9 devices: 5+4+5+2+3=19 of the values COMMON CORE ALGEBRA I, UNIT #10 – STATISTICS – LESSON #4 eMATHINSTRUCTION, RED HOOK, NY 12571, © 2013