Download CCAlgI.Unit #10.Lesson #4.Variation within a Data Set.Answer Key

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Business intelligence wikipedia, lookup

Data vault modeling wikipedia, lookup

Transcript
Answer Key
Name: _____________________________________
Date: ____________________
VARIATION WITHIN A DATA SET
COMMON CORE ALGEBRA I
Measures of central tendency give us numbers that describe the typical data value in a given data set. But,
they do not let us know how much variation there is in the data set. Two data sets can have the same mean but
look radically different depending on how varied the numbers are in the set.
Exercise #1: The two data sets below each have equal means but differ in the variation within the data set. Use
your calculator to determine the Interquartile Range (IQR) of each data set. The IQR is defined as the
difference between the third quartile value and the first quartile value.
Data Set #1: 3, 3, 4, 4, 5, 5, 6, 6, 7, 8, 8, 9, 9, 10, 10, 11, 11
Q1  4.5 and Q3  9.5
IQR  9.5  4.5  5
Data Set #2: 5, 5, 6, 6, 7, 7, 8, 8, 9, 9
Q1  6 and Q3  8
IQR  8  6  2
The interquartile range gives a good measure of how spread out the data set is. But, the best measure of
variation within a data set is the standard deviation. The actual calculation of standard deviation is complex
and we will not go into it here. We will rely on our calculators for its calculation.
Exercise #2: Using the same data sets above, use your calculator to produce the standard deviation (shown as
 x on the calculator) of the two data sets. Round your answers to the nearest tenth.
Data Set #1:
Data Set #2:
  2.7
  1.4
Standard Deviation
The standard deviation of a data set tells us, on average, how far a data point is away from the mean of
the data set. The larger the standard deviation, the greater the variation within the data set.
Exercise #3: A farm is studying the weight of baby chickens (chicks) after 1 week of growth. They find the
weight, in ounces, of 20 chicks. The weights are shown below. Find the mean, the interquartile range and the
standard deviation for this data set. Round any non-integer values to the nearest tenth. Include appropriate
units in your answers. Give an interpretation of the standard deviation.
2, 1, 3, 4, 2, 2, 3, 1, 5, 3, 4, 4, 5, 6, 3, 8, 5, 4, 6, 3
mean
x  3.7 ounces
interquartile range (IQR)
Q1  2.5 and Q3  5
IQR  5  2.5  2.5 ounces
standard deviation
  1.7 ounces
We can interpret this as saying that on average a
typical chick’s weight lies 1.7 ounces away from the
mean weight of 3.7 ounces.
COMMON CORE ALGEBRA I, UNIT #10 – STATISTICS – LESSON #4
eMATHINSTRUCTION, RED HOOK, NY 12571, © 2013
Exercise #4: A marketing company is trying to determine how much diversity there is in the age of people who
drink different soft drinks. They take a sample of people and ask them which soda they prefer. For the two
sodas, the age of those people who preferred them is given below.
Soda A: 18, 16, 22, 16, 28, 18, 21, 38, 22, 29, 25, 44, 36, 27, 40
Soda B: 25, 22, 18, 30, 27, 19, 22, 28, 25, 19, 23, 29, 26, 18, 20
(a) Explain why standard deviation is a better measure of the diversity in age than the mean.
The mean is only going to give us a sense for what the typical age of a particular soda drinker is. It won’t tell
us how wide of a variety of ages we have. Standard deviation will tell us how far a typical data point is away
from the mean, and thus a higher value means a greater diversity in age than a lower value.
(b) Which soda appears to have a greater diversity in the age of people who prefer it? How did you decide on
this?
Soda A appears to have the greater diversity. The ages range from a low of 16 to a high of 40 and lots of ages
spread between. On the other hand, Soda B has a low of 18 and a high of 30.
(c) Use your calculator to determine the sample standard deviation, normally given as sx , for both data sets.
Round your answers to the nearest tenth. Did this answer reinforce your pick from (b)? How?
s A  9.1 years and sB  4.1 years
This did reinforce the answer from part (b). The standard deviation is much higher for Soda A, with a typical
age being 9.1 year away from the mean, while for Soda B the typical age is only 4.1 years away from the
mean, showing much less diversity in age.
Population Versus Sample Standard Deviation
When we are working with every possible data point of interest, we call this a population and use the
population standard deviation,  . When we have only a sample of all possible values we use the sample
standard deviation, s. The formulas for these two differ very slightly, so their values tend to be slightly
different.
Exercise #5: Which of the following data sets would have a standard deviation (population) closest to zero?
Do this without your calculator. Explain how you arrived at your answer.
(1) 5,  2,  1, 0, 1, 2, 5
(3) 11, 11, 12, 13, 13
(2) 5, 8, 10, 16, 20
(4) 3, 7, 11, 11, 11,18
Choice (3) has almost no variation within the
data set at all. Choice (4) also has little
deviation, but the inclusion of the 3 and 18 will
make it have a larger standard deviation than
Choice (3).
COMMON CORE ALGEBRA I, UNIT #10 – STATISTICS – LESSON #4
eMATHINSTRUCTION, RED HOOK, NY 12571, © 2013
(3)
Answer Key
Name: _____________________________________
Date: ____________________
VARIATION WITHIN A DATA SET
COMMON CORE ALGEBRA I HOMEWORK
1. For each of the following data sets, use your calculator to help find the interquartile range and the
population standard deviation. Show your calculation for the IQR. Round all non-integer values to the
nearest tenth.
(a) 4, 6, 8, 10, 15, 19, 22, 25
(b) 3, 3, 4, 5, 5, 6, 6, 7, 7, 8
Q1  7 and Q3  20.5
Q1  4 and Q3  7
IQR  20.5  7  13.5
IQR  7  4  3
  7.3
  1.6
2. For the data set shown in the dot plot below, which of the following is closest to its population standard
deviation?
(1) 2.7
(3) 3.3
(2) 4.2
(4) 5.8
  3.333
(3)
3. What is the interquartile range of the data set represented in the box plot shown below?
(1) 24
(3) 8
(2) 14
(4) 12
0
10
20
(4)
IQR  18  6  12
4. Which of the following best measures the average distance that a data value lies away from the mean?
(1) mean
(3) median
(2) standard deviation
(4) range
The mean and median measure the central
tendency, not variation, of a data set. The
range does measure variation, but only
between the highest and lower values.
(2)
5. Which of the following data sets would have the largest standard deviation?
(1) 3, 3, 4, 5, 5
(3) 2, 8,18, 26, 35
(2) 72, 73, 74, 75, 76
(4) 8, 10,12, 14,16
The calculator could be used to find the
standard deviation of each set, but it is clear
that most have very little variation except (3).
Don’t be fooled by the large data values in (2).
They have very little variation.
COMMON CORE ALGEBRA I, UNIT #10 – STATISTICS – LESSON #4
eMATHINSTRUCTION, RED HOOK, NY 12571, © 2013
(3)
6. We are going to revisit our survey of households that have video enabled devices (televisions, smart phones,
tablets, etcetera). Recall that two surveys were done, each with 30 participants. In the first case (Survey A),
the survey was random, in the second case (Survey B), the survey only included families with at least one
teenager. The dot plots of the results are shown below.
Survey A
Survey B
(a) Enter the data into your calculator and use it to calculate the mean number of devices, the interquartile
range, and the standard deviation of both data sets. Round all non-integers to the nearest tenth.
Remember, you will have to enter a given data point more than once. For example, in Survey A, you
will need to enter 2-0’s, 5-1’s, 3-2’s, etcetera. Use the sample standard deviation.
Survey A Statistics:
Survey B Statistics:
mean  3.4 devices
mean  6.4 devices
Q1  2 and Q3  4
Q1  4 and Q3  8
IQR  4  2  2 devices
IQR  8  4  4 devices
s  2.5 devices
s  3.3 devices
(b) Which of these two survey data sets had the greatest variation in the data? Explain based on the
statistics you found in part (a).
Survey B clearly had more variation within the data set. The IQR is much higher meaning the middle 50% of
the data is more spread out than in Survey A. Also, the standard deviation is higher in Survey B’s results,
meaning that a typical data point lies further from the mean.
(c) How many of the 30 values in Survey B fall within one standard deviation of the mean? To do this
calculation, add the standard deviation and subtract the standard deviation from the mean and then count
the number of values between the results of this addition and subtraction.
Low Data Point  6.4  3.3  3.1
High Data Point  6.4  3.3  9.7
So, we should count all data points that are from 4 to 9 devices:
5+4+5+2+3=19 of the values
COMMON CORE ALGEBRA I, UNIT #10 – STATISTICS – LESSON #4
eMATHINSTRUCTION, RED HOOK, NY 12571, © 2013