Download Algebra 1 Summer Institute 2014 The Fair Allocation Paradigm

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Algebra 1 Summer Institute 2014
The Fair Allocation Paradigm
Summary
Goals
Participant Handouts
Participants will explore
 Understand the sample
and interpret the sample
mean as an indicator of
mean and learn several
fair allocation
ways to describe the degree  Explore deviations of data
of variation in data, based in
values from the sample
how much the data values
mean
vary from the mean
 Understand the mean
mean as the "balancing
point" of a data set
 Learn how to measure
variation about the mean
Materials
Technology
Source
Paper
Colored Pencils
Sticky notes
Snap Cubes
LCD Projector
Facilitator Laptop
1. The Noodle Conundrum
2. The Box Plot
3. Finding the Five
Number
Annenberg Learner
Website
Estimated Time
90 minutes
Mathematics Standards
Common Core State Standards for Mathematics
MAFS.6.SP.1: Develop understanding of statistical variability
1.2: Understand that a set of data collected to answer a statistical question has a
distribution which can be described by its center, spread, and overall shape.
1.3: Recognize that a measure of center for a numerical set of data summarizes all of
its values with a single number, while a measure of variation describe how its
values vary with a single number.
MAFS.6.SP.2: Summarize and describe distributions
2.5: Summarize numerical data sets in relation to their context, such as by:
a. Reporting the number of observations
b. Describing the nature of the attribute under investigation, including how it
was measured and its units of measurement.
c. Giving quantitative measures of center (median and/or mean) and variability
(interquartile range and/or mean absolute deviation), as well as describing any
overall pattern and any striking deviations from the overall pattern with
reference to the context in which the data were gathered.
Standards for Mathematical Practice
1. Make sense of problems and persevere in solving them
2. Reason abstractly and quantitatively
1
Algebra 1 Summer Institute 2014
3. Construct viable arguments and critique the reasoning of others
4. Model with mathematics
Instructional Plan
NOTE: in previous activities we have used the word “average” instead of mean. In this
activity sample mean is formally introduced. The words mean and median refer to sample
mean and sample median when estimating from a sample of data.
In the last activity, we explored the Five-Number Summary and its graphical
representation, the box plot. We also explored the median, a common numerical
summary for a data set.
In this session, we'll investigate another common numerical summary, the mean. We'll
also learn several ways to describe the degree of variation in data, based on how much
the data values vary from the mean.
1. Begin the session by asking participants what is the mean. Listen to a couple of
responses. In this session we are going to see how to interpret the mean. What
does it say about the data? Is it a center point? If it is a measure of center, in what
way is it a halfway point? (Slide 2)
The term average is a popular one; it is often used, and often used incorrectly.
Although there are different types of averages, the typical definition of the word
"average" when talking about a list of numbers is "what you get when you add all
the numbers and then divide by how many numbers you have." This statement
describes how you calculate the arithmetic mean, or average. But knowing how to
calculate a mean doesn't necessarily tell you what it represents.
2. Let's begin our exploration of the mean. The problem is: how large are families?
Ask Participants to think back to when they were in the 4th grade. How many
people were in their family?
They can represent the number of family members in their family when they were
in the 4th grade with snap cubes. Ask the participants to place their snap cubes in a
table in the front of the room. Once there are all in the front of the room, ask them
what could we do with them to better view the distribution. Suggest that we might
want to place them in some order, from smallest to largest for example, and you
do it yourself or ask somebody to do it so there can be a more visual
representation of the distribution. We would like to figure out what is the mean of
the distribution without using the rule of adding them up and dividing by how
many data points are there.
2
Algebra 1 Summer Institute 2014
3. Before figuring out the mean, introduce them to another representation of data:
the dot plot. Imagine you did a similar survey in the past to 9 people and you have
the representation of that data. Show them a poster paper with a dot plot and 9
sticky post-it notes lined up vertically on the column representing the number 5
like the picture below: (Slide 3)
Ask them what they think is the mean of this distribution. Is there any doubt that
the median and the mode are also 5? This data set shows no variation.
4. Ask participants to draw a distribution of 9 family sizes with a mean of 5 that
shows some variation. Does the median have to be 5? Participants will be in
groups of 4 and all will share their distribution using a poster paper and sticky
notes. As a couple of groups present their results, ask them how they decided on
that distribution and what is the median of their distribution. Encourage
presentations that show symmetrical and non-symmetrical distributions of the
data. (Slide 4)
5. Ask participants to write the number of members in their families when they were
in the 4th grade in a sticky post-it note. On a poster paper, participants will stick
their notes vertically in columns representing their number of family members.
Now there are two representations of their data: the snap cubes and the sticky
notes in the poster paper. Ask participants what is the connection between the two
representations and if it is possible to go from one to the other. Which one offers a
better visual representation of the data to them? Why? What could we do to the
representation of the snap cubes to resemble the representation on the poster
paper? Let participants figure it out; one way could be to lay down the columns of
snap cubes and stack together the ones representing the same numbers. Since the
visual representation and the dot plot look so different, the connection between
them might not be obvious for some people. (Slide 5)
3
Algebra 1 Summer Institute 2014
6. Return the snap cubes to their original single column representation. Pose the
question, what can be done to the cubes to find the mean without having to add
and divide? (Slide 6)
Ask a volunteer to come up and explain what they would do and do it. Hopefully
the attempt is to make all the stacks the same in size. What would be the mean, or
an estimate of the mean, without doing any calculations? The purpose is to give a
demonstration to the participants of how to find the mean without doing any
calculations. One of the interpretations of the mean is the number that levels off
the data. It is considered the fair-share value. The mean is the notion of putting all
the data together and redistributing it evenly.
What is the actual mean for this data? If appropriate, leave the result as a mixed
number.
7. What other information could we gather from the data to help us identify where
the data comes from? Direct the participants’ attention to the dot plots they
created with the 9 imaginary families and the mean of 5. Ask them to think about
how different are the data values and how much they vary from the quantity 5 that
represent the mean. The goal is to measure variation about the mean. The
distribution that has the least amount of variation is the one that has all the sticky
notes in the same column 5. What distribution has data points that vary the most
from 5?
We would like to quantify the variation from 5. Pick one distribution and write
down how many units is each data point away from 5. For example if the data
point is 7, the difference from 5 is 2. Write down the difference of each data point
from five on the sticky note. If the data point is 3, the difference is still a positive
2. Once all the 9 differences are written for one distribution, ask participants what
we could do to find the average of the differences?
8. Find the averages of the differences on the board. This quantity is called the
MAD. The MAD (Mean Absolute Deviation) measure tells us how much, on
average, the values in a dot plot differ from the mean. (Slide 7)
Find the MAD from another distribution. What is the MAD telling us? If the
MAD is small, it tells us that the values in the set are clustered closely around the
mean. If it is large, we know that at least some values are quite far away from the
mean. If a distribution has a smaller MAD than a second distribution, the data
shows less variation than a distribution with a larger MAD.
How do we find the MAD? The deviation (D) is the distance between the data
point value and the mean (D= |value – mean|). The result of this difference could
be negative or negative, since we are interested is in the distance, we take the
4
Algebra 1 Summer Institute 2014
absolute value, which results in positive values. Finally, in order to find the MAD,
we add all the differences in absolute value (D) and then divide by the number or
data points (n) in the distribution. MAD = Sum (D)/n.
9. Ask participants to find the MAD of the rest of the displayed distributions of the 9
imaginary families. MAD is not commonly used in statistics but it is a natural step
before developing the idea of standard deviation. MAD and standard deviation
could produce similar numbers, just like the mean and the median. Mean and
median inform us about center, MAD and standard deviation inform us how much
the data points deviate from the mean.
10. As an extension, ask participants if it would be possible to come up with a
distribution of 9 families with a mean of 5 members in the family and a MAD of
1? The reason it is impossible is that the MAD is the total of all absolute
deviations. You may have noticed that in these problems the MAD is the sum of
the deviations divided by 9. For the MAD to equal 1, the sum of the deviations
would have to be exactly
9 (9 / 9 = 1). But the only way that could happen is if the total deviation of the
points below the mean and the total deviation of the point above the mean were
each equal to 4.5. This would require having families with a fractional numbers of
family members, which is not possible.
11. Le us return to the question of how is the mean a measure of center. Pick one of
the distributions used to calculate the MAD, and draw a vertical line through its
mean (5). One of the interpretations of the mean is that the mean is the fair share
value; another is that it is the center of the distances of the values below the mean
and above the mean. Looking at the distribution with the vertical line: How many
values are below the mean? How many are above the mean? Could the mean and
the median be the same in this distribution? If no, why not? Ask participants to
add the distances of each data point below the mean, and do the same for the data
points above the mean. What do they notice? The total distances are the same on
each side, and this is the reason why the mean is a center of measure. (Slide 9)
12. Professional statisticians more commonly use two other measures of variation: the
variance and the standard deviation.
The method for calculating variance is very similar to the method used to
calculate the MAD. The first step in calculating the variance is the same used to
find the MAD: Find the deviation for each value in the set (i.e., how much each
value differs from the mean). The next step in calculating the variance is to square
each deviation. Note the difference between this and the MAD, which requires us
to find the absolute value of each deviation. The final step is to find the variance
5
Algebra 1 Summer Institute 2014
by calculating the mean of the squares. As usual, find the mean of the squares by
adding all the values and then dividing by how many there are.
Find the variance of one of the distributions. Here is a table for this calculation for
a possible distribution: (Slide 10)
Number of
Members in
Family(x)
Deviation
from the
Mean
(x - 5)
Squared
Deviation
from the
Mean
(x - 5)2
2
-3
9
3
-2
4
3
-2
4
4
-1
1
5
0
0
6
+1
1
6
+1
1
8
+3
9
8
+3
9
______
______
______
45
0
38
The mean of the squared deviations is 38 / 9 = 4 2/9, or approximately 4.22. This
value is the variance for this data set. As with the MAD, the variance is a measure
of variation about the mean. Data sets with more variation will have a higher
variance.
The variance is the mean of the squared deviations, so you could also say that it
represents the average of the squared deviations. The problem with using the
variance as a measure of variation is that it is in squared units. To gauge a typical
(or standard) deviation, we would need to calculate the square root of the
variance. This measure -- the square root of the variance -- is called the standard
deviation for a data set.
For the data set given above, the standard deviation is the square root of 4.22,
which is approximately 2.05. Note that this value is fairly close to the MAD for
this distribution, which is 16/9 or about 1.78.
6
Algebra 1 Summer Institute 2014
13. Ask each group of participants to find the variance and standard deviation for the
distribution of the family size of the whole class and compare the standard
deviation to the MAD of that distribution.
The standard deviation, first introduced in the late 19th century, has become the
most frequently used measure of variation in statistics today. For example, IQ
tests are created with an expected mean of 100 and a standard deviation of 15.
14. As an extension to this session, you could ask the following questions: (Slide 11)
a. What would happen to the mean of a data set if you added 3 to every number in
it?
b. What would happen to the MAD of a data set if you added 3 to every number in
it?
c. What would happen to the variance of a data set if you added 3 to every number
in it?
d. What would happen to the standard deviation of a data set if you added 3 to every
number in it?
e. What would happen to the mean of a data set if you doubled every number in it?
f. What would happen to the MAD of a data set if you doubled every number in it?
g. What would happen to the variance of a data set if you doubled every number in
it?
h. What would happen to the standard deviation of a data set if you doubled every
number in it?
Solutions:
a. The mean would increase by 3.
b. The MAD would not change. Since the values in the list are each 3 larger, and the
mean is also 3 larger, the deviations from the mean would remain the same.
c. The variance would not change, since it depends only on the deviation from the
mean, not the values themselves. Since the mean increases by 3 along with the
rest of the data set, none of the deviations will change.
d. Since the standard deviation is the square root of the (unchanged) variance, it will
not change.
e. The mean would be doubled.
f. The MAD would be doubled, since all the deviations are now doubled, and the
MAD is the average of these deviations.
g. The variance would be multiplied by 4. Since calculating the variance involves
squaring the deviations, the newly doubled deviations would all be squared,
resulting in values that are four times as large. For example, if a deviation was
(+3), it now becomes (+6). The value used in the variance calculation changes
from 32 = 9 to 62 = 36, which is four times as large.
7
Algebra 1 Summer Institute 2014
h. The standard deviation would be doubled, since it is the square root of the
variance.
Discussion on the Standard Deviation
The standard deviation is a measure of the spread of scores within a set of data. Usually,
we are interested in the standard deviation of a population. However, as we are often
presented with data from a sample only, we can estimate the population standard
deviation from a sample standard deviation. These two standard deviations - sample and
population standard deviations - are calculated differently.
The formula for the population standard deviation is: (Slide 12)
𝜎 = ∑𝑛𝑖=1
(𝑥𝑖 −𝜇)2
𝑛
where  is the mean.
The formula for the sample standard deviation is:
𝑠𝑑 = ∑𝑛𝑖=1
(𝑥𝑖 −𝑥̅ )2
𝑛−1
where 𝑥̅ is the mean
Notice the difference: the population formula divides by “n”, the sample by “n-1”.
When to use the sample or the population standard deviation:
We are normally interested in knowing the population standard deviation because our
population contains all the values we are interested in. Therefore, you would normally
calculate the population standard deviation if: (1) you have the entire population or (2)
you have a sample of a larger population, but you are only interested in this sample and
do not wish to generalize your findings to the population. However, in statistics, we are
usually presented with a sample from which we wish to estimate (generalize to) a
population, and the standard deviation is no exception to this. Therefore, if all you have is
a sample, but you wish to make a statement about the population standard deviation from
which the sample is drawn, you need to use the sample standard deviation. Confusion can
often arise as to which standard deviation to use due to the name "sample" standard
deviation incorrectly being interpreted as meaning the standard deviation of the sample
itself and not the estimate of the population standard deviation based on the sample.
Quick explanation of the difference:
When the mean is calculated from the n data, there are only n-1 degrees of freedom left to
calculate the spread of the data.
Essentially, the mean used is not the real mean but an estimate of the mean based on
samples of data found in the experiment. This estimate is, of course, biased towards
8
Algebra 1 Summer Institute 2014
fitting the found data because that is how you got it. Therefore estimating how widely
spread around the real mean the data samples are by calculating how spread around the
estimated mean they are is going to give a value biased slightly too low. Using n-1
instead of n compensates.
Source: https://statistics.laerd.com/statistical-guides/measures-of-spread-standarddeviation.php
9