Download Statistics and Probability

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Math 6 Notes – Unit 09: Statistics and Probability
In this unit, students will learn to identify statistical and non statistical questions. They will
draw dot plots, histograms and other graphical representations to visually display data sets.
Then students will calculate measures of center and also analyze the spread of data sets by
drawing box plots, finding the interquartile range, and finding the mean absolute deviations.
NVACS 6.SP.A.1 - Recognize a statistical question as one that anticipates variability in the
data related to the question and accounts for it in the answers.
In order to achieve the goal in CCSS 6.SP.1, students must first understand what a “statistical
question” is and what it is not. Simply put, a statistical question is one that can be answered (in
a variety of ways) numerically. A statistical question anticipates an answer that varies from one
individual to another; in doing so, the responses to a statistical question result in a set of data.
Data are the numbers produced in response to a statistical question. If answers to a statistical
question do not predict variability, then the question is not statistical. For example, a student
asking themselves, “How old am I?” is not a statistical question; the answer is predictable.
Example 1: Students should begin by simply “recognizing” statistical questions. Teachers
should present a group of both statistical and non-statistical questions to students. Student
teams or cooperative groups are allowed time to decide whether each question is statistical or
non-statistical. Then team responses should be shared classroom wide.
Questions - Statistical or not?
a. How many pets do each of your teachers own?
b. How old is the oldest member of a household?
c. What is your classmate’s favorite flavor of ice cream?
d. How many times does a sixth grader eat (on average) each day?
e. What lunch item is served every day on “Pizza Day”?
Questions a, b, and d are statistical; they can be responded to numerically and the questions will
have varied responses. Questions c and e are non-statistical questions. Question c is not a
numerically statistical question because students cannot respond numerically to the question.
Question e is not statistical because the answer would be, predictably, pizza. Additionally, the
answer is not numerical.
Example 2: Teachers should provide students with slides of questions (either via a power point
or Smartboard) to enhance student ability to identify “statistical” questions. Students will
respond individually to whether the question is statistical or not with a thumbs up or thumbs
down. If the question is statistical, students will be instructed to put their thumb up; if not, then
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 1 of 64
they will put their thumb down. After students have the opportunity to respond, teachers should
expand on WHY the questions are either statistical or not.
Possible questions:
At what age do children learn to ride a 2-wheel bicycle?
What are the favorite colors of the students in the class?
Whom do you admire most?
What time do you wake up on school days?
What time does your school begin?
How many pairs of shoes do you currently own?
Which pair of shoes is your favorite?
Next, teachers should allow students (or student groups) the opportunity to formulate both
statistical and non-statistical questions. Teachers should instruct students (or student teams) to
divide their notebook paper into two columns (one labeled “statistical”, one labeled “nonstatistical”). Individuals (or teams) should be given time to brainstorm examples of statistical or
non-statistical questions. Student examples should be shared orally with the class with
explanations provided by the students.
STUDENT REFLECTION: What makes a question numerically “statistical”? Use the word
“variability” in your response. Provide an example (with possible responses) to support your
view.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 2 of 64
NVACS - 6.SP.A.2 - Understand that a set of data collected to answer a statistical question
has a distribution which can be described by its center, spread, and overall shape.
Dot plots (also known as line plots) can be used as a simple means for students to visualize data.
Example: The number of various kinds of snakes found in a zoo is shown in the dot plot
(below). What is the total number of snakes in the zoo?
•
•
•
•
Black
Cobra
Number of Snakes in a Zoo
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Python
Anaconda
Rattle
Snakes
Green
Cobra
Types of snakes
A.
B.
C.
D.
29
25
30
34
In addition to visualizing data, we use dot plots to view the distribution, center, spread, and
overall shape of the data. Dot Plot A (below) shows the writing rubric scores of students based
upon “organization”. Dot Plot B shows the writing rubric scores of the same group of 30
students based upon “ideas”.
Dot Plot A – Organization
X
X
X
X
X
X
X
1
Math 6, Unit 09
X
X
X
X
X
X
X
X
2
X
X
X
X
X
X
X
X
3
X
X
X
X X
X X
4 5
Statistics and Probability
2014-15-NVACS
Page 3 of 64
Dot Plot B – Ideas
X
X
X
X X X
X X X
X X X X
X X X X
X X X X X
X X X X X
X X X X X
1 2 3 4 5
Showing the two dot plots vertically allows students to make direct comparisons in regard to the
center of the graph, its spread, and its overall shape.
Even though students have not been formally introduced to the mathematical definitions of
mean, median, mode, and range, informal discussion about the center and spread of the data and
picture can be discussed.
Questions teachers might ask include:
Where is the overall center of the data in graph A versus graph B?
How is the shape of the graph different in graph B? What do you think causes this difference?
Where are the scores “clustered” in graph A? Graph B?
In which graph is data more spread out? Support your answer.
Where is the actual middle of the set of data for A and B?
Example 1: Students were asked their shoe size. The data was then collected and a dot plot
was produced (with guidance from the teacher). An example dot plot is seen below.
Shoe sizes for 6th Graders
X
X
X
X
X
X
X
X
4 5 6
Math 6, Unit 09
X
X
X
X
X
X
X
X
X
7
X
X
X
X
X
X
X
X X
X X
X X
8 9 10 11 12
Statistics and Probability
2014-15-NVACS
Page 4 of 64
Discussion questions:
How many students responded to the question about their shoe size?
How would you describe the “shape” of the graph? What does that shape indicate?
Locate and describe the “center” of the data. That center is like the _______________?
Most students have a shoe size around 6, 7, or 8; the data is “clustered” there. This means that
if I were to calculate the measures of central tendency………
The data is not very spread out. Why do you think that is so? What does this tell us about the
measures of variability?
STUDENT REFLECTION: What does the shape of the scores have to do with the sameness or
differences in their values? Explain. Does the range of the data affect its shape?
Additional Examples:
Teachers can also collect student data based upon other survey questions. Students can
participate in the collection, organization, and representation of the data via a dot plot (line
plot). Some possible example questions are:
How many pets do your classmates own?
At what age did your classmates begin riding a 2 wheel bicycle?
How many siblings do your peers have?
How much do the students in our math class weigh?
How tall are the students in your math class?
Each of these questions can be utilized by teachers and students to collect, organize, and display
data. At that point students should be challenged to use their graphic representation of the data
(line plot) to draw generalizations about the data set and its shape and spread and obvious
“clusters”.
Student Extension: For more advanced level students, teachers can show a set of data
graphically represented on a dot plot. Teachers along with their students can describe the shape
of the data and its overall spread. Teachers can then challenge student groups to hypothesize
what the data could possibly and logically represent. In other words, students will be asked to
identify the survey population (if any) and formulate a possible survey question that would
result in that set of data.
Now that students understand what statistical questions are they need to be given opportunities
to gather data and make graphical representation of their data. It is necessary to show students
how to create the various types of graphical representations as well as their benefits and
limitations.
To begin working with data, we must first understand the two types of data we will encounter in this
unit—categorical data and numerical data. Categorical data consists of names, labels or other non-
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 5 of 64
numerical values such as movie preferences, types of animals, types of advertising, colors of medals
given to winners of sporting events, etc. Categorical data is usually displayed in circle graphs and
bar graphs. Numerical data consists of numbers such as weights of animals, lengths of rivers,
heights of waterfalls, rainfall in a particular city, etc. Numerical data is displayed in histograms, line
graphs, scatter plots, stem-and-leaf plots, and box-and-whisker plots.
As students begin to work with data displays, it is important that they choose an appropriate
display for the data set. Below is a quick summary of each of the displays students will
encounter in this unit.
Visual
Display
Usage/strength
bar graph
to compare categorical data. This graph
type has no center and no spread.
box-andwhisker plot
to organize numerical data into four groups
of approximately equal size. Used for large
sets of data; appropriate for comparing two
or more sets of data in terms of spread and
skewness.
histogram
line graph
6
7
0359
48
Interval
Key
4 2 = 42
Tally
stem-and-leafplot
to organize numerical data based on their
digits
frequency table
to organize numerical data according to the
number of times the item occurs
Frequency
0-9
15
10 - 19
7
20 - 29
1
30 - 39
6
Math 6, Unit 09
to compare frequencies of numerical data
that fall in equal intervals. Appropriate for
displaying large sets of data or data sets with
a large range. Strength is that it allows data
to be manipulated by varying the intervals;
gives an overall picture of the data by
intervals without highlighting specific pieces
of data within the set
to display numerical data that change over
time
Statistics and Probability
2014-15-NVACS
Page 6 of 64
X
X
X
X
X
X
0
1
X
X
X
X
X
2
3
4
5
6
X
X
7
8
Venn diagram
To show how data are grouped or belong
together.
Dot plot (line
plot)
To display numerical data. Good to use for
small to moderate sets of data with small to
moderate range. Strength is that it is easy to
create, highlights the distribution including
clusters, gaps, and outliers
DOT PLOTS (LINE PLOTS)
Dot plots (line plots) is the most basic type of graph for representing data was featured in
previous examples. It gives students a good visual representation of the shape of the data,
where the center might be, and a visual of the data’s variability or spread. However, it is not
very useful when there are a lot of data pieces because it can be cumbersome to create.
FREQUENCY TABLE and HISTOGRAM
The frequency of a data value is the number of times it occurs in a data set. How can you tell
the frequency of a data value by looking at a dot plot?
It is sometimes more convenient to show data that has been divided into intervals than to
display individual data values. A Histogram is a type of bar graph whose bars represent the
frequencies of data within intervals.
Example: Make a histogram for the data given:
12, 3, 8, 1, 1, 6, 10, 14, 3, 6, 2, 1, 3, 2, 7
First, make a frequency table:
Interval
1-4
5-8
9-12
13-16
Tally
Frequency
How many data values are in this interval?
8
4
2
1
A histogram is made up of adjoining vertical rectangles or bars. In a histogram the horizontal
axis typically identifies the topic of the graph and the vertical axis describes the frequency of
those observations.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 7 of 64
Age of Children at the Park
9
8
# of children
7
6
5
4
3
2
1
0
1-4
5–8
9 - 12
Age Groups
13 - 16
STEM-AND-LEAF PLOT
The following test scores are used to construct a stem-and-leaf plot:
82, 97, 70, 72, 83, 75, 76, 84, 76, 88, 80 81, 81, 82, 82
First determine how the stems will be defined. In this case, the stem will represent the tens
column in the scores, the leaf will be represented by the ones column.
When the information is presented, it will be in two parts, the stem and leaf. For instance,
5 | 7 4 would be read as follows: The stem represents fifty, and the leaf has two scores, 7 and 4.
Reading that information then gives 57 and a 54.
Student’s Test Scores
Since the lowest score is in the 70’s and the highest is in the
90’s, the stem will consist of 7, 8, and 9. Usually, the
smaller stems are placed on top, but they can be arranged
from largest to smallest. Another decision to be made is
whether or not to put the scores in order in the leaf portion.
Stems
7
8
9
Leaves
0 2 5 6 6
2 3 4 8 0 1 1 2 2
7
Key: 8|1 = 81
If the stem-and-leaf plot were to be rotated 90 degrees (a quarter turn), the graph would
resemble a bar graph (which leads to the next type of graph to discuss).
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 8 of 64
BAR GRAPH
Using the same information above, let’s construct a bar graph to show how many A, B, C, D,
and F’s there are. A’s are defined as 90 and above, B’s from 80 to 89, C’s 70 to 79, etc.
Sometimes, placing bars side-by-side in pairs makes it easier to display the kinds of
comparisons you want to show. This is called a double-bar graph.
Let’s compare the grades earned in the class discussed so far to another classroom of students.
Both the old data and the new data is summed up in the table below:
First classroom
Second classroom
Number of A’s
1
3
Number of B’s
9
7
Number of C’s
5
5
The double bar graph
could look like:
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 9 of 64
POINTS ON THE COORDINATE PLANE
(which helps with creating grids for line graphs and scatter plots)
A coordinate plane is formed by the intersection of a horizontal number line (called the x-axis) and a
vertical number line (called the y-axis). It consists of infinitely many points called ordered pairs. The
intersection of the two axes is called the origin. The coordinate plane is divided into six parts: the xaxis, the y-axis, Quadrant I, Quadrant II, Quadrant III, and Quadrant IV.
Quadrant II
y-axis
y
Quadrant I
6
5
4
3
2
1
-6 -5 -4 -3 -2 -1
-1
-2
-3
-4
-5
-6
x-axis
1 2 3 4 5 6 x
Quadrant III
Quadrant IV
The axes divide the plane into four regions called quadrants which are numbered I, II, III, and
IV in the counterclockwise direction.
An ordered pair is a pair of numbers that can be used to locate a point on a coordinate plane.
The two numbers that form the ordered pair are called the coordinates.
The origin is the intersection of the x- and y-axes. It is the ordered pair (0, 0).
The ordered pairs are listed in alphabetical order; (x, y) – (input, output), (horizontal axis,
vertical axis), (abscissa, ordinate), (domain, range). The alphabetical order will help you
remember these terms.
y
To find the coordinates of point A in Quadrant I, start from
the origin and move 2 units to the right, and up 3 units.
Point A in Quadrant I has coordinates (2, 3).
To find the coordinates of point B in Quadrant II, start
from the origin and move 3 units to the left, and up 4 units.
Point B in Quadrant II has coordinates (−3, 4).
B
6
5
4
3
2
1
-6 -5 -4 -3 -2 -1
-1
-2
C
-3
-4
-5
-6
A
1 2 3 4 5 6x
D
To find the coordinates of the point C in Quadrant III, start
from the origin and move 4 units to the left, and down 2
units. Point C in Quadrant III has coordinates (−4, −2).
To find the coordinates of the point D in Quadrant IV, start from the origin and move 2 units to
the right, and down 5 units. Point D in Quadrant IV has coordinates (2, −5).
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 10 of 64
y
5
Notice that there are two other points on the above graph,
one point on the x-axis and the other on the y-axis. For the
point on the x-axis, you move 3 units to the right and do not
move up or down. This point has coordinates of (3, 0). For
the point on the y-axis, you do not move left or right, but
you do move up 2 units on the y-axis. This point has
coordinates of (0, 2). Points on the x-axis will have
coordinates of (x, 0) and points on the y-axis will have
coordinates of (0, y).
-5
5
x
-5
To plot points in the coordinate plane, you must follow the
direction given. The first number in an ordered pair tells you to move left or right along the xaxis. The second number in the ordered pair tells you to move up or down along the y-axis.
Example: To plot the point (4, 2), we start at the origin and move four units to the right and then
move two units up.
When you make a graph, the first thing you do is choose the axes. Next, you choose the scale—
the numbers running along a side of the graph. The difference between numbers from one grid
line to another is the interval. The interval will depend on the lowest and highest values in your
data. When you can, you should choose scales with the numbers starting at zero and increasing
by ones or other convenient equal intervals.
LINE GRAPHS
A line graph displays a set of data using line segments. This graph shows data that has changed
over time.
Example: The following data represents the monthly class average over a 12 month period.
82, 97, 70, 72, 83, 75, 76, 84, 76, 88, 80, 81
The data can be arranged on a line graph:
Class Averages
100
95
90
85
80
75
70
65
60
55
J
F
M
A
M
J
J
A
S
O
N
D
The advantage of a line graph is that it shows changes in data over a period of time.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 11 of 64
Drawing Conclusions from Data
When looking at results, you must consider several questions like: Is the data display
misleading? What is the margin of error? Are the conclusions supported by the data?
Margin of error of a random sample defines an interval centered on the sample percent in
which the population percent is most likely to lie.
Example: A sample percent of 27% has a margin of error of ±4%. Find the interval in which
the population percent is most likely to lie.
27% − 4% =
23% and 27% + 4% =
31% . The interval is between 23% and 31%.
If p % of a sample gives a particular response and the sample is representative of the
population, then:
# of people in the population
p % ⋅ ( # of people in population ) =
giving the response
Example: A survey of 300 randomly selected cat owners finds that 120 cat owners prefer
Brand C cat food. How many owners in a town of 2000 cat owners prefer Brand C cat food?
The percent of cat owners in the sample that preferred Brand C is
120
= 40% ; 40% of 2000 = 800 cat owners
300
Example 1: Which of the following graphs most accurately depicts the hourly wages earned
with respect to time worked?
B
C
Hours Worked
Hours Worked
D
Earnings
Earnings
Earnings
Earnings
A
Hours Worked
Hours Worked
Discussion: Students should recognize that if one is paid by the hour, then only Graph C could
illustrate it. A person would not earn money at no (zero) hours as shown in graph A. Looking
at the other graphs, you could rule out B as a person would not earn the same amount for
different amounts of hours worked. Graph D shows a person earning less money as the person
works more hours.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 12 of 64
Example 2: The graphs below show the numbers of baskets made by Player A and Player B
during 5 basketball practices. Each player takes 100 practice shots during each practice.
According to the graphs, who was more successful at making baskets?
Player B
Player A
100
80
Number of Baskets
Number of Baskets
100
80
60
40
20
A. Player A did much
better.
B. Player B did much
better.
C. Their scores appear
to be about the
same.
D. More information is
needed.
80
78
60
76
40
74
20
72
0
0
0
1
2
3
4
0
5
1
Practice
2
3
4
5
Practice
Discussion: In this problem, there will be students who think Player B has scored the most
baskets because of the steepness of the linear segments. Some students will think Player A
scored the most baskets because his line segments looks consistently higher. They need to look
carefully at the vertical scaling and note that Player A started about 70-71 and ended at 80, the
same as Player B. (Answer is C.)
Example 3: Why is this graph misleading?
Season Attendance (in thousands)
# of spectators (thousands)
50
The heights of the balls are used to
represent the number of spectators.
However, the area of the balls
distorts the comparison.
40
30
The attendance for both sports
doubled in the 10 year period; the
size of the basketball makes it look
like that increase may have been a
lot more.
20
10
0
0
1990
2000
1990
2000
Year
Example 4: Why is this graph misleading?
The two bar graphs shown here picture the advertised gasoline mileage for four cars.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 13 of 64
The graph on the right distorts the differences in mileage because its left-hand scale does not
start with 0. In reading a graph, always check to see that numbered scales start with 0.
NVACS – 6.SP.B.4 - Display numerical data in plots on a number line, including dot plots,
histograms, and box plots.
Box-and-Whisker Plot
The purpose of a box-and-whisker plot is to organize numerical data into four groups of
approximately equal size. Let’s take a look at what might be a way of giving notes to your
students to help them learn the steps for creating a box-and-whisker plot. As you demonstrate
the example in the third column, students are working with you taking the notes. Column 1
allows for guided, group or independent practice once they have an idea how to make a boxand-whisker plot.
You try:
Making a Box & Whisker Plot
Ex:
Steps
5 10 7 9 8 6 11
10 12 8 14 16 16 11 13 11
15 8
1. Arrange data in increasing
order (minimum value &
maximum value are the
endpoints).
2. Find median of the entire list
(median value).
5 6 7 8 9 10 11
5 6 7 8 9 10 11
a. If there is a number in the
list that is the middle
term, circle it and draw a
line thru it. (median)
8 is the median
5 6 7 8 9 10 11
b. If there is not a number
that is in the middle, draw Does not apply to this problem
a line between the two
numbers. (Median is the
number halfway between
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 14 of 64
the two numbers.)
3. Look at the bottom half of the
numbers. Find the median of
the bottom half of numbers
(lower quartile, same as #2
above).
4. Look at the top half of the
numbers. Find the median of
the upper half of numbers
(upper quartile, same as #2
above).
5. Draw a number line that will
cover the range of data
(evenly spaced marks).
5 6 7
9 10 11
4
6
8
10
12
14
16
4
6
8
10
12
14
16
6. Slightly above the number
line place dots at the
following points: minimum,
lower quartile, median, upper
quartile, and maximum.
7. Draw a box with side borders
being the lower and upper
quartiles.
Draw two lines, one from
each side of the box,
connecting the minimum
point on one side and the
maximum point on the other
side.
Draw a vertical line at the
median point from the top to
the bottom of the box.
lower quartile
median
6
8
10
12
upper quartile
maximum
minimum
4
4
6
8
10
12
14
16
The data in the box represents the interquartile range – IQR, the average, the middle 50%.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 15 of 64
14
16
The whisker on the left represents the bottom quartile, the bottom 25%; the whisker on the right
represents the top 25%.
The difference between the upper and lower quartiles is called the “interquartile
range” (IQR).
A statistic useful for identifying extremely large or small values of data is called an
“outlier”. An outlier is commonly defined as any value of the data that lies more than
1.5 IQR units below the lower quartile or more than 1.5 IQR units above the upper
quartile.
As we get more in depth with these plots we will find that an extreme value is
commonly defined as any value of the data that lies more than 6 IQR units below the
lower quartile or more than 6 IQR units above the upper quartile.
In our example the lower quartile was at 6, the upper at 10.
Using that the IQR =
. Multiplying that by 1.5, we have
.
Therefore, any score below 6 − 6 = 0 is an outlier, as is any score above
There are no points below 0, so we are OK on the left. There are no points greater than
16, so we are OK on the right.
This would be an ideal place to use technology. Next are the instructions for
drawing a box-and-whisker plot on the TI-84. The examples will address
outliers.
Entering Data and Drawing a Box-and-Whisker plot on the TI-84
1. STAT
2. EDIT
3. Enter the numbers in List 1 (L1) or List 2 (L2)
4. Return to the home screen
5. STAT PLOT
modified
regular
6. Turn Stat Plot 1 on and select the type of boxplot (modified or
regular)
7. ZOOM
8. ZoomStat (9)
Modified
9. GRAPH
Regular
The TI-84 graphing calculator may indicate whether a box-and-whisker plot includes outliers.
One setting on the graphing calculator gives the regular box-and-whisker plot which uses all
numbers, so the furthest outliers are shown as being the endpoints of the whiskers.
Another calculator setting (modified) gives the box-and-whisker plot with the outliers specially
marked (in this case, with a simulation of an open dot), and the whiskers going only as far as the
highest and lowest values that aren't outliers.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 16 of 64
Find the outliers and extreme values, if any, for the following data set, and draw the boxand-whisker plot. Mark any outliers with an asterisk and any extreme values with an
open dot.
20, 21, 21, 23, 23, 24, 25, 25, 26, 27, 29, 33, 40
To find the outliers and extreme values, I first have to find the IQR. Since there
are thirteen values in the list, the median is the seventh value, so Q2 = 25. The
first half of the list is 20, 21, 21, 23, 23, 24, so Q1 = 22; the second half is 25, 26,
27, 29, 33, 40 so Q3 = 28. Then IQR = 28 – 22 = 6.
The outliers will be any values below 22 – 1.5×6 = 22 – 9 = 13 or above 28 + 1.5×6 = 28 + 9 =
37. The extreme values will be those below 22 – 3×6 = 22 – 18 = 4 or above 28 + 3×6 = 28 + 18
= 46
Another example: L2 =21, 23, 24, 25, 29, 33, 49
So I have an outlier at 49 but no extreme values, so I won't have a top whisker because Q3 is
also the highest non-outlier, and my plot looks like this:
Venn Diagrams
Venn diagrams are used to show relationships among sets of objects.
Let’s look at a Venn Diagram made up of three sets in which the regions are labeled. We’ll
describe each region.
U
1
3
2
B
A
5
4
6
7
C
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
8
Page 17 of 64
•
•
•
•
Region 5 is in all three circles. So any elements in region 5 would belong to all three
sets.
What about Region 2? Those are the elements in A and B, but not C.
How might you describe Region 6? Those are elements in B and C, but not in A.
Try Region 4. The elements in A and C, but not B.
Let’s look at some more regions. Region 1 describes the elements in A only.
What about region 3? Those elements are only in B. Region 7 then would be the elements in C
only. Region 8 would describe elements that are not members of any of the sets, but belong to
the universal set.
It’s important that you become familiar with how each of those regions might be described.
Being able to describe those regions would allow you to solve some problems.
Let’s try a problem.
Example: A survey was taken of 650 university students. It was reported that 240 were taking
math, 290 were taking biology, and 270 were enrolled in chemistry. Of those students, 80 were
taking biology and math, 70 were taking math and chemistry, 60 were taking biology and
chemistry, and 50 were taking all three classes. How many students took math only?
A Venn Diagram has been drawn to help show the problem.
140
30
200
B
M
50
20
10
190
C
How many students took only math?
Other questions you can answer:
How many students took math and biology, but not chemistry?
How many students took math and chemistry, but not biology?
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 18 of 64
How many students took biology and chemistry, but not math?
How many students took exactly two of the courses?
Now that students make a graphical representation of the data and discuss the general shape,
center and spread of the data, it is time to move them to be able to compute measure of center
and variability.
3 Measures of Central Tendency:
1. Mean
2. Median
3. Mode
MEAN
Students need to conceptually understand the concept of mean. Be sure to give them time to
experience that mean is an “equal distribution” or everyone getting a “fair share”.
1. One way to do this is to bring a large bag of treats (eg. M&M’s, Jolly Ranchers, etc.)
and distribute them to students at random in varying amounts. The students who get
nothing will usually begin to comment or complain that they didn’t get any, another will
say they didn’t get their share, others will say they got less than their neighbor . This
sets the stage for a great discussion on “equal distribution” or “fair share” where
everyone gets the same amount.
2. Another way to do this is give a problem and have students solve it using unifix cubes,
blocks, chips, etc.
Example: The table shows the number of students absent from 4th period last week.
What was the mean number of students absent per day?
Day
# of students
absent
Monday
2
Tuesday
5
Wednesday 2
Thursday
1
Friday
5
1. Have students model the data using their manipulatives.
2
Math 6, Unit 09
5
2
1
5
Statistics and Probability
2014-15-NVACS
Page 19 of 64
Have students “even out” or “equally distribute” the counters until each column has the same
number of counters. (remind students they have 5 columns, Monday – Friday, and must have 5
columns in the end.)
3
3
3
3
3
Example: With large groups or whole class, have each student create a stack (of unifix
cubes) that represents the number of letters in their last name. (eg. Long =
) Have
students display their stacks together. Have students compute the mean of
the letters in their classes last names using only the unifix cubes and
redistributing them or evening them out.
Of course you must be ready to discuss remainders and what they represent. Let’s say
the last names you were working with looked like this:
Once distributed it might look like this.
Since each of the columns
are 6 blocks high and a few
extra, the mean is 6 and
something.
What do we do with the 3
extra blocks? Since we
have 3 extra blocks and 5
columns our mean is 6 3/5.
Procedurally, the mean is the one that is probably most familiar; it’s the one often used in
school for grades. To find the mean, you simply add all the scores and divide by the number of
scores. For example, if a student scores 70, 80, and 90 on three tests, the mean is calculated as
follows: add the three scores, 70 + 80 + 90 = 240, then divide the sum by the number of data
pieces→ 240/3 = 80. The mean is 80. The average is 80.
Example: Find the mean of 72, 65, 93, 85, and 55.
First add those 5 scores together; 72 + 65 + 93 + 85 + 55 = 370
Second, divide that total by the number of scores, 370 ÷ 5 = 74. The mean is 74.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 20 of 64
Example: Five kids just finished bowling one game. The average score of the five kids is 82.
What is the total of all 5 scores?
Having a mean of 82 does not mean each kid scored an 82—it means if the
scores were distributed equally, they would each have 82. Their total score is
82 ⋅ 5 =
410.
When thinking of the mean, you need to think of the
TOTAL units being distributed EQUALLY.
Problem: If the mean is 6, find the missing value for the set of numbers 3, 4, 5,
, 9.
Method 1
Looking at the data given we know the mean is 6 and there are 5 data in the set.
So the total sum of the data must be 6 x 5 = 30.
Adding the data that we have 3 + 4 +5 + 9 = 21.
=9
The missing value must be 30 – 21 = 9.
Method 2
Knowing the mean is 6, examine each piece of data given with reference to the mean being 6.
3
4
5
9
-6
-6
-6
-6
-3
-2
-1
+3
Summing the differences you get -3. Since you are 3 short (negative)
add 3 to the mean, 6 + 3 = 9.
=9
Method 3
3 + 4 + 5 + x + 9 = 30
6
x + 21 = 30
x=9
Math 6, Unit 09
OR
3 + 4 + 5 + x +9 = 6 + 6 + 6 + 6 + 6
x+ 21 = 30
x=9
Statistics and Probability
2014-15-NVACS
Page 21 of 64
MEDIAN
The median, often used in finance, is the middle score when the data is listed in either ascending
or descending order. If there is no middle score, take the two middle scores, add them and
divide by 2. It’s also referred to as the average of the two middle scores.
Example:
Find the median of 72, 65, 93, 85, and 55.
Step 1:
Rewrite the data in ascending order: 55, 65, 72, 85 and 93.
Step 2:
The middle score is 72; therefore, the median is 72.
Notice in the first two examples the data is the same, but the mean and median are not the same.
Example:
Find the median of 72, 65, 93, 85, 74, and 55.
Step 1:
Step 2:
Step 3:
Rewrite the data in ascending order: 55, 65, 72, 74, 85, 93
Notice there is no middle score.
Add the two middle scores together and divide by 2.
72 + 74 = 146,  146 ÷ 2 = 73. The median is 73.
When thinking about the median, you need to think of the
MIDDLE score if they are listed in ORDER.
MODE
MODE
The mode is the value point that appears most frequently.
Example:
The following are test scores: 55, 64, 64, 76, 78, 81, 81, 81, and 92.
Find the mode.
The score that appears most often is 81.
When thinking about the mode, you need to think of the
score that APPEARS MOST FREQUENTLY.
Note: a distribution may have no mode, one mode or more than one mode.
Example 1: Find the mode of 55, 64, 64, 76, 78, 81, 81, 81, and 92.
What scores appears most often? The mode is 81.
Example 2: Find the mode of 8, 9, 11, 14, 15, and 17.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 22 of 64
What scores appears most often? None of these, so there is no mode.
Example 3: Find the mode of 17, 15, 15, 14, 14, 11.
What scores appears most often? Both 15 and 14 appear twice, so the
modes are 15 and 14.
An outlier – an extreme value that is much smaller than or much larger than the rest of the data
in a data set – can greatly affect the mean. It is important to note “which” measure of central
tendency is “most” useful to use.
Measure Most useful when
mean
the data are spread fairly symmetrically without outliers
median the data set has an outlier
the data involve a subject in which many data points of one value
mode
are important, such as election results
These three measures of central tendency, most often referred to as averages, describe a set of
data using a single number. By condensing the information like that, the whole picture may not
be seen.
The following example shows how using an average or mean to describe a performance may be
misleading. Let’s say Abe, Ben and Carl each bowl 3 games. Three games later, they all found
they had a mean of 80. Here are the scores for each person:
Abe’s scores: 80, 80, 80
Ben’s scores: 70, 80, 90
Carl’s scores: 65, 75, 100
In this case, the mean may not be a good indicator or each person’s performance. Looking at
Carl’s scores, it appears he’s a little erratic. It might be difficult to predict what he might score
on the next game using the average. Abe, on the other hand, looks pretty stable as he’ll
probably score an 80 on the next game.
Abe and Ben both have the same average—this example shows that one mean is a pretty good
descriptor, which would allow you to predict more comfortably what might happen next. In
other words, the mean is doing a pretty good job of describing what’s happening.
Carl’s mean is not as good of a descriptor as Abe’s. Although his average is 80 (like Abe’s),
the mean does not do a good job of describing what is happening. Abe’s mean better describes
what is occurring than Carl’s mean. But, if the scores were not shown, it would not be evident
how consistent Abe is and how erratic Carl is because their averages are both 80.
Sometimes, more information is needed. One way to do this is to look at all the scores and try
to determine consistency. In math, rather than looking at the whole set of data, only the high
and low scores might be examined. It might be someone just had one super high or low score
that really affected the mean. In other words, determine the spread of the scores. In statistics,
that’s referred to as variability. There are three ways to measure this spread or variability.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 23 of 64
3 Measures of Spread/Variability
1. Range
2. Interquartile Range (IQR)
3. Mean Absolute Deviation (MAD)
RANGE
The range is just the difference between the top score and the bottom score. The larger the
range, the less likely the mean can be depended upon as a good descriptor or predictor.
In the last example, the range of Abe’s scores was zero. The range of Carl’s scores was 35 and
Ben’s range was 20.
INTERQUARTILE RANGE (IQR)
Students can describe measures of variability in a data set by finding the interquartile range and
the mean absolute deviation (MAD).
The interquartile range is found by subtracting the lower quartile from the upper quartile.
Example:
Student’s Heights (in inches)
60
65
58
61
54
62
56
59
63
56
61
58
When the student data is ordered it appears like this:
54
56
56
58
58
59
60
61
61
62
63
65
The median of the data is 59.5 (59 + 60)/2. The significance of this number is that half the data
is less than 59.5 and half the data is greater than 59.5.
The Lower Quartile is the median of the lower half of the data. In this example, the lower half
of the data is 54, 56, 56, 58, 58, and 59. The median of these six data is 57 (56 + 58/2).
The Upper Quartile is the median of the upper half of the data. In this example, the upper half
of the data is 60, 61, 61, 62, 63, and 65. The median of these six data is 61.5 (61 + 62/2).
The Interquartile Range is the difference between the upper quartile and the lower quartile. 61.5
– 57 = 4.5.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 24 of 64
The smaller the interquartile range, the less variability there is in the data. The greater the
interquartile range, the greater the variability in the data.
An interesting reason to study box-and-whisker plots is when we begin to compare two or more
plots and examine the data. Following are several examples to highlight this concept.
Example: Given below are boxplots displaying the annual temperatures for two cities, Seattle
and Boston. What information and generalizations can you see in the plots?
Seattle
41
45
52
61
66
Boston
29
20
30
51.5
36.5
40
50
74
66.5
60
70
80
Some information you should note during your discussion with students, could include, but not
be limited to:
• The median temperature for Seattle and Boston is essentially the same.
• Boston’s data is more spread out.
• Boston’s range is greater so its temperature fluctuates more than Seattle’s.
• Boston’s temperature range is wider than Seattle’s.
• Boston has greater high temperatures and lower low temperatures.
• More than 25% of the days in Boston the low temperatures are below Seattle’s lowest
temperature.
• 1/4 of the days with high temperatures in Boston are higher than Seattle’s highest
temperature.
Example: Look at the three boxplots below. Even without a scale, what can you
say about the 3 temperature plots in general?
A.
B.
C.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 25 of 64
General discussions might include:
Plot A
•
•
•
•
This city has a lot of cold days (in comparison to the other two).
¼ of the days the temperatures are very close (in the lower quartile).
The number of days of lowest temperatures appear to have a great range.
The number of days of temperatures in the upper quartile have a great
range.
Plot B
•
•
•
This city has a lot of hot days (in comparison to the other two).
The variability of cold temperatures is great.
The variability of high temperatures for ½ the year is small.
Plot C
•
•
This city has real extremes - outliers.
Without the outliers, this city has the smallest range of temperatures.
Looking back at the boxplots, if you were told one of the graphs represents temperatures in Las
Vegas and another represents temperatures in Hawaii which graphs might represent them? Let
students explain why they would choose one plot over another.
In several Take It To the Mat articles, the issue of box-and-whiskers is addressed. Below is an
excerpt from one of the articles. For the complete articles go to www.rpdp.net > Math > Take It
To The Mat> HS Edition>Data Analysis and Probability> October 2003, November 2003 or
December 2003.
The impression of Las Vegas residents is that October 2003 was unusually warm. Half of the
days had high temperatures of 89º F or more and fully three-fourths of days were at or above
84º. The coolest high temperature was 63º, but after looking at the raw data that seems like an
anomaly when compared to the rest.
Was it really that warm in October 2003? How did it compare with the year before?
One of the powers of boxplots is to answer questions about comparing distributions.
In this case, we want to compare the highs in October 2003 with those from October
2002. The five-number summary for October 2002 is {59, 74, 80, 84, 92}.
Parallel boxplots for both
Octobers 2002 and 2003 are
shown at right.
2002
2003
50
x
70
60
80
90
100
110
High Temperature (deg F)
The boxplots clearly show that October 2003 was warmer, on the whole, than October 2002.
The median high temperature in October 2003 is 9 degrees warmer than in October 2002. The
coolest three-fourths of high temperatures in 2002 were below the first quartile in 2003. The
middle halves of each data set don’t even overlap!
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 26 of 64
We know that October 2003 was much warmer than October 2002, but how does it compare
with what is considered “normal?” (Normal is the average daily high temperature since 1937.)
Norm
Consider the parallel boxplots
at right and draw your own
conclusions.
2002
2003
x
50
70
60
80
90
100
110
High Temperature (deg F)
Useful suggestions for ways to connect these vocabulary words and concepts with your students
may include:
Box and Whiskers Song
By: Karl Spendlove
Sung to the tune of “Oh My Darling, Clementine”
Put in order
Find the median
Find the median of the top
Find the median of the bottom
Then you draw the whisker plot
Chorus:
Box and whiskers
Box and whiskers
Put the data to the test
Boxes show the middle 50
And the whiskers show the rest.
MEAN ABSOLUTE DEVIATION (MAD)
Students should additionally be introduced to mean absolute deviation (MAD) as a way to
gauge variability. MAD is the average of how far away each piece of data is from the mean.
Simply put, the greater the relative MAD, the more variability in the data set. The smaller the
MAD, the less the variability of the data set as a whole.
To compute mean absolute deviation:
Step 1. Find the mean of the data.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 27 of 64
Step 2. For each piece of data, find the distance that data is from the mean and take it’s absolute
value.
Step 3. Find the average of the distances from the mean.
Example:
45
45
45
65
70
55
70
Period 1 Test Scores
55
55
60
70
70
70
75
60
60
60
65
65
75
75
80
85
100
Step 1. Mean (1575/24 = 65.625 ≈ 66%)
Step 2. Find the distance each score is from the mean and take it’s absolute value.
45
45
45
55
55
55
60
60
60
60
65
65
absolute
21
distance from mean
21
21
11
11
11
6
6
6
6
1
1
65
70
70
70
70
70
75
75
75
80
85
100
4
4
4
4
4
9
9
9
14
19
34
score
score
absolute
distance from mean 1
Step 3. Find the average of the distances: 234 ÷ 24 =
9.875
The mean absolute deviation is 9.875
Example:
score 45
70
Period 2 Test Scores
70
75
75
80
80
80
80
85
85
85
absolute
distance from mean 20
85
score
15
90
15
90
10
90
10
95
5
95
5
95
5
95
5
100
0
100
0
100
0
100
absolute
distance from mean
5
5
5
10
10
10
10
15
15
15
15
0
Mean (2045/24 = 85.21= 85%)
225/24 = 9.375 Mean Absolute Deviation
Period 1 has a mean absolute deviation of 9.875; period 2 has a mean absolute deviation of
9.375 (similar, but slightly lower). What does this tell us about the data sets? First of all it tells
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 28 of 64
us that according to this measure of variance, these two data sets are relatively similar. Overall,
the data in each set is clustered reasonably close together. Outliers, students should understand,
can have a great effect on the mean absolute deviation.
NVACS - 6.SP.A.3 - Recognize that a measure of center for a numerical data set summarizes
all of its values with a single number, while a measure of variation describes how its values
vary with a single number.
In order to master standard 6.SP.A.3, students will need an understanding of the difference
between measures of center (mean, median, and mode) versus a measure of variability (range,
IQR, MAD). These measures are typically taught as a unit (often in one day) however, it would
be advantageous, in terms of student understanding, to separate them from the beginning of the
lesson.
Measures of center describe the data with a single numerical value; however, a measure of
variability, describes how the data differs (or varies) from other data values in the set.
Simple examples can help your students to see the difference between measures of center and
measures of variability.
Example 1: The following data set represents the ages of people who attended the 70th birthday
party of your grandmother at the senior center:
70, 75, 80, 72, 82, 81, 73, 78, 82
Measures of Center
Measure of Variability
Mean= 693/9 = 77
Range = 82 – 70 = 12
Median= 78
Mode= 82
The measures of center reasonably describe the data. If someone was to ask you, “How old
were the people who attended your grandmother’s birthday party?”, any one of the measures of
central tendency would be a “reasonable” representation of the data although in this case the
mean and median are better than the mode. However, the range does not reasonably describe
the data in one number; were the people who attended grandmother’s birthday party around 12
years old? No, all the range tells us here is that the people who attended are elderly.
Other data sets where the range is significantly different from the measures of center provide
students with an understanding that sets “measures of center” apart from “measures of
variability”.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 29 of 64
Example 2: Provide students with a dot plot of data.
Student Scores on Math
Constructed Response
X
X
X
X
X X
X X
X X
X X X
X X X
X X X X
0 1 2 3
Ask students to consider the data shown in the dot plot of constructed response scores among
students.
The following questions are appropriate discussion questions:
•
•
•
•
•
How many students are represented in the data set for students constructed response
scores? How do you know the sample size?
What are the measures of center (mean, median, and mode)? What do these values
mean? How do they compare with one another?
What is the range (measure of variability) of the data set? What does its value mean?
If you have to choose one number to DESCRIBE the data set, which measure would you
use? Defend your choice.
In this example, why do you think the measures of center and the measure of variability
are so similar? Explain your reasoning.
Student Reflection: Identify the measures of center. Identify a measure of variability. In what
ways are they alike? Different? Which measure of center best describes a set of data?
NVACS – 6.SP.B.4 - Display numerical data in plots on a number line, including dot plots,
histograms, and box plots.
In order for students to display numerical data they need to be equipped to make decisions and
perform mathematical calculations. Decisions need to be made in regard to which graphical
display is MOST appropriate for the data. Basic mathematical calculations often need to be
made to choose a graphical display and to accurately graph the given data. Students are
expected to choose the most appropriate display for a given set of data based upon their
understanding of the data and the various ways to display it. Additionally, students are
expected to read and interpret graphs from other sources.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 30 of 64
Type of Display
Dot Plot
Example A
Histogram
Example B
Box Plots
Example C
Strengths of the Display
Simple; easy to create; highlights the
distribution of the data including
clusters, gaps, and outliers
Allows data to be manipulated by
varying the intervals used in the
histogram; gives an overall picture of
the data by intervals without
highlighting specific pieces of data
within the set
Organizes the data into four
manageable quarters; allows students
to view the spread of the data and it
clusters in a moderate or larger sets of
data
Type of Data Set Best Suited for
the Display
Small to moderate sets of data; data
with a small to moderate range;
repeated numbers
Appropriate for displaying large
sets of data or data sets with a large
range
Moderate to large sets of data;
appropriate for comparing two or
more sets of data in terms of spread
and skewness
Example A: Eighteen students received a score on their practice constructed response for the
CRT. Students were scored on a rubric that ranged from 0 to 3 points. The following group of
data was the result: 0, 1, 2, 2, 2, 1, 3, 3, 0, 3, 2, 2, 1, 2, 3, 2, 0, 2. The data display follows.
Constructed ResponseScores
for CRT Practice
X
X
X
X
X X
X X X X
X X X X
X X X X
0 1 2 3
Once the data set has been displayed, ask students to make some observations from the data
(describe it) in terms of: shape, sample size, distribution, spread, clusters, etc. Model writing
some observations from the data display, then allow student groups to practice writing their own
sentences that express observations about the data.
Tip: Teachers may want to provide students with a list of both “key words” to help students
construct observation sentences. “Key words” might include: shape, distribution, variance,
spread, cluster, center, etc.
Example B: Sixth grade students collected data from 7th graders and 8th graders regarding the
number of video games owned by each group. A total of eighty students were surveyed; forty
7th graders and forty 8th graders. The data is below.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 31 of 64
Number of Video Games Owned by 7th Graders
7
6
8
12
9
4
10
1
12
31
16
0
2
32
11
5
0
2
3
8
0
7
2
2
40
15
0
38
3
2
33
20
9
36
31
3
6
6
18
9
1
14
2
8
48
23
43
46
Number of Video Games Owned by 8th Graders
4
46
29
0
45
32
50
11
30
18
54
10
32
11
21
27
16
17
12
34
19
31
52
28
12
6
7
44
5
16
0
19
Frequency Table for 7th Graders
Interval
0-9
10 - 19
20 - 29
30 - 39
40 - 49
50 - 59
Math 6, Unit 09
Tally
Frequency
25
7
1
6
1
0
Statistics and Probability
2014-15-NVACS
Page 32 of 64
Frequency Table for 8th Graders
Interval
0-9
10 - 19
20 - 29
30 - 39
40 - 49
50 - 59
Tally
Frequency
9
12
5
5
6
3
Video Games Owned by 8th Graders
30
# of students
25
20
15
10
5
0
0-9
1 0-1 9
20 -29
30 - 39
40 - 49
50 -59
# of video games
By displaying two sets of data in the same way using the same intervals, students have the
opportunity to compare and contrast the data sets using the histograms. Not only can students
make generalizations/observations about each set of data, discussion can be initiated regarding
the manipulation of the intervals in order to achieve particular conclusions. For example, would
increasing the range of the interval further support the conclusion that 8th graders own more
video games than 7th graders? Is there a way to display the data where the number of video
games owned by 8th and seventh graders could appear more equitable?
Tip: Teachers should not only questions students, but give students tools to question one
another regarding data and observations of data. By modeling questioning continually and
consistently, students will begin to feel more comfortable asking questions of each other.
Teachers must provide students with the collaborative opportunity to do so. An easy way to
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 33 of 64
begin this is with a “script” and an available bank of “key words”. Gradually, students won’t
need this to question each other meaningfully and appropriately.
Example C: A physical representation of each piece of data in the set is helpful when students
are beginning to create box plots. One simple way to achieve this is by using sticky notes to
represent each individual piece of data in the set.
The data set below was achieved by the teacher asking each of 32 students for their age in
months. Each student wrote their age (in months) on a sticky note provided by the teacher. The
sticky notes were then placed in order by the students in the front of the classroom.
130
133
139
142
130
134
139
143
131
136
140
144
131
136
140
145
131
137
141
147
132
137
141
148
132
138
142
149
132
139
142
150
Five Number Summary
Minimum: 130
Quartile 1 (Q1): (132 + 133)/2 = 132.5
Median: 139
Quartile 3 (Q3): 142
Maximum: 150
Ages in Months of a Class of 32 Sixth Grade Students
130
135
140
# of months
145
150
Have students complete the following to demonstrate their understanding of the data:
1.
2.
3.
4.
5.
Each 1/4th of the data is equal to ______ students. This is because _____ divided by 4
is ______.
The youngest 1/4th of students is between _____ and _____ months old.
The second youngest one quarter of the students are between ____ and ____ months old.
The oldest one quarter of students is between _____ and ____ months old.
The _____ quarter has students that are between 139 and 142 months old.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 34 of 64
6. The most “spread out” 1/4th of the data is the ____ quartile. I know this because
__________________________________________________________________.
7. The data is most “concentrated” in the _____quartile. I know this because
___________________________________________________________________.
8. The median class age in months is _____ months.
9. One half of the class is from ______ to ______ months old (remember this can be
answered with three different, yet correct, answers).
Tip: Beginning with “closed” responses allows teachers to model for students how to
correctly draw observations and conclusions from the data as well as modeling the correct
interpretation of the display.
Student Reflection: Identify one way to display data. Describe the strengths of your data
display and what kind of data it is best suited for. Describe the limitations of your data display.
Share your thoughts with a teammate.
NVACS – 6.SP.B.5a - Summarize numerical data sets in relation to their context by reporting
the number of observations.
NVACS- 6.SP.B.5b – Summarize numerical data sets in relation to their context by describing
the nature of the attribute under investigation, including how it was measured and its units of
measurement.
Students should be able to ascertain the number of pieces of data in a set by viewing a raw
representation of the data, by viewing the data organized into a table, or by viewing the data in
certain displays. Students should be able to express what the numerical data represents and
what unit of measure the numerical data expresses.
For example, when presented with the following data, students should be able to recognize that
there are 20 chickens in the coup.
Weights of Prize Winning Chickens at the County Fair (in pounds)
6.5
5.9
7.8
7.5
8.1
7.3
6.9
8.1
6.3
8.2
7.1
7.2
7.2
8.4
8.1
7.8
6.4
6.9
6.6
7.7
Each piece of data recognizes the existence of one chicken whose weight was measured.
Students should also be able to see and understand that the chicken’s weight was measured in
pounds.
Example 1: History Test Scores
Math 6, Unit 09
73, 45, 68, 90, 99, 67, 89, 61, 77, 59, 97, 83, 96
Statistics and Probability
2014-15-NVACS
Page 35 of 64
1. How many students took the history test? Tell how you know.
2. In what unit of measure is their score given? Explain your reasoning.
Example 2:
Students’ Heights (inches)
60
56
58
60
58
59
54
60
61
57
54
62
51
64
55
49
60
55
65
63
61
58
54
60
52
1. How many students had their height observed (reported)?
2. What unit of measure was used to record student heights?
3. What additional information could be added to the title to make this data more
meaningful?
4. Would another unit of measure have been appropriate? If so, which unit? Explain.
Example 3:
Number of Pets Owned by Mrs.
Wheeler’s 6th Grade Class
X
X
X
X
0
X
X
X
X
X
X
X
X
X
X
1
X
X
X
X
X X
X X X
X X X
X X
2 3 4 5 6 7 8
1. How many students were surveyed? How many students are in Ms. Wheeler’s class?
Were all of the students surveyed? How do you know?
2. What unit of measure is represented by each “x”?
3. Would you add anything to the display to make it more understandable and meaningful?
If so, what would you add?
Student Reflection: What are some ways to find the number of observations (pieces of data) in
a data set? How do you find the unit of measure represented in the data set? Explain.
NVACS - 6.SP.5c –Summarize numerical data sets in relation to their context, giving
quantitative measures of center (median and/or mean) and variability (interquartile range
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 36 of 64
and/or mean absolute deviation) as well as describing any overall pattern and any striking
deviations from the overall pattern with reference to the context in which the data were
gathered.
The measure of center that a student uses to describe or summarize a data set will depend both
on the shape of the data set and the context of the data collection. Although the mode is the
value that appears more frequently than the other values in the data set (and often the value
easiest for students to discern), it is the least frequently used as a measure of center. This is
because data sets may not have a mode, may have more than one mode, or the mode may not be
descriptive of the data set as a whole.
Example 1: Number of Times Grade Students Attended the Movies in the Last 30 Days
(20 students sampled)
3
2
2
1
1
8
3
7
0
5
4
6
12
4
12
12
8
6
5
7
In the above data set, the mode is 12; it appears 3 times and none of the other pieces of data
appear that often. This data set is an example of why the mode is often not the best measure of
center to describe the data set. The three students who answered “12” to the question all have
relatives who work in the local movie theater, therefore gaining them entry to the movies each
and every weekend. The mode, in this example, is not representative of the data set as a whole.
It is clearly substantially higher than most of the other responses.
The mean is a very common measure of center computed by adding all of the values in a data
set and dividing by the number of values in the data set. However, the mean can sometimes not
be the best measure of center used to describe or summarize a data set because a few data points
that are very high or low can greatly affect the mean.
Ages of People Who Attended Ailee’s 12th Birthday Party
Example 2:
10
9
18
12
12
16
13
13
10
40
14
9
11
11
12
12
11
13
13
80
13
8
84
12
In this data set, the mean is 19 (sum of 456/24 attendees). This mean is raised considerably by
the attendance of Ailee’s grandparents (80 and 84) and also the attendance of Ailee’s mother
(40). This data set is an example of a data set where the mean is strongly affected by one (or
more) pieces of data that are much higher (or lower) than the rest of the data in the set. Clearly,
the mean of 19 does not summarize or accurately describe the age of the attendees at Ailee’s
birthday party.
If the data were ordered to find the median:
8, 9, 9, 10, 10, 11, 11, 11, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 14, 16, 18, 40, 80, 84
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 37 of 64
The median in this data set is 12. The median would be the best and most appropriate measure
of center to describe the data set. As mentioned above, the mean is artificially inflated and there
are two modes in this set of data: 12 and 13.
Students will also need to be directed to look at the distribution of the data set. In data sets that
are symmetrically distributed, the mean and median will be very close to the same value. In
data sets that are clearly skewed, the mean and median will be different with the median
generally providing a better overall description of the data set.
Example:
2 student groups (6th graders surveyed at different schools)
HOW MANY PEOPLE IN THEIR IMMEDIATE HOUSEHOLD?
Group 1
Group 2
(18 students/observations including themselves)
(18 students/observations including themselves)
X
X
X X X
X X X X
X X X X X X X X X
2 3 4 5 6 7 8 9 10
even distribution
X
X
X
X X X X X X
2 3 4 5 6 7
X
X
X
8
X
X
X X
X X
9 10
uneven distribution
Questions:
1. Which student group had the most even distribution? Explain
2. Which student group had the most uneven distribution? Tell how you know.
3. In data groups with an even distribution, either the median or the mean can be used to
best describe the data. Which measure of center would you use to best describe the data
set in group 1? Explain your reasoning.
4. In data sets with skewed distribution, the median is often the best way to describe the
center of the data. What is the median in the group 2 data set? Do you think it describes
the center of the data better than the mean? Explain why.
5. (Extension) If you could add eight more pieces of data to group 2 to “even out the
distribution”, where would you add them? Explain your reasoning.
Student Reflection: The mean, median, and mode are all measures of center. Describe how
you know which measure of center best describes a particular set of data? What do you
look for in the data set to tell you which measure of center is most appropriate?
NVACS - 6.SP.B.5c – Summarize numerical data sets in relation to their context, giving
quantitative measures of measures of center (median and/or mean) variability (interquartile
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 38 of 64
range and/or mean absolute deviation) as well as describing any overall pattern and any
striking deviations from the overall pattern with reference to the context in which the data were
gathered.
Students can describe measures of variability in a data set by finding the interquartile range and
the mean absolute deviation.
The interquartile range is found by subtracting the lower quartile from the upper quartile.
Example 1:
Student’s Heights (in inches)
60
65
58
61
54
62
56
59
63
56
61
58
When the student data is ordered it appears like this:
54
56
56
58
58
59
60
61
61
62
63
65
The median of the data is 59.5 (59 + 60)/2.
The Lower Quartile is 57 (56 + 58)/2. The Upper Quartile is 61.5 (61 + 62)/2.
The Interquartile Range is 4.5 (61.5 − 57).
The lower the interquartile range, the less variability there is in the data. The greater the
interquartile range, the greater the variability in the data.
Another measure of variability is the mean absolute deviation. The mean absolute deviation is
used in the sixth grade to develop a deeper understanding of variability. Students can be
brought to understand numerically the distance each piece of data is from the mean of the data
and from there find the mean of those deviations, or distances. This standard also gives teachers
the opportunity to introduce or reinforce the concept of absolute value as distance (neither a
negative or a positive directionally). Students can see that the larger the mean absolute
deviation is, the greater the variability in the sets of data.
Consider the two data sets below:
Group A: Heights of Sixth Graders (as seen above):
54
5
56
3
56
3
58
1
58
1
59
0
60
1
61
2
61
2
62
3
63
4
65
6
62
0
62
0
64
2
66
4
68
6
68
6
70
8
Group B: Heights of Eighth Graders:
54
8
55
7
Math 6, Unit 09
57
5
59
3
59
3
Statistics and Probability
2014-15-NVACS
Page 39 of 64
The mean for Group A is 59.4 (which would be rounded to 59).
The mean for Group B is 62.
To figure the absolute mean deviation, you must assess each data value’s distance from zero,
take it’s absolute value and then average them. Distances from the mean are displayed in red in
each table above.
Group A: 5 + 3 + 3 + 1 + 1+ 0 + 1 + 2 + 2+ 3 + 4 + 6 = 31/12 = 2.58
Group B: 8 + 7 + 5 + 3 + 3 + 0 + 0 + 2 + 4 + 6 + 6 + 8 = 52/12 = 4.33
Group B has a greater mean absolute value than Group A; therefore there is greater variability
in the data in group B.
If the interquartile range is used as a measure of variability in the example above, Group B
again shows greater variability. As expressed above, the interquartile range for Group A data is
4.5. The interquartile range for Group B data is 9 (67-58=9). By using examples in which both
measures of variability consistently demonstrate a greater or less level of variability, students
will begin to see the relationship and connection with measures of variability.
Student Reflection: How would you describe “variability” within a data set? What are some
ways to determine the variability of a data set? Explain.
NVACS - 6.SP.5c – Summarize numerical data sets in relation to their context by giving
quantitative measures of measures of center (median and/or mean) variability (interquartile
range and/or mean absolute deviation) as well as describing any overall pattern and any
striking deviations from the overall pattern with reference to the context in which the data were
gathered.
The overall pattern of data is related to the shape and spread (variance of the data).
Example 1:
Inning
# of
Runs
Inning
# of
Runs
Math 6, Unit 09
1
3
1
3
2
4
2
1
3
2
3
2
Baseball Team A
Game 1
4
5
1
0
Baseball Team A
Game 2
4
5
4
0
Statistics and Probability
2014-15-NVACS
6
0
7
0
8
1
9
0
6
0
7
1
8
0
9
0
Page 40 of 64
Inning
# of
Runs
1
0
Inning
# of
Runs
1
1
2
1
2
0
3
0
Baseball Team B
Game 1
4
5
0
3
6
2
7
1
8
7
9
4
3
0
Baseball Team B
Game 2
4
5
0
0
6
4
7
6
8
2
9
1
Students can view the scoring pattern of each baseball team and make generalizations about
their scoring based upon the data. Clearly, each team has a distinct pattern to their scoring.
Students can be asked to “describe” the pattern and make generalizations regarding the pattern.
These generalizations can also aid students in making predictions about what might happen in
future games for each team.
Group discussion questions:
1. Overall, how do the scoring patterns for Team A and Team B differ? Are the scoring
patterns similar in any way?
2. Which team would you predict to be most likely to score in the second inning? Explain
your reasoning.
3. Which team has the greater variance among the innings? Which team is slightly more
even in their scoring?
4. If you were to go and see one of the teams play, which team would be more exciting to
watch? Why?
5. What might account for the difference in scoring patterns for the teams? Explain your
reasoning.
Example 2:
Weights of 8th Grade Girls
(Rounded to the nearest 10 pounds)
(20 observations)
X
X
X
80
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
90 100 110 120 130 140 150 160 170 180
pounds
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 41 of 64
Weights of 8th Grade Boys
(Rounded to the nearest 10 pounds)
(20 observations)
X
80
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
90 100 110 120 130 140 150 160 170 180
pounds
When comparing the weights of 8th grade boys and 8th grade girls, it is important for students to
make the observation that although the simple range of the data is the same for boys and girls
(180 − 80= 100) the data demonstrates different patterns and appears very different visually.
The weights of the 8th grade girls observed, clusters between 80 and 120 pounds. The weights
of the 8th grade boys observed, clusters from 120 to 180 lbs. These clusters greatly affect the
shape of the data and its measures of center.
In the set of data representing the weights of 8th grade girls, the 3 girls that weighed 140 lbs. or
more stand out; in the data representing the weights of 8th grade boys, the lower weights of 80
and 100 lbs. stand out from the rest of the data. Because of the overall patterns and clustering in
the graphic representation of each set of data, students can make the correct statement that based
upon this data, 8th grade boys weigh more than 8th grade girls.
Student Reflection: Is it possible for data sets with the same range to have different shapes?
Provide an example of two data sets that have the same range yet appear different in shape.
NVACS - 6.SP.5d –Summarize numerical data sets in relation to their context, by relating the
choice of measures of center and variability to the shape of the data distribution and the context
in which the data were gathered.
As demonstrated in Example 2 above, simply finding the range (high − low) of a set of data
does not provide very detailed information about the variability (spread) of the data set. Data
sets with identical ranges can be very different and therefore cause us to draw opposite
conclusions regarding the data set. Consider the example below.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 42 of 64
45
65
45
70
70
90
55
70
70
90
60
75
Period 2 Math Test Scores
24 Observations (%)
75 80
80
80
80
95 95
95
95
100
75
90
60
80
85
100
65
85
85
100
65
100
85
100
# of students
45
85
45
70
Period 1 Math Test Scores
24 Observations (%)
55
55
60
60
70
70
75
75
Letter Grade
Period 2
12
# of students
10
8
6
4
2
0
A
B
C
D
F
Letter Grade
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 43 of 64
When the data is viewed in a histogram, it is quite clear that the test scores for period 1 were
much lower overall than the test scores for period 2 despite the fact that both classes had a range
of 55 (100- 45= 55). Examples like the one above support the necessity for additional ways to
determine the variability of a data set.
As noted previously, a box plot provides significantly more information about the overall spread
of a set of data (variability). By providing a physical representation of each quarter of the data,
students can get an idea of where the data is clustered and where there are large differences in
the spread of the data.
Median: 65
Period 1
LQ:
57.5
Math Test
UQ:
72.5
Outlier: 100
High: 85
Low:
45
45
50
55
60
65
70
75
Period 2
Math Test
80
85
90
95
Median: 85
LQ:
80
UQ:
95
High: 100
Low:
70
100
Outlier: 45
45
50
55
60
65
70
75
80
85
90
95
100
It is important for students to understand that even though the physical size of each quarter of
the data differs in a box plot, it still represents only 25% of the total data set.
Students should be questioned first about the basics of a box plot. Below are some sample
questions that would help students feel comfortable with the basic components of a box plot.
1. What is the median test score for period 1?
2. For period 2, the Lower Quartile is 80%, what does that mean? How do we determine a
Lower Quartile?
3. In period 2, why is there an unconnected dot/point on the left side of the graph? What
causes this?
4. Why is the “box” in a different position along the number line in period 2 when
compared to its position in period 1?
5. Can you tell from ONLY looking at the box plots how many people were observed?
Explain.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 44 of 64
6. Can you tell ONLY from the histogram how many people were observed? Explain.
Students should additionally be introduced to mean absolute deviation as a way to gauge
variability. Simply put, the greater the relative mean absolute deviation, the more variability
in the data set. The smaller the mean absolute deviation, the less the variability of the data
set as a whole.
Period 1
Mean (1575/24 = 65.625= 66%)
45
45
45
55
55
55
60
60
60
60
65
65
21
65
21
70
21
70
11
70
11
70
11
70
6
75
6
75
6
75
6
80
1
85
1
100
1
4
4
4
4
4
9
9
9
14
19
34
85
85
85
237/24 = 9.875 Mean Absolute Deviation
45
70
70
75
Period 2
Mean (2045/24 = 85.21= 85%)
75
80
80
80
80
40
85
15
90
15
90
10
90
10
95
5
95
5
95
5
95
5
100
0
100
0
100
0
100
0
5
5
5
10
10
10
10
15
15
15
15
225/24 = 9.375 Mean Absolute Deviation
Period 1 has an absolute mean deviation of 9.875; period 2 has an absolute mean deviation of
9.375 (similar, but slightly lower). What does this tell us about the data sets? First of all it tells
us that according to this measure of variance, these two data sets are relatively similar. Overall,
the data in each set is clustered reasonably close together. Outliers, students should understand,
can have a great effect on the mean absolute deviation. Consider the following set of data:
Number of Pairs of Shoes Owned by Eight 6th Grade Students
6
10
8
5
4
5
4
32
With the outlier of 32, the mean of the data is 9.25 (74/8).
The mean absolute deviation is 3 + 1 + 5 + 5 + 1 + 4 + 4 + 23 = 46/8 = 5.75
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 45 of 64
Without the outlier of 32, the mean of the data set is 6 (42/7).
The mean absolute deviation is 0 + 2 + 2 + 2 + 4 + 1 + 1 = 12/7 = 1.7
The outlier not only had a great affect on the mean of the data, but also greatly affected the
mean absolute deviation. Without the outlier of 32, the data is clearly more clustered around the
mean and the shape of the data as seen in a dot plot is greatly altered.
Student Reflection: Why is finding the range of a data set not enough information to truly
describe the variability of the data set? What are the limitations of only finding the range?
What can a box plot and the mean absolute deviation of a data set tell you that the range cannot?
The “shape” of a data set can be greatly affected by an outlier within the data set. Of course, the
mean can also be greatly affected by an outlier in the data set. Normally, however, the median
and the mode are not affected by the presence of an outlier. Consider the following data set:
Prices for a new bicycle
80
75
90
110
90
120
115
490
120
75
In this data set, the outlier of $490 affects the variability of the data set as well as the mean
(center) of the data set.
Students should be asked to calculate the mean WITH the $490 bicycle (1365/10 = 136.5) and
WITHOUT the $490 bicycle (875/9 = 97.2).
Additionally, the outlier of $490 affects the MAD as well. However, the inter quartile range is
40 (120 – 80) even without such a great outlier. Had 490 been replaced with a price of 120, the
inter quartile range would remain 40.
ItStudent
is important
for students
to issee
thatlikely
outliers
some by
measures
of center
and some
Reflection:
Which
more
to affect
be affected
an outlier,
the MAD
or the Inter
measures
of variability
more
others.
quartile range?
Explain
yourthan
reasoning.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 46 of 64
SBAC Example:
Standard 6.SP.4, 6.SP.5
DOK: 2
Difficulty: M
Item Type: TE
This will be a Technology Enhanced Question.
On Core Sample Questions for Standard 6.SP.1
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 47 of 64
On Core Sample Questions for Standard 6.SP.2
On Core Sample Questions for Standard 6.SP.B.4
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 48 of 64
On Core Sample Questions for Standard 6.SP.5a
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 49 of 64
On Core Sample Questions for Standard 6.SP.B.5b
On Core Sample Questions for Standard 6.SP.B.5c
On Core Sample Questions for Standard 6.SP.B.5d
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 50 of 64
SBAC Example: Standard 6.SP.B.5
Math 6, Unit 09
DOK: 3
Difficulty: M
Statistics and Probability
2014-15-NVACS
Item Type: ER
Page 51 of 64
More SBAC Examples
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 52 of 64
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 53 of 64
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 54 of 64
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 55 of 64
***Please note the answer shown
for this example is incorrect.
There should be no bar in the 0-3
interval and the bar for the interval
12-15 should be 2 units tall.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 56 of 64
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 57 of 64
***BEWARE***
Please note both answers B and C
are correct since the graphs are
identical.
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 58 of 64
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 59 of 64
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 60 of 64
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 61 of 64
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 62 of 64
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 63 of 64
Math 6, Unit 09
Statistics and Probability
2014-15-NVACS
Page 64 of 64