Download Data Distributions and Outliers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
LESSON
9.2
Name
Data Distributions
and Outliers
9.2
Class
Date
Data Distributions and Outliers
Essential Question: What statistics are most affected by outliers, and what shapes can data
distributions have?
Resource
Locker
Common Core Math Standards
Explore
The student is expected to:
Using Dot Plots to Display Data
A dot plot is a data representation that uses a number line and Xs,
dots, or other symbols to show frequency. Dot plots are sometimes
called line plots.
S-ID.1
Represent data with plots on the real number line (dot plots, histograms,
and box plots). Also S-ID.2, S-ID.3, N-Q.1
Finance Twelve employees at a small company make the
following annual salaries (in thousands of dollars): 25, 30, 35, 35,
35, 40, 40, 40, 45, 45, 50, and 60.
Mathematical Practices
MP.2 Reasoning

Language Objective
Choose the number line with the most appropriate scale
for this problem. Explain your reasoning.
0
ENGAGE
20
30
40
50
60
70
20
35
50
65
80
95
Essential Question: What statistics are
most affected by outliers, and what
shapes can data distributions have?
Outliers affect the mean more than the median, and
they affect the standard deviation more than the
IQR. Data distributions can be described generally
as symmetric, skewed to the left, or skewed to
the right.
PREVIEW: LESSON
PERFORMANCE TASK
View the Engage section online. Discuss why, if you
owned a business, you might compare a competitor’s
sales to your company’s sales, and how your findings
might lead you to change the way you run your
business. Then preview the Lesson Performance Task.
© Houghton Mifflin Harcourt Publishing Company . image credit: ©Blend
Images/Alamy
Explain to a partner what an outlier is.
50
100
The second number line has the most appropriate scale. The scale of the first number line
includes a larger range of numbers than necessary, so dots will be clustered in the middle.
The scale of the third number line does not have convenient tick marks for determining
where values between the labels belong.

Create and label a dot plot of the data. Put an X above the number line for each time that
value appears in the data set.
x x
x x x
x x x x x x
20
30
40
x
50
60
70
Salary (thousands of dollars)
Reflect
1.
Discussion Recall that quantitative data can be expressed as a numerical measurement. Categorical,
qualitative data is expressed in categories, such as attributes or preferences. Is it appropriate to use a dot
plot for displaying quantitative data, qualitative data, or both? Explain.
A dot plot uses a number line, so it is only appropriate for displaying quantitative data.
Module 9
be
ges must
EDIT--Chan
DO NOT Key=NL-A;CA-A
Correction
Lesson 2
389
gh "File info"
made throu
Date
Class
Outliers
tions and
bu
Data Distri
Name
data
shapes can
rs, and what
by outlie
affected
are most
statistics
box plots).
rams, and
ion: What
have?
plots, histog
distributions
r line (dot
real numbe
plots on the
ent data with
a
Dat
S-ID.1 Repres
N-Q.1
lay
S-ID.3,
s to Disp
Also S-ID.2,
9.2
Resource
Locker
Quest
Essential
IN1_MNLESE389755_U4M09L2.indd 389
Using Dot
HARDCOVER PAGES 317326
Plot
Xs,
r line and
uses a numbe sometimes
entation that
plots are
is a data represshow frequency. Dot
A dot plot
symbols to
dots, or other
make the
plots.
company
called line
30, 35, 35,
yees at a small
dollars): 25,
Twelve emplo (in thousands of
s
Finance
salarie
l
annua
following
50, and 60.
40, 45, 45,
scale
35, 40, 40,
appropriate
Explore

with the most
ing.
number line
n your reason
Choose the
m. Explai
for this proble
0
20
100
50
30
70
60
50
40
Turn to these pages to
find this lesson in the
hardcover student
edition.
95
er line
first numb
35
scale of the
20
middle.
scale. The
red in the
appropriate
will be cluste
the most
line has
mining
sary, so dots
d number
s for deter
than neces
The secon
tick mark
numbers
convenient
range of
not have
a larger
includes
line does
number
third
of the
s belong.
The scale
that
the label
between
for each time
s
line
er
value
where
the numb
an X above
data. Put
plot of the
label a dot
set.
Create and
rs in the data
x x x
value appea
y . image
g Compan
credit: ©Blend
50
Harcour
n Mifflin
© Houghto
lamy
Images/A
t Publishin

80
65
x xx x x
x x x
20
30
50
40
sands of
Salary (thou
x
60
70
dollars)
rical,
rement. Catego
to use a dot
rical measu
sed as a numeences. Is it appropriate
be expres
data can
tes or prefer
as attribu
quantitative
n.
titative data.
ries, such
Recall that
both? Explai
ying quan
Discussion is expressed in categoqualitative data, or
for displa
data
appropriate
itative data,
qualitative
it is only
ying quant
Lesson 2
er line, so
plot for displa
uses a numb
A dot plot
Reflect
1.
389
Module 9
9L2 389
55_U4M0
ESE3897
IN1_MNL
389
Lesson 9.2
09/04/14
6:11 PM
01/04/14 9:27 PM
The Effects of an Outlier in a Data Set
Explain 1
EXPLORE
An outlier is a value in a data set that is much greater or much less than most of the other values in the data set.
Outliers are determined by using the first or third quartiles and the IQR.
Using Dot Plots to Display Data
How to Identify an Outlier
A data value x is an outlier if x < Q 1 - 1.5(IQR) or if x > Q 3 + 1.5(IQR).
Example 1

Create a dot plot for the data set using an appropriate scale for the
number line. Determine whether the extreme value is an outlier.
INTEGRATE TECHNOLOGY
To make it easier to create a dot plot for a large data
set, students can enter the data values into one column
of a spreadsheet, then use the spreadsheet’s datasorting function to arrange them in increasing order.
Suppose that the list of salaries from the Explore is expanded to include the owner’s salary of
$150,000. Now the list of salaries is 25, 30, 35, 35, 35, 40, 40, 40, 45, 45, 50, 60, and 150.
To choose an appropriate scale, consider the minimum and maximum values, 25 and 150.
A number line from 20 to 160 will contain all the values. A scale of 5 will be convenient for
the data. Label tick marks by 20s.
Plot each data value to see the distribution.
QUESTIONING STRATEGIES
xx
xxx
xxxxxx x
20
40
60
How can you use a dot plot to find the
interquartile range of a data set? First, find
the median by counting the same number of marks
from each end of the dot plot until the middle value
is reached. If there are an even number of marks,
find the mean of the two middle values. Then use
the same process to find the first quartile
(Q1, the middle value of the lower half) and the
third quartile (Q3, the middle value of the upper
half). Finally, subtract Q1 from Q3 to find the
interquartile range.
x
80
100
120
140
160
Salary
(thousands of dollars)
Find the quartiles and the IQR to determine whether 150 is an outlier.
?
150 >
Q3 + 1.5(IQR)
?
47.5 + 1.5(47.5 - 35)
150 >
150 > 66.25 True

Suppose that the salaries from Part A were adjusted so that the owner’s salary is $65,000.
Now the list of salaries is 25, 30, 35, 35, 35, 40, 40, 40, 45, 45, 50, 60, and 65.
x x
x x x
x x x x x x
To choose an appropriate scale, consider the minimum and
25
maximum data values,
A number line from
65
and
20
to
70
.
will
20
30
40
50
x
70
60
Salary (thousands of dollars)
contain all the data values.
A scale of
5
Label tick marks by
© Houghton Mifflin Harcourt Publishing Company
150 is an outlier.
EXPLAIN 1
The Effects of an Outlier in a Data Set
will be convenient for the data.
10s
AVOID COMMON ERRORS
.
Students sometimes forget to take the square root of
the mean of the squared deviations when calculating
standard deviation. Review the steps for calculating
the standard deviation.
Plot each data value to see the distribution.
Module 9
390
Lesson 2
PROFESSIONAL DEVELOPMENT
IN1_MNLESE389755_U4M09L2.indd 390
Integrate Mathematical Practices
This lesson provides an opportunity to address Mathematical Practice MP.2,
which calls for students to “reason abstractly and quantitatively.” Students solve
real-world problems by creating dot plots for data sets. They analyze and describe
the shapes of the data distributions, recognizing how the shapes affect the
measures of center and spread, and they use both dot plots and statistical
measures to compare data sets. Thus, they first take a situation from its real-world
context to represent it symbolically, then they interpret the results in the
real-world context.
01/04/14 9:27 PM
QUESTIONING STRATEGIES
How does an outlier affect the mean and
median of a data set? If a data set includes an
outlier, the mean can be increased or decreased
significantly. This can make the mean misleading as
a measure of center. When there are no outliers,
most data values cluster closer to the mean. The
median is much less affected by an outlier, because
a single outlier shifts the middle of the data set by
only a small amount, if at all.
Data Distributions and Outliers 390
Find the quartiles and the IQR to determine whether 65 is an outlier.
EXPLAIN 2
65 > Q3 + 1.5(IQR)
?
47.5
65 >
66.25
?
Comparing Data Sets
(
+ 1.5
47.5
-
35
True / False
)
Therefore, 65 is / is not an outlier.
INTEGRATE MATHEMATICAL
PRACTICES
Focus on Technology
MP.5 Review the steps generating statistics
Reflect
2.
Explain why the median was NOT affected by changing the max data value from 150 to 65.
The maximum value in the data set changed, but its ordered position did not, so the
middle value in the ordered list was not moved or changed.
using a graphing calculator. Students can
create a list by pressing STAT, then selecting 1:Edit.
A previously entered list can be cleared by
highlighting the name of the list, pressing CLEAR,
then pressing the down arrow.
Your Turn
3.
After entering data in a list, students can find the
one-variable statistics by pressing STAT, selecting
CALC, and then selecting 1:1-Var Stats. For data in
lists other than L1, they must enter the list number
before pressing ENTER to generate the statistics.
Sports Baseball pitchers on a major league team throw at the following speeds (in miles per hour):
72, 84, 89, 81, 93, 100, 90, 88, 80, 84, and 87.
Create a dot plot using an appropriate scale for the number line. Determine whether
the extreme value is an outlier.
x
70
x
75
80
x
x
85
72 < Q1 - 1.5(IQR)
?
xxxx
x
90
x
95
Explain 2
72 < 81 - 1.5(9)
?
100
Pitching Speeds (mph)
72 < 67.5 False
Therefore, 72 is not an outlier.
Comparing Data Sets
Numbers that characterize a data set, such as measures of center and spread, are called statistics. They are useful
when comparing large sets of data.
© Houghton Mifflin Harcourt Publishing Company
AVOID COMMON ERRORS
Students may expect their graphing calculators to
provide the value of the IQR. Remind them that they
must calculate the IQR by finding the difference
between the first and third quartiles.
65 >
Example 2

Calculate the mean, median, interquartile range (IQR), and standard
deviation for each data set, and then compare the data.
Sports The tables list the average ages of players on 15 teams randomly selected from
the 2010 teams in the National Football League (NFL) and Major League Baseball (MLB).
Describe how the average ages of NFL players compare to those of MLB players.
NFL Players’ Average Ages, by Team
25.8, 26.0, 26.3, 25.7, 25.1, 25.2, 26.1, 26.4, 25.9, 26.6, 26.3, 26.2, 26.8, 25.6, 25.7
MLB Players’ Average Ages, by Team
28.5, 29.0, 28.0, 27.8, 29.5, 29.1, 26.9, 28.9, 28.6, 28.7, 26.9, 30.5, 28.7, 28.9, 29.3
Module 9
391
Lesson 2
COLLABORATIVE LEARNING
IN1_MNLESE389755_U4M09L2.indd 391
Peer-to-Peer Activity
Have students work in pairs. Have each student create a data set with 10 values,
using the definition of outlier to verify that none of the values are outliers.
Students then find the mean, median, range, and IQR for their data sets. Have
students trade data sets with their partners. Ask each student to add an outlier to
the partner’s data set, and then calculate the new mean, median, range, and IQR
for the set. Students should compare their results and discuss how the outliers
affected the statistics.
391
Lesson 9.2
01/04/14 9:27 PM
On a graphing calculator, enter the two sets of data into L 1 and L 2.
QUESTIONING STRATEGIES
Use the “1-Var Stats” feature to find statistics for the data in lists L 1
_
and L 2. Your calculator may use the following notations: mean x,
standard deviation σx.
What can you conclude about two data sets by
comparing each of the following statistics:
mean, median, IQR, and standard deviation?
By comparing the mean and median values, you can
conclude whether the typical value for one data set
is higher or lower than the typical value for the
other set. By comparing the IQR and standard
deviation values, you can determine whether the
data values in one set are more or less spread out
than the values in the other set.
Scroll down to see the median (Med), Q 1, and Q 3. Complete the table.
Mean
Median
NFL
25.98
26.00
MLB
28.62
28.70
IQR (Q 3 - Q 1)
Standard
deviation
0.60
0.46
1.10
0.91
Compare the corresponding statistics.
The mean age and median age are lower for the NFL than for the MLB, which means that NFL players
tend to be younger than MLB players. In addition, the IQR and standard deviation are smaller for the
NFL than for the MLB, which means that the ages of NFL players are closer together than those of
MLB players.

The tables list the ages of 10 contestants on 2 game shows.
Game Show 1
18, 20, 25, 48, 35, 39, 46, 41, 30, 27
Game Show 2
24, 29, 36, 32, 34, 41, 21, 38, 39, 26
On a graphing calculator, enter the two sets of data into L 1 and L 2.
Mean
Median
IQR (Q 3 – Q 1)
Standard
deviation
Show 1
32.9
32.5
16
10.00
Show 2
32
33
12
6.45
© Houghton Mifflin Harcourt Publishing Company
Complete the table. Then circle the correct items to compare the statistics.
The mean is lower for the 1st / 2nd game show, which means that contestants in the 1st / 2nd game show are
on average younger than contestants in the 1st / 2nd game show. However, the median is lower for the 1st / 2nd
game show, which means that although contestants are on average younger on the 1st / 2nd game show, there
are more young contestants on the 1st / 2nd game show. Finally, the IQR and standard deviation are higher for
the 1st / 2nd game show, which means that the ages of contestants on the 1st / 2nd game show are further apart
than the age of contestants on the 1st/ 2nd game show.
Module 9
392
Lesson 2
DIFFERENTIATE INSTRUCTION
IN1_MNLESE389755_U4M09L2.indd 392
Multiple Representations
25/07/14 12:47 PM
Students may benefit from acting out a real-world example of how adding an
outlier to a data set affects measures of center and spread. For example, have five
students each begin with 1 to 5 slips of paper (or pennies or markers); each slip
represents a dollar. Have the students calculate the mean by equally distributing
all the slips of paper among the five students. Then have a sixth student with $25
(25 slips of paper) join the group. Again use the slips of paper to find the mean by
distributing them among the six students. Ask whether the new mean is a
reasonable measure of center.
Data Distributions and Outliers 392
Your Turn
EXPLAIN 3
4.
The tables list the age of each member of Congress in two randomly selected states. Complete the table and
compare the data.
Comparing Data Distributions
Illinois
26, 24, 28, 46, 39, 59, 31, 26, 64, 40, 69, 62, 31, 28, 26, 76, 57, 71, 58, 35, 32, 49, 51, 22, 33, 56
AVOID COMMON ERRORS
Arizona
Students often confuse the terms skewed to the left
and skewed to the right. Encourage students to come
up with a mnemonic to help them remember how the
direction of a skew should be described. For example,
students may easily remember how the “tail” of a data
distribution looks on a dot plot. Point out that both
tail and skew have four letters, and that a data
distribution is skewed in the direction of its tail.
42, 37, 58, 32, 46, 42, 26, 56, 27
Mean
Median
Illinois
43.81
39.5
Arizona
40.67
42
IQR (Q 3 - Q 1)
Standard
deviation
30
16.42
21.5
10.84
The mean is lower for Arizona, which means that, on average, members of Congress tend
to be younger in Arizona than in Illinois. However, the median is lower in Illinois, which
means that there are more young members of Congress in Illinois despite the differences
in average age. Finally, the IQR and standard deviation are lower for Arizona, which
QUESTIONING STRATEGIES
means that the ages of members of Congress are closer together than they are in Illinois.
Some data distributions are described as
uniform. What do you think the general shape
of a uniform distribution would be? The general
shape of a uniform distribution is fairly even across
the plot.
Comparing Data Distributions
A data distribution can be described as symmetric, skewed to the left, or skewed to the right, depending on the
general shape of the distribution in a dot plot or other data display.
© Houghton Mifflin Harcourt Publishing Company
What would be true about the mean and
median of a data set with a uniform
distribution? The mean and median would be
approximately equal.
Explain 3
Skewed to the Left
x x
Example 3

Symmetric
x
x x
x x x x
x x x x
Skewed to the Right
x
x x x
x x x x x
x x x x x x x
x
x x
x x x x
x x x x
x x
For each data set, make a dot plot and determine the type of distribution.
Then explain what the distribution means for each data set.
Sports The data table shows the number of miles run by members of two track teams
during one day.
Miles
3
3.5
4
4.5
5
5.5
6
Members of Team A
2
3
4
4
3
2
0
Members of Team B
1
2
2
3
3
4
3
Module 9
393
Lesson 2
LANGUAGE SUPPORT
IN1_MNLESE389755_U4M09L2.indd 393
Connect Vocabulary
English learners who are working on acquiring academic English in algebra may
find that some terminology is difficult to pronounce or to differentiate when
listening. Words such as effect and affect may be difficult to distinguish, and words
such as skew or interquartile may be difficult to pronounce. Be sure to enunciate
clearly so that students can understand and learn to pronounce the key words
correctly.
393
Lesson 9.2
01/04/14 9:27 PM
Team A
x
x x
x x x
x x x
3
Team B
x
x x x x
x x x x x x
x x x x x x x
x
x x
x x x
x x x
4
5
6
3
4
Miles
The data for team A show a symmetric distribution.
This means that the distances run are evenly
distributed about the mean.
B
5
6
Miles
The data for team B show a distribution skewed to
the left. This means that more than half the team
members ran a distance greater than the mean.
The table shows the number of days, over the course of a month, that specific numbers of
apples were sold by competing grocers.
Number of Apples Sold
0
50
100
150
200
250
300
Grocery Store A
1
4
8
8
4
1
0
Grocery Store B
3
6
8
8
2
2
1
Grocery Store A
x
x
x
x x
0
x
x
x
x
x
x
x
x
100
x
x
x
x
x
x
x
x
Grocery Store B
x
x
x
x x
x x
x x
x
x
x
x x
200
300
400
0
Number of Apples sold
x
x
x
x
x
x
x x x
x x x
100
200
300
400
Number of Apples sold
The distribution for grocery store B is:
left-skewed/ right-skewed /symmetric.
This means that the number of apples sold each day
is evenly/ unevenly distributed about the mean.
Reflect
7.
Will the mean and median in a symmetric distribution always be approximately equal? Explain.
The mean and median in a symmetric distribution will always be approximately equal
because the values are equally distributed on either side of the center.
8.
Will the mean and median in a skewed distribution always be approximately equal? Explain.
The mean and median in a skewed distribution will not always be approximately equal
© Houghton Mifflin Harcourt Publishing Company
The distribution for grocery store A is:
left-skewed/right-skewed / symmetric.
This means that the number of apples sold each day
is evenly / unevenly distributed about the mean.
x
x
x
x
x
x
x
x
because the median will sometimes be closer to where the values cluster than the
mean will be.
Module 9
IN1_MNLESE389755_U4M09L2.indd 394
394
Lesson 2
25/07/14 12:47 PM
Data Distributions and Outliers 394
Your Turn
ELABORATE
9.
Sports The table shows the number of free throws attempted during a basketball game. Make a dot plot
and determine the type of distribution. Then explain what the distribution means for the data set.
QUESTIONING STRATEGIES
Can a data set have more than one outlier?
Explain. Yes; More than one value may be
less than Q1 - 1.5(IQR) or greater than Q3 + 1.5(IQR).
Free Throws Shot
0
2
4
6
8
Members of Team A
2
2
4
2
2
Members of Team B
3
4
2
2
1
Team A
INTEGRATE MATHEMATICAL
PRACTICES
Focus on Critical Thinking
MP.3 Discuss with students whether all the values
x
x
x
x
x
x
x
x
0
2
4
Team B
x
x
x
x
6
8
Number of Free Throws
in a data set could be outliers. Review the definition
of outlier. Students should understand that because
an outlier must be less than Q1 or greater than Q3,
values between Q1 and Q3 will never be outliers for
a data set.
x
x
x
x
x
x
x
x
x
x
x
x
0
2
4
6
8
Number of Free Throws
The data for team A show a symmetric
The data for team B show a distribution
distribution. This means that the number of
skewed to the right. This means that fewer
free throws shot is evenly distributed about
than half of the team members shot a
the mean.
number of free throws that were greater
than the mean.
Elaborate
SUMMARIZE THE LESSON
© Houghton Mifflin Harcourt Publishing Company
How can you determine whether a value in a
data set is an outlier? How does the inclusion
of an outlier affect the mean, median, range, and
IQR? An outlier is a value that is less than
Q 1 - 1.5(IQR) or greater than Q 3 + 1.5(IQR). Outliers
significantly affect the mean and range, but affect
the median and IQR very little or not at all.
10. If the mean increases after a single data point is added to a set of data, what can you tell about this data
point?
If the mean increases after a single data point is added to a set of data, you can tell that the
data point added was larger than the mean of the set.
11. How can you use a calculation to decide whether a data point is an outlier in a data set?
You can decide whether a data point is an outlier in a data set by finding the 1st and 3rd
quartile and subtracting them to get the interquartile range. If the data point is larger or
smaller than the result found by adding the 3rd quartile to 1.5 times the interquartile range
or by subtracting the 1st quartile from 1.5 times the interquartile range, respectively, then
the data point is an outlier.
12. Essential Question Check-In What three shapes can data distributions have?
Data distributions can be skewed to the left, skewed to the right, and symmetric.
Module 9
Exercise
IN1_MNLESE389755_U4M09L2 395
395
Lesson 9.2
Lesson 2
395
Depth of Knowledge (D.O.K.)
Mathematical Practices
1–8
1 Recall
MP.4 Modeling
9
1 Recall
MP.5 Using Tools
10
2 Skills/Concepts
MP.7 Using Structure
11
1 Recall
MP.5 Using Tools
12
2 Skills/Concepts
MP.7 Using Structure
6/9/15 12:34 PM
Evaluate: Homework and Practice
EVALUATE
Fitness The numbers of members in 8 workout clubs are 100, 95, 90, 85, 85, 95, 100,
and 90. Use this information for Exercises 1–2.
1.
• Online Homework
• Hints and Help
• Extra Practice
Create a dot plot for the data set using an appropriate scale for the number line.
Possible plot shown.
60
x x x x
x x x x
70
80
90
100
110
ASSIGNMENT GUIDE
Number of Members
2.
Suppose that a new workout club opens and immediately has 150 members. Is the
number of members at this new club an outlier?
150 > 100 + 1.5(100 - 87.5) = 118.75 True
150 members is an outlier.
Sports The number of feet to the left outfield wall for 10 randomly chosen
baseball stadiums is 315, 325, 335, 330, 330, 330, 320, 310, 325, and 335. Use this
information for Exercises 3–4.
3.
Create a dot plot for the data set using an appropriate scale for the number line.
Possible plot shown.
300
x
x x x
x x x x x x
310
320
330
340
350
© Houghton Mifflin Harcourt Publishing Company • Image Credits: ©Blend
Images/Alamy
Number of Feet
4.
The longest distance to the left outfield wall in a baseball stadium is 355 feet. Is this
stadium an outlier if it is added to the data set?
?
355 > 335 + 1.5(335 - 320) = 357.5 False
355 feet is not an outlier.
Education The numbers of students in 10 randomly chosen
classes in a high school are 18, 22, 26, 31, 25, 20, 23, 26, 29,
and 30. Use this information for Exercises 5–6.
5.
Create a dot plot for the data set using an appropriate scale for
the number line. Possible plot shown.
x
x x xx xx
16
20
24
xxx
28
32
36
Concepts and Skills
Practice
Explore
Using Dot Plots to Display Data
Exercises 1, 3, 5, 7
Example 1
The Effects of an Outlier
in a Data Set
Exercises 2, 4, 6,
8, 17–18
Example 2
Comparing Data Sets
Exercises 9–12
Example 3
Comparing Data Distributions
Exercises 13–16,
19
INTEGRATE MATHEMATICAL
PRACTICES
Focus on Critical Thinking
MP.3 Understanding how outliers can affect the
mean and the median of a data set is an important
skill, especially for interpreting data. Discuss how
statistics can be misleading when outliers that affect
the mean value for a data set are included.
Number of Students
6.
Suppose that a new class is opened for enrollment and currently
has 7 students. Is this class an outlier if it is added to the data set?
?
7 < 20 - 1.5(29 - 20) = 6.5 False
Module 9
7 is not an outlier.
Lesson 2
396
Exercise
IN1_MNLESE389755_U4M09L2.indd 396
13–15
16
17–18
19
Depth of Knowledge (D.O.K.)
Mathematical Practices
2 Skills/Concepts
MP.4 Modeling
1 Recall of Information
MP.5 Using Tools
3 Strategic Thinking
MP.3 Logic
2 Skills/Concepts
MP.4 Modeling
25/07/14 12:47 PM
Data Distributions and Outliers 396
Sports The average bowling scores for a group of bowlers are 200, 210, 230, 220,
230, 225, and 240. Use this information for Exercises 7–8.
MODELING
To help students think about possible causes for
outliers in a data set, ask them to consider the
distribution of heights of all the people in a
kindergarten classroom, in a high school classroom,
and on a basketball court. Discuss how many outliers
might be expected in each case, and what factors
might affect the number of outliers in each situation.
Students should recognize that there is often a reason
why one value is very different from the others in a
data set, such as the fact that a kindergarten teacher
may be the only adult in the classroom.
7.
8.
Create a dot plot for the data set using an appropriate scale for the number line.
Possible plot shown.
x
x
200
210
220
230
x
240
250
Bowling Scores
Suppose that a new bowler joins this group and has an average score of 275. Is this
bowler an outlier in the data set?
?
275 > 235 + 1.5(235 - 215) = 265 True
275 is an outlier.
The tables describe the average ages of employees from two randomly chosen
companies. Use this information for Exercises 9–10.
9.
Company A
Company B
23, 29, 35, 46, 51, 50, 42, 37, 30
24, 23, 45, 45, 42, 52, 55, 47, 55
Calculate the mean, median, interquartile range (IQR), and standard deviation for each data set.
AVOID COMMON ERRORS
Mean
Median
IQR (Q 3 – Q 1)
Standard
deviation
Company A
Mean
38.1
Mean
37
Mean
18.5
Mean
9.27
Company B
Mean
43.1
Mean
45
Mean
20.5
Mean
11.33
10. Compare the data sets.
Employees at company A tend to be younger than employees at company B.
The ages of employees at company A are closer together than the ages of
employees at company B.
© Houghton Mifflin Harcourt Publishing Company
Make sure students understand the process for
determining the standard deviation for a data set.
Encourage them to first create a table to record the
deviation and squared deviation for each data value,
then add the squared deviations, divide the sum by
the number of values, and finally find the square root.
Suggest that when they do not record their work,
students can easily overlook a step in the process.
x
x x x
The tables describe the size of microwaves, in cubic feet, chosen randomly from two competing
companies. Use this information for Exercises 11–12.
Company A
Company B
1.8, 2.1, 3.1, 2.0, 3.3, 2.9, 3.3, 2.1, 3.2
1.9, 2.6, 1.8, 3.0, 2.5, 2.8, 2.0, 3.6, 3.1
11. Calculate the mean, median, interquartile range (IQR), and standard deviation for each data set.
Mean
Median
IQR (Q 3 – Q 1)
Standard
deviation
Company A
Mean
2.6
Mean
2.9
Mean
1.2
Mean
0.59
Company B
Mean
2.6
Mean
2.6
Mean
1.1
Mean
0.57
12. Compare the data sets.
Microwaves from company B tend to be smaller than microwaves from
company A. The average size of microwaves tend to be closer together at
company B than at company A.
Module 9
IN1_MNLESE389755_U4M09L2.indd 397
397
Lesson 9.2
397
Lesson 2
01/04/14 9:27 PM
For each data set, make a dot plot and determine the type of distribution. Then
explain what the distribution means for each data set. Possible plot shown.
CRITICAL THINKING
13. Sports The data table shows the number of miles run by members of two teams
running a marathon.
Have students analyze and describe the shape of the
distribution of a dot plot they created. Ask students
how the shape relates to the statistics they would use
to characterize the data.
Miles
5
10
15
20
25
Members of Team A
3
5
10
5
3
6
10
4
1
Members of Team B
5
Team A
Team B
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
5
10
15
20
25
30
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
5
10
15
20
25
Miles
30
Miles
The data for team A show a symmetric
distribution. The distances run are evenly
distributed about the mean.
The data for team B show a right-skewed
distribution. This means that fewer than half
of the team members ran a distance greater
than the mean.
14. Sales The data table shows the number of days that specific numbers of turkeys were
sold. These days were in the two weeks before Thanksgiving.
10
20
30
40
Grocery Store A
2
5
5
2
Grocery Store B
5
5
1
3
Grocery Store A
x
x
0
10
Grocery Store B
x
x
x
x
x
x
x
x
x
x
x
x
20
30
40
x
x
x
x
x
50
0
Number of Turkeys
IN1_MNLESE389755_U4M09L2.indd 398
x
x
x
x
20
30
40
50
Number of Turkeys
The data for grocery store A show a
symmetric distribution. This means that
the numbers of turkeys sold per day are
evenly distributed about the mean.
Module 9
10
x
x
x
x
x
The data for grocery store B show a rightskewed distribution. This means that the
store sold fewer than the average number of
turkeys for more than half of the days.
398
© Houghton Mifflin Harcourt Publishing Company
Number of Turkeys
Lesson 2
25/07/14 12:47 PM
Data Distributions and Outliers 398
15. State whether each set of data is left-skewed, right-skewed, or symmetrically
distributed.
JOURNAL
Have students create their own graphic organizers to
share with classmates, outlining the steps for finding
mean, median, Q 1, Q 3, IQR, and standard deviation
from a dot plot.
A. 3, 5, 5, 3
symmetric
B. 1, 1, 3, 1
right-skewed
symmetric
C. 7, 9, 9, 11
D. 5, 5, 3, 3
symmetric
symmetric
E. 19, 21, 21, 19
H.O.T. Focus on Higher Order Thinking
16. What If? Given the data set 8, 15, 12, 10, and 5, what happens to the mean if you
add a data value of 40? Is 40 an outlier of the new data set?
The mean increases from 10 to 15. 40 is an outlier of the new data set because
40 > 25.5.
17. Critical Thinking Can an outlier be a data value between Q 1 and Q 3? Justify your
answer.
An extreme value such as the max or min value can be an outlier, but by
definition, no value between Q 1 and Q 3 can be an outlier.
18. Justify Reasoning If the distribution has outliers, why will they always have an
effect on the range?
When present, outliers will always have an effect on the range since one
of the outliers will either be the highest or lowest number in a given data
set and the range is found by finding the difference between the highest
and lowest numbers.
© Houghton Mifflin Harcourt Publishing Company
19. Education The data table describes the average testing scores in 20 randomly
selected classes in two randomly selected high schools, rounded to the nearest ten.
For each data set, make a dot plot, determine the type of distribution, and explain
what the distribution means in context.
Average Scores
0
10
20
30
40
50
60
70
80
90
100
School A
0
1
2
2
3
4
3
2
2
1
0
School B
0
1
1
1
2
4
5
4
2
0
0
School A
School B
x
x x x
x x x x x x x
x x x x x x x x x
0
20
40
60
80
x
x
x x
x x x x x
100
Test Scores
The data for school A show a symmetric
distribution. This means that the test
scores were evenly distributed about the
mean test score.
Module 9
IN1_MNLESE389755_U4M09L2 399
399
Lesson 9.2
0
20
40
x
x
x
x
x
x
x
x x
x x
60
80
100
Test Scores
The data for school B show a left-skewed
distribution. This means that more than
half of the classes received a test score
that was above the mean.
399
Lesson 2
6/10/15 8:51 AM
Lesson Performance Task
INTEGRATE MATHEMATICAL
PRACTICES
Focus on Reasoning
MP.2 Ask students whether the dealer who tended
The tables list the daily car sales of two competing dealerships.
Dealer A
Dealer B
14
13
15
12
16
17
15
20
15
16
15
17
18
19
18
17
17
12
16
14
19
10
19
18
15
16
14
16
15
17
20
19
13
14
18
15
18
18
16
17
to sell more cars than a competitor would necessarily
make the greater profit. Students should recognize
that a greater number of car sales leads to a greater
profit only when the profit per car is about the same
in both cases. If one dealer sold more cars by setting
the prices so low that there was a very small profit
margin, that dealer could end up with lower profits
despite having more sales.
A. Calculate the mean, median, interquartile range (IQR), and standard deviation for
each data set. Compare the measures of center for the two dealers.
IQR (Q 3 – Q 1)
Standard
deviation
15
2
1.6
18
2.5
2.2
Mean
Median
Dealer A
14.85
Dealer B
17.3
QUESTIONING STRATEGIES
The number of cars sold by Dealer A tends to be lower than the number of cars sold by Dealer B.
What might be some reasons for an outlier to
occur in a set of daily car sale values? Possible
answers: There might have been a day with very bad
weather, so no one went car shopping, or a day
when the best salespeople were out sick, so they
didn’t sell any cars.
The number of cars sold by Dealer A are more consistent than the number of cars sold by Dealer B.
B. Create a dot plot for each data set. Compare the distributions of the data sets.
Dealer A
x
x
x x x
x x x
x
x
x
x
x
Dealer B
x
x
x x
x x x
x
x
x
x
x
x
x
x
x x
x x
10 11 12 13 14 15 16 17 18 19 20
The data for Dealer A show a symmetric
The data for Dealer B show a distribution
distribution, so the number of cars sold
skewed to the left, so during more than half
daily by Dealer A is evenly distributed
of the days, car sales were greater than the
about the mean.
mean.
C. Determine if there are any outliers in the data sets. If there are, remove the outlier and find the statistics for
that data set(s). What was affected by the outlier?
Dealer A:
Dealer B:
x < 14 - 1.5 (2)
x > 16 + 1.5 (2)
x < 16.5 - 1.5 (2.5)
x > 19 + 1.5 (2.5)
x < 11
x > 19
x < 12.75
x > 22.75
There are no values in the data set that
satisfy these inequalities for x. So, there
are no outliers.
Module 9
© Houghton Mifflin Harcourt Publishing Company
10 11 12 13 14 15 16 17 18 19 20
x
x
x x x
x x x
10 is an outlier in the data set for Dealer B.
Removing the outlier increases the mean and decreases
the standard deviation. The median is unaffected.
400
Lesson 2
EXTENSION ACTIVITY
IN1_MNLESE389755_U4M09L2.indd 400
Explain to students that a bimodal data distribution has two peaks. Have students
create a set of 20 daily car-sale values with a bimodal distribution, then create a
dot plot and calculate statistics for the data. Ask what situations might produce
this distribution. Students may speculate that a sudden change in sales tactics or
prices could lead to several days with much higher or lower sales values than
preceding days. Point out that neither the mean nor the median accurately
represents a bimodal distribution. Explain that in some cases, such as when the
data originate from two different sets of conditions, it is appropriate to split it into
two data sets and evaluate them separately.
25/07/14 12:47 PM
Scoring Rubric
2 points: Student correctly solves the problem and explains his/her reasoning.
1 point: Student shows good understanding of the problem but does not fully
solve or explain his/her reasoning.
0 points: Student does not demonstrate understanding of the problem.
Data Distributions and Outliers 400