Download Descriptive Statistics

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Descriptive Statistics
(Level IV Graduate Math)
Draft
(NSSAL)
C. David Pilmer
©2011
(Last Updated: Dec 2011)
This resource is the intellectual property of the Adult Education Division of the Nova Scotia
Department of Labour and Advanced Education.
The following are permitted to use and reproduce this resource for classroom purposes.
• Nova Scotia instructors delivering the Nova Scotia Adult Learning Program
• Canadian public school teachers delivering public school curriculum
• Canadian nonprofit tuition-free adult basic education programs
The following are not permitted to use or reproduce this resource without the written
authorization of the Adult Education Division of the Nova Scotia Department of Labour and
Advanced Education.
• Upgrading programs at post-secondary institutions
• Core programs at post-secondary institutions
• Public or private schools outside of Canada
• Basic adult education programs outside of Canada
Individuals, not including teachers or instructors, are permitted to use this resource for their own
learning. They are not permitted to make multiple copies of the resource for distribution. Nor
are they permitted to use this resource under the direction of a teacher or instructor at a learning
institution.
Acknowledgments
The Adult Education Division would also like to thank the following NSCC instructors for
reviewing this resource and offering suggestions during its development.
Eileen Burchill (IT Campus)
Nancy Harvey (Akerley Campus)
Eric Tetford (Burridge Campus)
Tanya Tuttle-Comeau (Cumberland Campus)
Alice Veenema (Kingstec Campus)
Table of Contents
Introduction…………………………………………………………………………...
Negotiated Completion Date………………………………………………………….
The Big Picture……………………………………………………………………….
Course Timelines……………………………………………………………………..
ii
ii
iii
iv
Populations and Samples …………………………………………………………….
Tables ………………………………………………………………………………...
Types of Data ………………………………………………………………………..
Bar Graphs and Histograms …………………………………………………………
Circle Graphs and Line Graphs ………………………………………………………
First Impressions …………………………………………………………………….
Second Impressions ………………………………………………………………….
What Type of Graph Should be Used ……………………………………………….
Mean, Median, Mode, and Trimmed Mean ………………………………………….
Box and Whisker Plots ……………………………………………………………….
Using Technology to Make Box and Whisker Plots …………………………………
Standard Deviation …………………………………………………………………...
Using Technology to Calculate Population Standard Deviation ……………………..
Distributions ………………………………………………………………………….
Normal Distributions and the 68-95-99.7 Rule ………………………………………
Z-Scores ………………………………………………………………………………
Growth Charts ………………………………………………………………………..
Putting It Together ……………………………………………………………………
1
3
5
7
15
20
22
24
26
34
41
46
52
57
60
68
80
85
Appendix
Area Under the Normal Curve (z-Table) ……………………………………………..
Weight-for-Age Percentiles: Boys …………………………………………………...
Length-for-Age Percentiles: Boys ……………………………………………………
Head Circumference-for-Age: Boys …………………………………………………
Post-Unit Reflections …………………………………………………………………
Answers ………………………………………………………………………………
96
97
98
99
100
101
NSSAL
©2011
i
Draft
C. D. Pilmer
Introduction
Statistics is the discipline concerned with the collection, organization, and analysis of data to
draw conclusions or make predictions. Statistics is widely employed in government, business,
and the natural and social sciences. In this unit we will focus on descriptive statistics; the
branch of statistics that deals with the description of data. In the first part of the unit, we will
look at the different ways data can be presented using graphs (e.g. bar graphs, histograms, circle
graphs, line graphs,…) and how these graphs can be interpreted. In the next part of the unit we
will learn how to determine and interpret measures of central tendency and standard deviation.
In descriptive statistics, we must differentiate between two important terms; population and
sample. A population is the set representing all measurements of interest to an investigator. A
sample is a subset of measurements selected randomly from the population of interest. It is
probably easier to look at these terms in the following way. Suppose you wanted to know the
average income of working adults in your community. If you asked every working adult in the
community, then you are dealing with the population. If, however, you randomly selected and
interviewed only a portion of the working adults in your community, then you are dealing with a
sample. For the sake of simplicity, this unit will only focus on populations. For example, if one
of the questions supplies student scores on a test, you will assume that these scores represent all
the student scores, not a randomly selected portion of the scores.
The other branch of statistics that we have not discussed is inferential statistics. In the case of
inferential statistics one makes inferences about population characteristics based on evidence
drawn from samples. Translated you take a random sample from a population and use the
information collected from that small sample to make a prediction about the much larger
population. For example if you wanted to know how much time Nova Scotian adults between
the ages of 20 years and 40 years of age spent watching television on weekdays, it would be
impractical to collect data from every NS adult in that age group. It would be very challenging,
time-consuming, and expensive. It would make more sense to randomly select 300 adults from
that age group, collect the data, analyze the data, and use that data to predict the average number
of hours all NS adults in that age group view television on weekdays. Although inferential
statistics is an extremely important branch of statistics, it goes beyond what is needed for a
graduate level math course. Inferential statistics is, however, examined in the Academic Level
IV Math course.
Negotiated Completion Date
After working for a few days on this unit, sit down with your instructor and negotiate a
completion date for this unit.
Start Date:
_________________
Completion Date:
_________________
Instructor Signature: __________________________
Student Signature:
NSSAL
©2011
__________________________
ii
Draft
C. D. Pilmer
The Big Picture
The following flow chart shows the five required units and the four optional units (choose two of
the four) in Level IV Graduate Math. These have been presented in a suggested order.
Math in the Real World Unit (Required)
• Fractions, decimals, percents, ratios, proportions, and
signed numbers in real world applications
• Career Exploration and Math
Solving Equations Unit (Required)
• Solve and check equations of the form Ax + B = Cx + D ,
A = Bx 2 + C , and A = Bx 3 + C .
Consumer Finance Unit (Required)
• Simple Interest and Compound Interest
• TVM Solver (Loans and Investments)
• Credit and Credit Scores
Graphs and Functions Unit (Required)
• Understanding Graphs
• Linear Functions and Line of Best Fit
Measurement Unit (Required)
• Imperial and Metric Measures
• Precision and Accuracy
• Perimeter, Area and Volume
Choose two of the four.
Linear
Functions and
Linear Systems
Unit
Trigonometry
Unit
Statistics Unit
ALP Approved
Projects
(Complete 2 of
the 5 projects.)
Note:
You are not permitted to complete four ALP Approved Projects and thus avoid selecting from
the Linear Functions and Linear Systems Unit, Trigonometry Unit, or Statistics Unit.
NSSAL
©2011
iii
Draft
C. D. Pilmer
Course Timelines
Graduate Level IV Math is a two credit course within the Adult Learning Program. As a two
credit course, learners are expected to complete 200 hours of course material. Since most ALP
math classes meet for 6 hours each week, the course should be completed within 35 weeks. The
curriculum developers have worked diligently to ensure that the course can be completed within
this time span. Below you will find a chart containing the unit names and suggested completion
times. The hours listed are classroom hours.
Unit Name
Minimum
Completion Time
in Hours
24
20
18
28
24
20
20
Total: 154 hours
Math in the Real World Unit
Solving Equations Unit
Consumer Finance Unit
Graphs and Functions Unit
Measurement Unit
Selected Unit #1
Selected Unit #2
Maximum
Completion Time
in Hours
36
28
24
34
30
24
24
Total: 200 hours
As one can see, this course covers numerous topics and for this reason may seem daunting. You
can complete this course in a timely manner if you manage your time wisely, remain focused,
and seek assistance from your instructor when needed.
NSSAL
©2011
iv
Draft
C. D. Pilmer
Populations and Samples
As we learned in the introduction, descriptive statistics is concerned with the description of
data. This means that we look at methods that organize data and summarize data in an effective
presentation that ultimately increases our understanding of the data.
In the same introduction, we learned about
populations and samples. A population is the set
representing all measurements of interest to an
investigator. A sample is a subset of measurements
selected randomly from the population of interest.
The relationship between a sample and population
can be represented by the diagram on the right
where the sample is a small portion of the
population. With the exception of this small section
of the unit, we are only going to focus on
populations.
Population
Sample
Example 1
The Testing and Evaluation Division of the Department of Education reported that the average
mark on the grade 12 provincial math exam was 68%. This average was obtained by randomly
selecting 500 exams from throughout the province. Are we dealing with a sample or a
population? Explain.
Answer:
The Testing and Evaluation Division randomly selected 500 exams, rather than every exam.
For this reason they were dealing with a sample (i.e. a subset of the population).
Example 2
Statistics Canada had all households complete the long-form census. They reported that the
average salary, after tax, of unattached individuals in 2009 was $31 500. Are we dealing with a
sample or a population? Explain.
Answer:
Since every household, which would include every unattached individual, was reporting,
then we are dealing with a population (i.e. all measurements of interest).
Questions:
1. The town’s mayor is interested in knowing what portion of her 4127 taxpayers support the
development of a new recreational center in the community. Because it is too costly to
contact all the taxpayers, a survey of 300 randomly selected taxpayers is conducted.
Describe the population and sample for this problem.
NSSAL
©2011
1
Draft
C. D. Pilmer
2. A building contractor just purchased 6000 used bricks. He knows that a small portion of
these bricks are cracked and therefore unusable. He randomly selected 200 bricks and
discovered that 14 of them were unusable. Describe the population and sample for this
problem.
3. A company conducted a phone survey that involved 1200 randomly selected employed
workers from Nova Scotia. Each participant had to report their annual gross income. At the
time (2009) it was known that there were 453 000 employed workers in Nova Scotia. After
conducting the survey and analyzing the data, the company reported an average annual
income of 29 900 for the 1200 participants. Describe the population and sample for this
problem.
4. Between 2001 and 2009, 3730 adults obtained high school diplomas through the Nova Scotia
School for Adult Learning (NSSAL). The Nova Scotia government wanted to know how
many of these adults pursued further education after obtaining their diploma. After
interviewing 240 randomly selected graduates, it was discovered that 65% had pursued post
secondary education primarily at the Nova Scotia Community College. Describe the
population and sample for this problem.
NSSAL
©2011
2
Draft
C. D. Pilmer
Tables
Investigation: The Fringe Movie Festival
A small privately owned multiplex movie theatre has decided to host a fringe movie festival.
Over the weekend, they are showing "cheesy" prequel movies that are obvious parodies of the
original blockbusters. The following table shows the number of tickets sold for each movie over
the weekend. They have broken the tickets into three categories: senior, adult, and child tickets.
Movie
Jaws: The Teething Years
Terminator: Rise of the Toasters
Star Wars: Episode 0
Avatar: Evolving from the Blue Man Group
Transformers: The Horse and Buggy Years
Senior
Tickets
158
33
133
51
62
Adult
Tickets
349
412
341
409
350
Child
Tickets
54
47
146
136
122
Use the table to answer the following questions.
1. Which movie had the greatest number of child viewers?
2. Which movie had the greatest number of viewers during the festival? How did you arrive at
this answer?
3. Which movie had the fewest number of viewers during the festival?
4. Based solely on ticket sales, what movie appeared to be most popular by both seniors and
adults? How did you arrive at this answer?
5. Based solely on ticket sales, what movie appeared to be least popular by both seniors and
adults?
6. Could you quickly answer the questions above? Besides a table, what other way could the
data be displayed so that you can more efficiently address the questions?
NSSAL
©2011
3
Draft
C. D. Pilmer
7. Here is the stacked bar graph corresponding to the fringe movie festival ticket sales data.
Number of Tickets Sold
700
600
500
Child Tickets
400
Adult Tickets
300
Senior Tickets
200
Avatar:Evolving
from the Blue
Man Group
Star Wars:
Episode 0
Terminator:
Rise of the
Toasters
Jaws: The
Teething
Years
0
Transformers:
The Horse and
Buggy Years
100
What are your thoughts regarding presenting the data in this graphical form?
8. Was the fringe movie festival data collected on the previous page derived from a sample or a
population? Justify your answer.
NSSAL
©2011
4
Draft
C. D. Pilmer
Types of Data
In the last section we learned that data is often easier to understand if it is expressed as a graph
instead of a table. Before we can look at all the different ways data can be displayed in graphical
form (e.g. line graphs, circle graphs, histograms, …), we need to take a few minutes and learn
about the different types of data. These different types influence the type of graph that can be
used.
When data is collected, the responses can be classified as a categorical data set or a numerical
data set. These two terms are most easily explained using an example. Suppose we have an
adult education class comprised of 10 learners who all have cell phones. The instructor asks two
questions and obtains the following responses.
Question 1: What cell phone provider do you use?
Responses to Question 1:
{Telus, Bell Aliant, Telus, Bell Aliant, Rogers, Rogers, Koodo, Rogers, Telus, Rogers}
Question 2: What was your cell phone bill for the previous month?
Responses to Question 2:
{$27.80, $33.50, $45.70, $32.00, $54.90, $29.00, $43.65, $67.40, $35.89, $39.67}
The collection of responses to the first question is called a categorical data set. Categorical data
is data that can be assigned to distinct non-overlapping categories. The responses to question 1
fit into four categories; Bell Aliant, Koodo, Rogers and Telus. The collection of responses to the
second question is called a numerical data set. This is the case because the data is comprised of
numbers, specifically different amounts of money.
There are two types of numerical data; discrete and continuous. Numerical data is discrete if the
possible values are isolated points on a number line. For example, if survey participants were
asked how many phone calls they made today, their responses would be whole numbers like 0, 4
or 12. They would not respond with something like 7.8 phone calls. Since they can only report
isolated points, then we end up with discrete numerical data. Numerical data is continuous if the
set of possible values forms an entire interval on the number line. For example, if soil samples
were tested for acidity, the pH could be reported with numbers like 4, 4.17, 4.173, or any other
number in the interval. Generally continuous data arises when observations involve making
measurements (e.g. weighing objects, recording temperatures, recording time to complete
tasks,…) while discrete data arises when observations involve counting.
NSSAL
©2011
5
Draft
C. D. Pilmer
Question:
1. For each of the following, state whether the data collection would result in a categorical data
set or numerical data set. If the data is numerical, indicate whether we are dealing with
discrete or continuous data.
(a)
(b)
Concentration in parts per million (ppm) of a particular
contaminant in water supplies
Brand of personal computer purchased by customers
(c)
The sex of children born at the IWK Hospital in December
(d)
The height of male adult education learners at a specific
campus
The number of children in each household.
(e)
(f)
(g)
(h)
(i)
(j)
The gross income of adult workers between the ages of 25
and 35 in Nova Scotia
The races of people immigrating to Canada
The time it takes for females between the ages of 20 and 30 to
complete the 100 m dash
The sum of the numbers rolled on two dice
(k)
The amount of gas purchased by individual UltraCan
customers on a specific day
The size of shoe purchased by teenage males
(l)
The destination city or town for summer vacations
(m) The head circumference of a newborn child
(n)
NSSAL
©2011
The country of manufacture for vehicles in the staff parking
lot at the NSCC Waterfront Campus
6
Draft
C. D. Pilmer
Bar Graphs and Histograms
Bar graphs and histograms look very similar so learners often get them confused. Bar graphs
are used to display categorical data or discrete numerical data. The bars in bar graphs are
separated from one another. Examples of bar graphs are shown below.
Bar Graph #1
In this survey, 60 randomly selected Australian
students were asked to report in which month
they were born.
Bar Graph #2
In this survey, 200 randomly selected
international students were asked which hand
they write with.
Histograms are used to display continuous numerical data where the data is organized into
classes. The bars on a histogram are not separated from one another.
Histogram #1
In this survey, 100 randomly selected students
from all over the world were asked to report
how long it took to travel from home to school.
In this case the class width is 5. The first class
goes from 0 to 5, not including five. The
second class goes from 5 to 10, not including
10.
NSSAL
©2011
Histogram #2
Forty randomly selected secondary students
from Canada were asked to report their heights
in centimeters. As with Histogram #1, the
class width in this case is 5 however the
intervals do not start and end on multiples of 5.
For example the first class showing a value is
centered at 120. That means that this class
goes from 117.5 to 122.5, not including 122.5.
7
Draft
C. D. Pilmer
Transformers:
The Horse and
Buggy Years
Avatar:Evolving
from the Blue
Man Group
Star Wars:
Episode 0
45
40
35
30
quantity sold
Double bar graphs allow one to present
more than one kind of information,
situation, or event in one graph, instead
of drawing two separate bar graphs.
One of the most common uses is to
simultaneously display data for both
males and females. The example on the
right shows how the coffee purchasing
decisions for males and female differ at
a particular coffee shop on a particular
morning.
Terminator:
Rise of the
Toasters
Jaws: The
Teething
Years
Number of Tickets Sold
Bar graphs also come in different forms;
700
two of the most common are stacked bar
600
graphs and double bar graphs. We have
500
already been exposed to stacked bar
Child Tickets
400
graphs when we completed the
Adult Tickets
300
Senior Tickets
questions regarding the fringe movie
200
festival in the section titled "Tables."
100
On a stacked bar graph the bars are
0
divided into categories so that we can
compare the parts to the whole. In the
case of the fringe movie festival graph,
the bars were divided into three
categories: senior tickets, adult tickets, and child tickets. By doing this we can quickly see how
those three types of tickets sales contributed to the overall sales for each movie.
25
male
20
female
15
10
5
0
small coffee
It should be mentioned that in all
the bar graph examples we have
provided to this point, the bars
have been oriented vertically. Bar
graphs can also be drawn such that
the bars are in a horizontal
orientation. That is what we have
done with the stacked bar graph on
the right which was obtained using
the data from the fringe movie
festival.
large coffee
Transformers: The
Horse and Buggy
Years
Avatar:Evolving
from the Blue Man
Group
Senior Tickets
Star Wars:
Episode 0
Adult Tickets
Child Tickets
Terminator: Rise of
the Toasters
Jaws: The Teething
Years
0
NSSAL
©2011
medium coffee
8
100
200
300
400
500
600
700
Draft
C. D. Pilmer
Example 1
Anne tracked the additional time, in minutes, she spent outside of regular class time to work on
her five courses, over two days (Wednesday and Thursday). That information is displayed in the
graph below.
Minutes of Additional Work
40
35
30
25
Wednesday
20
Thursday
15
10
5
lo
gy
So
cio
to
ry
Hi
s
at
h
M
un
ica
t io
ns
Co
m
m
Bi
ol
og
y
0
(a) How much time did she spend on Thursday doing additional work in History?
(b) In what subject and on what day did she spend 25 minutes doing additional work?
(c) In what subject did she spend the same amount of time on Wednesday and Thursday doing
additional work?
(d) How much more time did she spend on Wednesday doing additional work in Math compared
to Thursday?
(e) How much more time did she spend on Thursday doing addition work in Biology compared
to History?
(f) How much time over the two days did she spend doing additional work in Biology and
Communications?
Answers:
(a) 10 minutes
(b) Math on Thursday
(c) Sociology (She spent 15 minutes each day)
(d) Math Wednesday: 30 minutes
Math Thursday: 25 minutes
30 - 25 = 5 minutes
(e) Biology Thursday: 20 minutes
History Thursday: 10 minutes
20 - 10 = 10 minutes
(f) 15 + 20 + 20 + 35 = 90 minutes or 1.5 hours
NSSAL
©2011
9
Draft
C. D. Pilmer
Example 2
Thirty-six randomly selected males between the ages of 20 and 29 years of ages were weighed.
The weights in pounds are shown below.
210
143
194
174
203
181
224
171
178
186
182
186
188
215
192
182
194
174
166
177
192
188
191
167
207
189
155
178
162
202
160
193
181
188
181
196
(a) Construct a histogram with class widths of 10 starting at 140.
(b) What percentage of the randomly selected males weighed less than 180 pounds?
Answers:
(a) Construct a table to organize the data in terms of the classes. The first class is from 140
to 150 includes 140 but does not include 150.
Class
140 to 150
Tally
Frequency
1
150 to 160
1
160 to 170
4
170 to 180
6
180 to 190
11
190 to 200
7
200 to 210
3
210 to 220
2
220 to 230
1
Now construct the histogram.
(b) Out of the 36 participants, 12 weighed less than 180 pounds.
12
1
× 100 = 33 %
36
3
NSSAL
©2011
10
Draft
C. D. Pilmer
Questions
200
180
160
Number of Fans (in millions)
1. A study was conducted to see which major
league sport is most popular. In the study, they
looked at how many fans (in millions) each
sport has. The information is displayed using a
bar graph.
Acronyms:
NFL: National Football League
NBA: National Basketball Association
MLB: Major League Baseball
NHL: National Hockey League
NASCAR: National Association for Stock
Car Auto Racing
140
120
100
80
60
40
20
0
NFL
NBA
MLB
NHL
NASCAR
(a) Which sport is most popular amongst the fans?
(b) Approximate the number of fans the National Hockey League has.
(c) Which major league sport has 120 million fans?
(d) Approximately how many more fans does the NFL have compared to the NBA?
(e) Is this a bar graph or histogram?
2. The medal counts for the 2006 and 2010 winter Olympics for four countries have been
provided in the following graph.
Norway
Germany
2010
2006
United States
Canada
0
2
4
6
8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
number of medals
(a) What type of graph are we dealing with?
NSSAL
©2011
11
Draft
C. D. Pilmer
(b) Of the four countries, which had highest medal count in 2006?
(c) What was the medal count for the United States in 2010?
(d) Which country had a medal count of 19 in 2006?
(e) How many more medals did Canada obtain in 2010 compared to 2006?
(f) In 2010, how many more medals did the United States get compared to Germany?
(g) What was the total medal count all four countries in 2010?
(h) What was the total medal count for both Germany and the United States over the 2006
and 2010 winter Olympics?
3. The Canadian Nurses Association reported the age distribution of all registered nurses (RNs)
in Canada for the year 2009. This data was used the construct the following graph.
40000
35000
Number of RNs
30000
25000
20000
15000
10000
5000
0
<24
25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64
65+
Age
Source: Canadian Institute for Health Information
(a) What type of graph are we dealing with?
(b) What type of data was used to construct this graph?
(c) Approximately how many registered nurses in 2009 were between the ages 30 and 39?
NSSAL
©2011
12
Draft
C. D. Pilmer
(d) In 2009, approximately how many more 55 to 59 year old RNs are there compared to 60
to 64 year old RNs?
(e) What three classes of ages had the greatest number of RNs in 2009?
(f) Considering that Canada has an aging population, what potential problem is likely to
occur in the near future based on the information supplied in this graph.
1400
1300
1200
1100
1000
900
800
700
600
500
400
300
200
100
0
Inpatient Days
Consult Visits
10
20
09
/
20
09
20
08
/
20
08
20
07
/
20
07
20
20
06
/
20
05
/
20
20
04
/
20
06
New Consults
05
Number of Cases
4. The Nephrology and Hypertension Department of the Children's Hospital in London, Ontario
reported the number of cases they addressed over the different fiscal years (i.e. from April 1
of one year to March 31 of the next year). They broke the cases into three categories: new
consults, consult visits, and inpatient days. New consults refer to cases that have been
referred by an outside source (typically a family doctor) to the department. With each case,
the information in the patient's medical file is reviewed to see if the patient needs can be
served by the department. Consult visits refer to day clinic visits by patients. Inpatient days
refer to hospital stays by patients whose immediate needs cannot be met by day clinic visits.
Fiscal Year
Source: University of Western Ontario, Department of Paediatrics
(a) What type of graph are we dealing with?
(b) Were there significant changes in the number of new consults to the Nephrolopgy and
Hypertension Department over the six fiscal years?
NSSAL
©2011
13
Draft
C. D. Pilmer
(c) Approximately how many cases were dealt with in the 2008/2009 fiscal year?
(d) Approximately how many consult visits were dealt with in 2004/2005?
(e) Approximately how many cases involving inpatient visits were addressed in 2005/2006?
(f) Approximately how many more cases involving consult visits occurred in 2006/2007
compared to 2005/2006?
(g) What was the big shift from 2008/2009 to 2009/2010?
5. Thirty randomly selected families of four were asked how much they spent on their last
family meal at a restaurant. The following data was obtained.
70
68
62
86
78
67
94
82
75
74
66
103
65
97
64
68
80
83
67
71
77
72
69
64
90
72
78
66
64
86
(a) Construct a histogram with class widths of 5 starting at 60. Reminder that the class 60 to
65 does not include the number 65. The 65 is in the next class.
Class
60 to 65
65 to 70
70 to 75
75 to 80
80 to 85
85 to 90
90 to 95
95 to 100
100 to 105
Tally
Frequency
(b) What percentage of the families spent $90 or more on their meal?
(c) What type of data are we dealing with?
(d) Are we dealing with a sample or population?
NSSAL
©2011
14
Draft
C. D. Pilmer
Circle Graphs and Line Graphs
Circle graphs, also called pie charts, are divided into sectors where each sector represents part
of a whole. Each sector is proportional in size to the amount each sector represents. For
example if 70 out of 140 people responded that their favorite ice cream was chocolate, then the
"chocolate" sector of the circle graph would be 50% or half of the circle graph.
Example 1
In 1999, registered nurses were asked to report
where they were employed. The results are
presented in the circle graph on the right. At the
time there were 229 000 registered nurses in
Canada.
Community
Health Agency
8%
Other
16%
Home Care
4%
Nursing Home
12%
Source: Registered Nurses Database
Not Stated
(a) What percentage of registered nurses
1%
worked in nursing homes in 1999?
(b) Approximately how many registered nurses
worked in hospitals in 1999?
(c) Approximately 9160 RNs were employed in
Hospital
what sector?
59%
(d) Approximately how many RNs were
employed in either home care or nursing
homes?
(e) Approximately how many more RNs were employed in hospitals than in community health
agencies?
(f) What is the ratio of RNs employed in community health agencies to nursing home?
Answers:
(a) 12%
(b) 59% of 229 000
0.59 × 229 000 = 135 110 RNs
(c)
9160
× 100 = 4% These RNs are working in home care.
229000
(d) 4% + 12% = 16%
16% of 229 000
0.16 × 229 000 = 36 640 RNs
(e) 59% - 8% = 51%
51% of 229 000
0.51 × 229 000 = 116 790 RNs
(f)
community health agency
nursing home
8
8÷4 2
← desired ratio
=
=
12 12 ÷ 4 3
Line graphs are created by plotting data points and connected them with lines. These lines are
useful for showing trends; that is, how something changes in value as something else happens.
NSSAL
©2011
15
Draft
C. D. Pilmer
Example 2
This line graph shows how the fertility rate in Canada
has changed since 1950. The fertility rate is the
average number of children born of women between
the ages of 15 and 49.
4.5
4
3.5
Fertility Rate
3
Source: Statistics Canada
(a) What was the approximate fertility rate in 1970?
(b) In what year was the fertility rate approximately
3.2?
(c) How much did the fertility rate drop by between
1960 and 1970?
(d) After 1960, when did the fertility rate increase?
2.5
2
1.5
1
0.5
0
1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000
Year
Answers:
(a) 2.3
(b) 1965
(c) 3.9 - 2.3 = 1.6
The fertility rate dropped by approximately 1.6
(d) It only increased slightly between 1985 and 1990.
Questions
1. The following circle graph was
constructed using data collected
from all patients over a one month
period at a specific emergency
room. That month 1200 patients
visited the site.
home injuries
19%
respiratory
problems
9%
heart attacks
14%
miscellaneous
7%
w ork injuries
24%
auto accidents
27%
(a) What is the leading cause of emergency room visits to this location during this month?
(b) How many more times likely was the staff at this emergency room going to see a patient
injured in an auto accident compared to a patient having respiratory problems?
(c) How many patients suffering from work related injuries sought treatment at the
emergency room?
NSSAL
©2011
16
Draft
C. D. Pilmer
(d) How many more patients sought treatment for heart attacks compared to patients
suffering from respiratory problems?
(e) Which one of the following represents the ratio of patients with worked related injuries to
patients suffering from heart attacks? (Multiple Choice)
7
12
(i)
(ii)
12
7
14
27
(iv)
(iii)
27
14
(f) What was the cause of emergency room visits for 228 patients?
2. The following graph shows the value of Canada's exports from January 2008 until November
2010. The values are expressed in millions of Canadian dollars; for example the number
20,000 on the vertical scale represents $20,000 million dollars or $20 billion dollars.
50,000.00
Exports in Millions of Dollars
45,000.00
40,000.00
35,000.00
30,000.00
25,000.00
20,000.00
15,000.00
10,000.00
5,000.00
J
Fe anbr 0 8
u
M ar y
ar
c
Ap h
r
M il
ay
Ju
n
J e
Se Au uly
pt gu
e s
O mb t
c
No to er
b
De vem er
ce be
m r
Ja be
Fe n- r
br 0 9
u
M ar y
ar
c
Ap h
r
M il
ay
Ju
n
J e
Se Au uly
pt gu
e s
O mb t
No cto er
b
De vem er
ce be
m r
J be
Fe an- r
br 1 0
u
M ar y
ar
c
Ap h
r
M il
ay
Ju
n
J e
Se Au uly
pt gu
e s
O mb t
c
No to er
ve be
m r
be
r
0.00
Source: Statistics Canada
(a) Name at least three periods when Canada's exports largely remained unchanged.
NSSAL
©2011
17
Draft
C. D. Pilmer
(b) During what month and year did Canada's exports almost reach $45 billion dollars?
(c) When were Canada's exports lowest between Jan-08 and Nov-10?
(d) Approximately how much did exports drop by between October 2008 and January 2009?
Based on your knowledge of world events, why do you think this occurred?
3. There were 725 housing starts in the first quarter of 2011 in Nova Scotia. These starts were
broken into four categories: single detached (i.e. single dwelling homes), semi-detached (i.e.
single-family home that is joined on one side to another home), row housing (i.e.
townhouse), and apartments.
Single Detached,
293
Apartments, 337
Semi-detached, 60
Row Housing, 35
Source: Canada Mortgage and Housing Corporation
(a) What percentage of the housing starts was for single detached homes?
(b) What is the ratio of row housing starts to semi-detached starts?
(c) How many more apartment starts were there compared to the combined row housing and
semi-detached starts?
NSSAL
©2011
18
Draft
C. D. Pilmer
(d) The Canada Mortgage and Housing Corporation predicts that the second quarter housing
starts in Nova Scotia will increase from 725 to 850. If they assume that the proportion of
single detached starts remains the same from the first quarter to the second, how many
single detached starts do they anticipate in this second quarter?
Value of RIM Sotck ($)
4. The value of stock changes over time. The following line graph shows how the Research in
Motion (RIM) stock changed over the month of June in 2011. Notice that the month is
comprised of 22 days, rather than 30. There were only 22 trading days in June 2011; stocks
are not traded on weekends.
42
41
40
39
38
37
36
35
34
33
32
31
30
29
28
27
26
25
24
23
22
21
20
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Trading Day
Source: Nasdaq.com
(a) On what trading day was the greatest single day loss in the value of RIM shares during
the month of June? Approximate the amount that was lost per share on that day.
(b) By how much approximately did the stock drop by from the beginning of the month until
the end of the month?
(c) On what trading day was the greatest single day gain in the value of RIM shares during
the month of June? Approximate the amount that each share increased by on that day.
NSSAL
©2011
19
Draft
C. D. Pilmer
First Impressions
Part 1
Grocery store customers were asked to
identify their favorite brand of ice
cream. Once the data was collected, a
circle graph was constructed. It is
shown on the right.
Jen and Berry Ice
Cream
Charmer Dairies Ice
Cream
Faxter Ice Cream
What is your first impression regarding
customer's preferences for particular
brands of ice cream?
Part 2
The 2001 population counts for five
urban centres in Canada were used to
construct this graph.
140000
130000
120000
Source: Statistics Canada
Population
What is your first impression regarding
the population counts for these centres?
110000
100000
90000
80000
70000
60000
50000
Lethbridge,
AB
NSSAL
©2011
20
Moncton,
NB
Nanaimo,
BC
Sarnia, ON
TroisRiveres, QC
Draft
C. D. Pilmer
100%
80%
Percentage
Part 3
The owners of an amusement park
kept track of the number of male and
female patrons that used four
particular rides in the park on a
weekday morning. They used the data
to construct the following graph.
What is your first impression
regarding the patron usage of these
rides?
60%
Females
Males
40%
20%
0%
Hurl-a-Twirl
Source: Statistics Canada
What is your first impression regarding
the change in the price of a domestic
flight
Bumper
Boats
Zip Line
210
Average Domestic Fare
Part 4
The following line graph shows how the
average price of a domestic flight from
Halifax changed between the first
quarter of 2007 until the third quarter of
2010.
Death Drop
200
190
180
170
III
II
IV
01
0
I-2
II
III
IV
00
9
I-2
II
III
II
III
IV
00
8
I-2
I-2
00
7
160
Quarters
NSSAL
©2011
21
Draft
C. D. Pilmer
Second Impressions
We are going to re-examine some of the real world applications that we were exposed to in the
section titled "First Impressions."
In part 1 of First Impressions, we looked at a
circle graph regarding customer's preference for
particular brands of ice cream. We have
redrawn the circle graph using the same data.
Based on this new perspective of the circle
graph, have your first impressions changed?
Why or why not?
Charmer
Dairies Ice
Cream
36%
Jen and
Berry Ice
Cream
36%
Faxter Ice
Cream
28%
140000
130000
120000
110000
100000
Population
In part 2 of First Impressions, we
looked at a bar graph regarding 2001
population counts for five Canadian
urban centres. We have redrawn the
graph using the same data. Based on
this new graph, has your first
impression changed? Why or why
not?
90000
80000
70000
60000
50000
40000
30000
20000
10000
0
Lethbridge,
AB
NSSAL
©2011
22
Moncton,
NB
Nanaimo,
BC
Sarnia, ON
TroisRiveres, QC
Draft
C. D. Pilmer
250
200
Number of People
In part 3 of First Impressions, we
looked at a stacked bar graph
regarding the patron usage of four
specific rides in an amusement park.
We have redrawn the graph using
the same data. Based on this new
graph, has your first impression
changed? Why or why not?
150
Females
Males
100
50
0
Hurl-a-Twirl
Bumper
Boats
Zip Line
250
Average Domestic Fare
In part 4 of First Impressions, we
looked at a line graph regarding the
average price of domestic flights
from Halifax. We have redrawn the
graph using the same data. Based
on this new graph, has your first
impression changed? Why or why
not?
Death Drop
200
150
100
50
II
III
01
0
I-2
III
IV
II
IV
00
9
I-2
II
III
00
8
III
IV
II
I-2
I-2
00
7
0
Quarters
Why did we bother exposing you to the two versions of each of these graphs?
NSSAL
©2011
23
Draft
C. D. Pilmer
What Type of Graph Should Be Used?
Below you have been provided with data tables. Indicate what type of graph (histogram, line,
circle, bar, double bar, or stacked bar graph) you would use for this data. In a few cases, there
can be more than one acceptable answer.
1. Graph Type: _______________________
Favorite Music
Genre
Pop
Rock
Hip Hop
Country
Blues
Other
Male
Female
90
120
70
100
50
70
150
70
60
120
40
60
3. Graph Type: _______________________
time (s)
0
2
4
6
8
Anne
Jane
Denise
Meera
Yoshi
NSSAL
©2011
Swim
Time
(min)
10
12
13
11
10
Bike
Time
(min)
55
54
58
53
53
Graph Type: _______________________
Favorite Movie
Genre
Action
Comedy
Drama
Horror
Science Fiction
Other
4.
distance (m)
0
1.6
3.2
4.8
6.4
Percentage
32
18
15
8
21
6
Graph Type: _______________________
Time Commuting
to Work (min)
0 - 10
10 - 20
20 - 30
30 - 40
> 40
5. Graph Type: _______________________
Triathlon
Athlete
2.
6.
24
27
39
58
43
12
Graph Type: _______________________
Blood Type
A+
AO+
OB+
BAB+
AB-
Run
Time
(min)
35
37
40
39
41
Frequency
Percentage
35.7
6.3
37.4
6.6
8.5
1.5
3.4
0.6
Draft
C. D. Pilmer
7. Graph Type: _______________________
Town
Amherst
Digby
Kentville
Pictou
Port Hawkesbury
Population
in 2006
9505
2092
5812
3813
3517
9. Graph Type: _______________________
Salaries in Thousands
of Dollars
15 - 25
25 - 35
35 - 45
45 - 55
55 - 65
65- 75
more than 75
NSSAL
©2011
8.
Graph Type: _______________________
Television
Program
Type
Comedy
Drama
Reality
Audience Share (%)
1996 - 1997
2001 - 2002
12
13
10
8
9
8
10. Graph Type: _______________________
Number of
Employees
16
43
57
48
23
11
6
Year
1997
1998
1999
2000
2001
2002
2003
25
Cell Phone Revenues
(Billions of Canadian Dollars)
3.3
4.4
4.6
5.4
6.0
7.2
8.1
Draft
C. D. Pilmer
Mean, Median, Mode, and Trimmed Mean
Charlie looks at the marks his Level IV Graduate Math learners earned in a particular unit over
the last year.
{81, 74, 91, 82, 79, 95, 78, 92, 86, 74, 78, 69, 84, 77, 88, 78, 71}
He wants to report how well his students performed on this particular unit without having to
supply all seventeen pieces of data. He could use a histogram to display the results but he
decides instead to calculate two measures of central tendency: the mean (arithmetic average) and
median (middle).
Mean
The most common measure of central tendency is the arithmetic average, or mean. When
calculating a mean, statisticians differentiate between population means and sample means by
using different symbols. The procedure for calculating either of these means is identical. The
population mean and sample mean are calculated by adding all the data points and then
dividing up the number of data points.
µ=
x1 + x 2 + x3 + ... + x n
n
where µ (mu) is the population mean
x=
x1 + x 2 + x3 + ... + x n
n
where x (x bar) is the sample mean
Although in later sections of this unit, we are only going to concentrate on populations, in this
section we will ask you to know both formulas, specifically the two symbols ( µ and x ) used to
represent the different means.
Let's return to Charlie’s math marks. Since he is looking at the marks of all of the learners who
completed the unit, he is dealing with a population. The population mean, µ , is calculated
below.
x1 + x 2 + x3 + ... + x n
n
81 + 74 + 91 + 82 + 79 + 95 + 78 + 92 + 86 + 74 + 78 + 69 + 84 + 77 + 88 + 78. + 71
µ=
17
1377
µ=
17
µ=
µ = 81
The mean mark for Charlie’s learners on this unit is 81%.
NSSAL
©2011
26
Draft
C. D. Pilmer
Median
The mean is not the only way to describe the center. Another method is to use the “middle
value” of the data which is called the median. The median separates the higher half of the data
from the lower half.
The median can be calculated in the following manner.
1. Arrange the data points in order of size, from smallest to largest.
2. If the number of data points is odd, then the median is the data point in the middle of the
ordered list.
3. If the number of data points is even, then the median is the mean of the two data points
that share the middle of the ordered list.
Return to Charlie’s math marks. The median is calculated by following the procedure provided
below.
Order the data points from smallest to largest
69, 71, 73, 74, 77, 78, 78, 78, 79, 81, 82, 84, 86, 88, 91, 92, 95
Since we have an odd number of data points (n = 17), then median will be in the middle data
point of the ordered list.
69, 71, 74, 74, 77, 78, 78, 78, 79, 81, 82, 84, 86, 88, 91, 92, 95
The median will be 79.
Suppose we had another instructor, Angela, who had sixteen learners who completed the same
unit. She has recorded the marks that they made and worked out the mean and median.
{99, 94, 80, 63, 77, 99, 68, 62, 95, 78, 66, 93, 65, 64, 98, 95}
Mean:
x + x + x3 + ... + x n
µ= 1 2
n
99 + 94 + 80 + 63 + 77 + 99 + 68 + 62 + 95 + 78 + 66 + 93 + 65 + 64 + 98 + 95
µ=
16
1296
µ=
16
µ = 81
The mean mark for these learners on this unit is 81%.
Median:
Order the data points from smallest to largest
62, 63, 64, 65, 66, 68, 77, 78, 80, 93, 94, 95, 95, 98, 99, 99
Since the number of data points is even (n = 16), then the median is the mean of the two data
points that share the middle of the ordered list.
62, 63, 64, 65, 66, 68, 77, 78, 80, 93, 94, 95, 95, 98, 99, 99
78 + 80
Median =
= 79
2
NSSAL
©2011
27
Draft
C. D. Pilmer
Is the Mean and Median Enough?
These measures of central tendency often do not give us a complete understanding of the data set
because they do not give any indication how the data is spread out. This is especially evident
when we look at the means and medians for the two groups of math students previously
discussed. Although the means and medians are identical for Charlie's and Angela's learners, the
marks earned by the two groups are vastly different.
•
In Charlie’s group, the majority of students earned marks between 71 and 88. There was
only one mark in the sixties and only three marks in the nineties. The marks are clustered
together.
•
In Angela's group, learners could largely be divided into two groups; learners who did
very well (i.e. obtained marks in the high 90's) and learners who found the material
challenging (i.e. obtained marks in the 60's). The marks are not clustered together as they
were with Charlie's learners.
Range of Marks
60 to 65
65 to 70
70 to 75
75 to 80
80 to 85
85 to 90
90 to 95
95 to 100
Number of Charlie's
Learners
0
1
3
5
3
2
2
1
Number of Angela's
Learners
3
3
0
2
1
0
2
5
It is important to note that our two measures of central tendency, mean and median, did not
reveal this important difference between the two data sets. We will address this issue in a later
section of this unit.
When are the Mean and Median Not Close to Each Other?
There are times when the mean and median may not be close to each other. One case is if an
outlier exists within the data set. An outlier is a data point that falls outside the overall pattern
of the data set. Consider the following data set where the data points have already been arranged
in ascending order.
{2.8, 3.0, 3.0, 3.1, 3.2, 3.4, 3.4, 3.5, 3.5, 3.6, 3.7, 3.9, 4.0, 4.2, 16.7}
Notice that all but one data point is between 2.8 and 4.2. The mean for this data set is 4.3 and the
median is 3.5. It is obvious that in this case the median is a far better measure of central
tendency than the mean. The outlier, 16.7, greatly influenced the mean to a point where it no
longer accurately represented the center of the data set.
The extreme sensitivity of the mean to even a single outlier and the insensitivity of the median to
outliers led to the development of trimmed means. Trimmed means are calculated by ordering
NSSAL
©2011
28
Draft
C. D. Pilmer
the data points from smallest to largest, deleting a selected number of points from both ends of
the ordered list, and finally averaging the remaining numbers. For example to calculate the 5%
trimmed mean, the bottom 5% of the data points and the top 5% of the data points are deleted.
Consider the data set at the top of the page. We will calculate the 5% trimmed mean for this data
set. If 5% of the number of data points (i.e. 5% of 15) is 0.75, we would round up to 1 (round to
nearest whole number). Since we obtained a 1, we would drop one data point from the bottom
and one data point from the top of the data set.
2.8, 3.0, 3.0, 3.1, 3.2, 3.4, 3.4, 3.5, 3.5, 3.6, 3.7, 3.9, 4.0, 4.2, 16.7
Finally we work out the mean of the remaining thirteen data points.
3.0 + 3.0 + 3.1 + 3.2 + 3.4 + 3.4 + 3.5 + 3.5 + 3.6 + 3.7 + 3.9 + 4.0 + 4.2
13
= 3.5
5% trimmed mean =
Notice that this trimmed mean is equal to the median that we previously calculated. By
eliminating the effects of outliers, the median and resulting mean should be in close proximity.
The symbol, x(T ) , is used to represent a trimmed mean. The only problem with this symbol is
that it does not indicate whether we are dealing with a 5%, 10%, 15% or 20% trimmed mean.
Example 1
Twenty two runners of the 100 m dash were randomly selected from colleges and universities in
Canada. The time of each runner in the last competition was recorded. Of these runners, one
person had pulled a hamstring and another had tripped during their last competition. The times
in seconds are recorded below. Determine the mean, median, and 10% trimmed mean.
10.23 10.89 11.76 9.87
11.33 10.75 9.96
11.54 10.52 18.57 9.72
12.05 11.56 10.15
19.42 11.68 12.09 11.49 11.67 10.19 10.52 9.99
Answer:
10.83 + 10.89 + 11.76 + ... + 10.19 + 10.52 + 9.99
22
= 11.63
Mean =
Median: Rearrange the data points from smallest to largest. Since we are dealing with an
even number of data points (22), then the median is the mean of the two data points
that share the middle of the ordered list.
9.72, 9.87, 9.96, 9.99,…, 10.75, 10.89, 11.33, 11.49,…, 12.05, 12.09, 18.57, 19.42
Median =
NSSAL
©2011
10.89 + 11.33
= 11.11
2
29
Draft
C. D. Pilmer
10% Trimmed Mean
If 10% of the number of data points (i.e. 10% of 22) is 2.2, we would round down
to 2 (round to nearest whole number). We will now drop two data points from the
bottom and two data points from the top of the data set, and then work out the
mean of the remaining eighteen data points.
9.72, 9.87, 9.96, 9.99, 10.15,…, 11.76, 12.05, 12.09, 18.57, 19.42
9.96 + 9.99 + 10.15 + ... + 11.76 + 12.05 + 12.09
18
= 11.02
10% trimmed mean =
Mode
The mode of a set of data is the value in the set that occurs most frequently. For the following
data, the mode is 6 because it occurs more times than any other value.
{2, 3, 4, 4, 5, 6, 6, 6, 6, 7, 7, 7, 9, 10}
Mode = 6
Many textbooks and websites refer to the mode as a measure of central tendency; this is
incorrect. Although the mode is often around the center of the data set when the points are
arranged from smallest to largest, this is not always the case. Consider the data we previously
examined concerning Charlie's and Angela's Graduate Math learners.
Data for Charlie's Learners
Order the data points from smallest to largest, and identify the data point that occurs most
frequently.
69, 71, 73, 74, 77, 78, 78, 78, 79, 81, 82, 84, 86, 88, 91, 92, 95
Mode = 78
Data for Angela's Learners
Order the data points from smallest to largest, and identify the data point(s) that occurs most
frequently.
62, 63, 64, 65, 66, 68, 77, 78, 80, 93, 94, 95, 95, 98, 99, 99
The data points 95 and 99 occur the most frequently therefore we state that is data set is
bimodal.
Mode = 95 and 99
The mode for the Charlie's data is close to the center of the data set, however, the modes for
Angela's data is not near the center.
NSSAL
©2011
30
Draft
C. D. Pilmer
Questions
Please use the appropriate symbols ( x , µ , and x(T ) ) when answering these questions.
1. A study regarding the size of winter wolf packs in regions of the United States, Canada, and
Finland was conducted. The following data from 18 randomly selected packs was obtained.
2
3
15
8
7
8
2
4
13
7
3
7
10
7
5
4
2
4
(a) Are we dealing with a sample or a population?
_____________________
(b) Determine the mean, median, and mode.
(c) Why would the researchers not likely use a trimmed mean with this data set?
2. A local cab company has a fleet of nine cars. The company kept the records for the amount
of money each vehicle required for a one week period. The data is shown below.
$125 $157
$210
$139
$182
$167
$143
$150
$162
(a) Are we dealing with a sample or a population?
_____________________
(b) Are we dealing with a numerical or categorical data set?
_____________________
(c) Determine the mean, median, and mode.
NSSAL
©2011
31
Draft
C. D. Pilmer
3. A magazine conducted a survey where they wished to understand the average class size of
first year courses at a local community college. They randomly selected 17 first year classes
and obtained the following numbers.
23
37
36
40
39
115
28
25
23
32
27
16
15
31
27
34
(a) Are we dealing with a sample or a population?
41
____________________
(b) Determine the mean, median, mode, 5% trimmed mean, and 10% trimmed mean.
(c) Why is it appropriate to use trimmed means in this situation?
(d) If this data set was comprised of 78 data points and we wanted to calculate a 5% trimmed
mean, how many data points would be dropped from the bottom and top of the data set?
4. A new subdivision outside of Halifax was constructed over the last few years. Barb wanted
to know what the average value of the new homes was. She was not prepared to look at the
assessed values of all 218 new homes. Instead she randomly selected 24 homes and recorded
their assessed values. These values in thousands of dollars are shown below.
267
265
226
254
231
221
246
252
253
241
261
589
243
269
267
253
287
320
221
264
257
249
226
267
NSSAL
©2011
32
Draft
C. D. Pilmer
(a) Calculate the mean, median, mode, and 5% trimmed mean.
(b) Which of these measures is not influenced or less influenced by extremely high or low
data points?
(c) Would a histogram or a bar graph be used with this data set?
5. In gymnastics and diving, several judges score each athlete. The final score for the athlete is
calculated by removing the high and low scores and averaging the remainder. Why do you
think they use this trimmed mean scoring method in gymnastics and diving?
NSSAL
©2011
33
Draft
C. D. Pilmer
Box and Whisker Plots
Box and whisker plots, also called box plots, are a quick graphic approach for examining one or
more sets of data. It is named such because the middle portion is comprised on a rectangular box
which typically has a line (whisker) extending from the two ends of the box.
Whisker
15
20
Box
25
Whisker
30
35
40
The box and whisker plot provides us with five critical pieces of information regarding the data
that was used to construct it. (Refer to the diagram below.)
• We are supplied with the minimum value in our data set. In this case, that value is 17.
• We are supplied with the maximum value in our data set. In this case, that value is 36.
• We are supplied with the median (or middle) of the data set. In this case, the median is
26.
• We are supplied with the lower quartile (also called first quartile or Q1). This value is
found by working out the median of the numbers below the median of the entire set of
data. The lower quartile is the number that 25% of the data is below. In this case, the
lower quartile is 21.
• We are supplied with the upper quartile (also called third quartile or Q3). This value is
found by working out the median of the numbers above the median of the entire set of
data. The upper quartile is the number that 25% of the data is above. In this case, the
upper quartile is 30.
minimum lower median upper
value
quartile
quartile
15
20
25
30
maximum
value
35
40
Before we learn how to construct a box and whisker plot, we are going to look at a sample
question involving a real world context where we have to compare two plots.
NSSAL
©2011
34
Draft
C. D. Pilmer
Example 1
Two blood testing departments at different Nova Scotia hospitals recorded their patient wait
times in minutes. This data was used to construct the two box and whisker plots.
Department A
Department B
0
5
10
15
20
25
30
How do the wait times compare at these two blood testing departments?
Answer:
Although the minimum value for Department B is 2 minutes less than the minimum value for
Department A, and the lower quartile for Department B is 1 minute less than the lower
quartile of Department A, the overall results for Department A are better. The median or
Department A is slightly better, and the upper quartile and maximum value for Department A
are much better than those for Department B. Department A appears to deliver a more
consistent level of service in terms of wait times; that is why the box and whiskers are shorter
for Department A's plot. We can say that the wait times are clustered closer together for
Department A versus Department B. To explain this further, just look at the boxes for the
two plots. Based on the first box, we can see that middle 50% of Department A's patients are
served between 10 minutes and 16 minutes. Based on the second box, the middle 50% of
Department B's patients are, however, served between 9 minutes and 21 minutes; a much
longer time span. We can also conclude that generally patients had shorter wait times at
Department A.
Making a Box and Whisker Plot
It is a six step process to construct a box and whisker plot.
(i) Arrange the data points in order of size, from smallest to largest.
(ii) Identify the minimum value and maximum value.
(iii) Determine the median.
(iv) Find the lower quartile by finding the median of the numbers below, but not including, the
median of the entire set of numbers.
(v) Find the upper quartile by finding the median of the numbers above, but not including, the
median of the entire set of numbers.
(vi) Draw your box and whisker plot along a number line using the values you found in steps (ii)
through (v).
NSSAL
©2011
35
Draft
C. D. Pilmer
Example 2
Construct a box and whisker plot for the following data.
22, 4, 11, 24, 18, 9, 19, 21, 13
Answer:
(i) Arrange from smallest to largest
4, 9, 11, 13, 18, 19, 21, 22, 24
(ii) Minimum Value = 4, Maximum Value = 24
(iii) Find the median (i.e. middle value).
4, 9, 11, 13, 18, 19, 21, 22, 24
Median = 18
(iv) Find the lower quartile. This is done by taking the lower 50% of the data, not including
the median from step (iii), and finding the median of these data points.
4, 9, 11, 13
9 + 11
Lower Quartile = 10
= 10
2
(v) Find the upper quartile. This is done by taking the upper 50% of the data, not including
the median from step (iii), and finding the median of these data points.
19, 21, 22, 24
21 + 22
Upper Quartile = 21.5
= 21.5
2
(vi) Draw the plot along a number line.
5
10
15
20
25
Example 3
Display the following as a box and whisker plot.
10, 14, 21, 26, 16, 12, 14, 9, 17, 26
Answers:
(i) Arrange from smallest to largest.
9, 10, 12, 14, 14, 16, 17, 21, 26, 26
(ii) Minimum Value = 9, Maximum Value = 26
(iii) Find the median.
9, 10, 12, 14, 14, 16, 17, 21, 26, 26
14 + 16
Median = 15
= 15
2
(iv) Find the lower quartile using the lower 50% of the data, not including the median.
9, 10, 12, 14, 14
Lower Quartile = 12
(v) Find the upper quartile using the upper 50% of the data, not including the median.
16, 17, 21, 26, 26
Upper Quartile = 21
NSSAL
©2011
36
Draft
C. D. Pilmer
(vi) Draw plot along a number line.
5
10
15
20
25
Questions
1 Construct a box and whisker plot for each of the following sets of data.
(a) 30, 15, 6, 24, 19, 15, 17, 21, 20, 11, 9
Remember to start by reorganizing the data.
5
10
15
20
25
30
(b) 45, 46, 37, 52, 33, 34, 43, 43, 48, 50, 49, 43, 46, 40
Remember to start by reorganizing the data.
25
NSSAL
©2011
30
35
40
37
45
50
55
Draft
C. D. Pilmer
(c) 31, 26, 38, 25, 24, 29, 31, 37, 38, 30, 40, 27, 24, 24, 31, 26, 33
20
25
30
35
40
45
35
40
45
(d) 38, 37, 40, 28, 34, 36, 35, 41, 38, 35
20
25
30
2. A reaction time experiment is conducted in several adult education classrooms. In the
experiment one student releases a ruler and a second student tries to grasp it as quickly as
possible. The distance that the ruler drops is one way to measure the second student's
reaction time. For example, if Student A's ruler only drops 7 cm compared to Student B's
ruler that drops 12 cm, then we could say that Student A has a better reaction time.
NSSAL
©2011
38
Draft
C. D. Pilmer
(a) Each member of Mrs. Leck's math class participated in the experiment. The following
data was collected. Construct a box-and-whisker plot.
18
22
10
19
12
21
7
16
22
20
9
20
11
5
10
15
20
25
30
(b) Mr. Porter's class and Mr. Churchill's class participated in the same experiment. A boxand-whisker plot was constructed for both classes.
Mr. Porter's Class
Mr. Churchill's Class
5
10
15
20
25
30
How do the two classes compare in terms of reaction times?
(c) Mrs. Lowe's class and Mr. Vroom's class participated in the same experiment. The
following data was collected.
Mrs. Lowe's Class
9
17
6
12
15
20
10
17
13
19
20
10
Mr. Vroom's Class
16
20
23
10
23
18
6
21
17
23
15
Construct two box-and-whisker plots.
NSSAL
©2011
39
Draft
C. D. Pilmer
5
10
15
20
25
30
How do the two classes compare in terms of reaction times?
(d) Mrs. Burchill's class and Mr. Rhodenizer's class participated in the same experiment.
The following data was collected.
Mrs. Burchill's Class
16
7
12
5
21 13 16 10 18 11
8
19 14 11
Mr. Rhodenizer's Class
9
14 13 19
8
16
11
22
14
6
11
Construct two box-and-whisker plots.
5
10
15
20
25
30
How do the two classes compare in terms of reaction times?
NSSAL
©2011
40
Draft
C. D. Pilmer
Using Technology to Make Box and Whisker Plots
The TI-83 and TI-84 graphing calculators can draw box and whisker plots. This is particularly
useful when we have lots of data. In this example we are going to use two sets of data to create
two box and whisker plots at the same time.
First Data Set
5.8 3.9 11.0
4.5 7.2 6.0
9.3
6.2
5.3
4.7
4.5 14.5
10.2 3.2
6.1
8.0
16.1 7.1 12.7 6.9
5.2 15.9 7.8 13.2
4.7
6.7
Second Data Set
7.3 10.2 8.3
13.2 7.2 12.6
9.9
7.7
5.0
9.0
9.4
6.9
9.7
8.7
7.5
4.9
8.3
8.2
8.1
8.5
7.9
10.0
8.6
7.7
4.8
7.2
3.1
4.9
Procedure:
1. Enter the First Data Set in List 1 and the Second Data Set in List 2
STAT > EDIT > Edit > Enter first data set in L1 > Enter second data set in L2
2. Turn on the Plots
STAT PLOT > Select Plot 1 > Select On, Box and Whisker, and L1 > STAT PLOT
> Select Plot 2 > Select On, Box-and Whisker, and L2
3. Draw the Box-and-Whisker Plot
ZOOM > ZoomStat > TRACE > Move the right, left, up and down buttons to see the
different values on the box and whisker plots.
NSSAL
©2011
41
Draft
C. D. Pilmer
Questions
In the following questions you will be asked to draw histograms as well as box-and-whisker
plots. You are required to draw the histograms by hand and the box and whisker plots using
technology.
1. Mrs. Ross is coaching her daughter's junior high basketball team. She has three players to
choose from the bench. The statistics for each of the players is shown below. You are going
to use your knowledge of statistics to help Mrs. Ross in making a selection.
Tanya
8
4
20
22
25
14
23
24
2
10
23
25
16
2
Barb
22
6
12
18
18
12
25
14
13
20
8
20
18
16
Suzette
30 29
11
16
4
5
20
6
8
22
9
6
28
11
25
9
9
(a) Using technology, construct three box-and-whisker plots.
0
5
10
15
20
25
30
(b) Determine the mean score for each player.
(c) Draw three histograms for the three sets of data. Note that the classes will include the
first number but not the second. For example the class 0 to 5 includes 0, but not 5.
Tanya
Class
0 to 5
5 to 10
10 to 15
15 to 20
20 to 25
25 to 30
30 to 35
NSSAL
©2011
Frequency
Barb
Class
0 to 5
5 to 10
10 to 15
15 to 20
20 to 25
25 to 30
30 to 35
42
Frequency
Suzette
Class
0 to 5
5 to 10
10 to 15
15 to 20
20 to 25
25 to 30
30 to 35
Frequency
Draft
C. D. Pilmer
(d) Which player has two distinct clusters within their data?
__________________
(e) Who is the best player?
__________________
(f) Who is the most consistent player?
__________________
(g) What range of scores would be considered Tanya's top 25%?
__________________
(h) What range of scores would be considered Barb's bottom 25%?
__________________
(i) What range of scores would be considered Suzette's top 50%?
__________________
2. Mrs. Tuttle-Comeau is an assistant coach for her son's high school track and field team. At
the last track meet (Track Meet A) she gathered the following data regarding 30 sprinters in
the 100 m race. Each of these pieces of data represents the best time each of the high school
sprinters obtained during this meet.
11.0
12.2
11.5
12.5
10.6
12.2
12.1
12.8
11.0
11.2
13.0
11.6
12.2
12.2
10.9
12.7
11.2
12.0
11.4
13.2
10.7
13.7
12.2
11.5
10.9
16.2
11.1
12.9
11.9
12.2
(a) Determine the mean time.
(b) Construct a box and whisker plot for this data.
10
NSSAL
©2011
11
12
13
14
43
15
16
Draft
C. D. Pilmer
(c) Construct a histogram. Note that the classes will include the first number but not the
second. For example the class 10 to 11 includes 10, but not 11.
Class
10 to 11
11 to 12
12 to 13
13 to 14
14 to 15
15 to 16
16 to 17
Frequency
(d) Are there two distinct clusters within this data?
__________________
(e) What range of times would place an individual in the top 50% of the competitors?
(f) What range of times would place an individual in the bottom 25% of the competitors?
(g) What range of times would place an individual in the top 25% of the competitors?
(h) Here's a box-and-whisker plot for another track meet (Track Meet B). Which track meet,
A or B, resulted in a greater percentage of strong performances? How did you arrive at
this answer?
10
NSSAL
©2011
11
12
13
14
44
15
16
Draft
C. D. Pilmer
3. Body mass index (BMI) is a calculation that uses an individual's height and weight to
estimate how much body fat they have. In Canada a BMI is recorded in kg/m2 and then those
results are then matched with one of four categories designated by Health Canada. These
categories are:
• underweight (BMIs less than 18.5);
• normal weight (BMIs 18.5 to 24.9);
• overweight (BMIs 25 to 29.9), and
• obese (BMIs 30 and over).
The BMIs for adult learners from two different college classes were calculated and recorded.
Class A
29.3
27.3
24.3
23.5
27.2
28.6
20.2
24.6
27.3
29.4
21.8
25.2
27.9
28.5
26.8
23.1
28.4
26.9
22.9
28.1
26.7
22.5
Class B
30.2
23.6
21.4
18.8
17.2
24.2
28.6
19.6
20.9
32.7
26.8
23.8
20.7
18.5
30.8
31.4
21.8
22.5
17.8
18.3
Using technology, construct two box and whisker plots and record the results below.
15
20
25
30
How to the BMI's for the two classes compare?
NSSAL
©2011
45
Draft
C. D. Pilmer
Standard Deviation
Measures of central tendency (median and mode) do not give us any indication of how the data is
spread out. Consider the following two sets of data.
First Data Set: 13, 14, 15, 15, 15, 16, 17
Second Data Set: 10, 12, 13, 15, 17, 18, 20
The mean for both of these data sets is 15, however, the individual pieces of data in these sets are
considerably different. In the first set, the numbers range from 13 to 17, and clearly cluster
around the number 15. In the second set the numbers range from 10 to 20 and tend to be more
spread out around the mean. The dispersion is far greater in the second set, than in the first.
Standard deviation is one way of measuring dispersion. If the standard deviation is low, then
the data clusters around the mean. If the standard deviation is high, then the data is spread out
around the mean. Without getting into the actual calculations, the standard deviation for the first
data set is 1.20 and the standard deviation for the second data set is 3.30. The larger number
indicates greater dispersion.
Calculating Standard Deviation
Before we get to the calculations, we have to remind you of an important point and introduce two
formulas. In the unit introduction we stated that this unit would focus on populations, rather than
samples. A population is the set representing all measurements of interest to an investigator
while a sample is simply a subset of the measurements from the population chosen at random.
We learned that the mean is calculated by adding all the data values and then dividing up the
number of data values. This can be expressed using the following formula.
µ=
x1 + x 2 + x3 + ... + x n
n
where µ (mu) is the population mean
The formula for population standard deviation, σ (sigma), is shown below. You are not
expected to memorize this formula.
σ=
(x1 − µ )2 + (x2 − µ )2 + (x3 − µ )2 + ... + (xn − µ )2
n
This formula requires that you complete six steps.
Step 1: Find the mean; µ .
Step 2: Calculate the difference between each data value and the mean; xi − µ .
Step 3: Square those differences found in Step 2; ( xi − µ )
2
Step 4: Add the squared differences; ( x1 − µ ) + (x2 − µ ) + (x3 − µ ) + ... + ( xn − µ )
2
2
2
2
Step 5: Divide the sum from Step 4 by the number of data values.
Step 6: Square root the value from Step 5.
NSSAL
©2011
46
Draft
C. D. Pilmer
The easiest way to learn how to use this formula (i.e. complete the six steps) is to construct a
table where only small portions of the calculation are completed at any one time.
Example 1
Determine the standard deviation for the following set of data.
10, 12, 13, 15, 17, 18, 20
Answer:
Find the mean.
x1 + x 2 + x3 + ... + x n
n
10 + 12 + 13 + 15 + 17 + 18 + 20
µ=
7
µ = 15
µ=
Construct the table.
xi
xi − µ
(Step 2)
10
-5
12
-3
13
-2
15
0
17
2
18
3
20
5
(Step 1)
( x i − µ )2
(Step 3)
25
9
4
0
4
9
25
Sum = 76
(Step 4)
76
7
σ = 3.3
σ=
(Steps 5 and 6)
The population standard deviation is 3.3.
Example 2
Mrs. Gillis teaches math to adults. At the end of the year she examines the final marks for all of
her students who have completed the course. She wants to work out the standard deviation of
those marks.
87
72
91
Find the mean.
µ=
82
74
93
75
83
78
75
Answer:
NSSAL
©2011
x1 + x 2 + x3 + ... + x n
n
47
Draft
C. D. Pilmer
87 + 72 + 91 + 82 + 74 + 93 + 75 + 83 + 78 + 75
10
µ = 81
µ=
Construct the table.
xi
xi − µ
87
72
91
82
74
93
75
83
78
75
( x i − µ )2
6
-9
10
1
-7
12
-6
2
-3
-6
36
81
100
1
49
144
36
4
9
36
Sum = 496
496
10
σ = 7.04
σ=
The population standard deviation is 7.04.
Questions
1. Determine the standard deviation for the following data.
25
32
24
28
31
28
µ=
xi
NSSAL
©2011
xi − µ
( x i − µ )2
48
Draft
C. D. Pilmer
2. Determine the standard deviation for the following data.
3.7
4.3
5.0
4.6
4.0
4.7
3.9
4.2
µ=
( x i − µ )2
xi − µ
xi
3. Two data sets have been provided.
15
14
13
18
16
13
16
15
15
17
15
16
14
11
19
16
11
16
(a) Calculate the standard deviation for each data set.
µ=
µ=
xi
NSSAL
©2011
xi − µ
( x i − µ )2
xi
49
xi − µ
( x i − µ )2
Draft
C. D. Pilmer
(b) The standard deviations are different for the two data sets. What is this telling you?
4. Barb, a math instructor, recorded the height in centimetres of all of the male students in her
Level IV math courses. She obtained the following measurements.
181
173
184
183
190
180
186
176
185
(a) What is the median for this data?
(b) What is the mean for this data?
(c) Is Barb dealing with a categorical or numerical data set?
(d) Determine the standard deviation.
xi
NSSAL
©2011
50
Draft
C. D. Pilmer
(e) Another instructor at different campus also has 9 male learners in his Level IV Math
courses. He measured their heights. He found the mean to be 182 cm with a standard
deviation of 6.4 cm. Based on these results, what can you say about the heights of this
instructor’s male learners compared to Barb’s male learners?
(f) A third instructor at another campus also has 9 male learners in her Level IV Math
courses. She measured their heights. She found the mean to be 179 cm with a standard
deviation of 4.8 cm. Based on these results, what can you say about the heights of this
instructor’s male learners compared to Barb’s male learners?
5. Without attempting any calculations, match each standard deviation with the appropriate
histogram. Please note that all of the histograms are drawn at the same scale.
Standard Deviations:
(a) 0.69
(b) 1.40
(c) 3.34
(d) 3.62
Matches with _____
Matches with _____
Histograms:
(i)
Matches with _____
Matches with _____
6. Create two data sets the meet all of the following conditions.
• They have at least six pieces of data.
• They must have a mean of 10.
• They have standard deviations that are quite different.
NSSAL
©2011
51
Draft
C. D. Pilmer
Using Technology to Calculate Population Standard Deviation
In the last section we learned how to work out the population standard deviation ( σ ) using paper
and pencil. The TI graphing calculators can calculate this along with several other measures we
have been exposed to in this unit. Using such technology is particularly useful when we are
dealing with a large number of data points.
Example
Tylena was teaching an evening class comprised of 30 adult learners. She asked them all to
complete a series of thirty basic math problems. She recorded how long it took for each learner
to complete the task in minutes. The data is shown below.
40
60
(a)
(b)
(c)
(d)
46
56
68
44
51
53
42
58
55
60
48
45
52
52
38
55
49
46
56
51
50
40
35
50
54
64
50
45
Draw a histogram using technology. Use class widths of 5 starting at 35.
Determine the mean time.
Determine the standard deviation.
Determine the median.
Answers:
Step 1: Enter the Data in the Calculator
STAT > Edit > If data already exists in L1 then move the > Enter the data in L1
cursor up so L1 is highlighted, press
CLEAR, and move the cursor back down.
Step 2: Draw the Histogram
STATPLOT > Select Plot 1 > Turn on the plot, select histogram, Xlist > WINDOW
should be L1 and Freg should be 1.
> Set Xmin at 35, Xmax at 70, Xscl at 5 > GRAPH > TRACE > Use the right
Ymin at 0, Ymax at 10, Yscl at 1
and left arrows
Note: The Xmin on the Window setting is the starting point for the first class and the Xscl
sets the class width. In this case the first class is 35 - 40.
NSSAL
©2011
52
Draft
C. D. Pilmer
STAT > CALC > 1-Var Stats > Enter the List (typically L1) > ENTER
The calculator does not report the population mean ( µ ) however, as we previously learned,
the formula for sample mean and population mean are the same. The calculator reports the
sample mean x , but we know that we are actually dealing with a population mean of 50.4
minutes. We are also asked to determine the standard deviation, which is actually the
population standard deviation ( σ ). This calculator uses the
symbol σ x , rather than σ , to represent the population standard
deviation. Therefore our population standard deviation is 7.5
minutes. To find the median, scroll down using the down arrow
while still on the 1-Var Stats results until you find Med. The
median in this case is 50.5.
()
(b) population mean ( µ ) = 50.4 minutes
(c) population standard deviation ( σ ) = 7.5 minutes
(d) median = 50.5 minutes
Questions
1. Provincial governments keep records of the number of young offenders who are incarcerated
each year. The incarceration rates vary greatly from province to province. In 2006 Nova
Scotia reported an incarceration rate of 9.91. That means that 9.91 young persons out of
10 000 young persons was incarcerated. Below you will find the incarceration rates for the
provinces and territories for 2006. (Source: Statistics Canada)
Province
YT
NT
NU
BC
AB
Rate
8.57
46.12
20.49
4.45
7.18
Province
SK
MB
ON
QC
Rate
24.54
21.25
7.51
3.89
Province
NB
PE
NS
NL
Rate
10.20
7.21
9.91
11.93
(a) Are we dealing with a population or a sample? Explain.
(b) Using technology draw a histogram showing the distribution of incarceration rates. Use
class widths of 5 starting at 0.
(c) Determine the mean, median, and standard deviation.
NSSAL
©2011
53
Draft
C. D. Pilmer
(d) There is a substantial difference between the mean and median. Why is this so?
2. Below you will find a list of Prime Ministers of Canada since Confederation in 1867. We
have also been supplied with their age upon first taking office as PM.
Prime Minister (PM)
John A. MacDonald
Alexander Mackenzie
John Abbott
John Sparrow Thompson
Mackenzie Bowell
Charles Tupper
Wilfrid Laurier
Robert Borden
Arthur Meighen
William Lyon Mackenzie King
Richard Bennett
Louis St-Laurent
John Diefenbaker
Lester Pearson
Pierre Trudeau
Joe Clark
John Turner
Brian Mulroney
Kim Campbell
Jean Chretien
Paul Martin
Stephen Harper
First Term Starts
1867
1873
1891
1892
1894
1896
1896
1911
1920
1921
1930
1948
1957
1963
1968
1979
1984
1984
1993
1993
2003
2006
Age
52
51
70
48
70
74
54
57
46
47
60
66
61
65
48
39
55
45
46
59
65
46
(a) Are we dealing with a population or a sample? Explain.
(b) Using technology draw a histogram showing the distribution of ages for PMs first taking
office. Use class widths of 5 starting at 35.
(c) Determine the mean PM age for first taking office.
NSSAL
©2011
54
Draft
C. D. Pilmer
(d) Determine the standard deviation.
(e) Determine the median.
(f) What can you conclude based on the histogram and standard deviation?
3. Cholesterol is waxy, fat-like substance found in all cells of the body. Our bodies need it to
make hormones, vitamin D, and substances used in digestion. However, cholesterol,
specifically low density lipoprotein (LDL) cholesterol, in high amounts is dangerous to one's
health. The following chart looks at various cholesterol ranges and their classifications. The
units of measure are millimoles per litre (mmol/L).
LDL Cholesterol
Levels
Classification
below 2.6
desirable
from 2.6
to 3.3
near
optimal
from 3.4
to 4.1
borderline
from 4.2
to 4.9
high
above 4.9
too high
Dr. Gillis is looking through the records for all her male patients over the last year who are
between the ages of 50 and 60 years. They have all had blood work and she records all the
LDL cholesterol levels for these patients in the chart below.
4.1
5.2
3.6
2.9
3.4
2.7
5.1
5.3
2.4
2.6
2.5
2.8
2.5
3.0
3.5
4.6
3.8
4.9
4.8
3.3
4.4
3.2
2.4
3.0
2.3
3.7
4.2
3.7
3.3
3.4
(a) Using technology draw a histogram showing the distribution of LDL cholesterol levels.
Use class widths of 0.8 starting at 1.8.
(b) Determine the mean LDL cholesterol levels for Dr. Gillis' male patients between the ages
of 50 and 60 years.
NSSAL
©2011
55
Draft
C. D. Pilmer
(c) Determine the standard deviation.
(d) Determine the median.
(e) What can you conclude based on the histogram and standard deviation?
NSSAL
©2011
56
Draft
C. D. Pilmer
Distributions
A frequency polygon is the shape that is formed when midpoints of the tops of the bars on a
histogram are joined by straight lines.
In this case, the frequency polygon forms a bell-shaped curve that is associated with a population
that follows a normal distribution. Many variables observed in nature, including heights,
weights, and reaction times, follow normal distributions. Consider the heights of female students
at college. There are a few women who are less than 5 feet tall, a few who are taller than 6 feet,
but the majority of the women are probably between 5’3” and 5’8”. We would expect a normal
distribution for the heights of women attending college.
Let’s consider a population that results in a normal distribution. The normal curve will be
centered about population mean ( µ ). The standard deviation ( σ ) determines the extent to
which the curve spreads out. If we
look at the two normal
distributions supplied below, we
can see that both distributions are
A
centered around the same value,
65. That means that the mean for
both of these populations is 65.
The standard deviations, although
not supplied, are not the same.
The standard deviation for normal
distribution A must be lower than
B
that for distribution B because the
curve is narrowing meaning that
the data points are more clustered
around the mean.
Please note that the horizontal axis is labeled x. This indicates that we are looking at the
distribution of the individual data points denoted by the symbol x.
NSSAL
©2011
57
Draft
C. D. Pilmer
Do not assume that we have to have a perfectly symmetrical bellshaped distribution to have a normal distribution. The histogram on the
right would create a frequency polygon which is almost symmetrical,
but we would still say that we are dealing with a normal distribution.
For this course, most of our time will be spent examining situations that follow normal
distributions. However, it is important to understand that other types of distributions exist.
These other types are shown below. A uniform distribution occurs when every class has equal
frequency. A skewed distribution occurs when one tail is much larger than the other tail. A
bimodal distribution occurs when two classes with the largest frequencies are separated by at
least one class.
Uniform
Distribution
Skewed Left
Distribution
Skewed Right
Distribution
Bimodal
Distribution
Question
1. Based on the situation, what type of distribution (normal, uniform, bimodal,…) would you
likely obtain?
Distribution Type
(a) You randomly select 100 students at an elementary school and
each must report their grade level. There are two classes at each
grade level and between 22 to 26 students in each class. What
would the distribution of grade levels look like?
(b)
Two groups of athletes are running the 100 m dash. One group
is comprised of males 12 years of age or younger, and the other
is comprised of males between 16 and 20 years of age. You
randomly select 150 athletes and ask them to report their time
for the 100 m dash. What would the distributions of times look
like?
(c)
Mrs. Chopra teaches one of the three grade six classes.
Normally the administration tries to distribute the strongest math
students evenly between the three classes. That did not occur
this year and now Mrs. Chopra has a large portion of strong
math students in her class. If her class was asked to complete a
fair math test, what would the distribution of marks look like?
NSSAL
©2011
58
Draft
C. D. Pilmer
Distribution Type
(d)
You randomly select 100 females between the ages of 20 and 29
and record their heights. What would the distribution of heights
look like?
(e)
A college instructor had what he described as an average class of
students. From his perspective there were a few weak students,
a few strong students but the majority of the students were of
average ability. He gave the class an extremely challenging test
where only the strongest students could maintain good marks,
ranging from 75% to 95%. The rest of the students did poorly
where many resoundingly failed the test. What would the
distribution of marks for this test look like?
(f)
You spin the following spinner 300
times recording how many times you
obtain each of the results (1, 2, 3, 4).
What would the distribution of results
look like?
2
1
3
4
(g)
A nursing student working at the children's hospital looks at the
birth weights of all babies born in the hospital during June, July,
and August. What would the distribution of birth weights look
like?
(h)
Eastern American Toad, common in Nova Scotia, enter the
world as small dark polliwogs, become miniature toads, and
finally mature to be adult toads. What would the distribution of
ages for Eastern American Toads of all forms (polliwogs to
adults) look like?
(i)
A personal trainer at a coed gym recorded the maximum
resistance people would set on a particular piece of exercise
equipment over a one month period. What would the
distribution of resistance settings look like?
(j)
A kinesiologist is recording the grip strength of 250 randomly
selected males between the ages of 25 and 35. What would the
distribution of grip strengths look this?
NSSAL
©2011
59
Draft
C. D. Pilmer
Normal Distributions and the 68-95-99.7 Rule
In the last section we learned about symmetrical bell-shaped distributions called normal
distributions. We also mentioned that the normal curve will be centered about population mean
( µ ), and that the standard deviation ( σ ) determines the extent to which the curve spreads out.
Lower standard deviations result in taller narrower curves. There is something else that is
important to learn about normal curve. It is the 68-95-99.7 rule.
According to the 68-95-99.7 rule, in any bell-shaped distribution, the following holds true.
• Approximately 68% of the data points will lie within one standard deviation of the mean.
• Approximately 95% of the data points will lie within two standard deviations of the
mean.
• Approximately 99.7% of the data points will lie within three standard deviations of the
mean.
Let's describe this rule again using the proper symbols that we use for populations. According to
the 68-95-99.7 rule, in any bell-shaped distribution of a population, the following holds true.
• Approximately 68% of the data points are between µ − σ and µ + σ .
• Approximately 95% of the data points are between µ − 2σ and µ + 2σ .
• Approximately 99.7% of the data points are between µ − 3σ and µ + 3σ .
Let’s see how this rule applies to a population with a normal distribution where the population
mean ( µ ) is 40 and the population standard deviation ( σ ) is 10. This distribution is shown
below. Notice that it is centered about the mean.
For this population we would expect that approximately 68% of the data points would be
between 30 ( µ − σ or 40-10) and 50 ( µ + σ or 40+10). We would expect that approximately
95% of the data points would be between 20 ( µ − 2σ ) and 60 ( µ + 2σ ). Finally we would
expect that approximately 99.7% of the data points to be between 10 ( µ − 3σ ) and 70 ( µ + 3σ ).
NSSAL
©2011
60
Draft
C. D. Pilmer
Let's take what we just learned and expand upon it. Consider the following statements for a
normal population.
•
•
If 68% of the data points are found between µ − σ and µ + σ , then 34% of the data
points would be between µ and µ + σ .
If 68% of the data points are found between µ − σ and µ + σ , then 34% of the data
points would be between µ − σ and µ .
68%
34%
34%
µ −σ
µ +σ
µ
x
If we extend this line of thinking, we can state the following.
•
•
•
•
If 95% of the data points are found between µ − 2σ and µ + 2σ , then 47.5% of the data
points would be between µ and µ + 2σ .
If 95% of the data points are found between µ − 2σ and µ + 2σ , then 47.5% of the data
points would be between µ − 2σ and µ .
If 99.7% of the data points are found between µ − 3σ and µ + 3σ , then 49.85% of the
data points would be between µ and µ + 3σ .
If 99.7% of the data points are found between µ − 3σ and µ + 3σ , then 49.85% of the
data points would be between µ − 3σ and µ .
Hopefully it makes sense that 50% of the data points should be above the mean, and 50% of the
data points must be below the mean.
It should also be noted that these values (64%, 95%, 99.7%, 34%, 47.5%,…) can be expressed as
probabilities. Probability is the chance that something will happen - how likely it is that some
event will occur. Referring back to our normal distribution, there is a 0.64 probability that a
randomly selected data point can be found within one standard deviation of the mean (i.e. from
µ − σ to µ + σ ).
NSSAL
©2011
61
Draft
C. D. Pilmer
Example 1
For a normal population with a mean of 15 and standard deviation of 2, what percentage of the
data points would measure
(a) between 15 and 19?
(b) between 13 and 21?
(c) between 11 and 13?
Answers:
(a) This question could be restated. It would read, “What percentage of the data points
would be between µ and µ + 2σ ?” (Reason: 15 is µ , and 19 is 2 σ to the right of µ )
47.5%
15
µ
x
19
µ + 2σ
Therefore approximately 47.5% of the data points will be between 15 and 19.
(b) This question could be restated. It would read, “What percentage of the data points
would be between µ − σ and µ + 3σ ?”
34%
13
µ −σ
49.85%
15
µ
21
µ + 3σ
x
Therefore approximately 83.85% (34% + 49.85%) of the data points will be between 13
and 21.
NSSAL
©2011
62
Draft
C. D. Pilmer
(c) This question could be restated. It would read, “What percentage of the data points
would be between µ − 2σ and µ − σ ?”
34%
47.5%
11
13
µ
−σ
µ − 2σ
15
µ
Therefore approximately 13.5% (47.5%-34%) of the data points will be between 11 and
13.
Example 2
The quality control officer at a cereal factory knows that the mean weight for the cereal in their
regular size box is 461 grams with a standard deviation of 6 grams.
(a) What is the probability of randomly choosing a cereal box off the assembly line that weighs
between 461 grams and 467 grams?
(b) What is the probability of randomly choosing a cereal box off the assembly line that weighs
between 455 grams and 479 grams?
(c) What is the probability of randomly choosing a cereal box off the assembly line that weighs
between 443 grams and 449 grams?
(d) What is the probability of randomly choosing a cereal box off the assembly line that weighs
more than 455 grams?
(e) If we randomly chose 800 boxes, how many would we expect to be between 449 grams and
473 grams?
Answers:
(a) Attack this logically.
• We were told that µ is 461, and that σ is 6.
• We were told that we are dealing with boxes between 461 and 467 grams. Notice
that 467 is 6 (or one standard deviations) away from 461 ( µ ). That means that 467 is
actually µ + σ .
• Let's find the percentage of data points that are between µ + σ and µ . The answer is
34%.
• Now convert that percentage to a probability. The probability is 0.34.
NSSAL
©2011
63
Draft
C. D. Pilmer
(b) Think logically.
• 455 is one standard deviation to the left of the mean, and therefore can be expressed
as µ − σ .
• 479 is three standard deviations to the right of the mean and therefore can be
expressed as µ + 3σ .
• We actually need to find the percentage of boxes that are between µ − σ and
µ + 3σ .
• We know that 34% of the data points are between µ − σ and µ . We also know that
49.85% of the data points are between µ and µ + 3σ . Therefore we can conclude
that 83.85% (34% + 49.85%) of the data points are between µ − σ and µ + 3σ .
• Convert 83.85% to a probability of 0.8385. Based on this number, we can say that
there is a very high chance that a randomly selected cereal box will have weight
between 455 g and 479 g.
(c) Think logically.
• 443 is three standard deviations to the left of the mean, and therefore can be
expressed as µ − 3σ .
• 449 is two standard deviations to the left of the mean, and therefore can be expressed
as µ − 2σ .
• We actually need to find the percentage of boxes that are between µ − 3σ and
µ − 2σ .
• We know that 49.85% of the data points are between µ − 3σ and µ . We also know
that 47.5% of the data points are between µ − 2σ and µ . Therefore we can conclude
that 2.35% (49.85% - 47.5%) of the data points are between µ − 3σ and µ − 2σ .
• Convert 2.35% to a probability of 0.0235. Based on this number, we can say that
there is a very slight chance that a randomly selected cereal box will have weight
between 443 g and 449 g.
(d) Think logically.
• 34% of the data points are between 455 ( µ − σ ) and 461 ( µ ).
• 50% of the data points are greater than 461 ( µ )
• Therefore 84% of the data is greater than 455. This gives us a probability of 0.84
(e) The number 449 is µ − 2σ . The number 473 is µ + 2σ . We know that 95% of the data
points should be two standard deviations to the left and right of the mean. As a
probability, it is expressed as 0.95.
0.95 × 800 = 760
Of the 800 randomly selected cereal boxes, we would expect 760 boxes to be between
449 g and 473 g.
NSSAL
©2011
64
Draft
C. D. Pilmer
Questions
1. Use the 68-95-99.7 rule on a distribution of data points with a population mean of 230 and a
population standard deviation of 15 to answer the following questions. You may wish to
draw and label a normal distribution curve to assist you with each of these questions. This is
what we did in Example 1.
(a) What percentage of the data points would measure between 215 and 245?
(b) What percentage of the data points would measure between 230 and 260?
(c) What percentage of the data points would measure between 215 and 230?
(d) What percentage of the data points would measure between 185 and 230?
(e) What percentage of the data points would measure between 200 and 245?
(f) What percentage of the data points would measure between 215 and 275?
(g) What is the probability that a randomly selected data point would be between 185 and
260?
(h) What is the probability that a randomly selected data point would be between 245 and
260?
NSSAL
©2011
65
Draft
C. D. Pilmer
(i) What is the probability that a randomly selected data point would be between 185 and
200?
(j) What is the probability that a randomly selected data point would be between 245 and
275?
(k) What is the probability that a randomly selected data point would be less 245?
(l) What is the probability that a randomly selected data point is greater than 200?
(m) What is the probability that a randomly selected data point is less than 215?
2. A company monitored the production of 2000 bagels for a one day period. They determined
that the mean weight (population mean) of the bagels was 104 grams with a standard
deviation of 3 grams. Assume the distribution of bagel weights is bell-shaped. You may
choose to draw and label a normal distribution curve to assist you with each of these
questions.
(a) How many of the 2000 bagels were within 9 grams of the mean?
(b) How many of the 2000 bagels were within 3 grams of the mean?
NSSAL
©2011
66
Draft
C. D. Pilmer
(c) How many of the 2000 bagels are between 98 grams and 104 grams?
(d) How many of the 2000 bagels are between 101 grams and 110 grams?
(e) How many of the 2000 bagels are between 107 grams and 110 grams?
(f) How many of the 2000 bagels are between 98 grams and 110 grams?
(g) How many of the 2000 bagels are between 95 grams and 101 grams?
(h) How many of the 2000 bagels are between 98 grams and 113 grams?
(i) How many of the 2000 bagels are between 95 grams and 104 grams?
(j) How many of the 2000 bagels are between 110 grams and 113 grams?
(k) How many of the 2000 bagels are less than 98 grams?
NSSAL
©2011
67
Draft
C. D. Pilmer
Z-Score
In the last section, the problems used numbers that were always 1, 2, or 3 standard deviations
from the mean. For example in question 1 (e), we were told that the population mean was 230
and the population standard deviation was 15, and then we were asked to find percentage of the
data points that were between 200 and 245? The number 200 is exactly two standard deviations
below the mean, while the number 245 is exactly one standard deviation above the mean. What
if we were asked to find the percentage of data points that would be between 197 and 251?
These two values are not 1, 2, or 3 standard deviations from the mean; rather, they are located
some fractional amount of the standard deviation away from the mean. Because of this, the
technique that we learned in the previous section will not work. We need another approach; we
are going to use z-scores.
In statistics, the z-score (also called the standard score) indicates how many standard deviations
a data point is above or below the mean. It is found using the following formula.
z=
x−µ
σ
where x is the data point (also called an observation or raw value), µ is
the population mean, and σ is the population standard deviation.
Example 1
A population, which results in a bell-shaped distribution, has a mean of 26.1 and standard
deviation of 2.3. How many standard deviations from the mean is each of these data points?
(a) 28.9
(b) 24.7
Answers:
(a)
z=
x−µ
σ
(b)
28.9 − 26.1
2.3
z = 1.22
z=
x−µ
σ
24.7 − 26.1
2.3
z = −0.61
z=
z=
The data point 28.6 is 1.22
standard deviations from the
mean of 26.1. The z-score is
positive because the data point is
larger than the mean (i.e. to the
right of the mean).
The data point 24.7 is 0.61
standard deviations from the
mean of 26.1. The z-score is
negative because the data point is
smaller than the mean (i.e. to the
left of the mean).
What we have just learned regarding z-scores does not help us answer questions like the one
introduced at the beginning of this section.
Original Question:
We have a population, which results in a bell-shaped distribution, has a mean of 230 and
standard deviation of 15. What percentage of data points that would be between 197 and
251?
NSSAL
©2011
68
Draft
C. D. Pilmer
Using the z-score we can now determine how many standard deviations the data points 197 and
251 are away from the mean, 230. This, however, does not tell us the percentage of data points
that are between 197 and 251. We need to learn about area under the standard normal curve.
The mathematics necessary to understand how one determines the area under the standard
normal curve is well beyond the scope of this course. At this level all we need to know is that
the standard normal curve is centered at 0 (i.e. has a mean of 0), has a standard deviation of 1,
that the total area under this curve is equal to 1, and that area is equal to the probability that a
randomly selected data point falls within that interval. We use the standard normal curve to
understand other populations that are normally distributed, even though these populations have
different means and standard deviations.
Standard Normal Curve: µ = 0 , σ = 1 , Area Under the Complete Curve = 1
If we look at the standard normal curve on the
right, we notice that we have gone 2 standard
deviations to the left and right of the mean
(represented by the -2.0 and 2.0). The area under
the curve within this interval (i.e. the shaded
region on the diagram) is 0.9544. This area is
equivalent to probability that a randomly selected
data point falls within that interval. This makes
sense when we remember that we had already
learned that there is a 95% chance that a randomly
selected data point is within two standard
deviations of the mean.
If we look at the next diagram, we have gone 1.2
standard deviations to the left of the mean and 1.6
standard deviations to the right of our mean on the
standard normal curve. In this case, the area
under the curve in that interval is 0.8301. That
means that there is a 0.8301 probability that a
randomly data point will fall within that interval.
Area = 0.9544
Area = 0.8301
In the last two diagrams, we supplied the areas under the curves in the defined intervals but how
do we determine these areas when they are not supplied? We have to use a chart and a procedure
that is identical to what we used in the last section. The chart allows us use to determine
areas/probabilities from a specific standard deviation to the mean. The easy way to show how to
use the chart is through worked examples.
NSSAL
©2011
69
Draft
C. D. Pilmer
Example 2
A population, which results in a bell-shaped distribution, has a mean of 250 and standard
deviation of 30. What is the probability that a measurement from a randomly selected item is
between 250 and 272?
Answer:
Start by considering the interval from 250 to 272. The 250 is equivalent to the population
mean ( µ ). The 272 is 22 units to the right of the mean; we need to determine how many
standard deviations this value (272) is away from the mean. This is when we use z-scores.
z=
x−µ
σ
272 − 250
30
z = 0.73
z=
We can now rephrase the original question. We are really trying to find the probability that a
randomly selected data point is between µ and µ + 0.73σ .
Now let's put this in the context of our standard
normal curve, which is drawn on the right.
Remember on our standard normal curve, the
mean is 0 and the standard deviation is 1. We
are going to find the area under this curve from
0 ( µ ) to 0.73 ( µ + 0.73σ ). The area under this
curve in this interval has been shaded on our
diagram. We can use our knowledge of the
standard normal curve to understand other populations that are normally distributed, even
though these populations have different means and standard deviations. The area under our
standard normal curve from 0 to 0.73 is equivalent to the area under our original normal
distribution from µ (250) to µ + 0.73σ (272).
To find the area under our standard normal curve, we go to the Areas Under the Standard
Normal Curve chart found in the back of this resource (page 96). We have reproduced a
portion of this chart below. We work with the row labeled 0.7 and the column labeled 0.03
(Reason: 0.7 + 0.03 = 0.73). We find that this row and column intersect at 0.2673.


z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0
0.0000
0.0040
0.0080
0.0120
0.0160
0.0199
0.0239
0.0279
0.0319
0.0359
0.1
0.0398
0.0438
0.0478
0.0517
0.0557
0.0596
0.0636
0.0675
0.0714
0.0753
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0.7
0.2580
0.2611
0.2642
0.2673
0.2704
0.2734
0.2764
0.2794
0.2823
0.2852
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
NSSAL
©2011
70
Draft
C. D. Pilmer
That means that the area under the standard normal curve between 0 ( µ ) and 0.73
( µ + 0.73σ ) is 0.2673. In terms of our original normally distribution, it means that there is a
0.2673 probability that a randomly selected data point will be between 250 ( µ ) and 272
( µ + 0.73σ ).
Example 3
Data for a population was normally distributed with a mean of 167 and standard deviation of 18.
What is the probability that a randomly selected data point from this population is between 144
and 181?
Answer:
This question is more challenging than the last one because neither of the values supplied
(144 or 181) is the population mean. The lower limit, 144, is below the mean, while the
upper limit, 181, is above the mean.
We need to find out how much above and below these two values are but in terms of
standard deviations. That means we need to work out the z-scores.
z=
x−µ
z=
σ
x−µ
σ
181 − 167
18
z = 0.78
144 − 167
18
z = −1.28
z=
z=
Our question can now be rephrased as "What is the probability that a randomly selected data
point from this population is between µ − 1.28σ and µ + 0.78σ ?"
To tackle this, we need to work with the
standard normal curve and have to break the
question into parts. We start by finding the
area/probability on our standard normal curve
from -1.28 ( µ − 1.28σ ) to 0 ( µ ), then find the
area/probability from 0 ( µ ) to 0.78
( µ + 0.78σ ), and finally we add the two
areas/probabilities.
Area/Probability between µ − 1.28σ and µ
(Find 1.28 on the chart.)


z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0
0.0000
0.0040
0.0080
0.0120
0.0160
0.0199
0.0239
0.0279
0.0319
0.0359
0.1
0.0398
0.0438
0.0478
0.0517
0.0557
0.0596
0.0636
0.0675
0.0714
0.0753
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1.2
0.3849
0.3869
0.3888
0.3907
0.3925
0.3944
0.3962
0.3980
0.3997
0.4015
NSSAL
©2011
71
Draft
C. D. Pilmer
Area/Probability between µ and µ + 0.78σ
(Find 0.78 on the chart.)


z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0
0.0000
0.0040
0.0080
0.0120
0.0160
0.0199
0.0239
0.0279
0.0319
0.0359
0.1
0.0398
0.0438
0.0478
0.0517
0.0557
0.0596
0.0636
0.0675
0.0714
0.0753
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0.7
0.2580
0.2611
0.2642
0.2673
0.2704
0.2734
0.2764
0.2794
0.2823
0.2852
0.3997 + 0.2823 = 0.6820
For the standard normal curve, the area from -1.28 to 0.78 is 0.6820.
In terms of our original normal distribution, there is a 0.6820 probability that a randomly
selected data point from this population is between 144 ( µ − 1.28σ ) and 181 ( µ + 0.78σ ).
The Different Cases
The biggest struggle with these questions is the determination of the areas since the chart only
shows areas from 0 ( µ ) to the specified z value. There are five different cases we may
encounter, two of which we have already examined in Examples 2 and 3.
Case 1
This occurs when we need to find the area/probability between a
given z value and 0 ( µ ). With these questions we simply use the
chart once. This is what we did in Example 2.
Case 2
This occurs when we need to find the area/probability between two given z values that are on
either side of 0 ( µ ). With these questions, we find two separate area/probabilities and add them
together. This is what we did in Example 3.
=
NSSAL
©2011
+
72
Draft
C. D. Pilmer
Case 3
This occurs when we need to find the area/probability between two given z values that are on
same side of 0 ( µ ). With these questions, we find two separate areas/probabilities and subtract
the smaller from the larger.
=
-
Case 4
This occurs when we need to find the area/probability to the right of a positive z value, or to the
left of a negative z value. With these questions, we take the area to the right (or left) of 0 (This
area is always equal to 0.5 because it is half the area of our standard normal curve) and subtract
the area from 0 to the z value.
=
Area always equals 0.5
Case 5
This occurs when we need to find the area/probability to the right of a negative z value, or to the
left of a positive z value. With these questions, we take the area to the right (or left) of 0 (This
area is always equal to 0.5 because it is half the area of our standard normal curve) and add the
area from 0 to the z value.
=
+
Area always equals 0.5
NSSAL
©2011
73
Draft
C. D. Pilmer
Example 4
Porphyrin is a pigment in blood protoplasm. In the population of healthy adults, the
concentration of porphyrin is normally distributed with mean µ = 38 mg/dL and standard
deviation σ = 12 mg/dL.
(a) What is the probability that a randomly selected healthy adult would have a prophyrin
concentration between 43 mg/dL and 54 mg/dL?
(b) What is the probability that a randomly selected healthy adult would have a prophyrin
concentration less than 47 mg/dL?
Answers:
(a) Both 43 and 54 are above the mean (38). We need to find out how much above these two
values are but in terms of standard deviations. That means we need to determine the zscores.
x−µ
x−µ
z=
z=
σ
σ
43 − 38
z=
12
z = 0.42
54 − 38
z=
12
z = 1.33
Based on this work the question can be rephrased. "What is the probability that a
randomly selected healthy adult would have a prophyrin concentration between
µ + 0.42σ and µ + 1.33σ ?"
Now let's put this in the context of our standard normal
curve. We need to find the area under the curve (which
is equivalent to the probability) from 0.42 to 1.33.
Notice that both of these values are to the right of 0 ( µ
on our standard normal curve). That means that we are
dealing with Case 3.
•
Find the area/probability from 0 to 0.42. From the chart we find that the answer
is 0.1628.
•
Find the area/probability from 0 to 1.33. From the chart we find that the answer
is 0.4082.
•
Now subtract the two areas/probabilities.
0.4082 - 0.1628 = 0.2454
There is a 0.2454 probability that a randomly selected healthy adult would have a
prophyrin concentration between 43 mg/dL and 54 mg/dL?
(b) We start by finding how much 47 is above the mean (38) in terms of standard deviations.
z=
x−µ
σ
47 − 38
12
z = 0.75
z=
NSSAL
©2011
74
Draft
C. D. Pilmer
The question can now be rephrased. "What is the probability that a randomly selected
healthy adult would have a prophyrin concentration less than µ + 0.75σ ?"
Let's put this in the context of our standard normal
curve. We need to find the area under the curve (which
is equivalent to the probability) below 0.75. Notice we
are trying to find the area under the curve to the left of a
positive z value; this is Case 5.
•
Find the area/probability less than 0. It is
always 0.5 because we are dealing with exactly half of our standard normal curve.
•
Find the area/probability from 0 to 0.75. From the chart we find that the answer
is 0.2734.
•
Now add the two areas/probabilities.
0.5 + 0.2734 = 0.7734
There is a 0.7734 probability that a randomly selected healthy adult would have a
prophyrin concentration less than 47 mg/dL?
Checking Your Answers on the TI-83 or TI-84 (Optional)
The normalcdf command (normal cumulative density function command) allows one to
determine the probability that a data point will fall within an interval for a known normal
distribution. This command is found using the DISTR button.
normalcdf(lower limit, upper limit, mean, standard deviation)
In part (a) of example 4, we wanted to find the probability that a
randomly selected healthy adult would have a prophyrin concentration
between 43 mg/dL and 54 mg/dL? To do this we enter normalcdf(43,
54, 38, 12) into the calculator. It generates the probability 0.2472. This
is very close to the 0.2454 we worked out by hand. The calculator
actually produced a more accurate answer because we had to round off
our z-scores to two decimal points when working things out by hand.
For questions where there is only one endpoint, it is recommended that
one go 5 (or more) standard deviations above or below the mean. This
happened in part (b) of example 4 where we had to find the probability
that a randomly selected healthy adult would have a prophyrin
concentration less than 47 mg/dL. Five standard deviation below the
mean is -22 (38 - 5 × 12). We would enter normalcdf(-22, 47, 38, 12)
into the calculator. It generates the probability 0.7734.
NSSAL
©2011
75
Draft
C. D. Pilmer
Questions
1. A population, which results in a bell-shaped distribution, has a mean of 42.7 and standard
deviation of 7.9. How many standard deviations from the mean is each of these data points?
(a) 37.6
(b) 53.2
2. It may surprise you but professors at universities do not spend all their time teaching
graduate and undergraduate students. A significant amount of time is spent on research. So
what percentage of time do professors spend teaching and on teaching-related activities?
The NEA Almanac of Higher Education reported that the mean percentage of time spent on
teaching activities is about 51% with a standard deviation of 25%. If we are dealing with a
bell-shaped distribution, determine the z-scores corresponding to the following professors'
percentage of time devoted to teaching activities.
(a) Dr. B. Pletner, 68%
(b) Dr. R. Dawson, 43%
3. An NSCC instructor examined the results from a common exam offered at all campuses. She
discovered that the marks were normally distributed. She calculated the z-scores for her six
learners. These are shown below.
Tylena, 0.93
Hamid, -1.13
Meera, -0.42
Beverly, 0.00
Elliott, 1.27
Marcus, 0.58
(a) Which of these learners scored above the mean?
(b) Which of these learners scored below the mean?
(c) Which of the learner scored on the mean?
(d) Which of her learners obtained the best mark? Based on the information provided, can
you determine the mark?
(e) Can you tell if every one of her learners passed the test? Explain.
NSSAL
©2011
76
Draft
C. D. Pilmer
4. The concentration of red blood cells in whole blood is measured in millions per cubic
millimetre. Within the population of healthy females, the red blood cell concentration is
normally distributed with a mean of 4.8 million/mm3 and a standard deviation of 0.3
million/mm3.
(Hint: Each of these five questions corresponds to the five cases we described earlier for area
under the standard normal curve. You may wish to draw the standard normal curve as was
done in the worked examples to assist you with each part of this question.)
(a) What is the probability that a randomly selected healthy female would have a red blood
cell concentration between 4.8 and 5.3 million/mm3?
(b) What is the probability that a randomly selected healthy female would have a red blood
cell concentration between 4.4 and 5.0 million/mm3?
(c) What is the probability that a randomly selected healthy female would have a red blood
cell concentration between 5.2 and 5.5 million/mm3?
(d) What is the probability that a randomly selected healthy female would have a red blood
cell concentration less than 4.6 million/mm3?
(e) What is the probability that a randomly selected healthy female would have a red blood
cell concentration greater than 4.3 million/mm3?
NSSAL
©2011
77
Draft
C. D. Pilmer
5. A community examined the response times of their police department over a three year
period. They discovered that the distribution of response times was bell-shaped and that the
mean response time was 8.2 minutes with a standard deviation of 1.9 minutes. For a
randomly received emergency call to the police department in that three year period, what is
the likelihood that the response time will be:
(a) greater than 8.2 minutes?
(b) between 6.0 and 8.2 minutes?
(c) less than 9.3 minutes?
(d) between 6.4 and 7.7 minutes?
(e) between 4.2 and 8.8 minutes?
(f) greater than 9.7 minutes?
NSSAL
©2011
78
Draft
C. D. Pilmer
6. A consumer magazine reports that the average life of a refrigerator before replacement is 14
years with a standard deviation of 2.5 years. Assume that the distribution of refrigeration life
spans is approximately normal. What is the probability that someone will keep a
refrigerator:
(a) between 11 years and 16 years?
(b) greater than 15 years?
(c) less than 14 years?
(d) between 10 years and 13 years?
(e) greater than 12 years?
(f) between 8 years and 14 years?
NSSAL
©2011
79
Draft
C. D. Pilmer
Growth Charts
One of the most common uses of standard deviations is in the production of growth charts used
in the health sciences. These charts show the wide range of values for a particular measurement
(e.g. weight, height, head circumference,…) for different ages. Normally we would use
standard deviation to describe the spread of these measurements, but many growth charts use
percentiles. Although the charts use percentiles, it is important to note that standard deviations
were used in the construction of these percentiles.
Each standard deviation represents a fixed
percentile. For example −3 σ is the 0.13th
percentile, −2 σ the 2.28th percentile, −1 σ
the 15.87th percentile, 0 σ the 50th percentile,
+1 σ the 84.13th percentile, +2 σ the 97.72th
percentile, and +3 σ the 99.87th percentile.
You are not expected to know these values.
Growth charts don't use percentiles like 0.13,
2.28 or 15.87, rather they use whole numbers
like 3, 5, 10, 25, and so on.
Source: Wikimedia Commons, Author: Mwtoews
Percentiles rank the position of an individual by indicating what percent of the reference
population the individual would equal or exceed. For example, on the weight growth charts, a
30-month-old boy whose weight is at the 25th percentile, weighs the same or more than 25
percent of the reference population of 30-month-old boys, and weighs less than 75 percent of the
30-month-old boys in the reference population.
It is important to understand that the growth charts are best used to follow a child's growth over
time or to find a pattern of his/her growth. Should one be concerned if a child consistently is in a
low percentile for a particular measure? For example, should a parent be concerned if from the
ages of 10 months to 32 months their girl ranks between the 5th and 10th percentile for weight?
The answer is no; she is exhibiting normal growth. Should one be concerned with a sudden drop
or sudden increase in a percentile value for a particular measure? For example, should a parent
be concerned if their son dropped from the 90th percentile for weight at the age of 6 months to
the 25th percentile at the age of 12 months? The answer is yes; such a large drop may indicate a
problem.
On the growth charts we will be using, there are nine lines/curves. The bottom line represents
the 3rd percentile and the top line represented the 97th percentile. The other lines from top to
bottom are the 5th, 10th, 25th, 50th, 75th, 90th, and 95th percentile. We have included these growth
charts in the appendix, found at the end of this resource. We will need to use these charts to
answer all the questions in this section. All of these charts are from the 2000 CDC Growth
Charts for the United States: Methods and Development (Kuczmarski RJ, Ogden CL, Guo SS, et
al. 2000 CDC growth charts for the United States: Methods and development. National Center
for Health Statistics.Vital Health Stat 11(246). 2002). We should apologize ahead of time that we
have only supplied growth charts for boys. The growth charts for boys are blue and those for
NSSAL
©2011
80
Draft
C. D. Pilmer
girls are pink. Unfortunately charts in pink do not reproduce well in a black and white resource
so we had to omit them.
Example 1
Using the weight growth chart for boys, answer the following.
(a) In what percentile is a 3 month year old boy weighting 12 pounds (or 5.44 kg). What does
this percentile mean?
(b) What weight would one expect for a four month old boy who is in the 75th percentile for
weight?
(c) What range of weights would one expect for two month old boys who are between the 3rd
and 97th percentile for weight?
(d) What range of ages would one expect for boys whose weights are 12 pounds yet stay within
the 3rd and 97th percentile for their age?
Answers:
(a) On the vertical axis, find 12 pounds and
on the horizontal axis, find 3 months.
Plot the point (3, 12) on the coordinate
system. This point intersects the fourth
curve from the bottom; (i.e. the 25th
percentile curve). It means that this 3
month old 12 pound boy weights as much
or more than 25 percent of the boys of the
same age.
NSSAL
©2011
81
(b) On the horizontal axis, find 4 months.
Move up until we intersect the sixth
curve from the bottom (i.e. the 75th
percentile curve). This point
corresponds with a weight of 16 pounds
(or approximately 7.23 kg).
Draft
C. D. Pilmer
(c) A two month old boy in the 3rd percentile
would only weigh approximately 8.8
pounds. A two month old boy in the 97th
percentile weighs approximately 14.5
pounds. Therefore we would expect that
weights between 8.8 pounds and 14.5
pounds would cover all two month old
boys between the 3rd and 97th percentile.
(d) A one month old boy could weigh as
much as 12 pounds if he is in the 97th
percentile. A boy a little more than 4
month old could weigh as little as 12 kg
if he is in the 3rd percentile. Therefore,
boys between 1 month and a little more
than 4 months of age could weigh 12
pounds yet still be within the 3rd and 97th
percentile for their age.
Questions
1. In what percentile for head circumference is a 12 month old boy with a head circumference
of 46.2 cm? Explain what this percentile means.
2. In what percentile for length is a 31 month old boy with a length of 99 cm (or 39 inches).
Explain what this percentile means.
NSSAL
©2011
82
Draft
C. D. Pilmer
3. For each case, determine the percentile ranking.
(a) 33 month old boy, length = 36 inches
(b) 21 month old boy, weight = 31 pounds
(c) 30 month old boy, weight = 26 pounds
(d) 23 month old boy, head circumference = 19.5 inches
(e) 10 month old boy, length = 28.5 inches
(f) 33 month old boy, head circumference = 19.75 inches (or approximately 51 cm)
(g) 10 month old boy, weight = 24.5 pounds (or approximately 11.3 kg)
(h) 28 month old boy, length = 33.5 inches (or approximately 86 cm)
4. For each case, determine the measure.
(a) What weight would one expect for a twelve month old boy who is in the 5th percentile for
weight?
(b) What length would one expect for a 20 month old boy who is in the 50th percentile for
length?
(c) What head circumference would one expect for a 10 month old boy who is in the 97th
percentile for head circumference?
5. What range of lengths would one expect for 15 month old boys who are between the 3rd and
97th percentile for length?
6. What range of head circumferences would one expect for 30 month old boys who are
between the 3rd and 97th percentile for head circumference?
NSSAL
©2011
83
Draft
C. D. Pilmer
7. What range of ages would one expect for boys whose lengths are 31 inches yet stay within
the 3rd and 97th percentile for their age?
8. What range of ages would one expect for boys whose head circumferences are 16.25 inches
yet stay within the 3rd and 97th percentile for their age?
9. What range of weights would one expect for 33 month old boys who are between the 25th
and 75th percentile for weight?
10. What range of lengths would one expect for 22 month old boys who are between the 10th and
90th percentile for length?
11. What range of ages would one expect for boys whose weights are 21 pounds yet stay within
the 5th and 90th percentile for their age?
12. What range of ages would one expect for boys whose lengths are 29 inches yet stay within
the 25th and 75th percentile for their age?
13. Look at the weights of a particular boy over a 12 month period. Do you have concerns
regarding his weight? Explain.
Months
Weight (kg)
NSSAL
©2011
0
4.55
2
5.89
4
6.80
84
6
7.58
8
7.82
10
8.16
12
8.42
Draft
C. D. Pilmer
Putting It Together
In this unit we looked at the following.
• Populations and Samples
• Categorical and Numerical Data
• Bar Graphs, Double Bar Graphs, Stacked Bar Graphs, Histogram, Circle Graphs and Line
Graphs
• Mean, Trimmed Mean, Median, and Mode
• Box and Whisker Plots (with and without technology)
• Standard Deviation (with and without technology)
• Distributions (Normal, Skewed, Bimodal, Uniform)
• The 68-95-99.7 Rule for Normal Distributions
• Z-Scores
• Growth Charts
Questions:
1. The manager of the community sportsplex wanted to know how the 1386 members might
feel about the discussion concerning an addition to the existing building that included a 25
metre, 8 lane pool. He asked 230 randomly selected members if they were willing to pay an
additional $35 a year on their membership fee to have these new features. Describe the
population and the sample for this situation.
2. For each of the following, state whether the data collection would result in a categorical data
set or numerical data set. If the data is numerical, indicate whether we are dealing with
discrete or continuous data.
(a) The number of pets in Nova Scotian households
(b) The type of MP3 player owned by adults.
(c) The diameter of the trunk of spruce trees growing in a particular
valley.
(d) The size of T-shirts worn by boys between the ages of 16 and 18
years
(e) The number of children traveling more than 1.5 kilometres to
school.
(f) The time to complete a driver’s license renewal at a specific
Access Nova Scotia location
NSSAL
©2011
85
Draft
C. D. Pilmer
3. The 5-year survival rates for six different types of cancers have been supplied in the graph
below.
100
90
Survival Rate %
80
70
60
1992 to 1994
50
2004 to 2006
40
30
20
10
Br
ai
n
O
va
ry
or
ec
ta
l
Co
l
Br
ea
st
Sk
in
M
el
an
om
a
Pr
os
ta
te
0
Source: Canadian Cancer Registry
(a) What was the approximate survival rate for colorectal cancer between 1992 and 1994?
(b) What was the approximate survival rate for breast cancer between 2004 and 2006?
(c) By approximately how much did the survival rate for ovarian cancer improve from 19921994 to 2004-2006?
(d) If approximately 22 200 Canadian women were diagnosed with breast cancer in 2006,
then how many are expected to survive?
(e) What type of graph (bar, double bar, stacked bar, circle,…) are we dealing with here?
(f) Can you conclude that there were fewer cases of brain cancer than prostate cancer based
on this graph? Why or why not?
NSSAL
©2011
86
Draft
C. D. Pilmer
4. A major fast food chain that specializes in pizzas had all its store report on the topping
selected by all customers for their pizzas. This data was used to construct the circle graph
below. It is also important to know that this chain sold 564 000 pizzas over a one year period
amongst all of their establishments.
other
6%
onions
4%
mushroom
14%
pepperoni
42%
sausage
19%
vegetable
15%
(a) Are we dealing with a sample or a population? Explain.
(b) What percentage of customers ordered vegetables on their pizza?
(c) What percentage of customers ordered sausage and/or onion on their pizzas?
(d) What percentage of customers ordered sausage and onion on their pizzas?
(e) How many pizzas with pepperoni topping were sold during this year?
(f) How many pizzas with sausage and/or mushroom toppings were sold during this year?
(g) What is the ratio of pizzas with mushroom toppings to pizzas with pepperoni toppings?
(h) There were 107 160 pizzas with a particular topping. What topping was it?
NSSAL
©2011
87
Draft
C. D. Pilmer
5. The following graph shows the number of infant deaths in Canada from 1999 to 2007.
1,900
Number of Infant Deaths
1,880
1,860
1,840
1,820
1,800
1,780
1,760
1,740
1,720
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
Year
Source: Statistics Canada
What are your thoughts regarding the scale used on the vertical axis of this line graph?
6. Below you have been provided with data tables. Indicate what type of graph (histogram,
line, circle, bar, double bar, or stacked bar graph) you would use for this data.
(a)
Graph Type: ___________________
Brand of
Car
Toyota
GM
Honda
Ford
Chrysler
Volkswagen
Hyundai
Other
NSSAL
©2011
Canadian Market
Share (Sept 2011)
9.9%
12.8%
8.1%
16.7%
15.8%
4.8%
13.1%
18.8%
(b)
Graph Type: ___________________
Canadian Policereported Crimes
Impaired Driving
Abduction
Arson
Counterfeiting
Theft over $5000
Fraud
Uttering Threats
Extortion
88
2008
2009
84 759
464
13 270
1015
16 743
90 932
78 500
1385
88 630
429
13 372
798
15 573
90 623
78 407
1701
Draft
C. D. Pilmer
(c)
Graph Type: ___________________
Cause for Lateness
Snoozing after Alarm
Car Problems
Missed Public Transit
Family Crisis
Stuck in Traffic
Other
(e)
Frequency
83
23
47
62
113
59
Graph Type: ___________________
Time
0
1
2
3
4
5
(d)
Height of Projectile
in Metres
2.0
22.1
32.4
32.9
23.6
4.5
Graph Type: ___________________
Mean Amount of
Sleep in Hours
5-6
6-7
7-8
8-9
9 - 10
(f)
Number of
People
26
74
103
57
21
Graph Type: ___________________
Department
Jan
Feb
Mar
Profit Profit Profit
($)
($)
($)
Automotive 4045 5612 6289
Toys
2045 2549 3283
Electronics 6845 2248 1867
Sporting G. 2567 1217 1506
Footwear
4753 5608 6099
Men's
1598 2286 1894
Women's
3725 4589 4635
7. An airline company randomly selected eighteen suitcases from domestic flights and recorded
their weights in kilograms.
16.2
11.3
15.7
14.7
15.1
19.6
16.0
14.1
3.9
18.0
14.8
16.3
13.6
11.9
12.4
14.8
13.5
19.7
(a) Although the airline collected a sample, describe the population in this situation.
(b) Would a histogram or bar graph be used with this data set?
(c) Calculate the mean, median, mode, and 5% trimmed mean without using the STAT
feature on a TI-83/84 calculator.
NSSAL
©2011
89
Draft
C. D. Pilmer
8. Mr. Tetford's and Mrs. Gatien's learners wrote the same math test. The test was out of 30.
The results for the two classes are shown below.
Mr. Tetford's Class
26 26 29 22
23
19
25
27
23
27
24
20
25
Mrs. Gatien's Class
25 27 23 21
23
22
20
24
20
30
21
24
20
22
(a) Construct box and whisker plots for each set of data without using a graphing calculator.
5
10
15
20
25
30
(b) What range of marks would place a learner in the top 50% of Mr. Tetford's class?
(c) What range of marks would place a learner in the bottom 25% of Mrs. Gatien's class?
(d) What range of marks would place a learner in the top 25% of the Mrs. Gatien's class?
(e) How do the two classes compare in terms of marks on this math test?
NSSAL
©2011
90
Draft
C. D. Pilmer
9. A study looked at the concentration of iron in the bloodstream of ten randomly selected high
performance female athletes. The following data was collected. The concentrations are
measured in grams per decilitre (g/dl).
15.3
14.2
13.6
11.9
14.8
12.6
14.6
13.9
14.2
12.9
(a) Are we dealing with a population or a sample?
(b) Calculate the mean without using the STAT features on your calculator. Use the
appropriate symbol.
(c) Calculate the standard deviation without using the STAT features on your calculator..
xi
10. If you were collecting a random sample in each situation, what type of distribution (normal,
uniform, bimodal, skewed) would you likely obtain?
Distribution Type
(a) Hodgkin’s lymphoma is a type of cancer that originates from
white blood cells. This disease typically affects people either in
early adulthood or when they are 55 years of age or older. You
randomly select 250 patients with Hodgkin’s lymphoma and ask
them to report the age of their initial diagnosis. What would the
distribution of ages likely look like?
(b) Most people make under $40,000 a year, but some make quite a
bit more, with a smaller number making many millions of
dollars a year. What would the distribution of yearly earnings
likely look like?
(c) James is working as a biologist for the summer and measuring
the circumferences of randomly selected maple trees in a natural
growth forest. What would the distribution of circumferences
likely look like?
NSSAL
©2011
91
Draft
C. D. Pilmer
Distribution Type
(d)
You use the random number generator on your calculator to find
500 random whole numbers between 1 and 10. What would the
distribution of numbers likely look like?
11. The body mass index of all 6000 new recruits to the armed forces were taken. The mean was
23.0 kg/m2 and the standard deviation 2.5 kg/m2. Assume that the distribution of body mass
indexes was bell-shaped. (Hint: Use the 68-95-99.7% rule to solve these questions, rather
than z-scores and the standard normal curve.)
(a) How many new recruits had body mass indexes between 23.0 kg/m2 and 25.5 kg/m2?
(b) How many new recruits had body mass indexes between 18.0 kg/m2 and 23.0 kg/m2?
(c) How many new recruits had body mass indexes between 15.5 kg/m2 and 30.5 kg/m2?
(d) How many new recruits had body mass indexes between 20.5 kg/m2 and 28.0 kg/m2?
(e) How many new recruits had body mass indexes between 18.0 kg/m2 and 30.5 kg/m2?
(f) How many new recruits had body mass indexes between 15.5 kg/m2 and 25.5 kg/m2?
(g) How many new recruits had body mass indexes between 25.5 kg/m2 and 28.0 kg/m2?
NSSAL
©2011
92
Draft
C. D. Pilmer
(h) How many new recruits had body mass indexes between 15.5 kg/m2 and 18.0 kg/m2?
(i) How many new recruits had body mass indexes greater than 23.0 kg/m2?
(j) How many new recruits had body mass indexes greater than 20.5 kg/m2?
(k) How many new recruits had body mass indexes less than 28.0 kg/m2?
(l) How many new recruits had body mass indexes greater than 25.5 kg/m2?
(m) How many new recruits had body mass indexes less than 18.0 kg/m2?
12. Data collected over the last 100 years indicates that the average daily temperature for a
particular location in August is 26oC with a standard deviation of 3oC. If we are dealing with
a bell-shaped distribution, determine the z-scores corresponding to each of these
temperatures.
(a) 31oC
(b) 24oC
NSSAL
©2011
93
Draft
C. D. Pilmer
13. Scores on the Wechsler Adult Intelligence Scale (i.e. an IQ test) for 20 to 34 year old adults
are approximately normal with a mean of 110 and a standard deviation of 25. For a
randomly selected adult within that age group, determine (without using a graphing
calculator) the likelihood that their IQ will be:
(a) between 104 and 128?
(b) between 80 and 110?
(c) greater than 110?
(d) less than 132?
(e) between 90 and 107?
(f) greater than 150?
NSSAL
©2011
94
Draft
C. D. Pilmer
14. In what percentile for head circumference is a 11 month old boy with a head circumference
of 44.4 cm? Explain what this percentile means.
15. What weight would one expect for a 24 month old boy who is in the 25th percentile for
weight?
16. What range of lengths would one expect for 28 month old boys who are between the 3rd and
97th percentile for lengths?
17. What range of ages would one expect for boys whose lengths are 25 inches yet stay within
the 3rd and 97th percentile for their age?
18. What range of head circumferences would one expect for 25 month old boys who are
between the 10th and 90th percentile for head circumference?
NSSAL
©2011
95
Draft
C. D. Pilmer
Areas Under the Normal Curve (z-Table)
The values inside the table represent the areas under the normal curve for values between 0 and a
z-score. For example, to determine the area under the curve between 0 and 1.37, look in the
intersecting cell for the row labeled 1.3 and the column labeled 0.07. The area is 0.4147.
z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
NSSAL
©2011
96
Draft
C. D. Pilmer
NSSAL
©2011
97
Draft
C. D. Pilmer
NSSAL
©2011
98
Draft
C. D. Pilmer
NSSAL
©2011
99
Draft
C. D. Pilmer
Post-Unit Reflections
What is the most valuable or important
thing you learned in this unit?
What part did you find most interesting or
enjoyable?
What was the most challenging part, and
how did you respond to this challenge?
How did you feel about this topic when
you started this unit?
How do you feel about this topic now?
Of the skills you used in this unit, which
is your strongest skill?
What skill(s) do you feel you need to
improve, and how will you improve them?
How does what you learned in this unit fit
with your personal goals?
NSSAL
©2011
100
Draft
C. D. Pilmer
Answers
Populations and Samples (pages 1 to 2)
1. Population: all the taxpayers in this community (4127)
Sample: the 300 randomly selected taxpayers
2. Population: all the used bricks that the contractor purchased (6000)
Sample: the 200 randomly selected bricks that were examined to determine usability
3. Population: all of the employed workers in Nova Scotia (453 000)
Sample: the 1200 randomly selected employed workers who participated in the survey and
reported their annual gross income
4. Population: all of the adults who received a high school diploma from NSSAL between 2001
and 2009
Sample: the 240 randomly selected NSSAL graduates who participated in the interview
Tables (pages 3 to 4)
1. Star Wars: Episode 0
2. Star Wars: Episode 0
3. Terminator: Rise of the Toasters
4. Jaws: The Teething Years
5. Transformers: The Horse and Buggy Years
6. A graph of some fashion
7. It is far easier to use this graph to answer the questions on the previous page.
8. Population
Types of Data (pages 5 to 6)
1. (a)
(c)
(e)
(g)
(i)
(k)
numerical (continuous)
categorical
numerical (discrete)
categorical
numerical (discrete)
numerical (discrete)
NSSAL
©2011
(b)
(d)
(f)
(h)
(j)
(l)
101
categorical
numerical (continuous)
numerical (continuous)
numerical (continuous)
numerical (continuous)
categorical
Draft
C. D. Pilmer
(m) numerical (continuous)
(n) categorical
Bar Graphs and Histograms (pages 7 to 14)
1. (a)
(b)
(c)
(d)
(e)
baseball
approximately 78 million fans
football
little less than 20 million fans
bar graph
2. (a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
double bar graph
Germany
37 medals
Norway
2 medals
7 medals
116 medals
121 medals
3. (a)
(b)
(c)
(d)
(e)
(f)
histogram
numerical, continuous
approximately 52 000 RNs (24 000 + 28 000)
approximately 18 000 RNs (36 000 - 18 000)
three classes: 45 to 49 years, 50 to 54 years, and 55 to 59 years
shortage of RNs in the future
4. (a)
(b)
(c)
(d)
(e)
(f)
stacked bar graph
no
little more than 1300 cases
approximately 550 cases
approximately 300 cases (850 - 550)
consult visits 2005/2006: 460 (540-80)
consult visits 2006/2007: 660 (750-90)
660 - 460 = 200 cases
(g) inpatient days decreased significantly but consult visits increased by a similar amount
5. (a)
NSSAL
©2011
102
Draft
C. D. Pilmer
2
(b) 16 %
3
(c) numerical, continuous
(d) sample
Circle Graphs and Line Graphs (pages 15 to 19)
1. (a)
(b)
(c)
(d)
automobile accidents
3 times
288
60
12
(e) (ii)
7
(f) home injuries
2. (a) Jan - Feb 08, Aug - Sept 08, Jan - Feb 09, Jan - Feb 10, Oct - Nov 10
(b) Oct 08
(c) May 09
(d) $15 000 million ($15 billion)
3. (a) 40%
7
(b)
12
(c) 242 starts
(d) 340
4. (a) 13th day, $7.40
(b) $11.40 per share
(c) 15th day, $2.50 per share
First Impression/Second Impressions (pages 20 to 23)
(More detailed responses are required than what is supplied below.)
Part 1 - The perspective of the circle graph that was initially presented can lead one to believe
that the three brands of ice cream are favored equally; this is not the case.
Part 2 - One may initially assume that the population of Trois-Rivieres is 4 to 5 times that of
Lethbridge if one did not consider the scale on the vertical axis. On the first bar graph, the
vertical axis starts at 50 000, rather than 0 (as it does on the second graph).
Part 3 - Because the first graph deals with percentages, we do know what percentage of patrons
for each ride were male and female. However, we are unable to see how the rides compared to
NSSAL
©2011
103
Draft
C. D. Pilmer
each other in terms of attracting patrons. This only occurred when we were able to examine the
second graph which plotted number of people on the vertical axis.
Part 4 - The first graph may have made individuals believe that the average price of a domestic
airfare was fluctuating wildly. This occurs when one fails to look at the scale on the vertical
axis. In the first graph, the scale starts at $160, rather than $0 (as it does in the second graph).
What Type of Graph Should Be Used? (pages 24 to 25)
1. Double Bar Graph (or Stacked Bar Graph)
2. Circle Graph (or Bar Graph)
3. Line Graph
4. Histogram
5. Stacked Bar Graph
6. Circle Graph (or Bar Graph)
7. Bar Graph
8. Double Bar Graph
9. Histogram
10. Line Graph
Mean, Median, Mode, and Trimmed Mean (pages 26 to 33)
1. (a) sample
(b) x = 6.2 Median = 6 Mode = 7
(c) There are no outliers.
2. (a) population
(b) numerical
(c) µ = 159.44 Median = 157 No Mode
3. (a) sample
(b) x = 35 (34.6)
5% Trimmed Mean
10% Trimmed Mean
NSSAL
©2011
Median = 31
x(T ) = 31 (30.6)
Mode = 23 and 27 (bimodal)
x(T ) = 31 (30.9)
104
Draft
C. D. Pilmer
(c) Trimmed means are appropriate because the outlier 115 exists within the data set.
(d) Four data points from the bottom and four data points from top of the data set
4. (a) x = 268 (267.875) Median = 254 (253.5)
(b) Median and Trimmed Mean
(c) Histogram
Mode = 267
x(T ) = 255 (255.409)
5. This score system was likely implemented to eliminate the effect of a single rogue judge who
would inflate or deflate the score of a particular athlete.
Box and Whisker Plots (pages 34 to 40)
1 (a) minimum: 6
lower quartile: 11
median: 17
upper quartile: 21
maximum: 30
(b) minimum: 33
lower quartile: 40
median: 44
upper quartile: 48
maximum: 52
(c) minimum: 24
lower quartile: 25.5
median: 30
upper quartile: 35
maximum: 40
(d) minimum value: 28
lower quartile: 35
median: 36.5
upper quartile: 38
maximum: 41
2. (a) minimum: 7
lower quartile: 10.5
median: 18
upper quartile: 20.5
maximum: 22
(b) The median, upper quartile and maximum for Mr. Porter's class are equal to those for
Mr.Churchill's class. That means that in both classes student with slower reaction times
(i.e. worse than the median) were performing at the approximately the same level. When
we compared students with faster reaction times (i.e. better than the median), however,
we notice a difference between the two classes. Because Mr. Churchill's class has a
NSSAL
©2011
105
Draft
C. D. Pilmer
smaller minimum and lower quartile, we can say that his faster reaction time students in
general out-performed Mr. Porter's faster reaction time students.
(c) Mrs. Lowe's Class
minimum: 6
lower quartile: 10
median: 14
upper quartile: 18
maximum: 20
Mr. Vroom's Class
minimum: 6
lower quartile: 15
median: 18
upper quartile: 23
maximum: 23
With the exception of the minimum, all other values are lower (faster reaction times) for
Mrs. Lowe's class. That means that the majority of Mrs. Lowe's students out-performed
Mr. Vroom's students in the reaction time experiment.
(d) Mrs. Burchill's Class
minimum: 5
lower quartile: 10
median: 12.5
upper quartile: 16
maximum: 21
Mr. Rhodenizer's Class
minimum: 6
lower quartile: 9
median: 13
upper quartile: 16
maximum: 22
The two box-and-whisker plots are very similar. One can conclude that the students
performed at about the same level on the reaction time experiment.
Using Technology to Make Box-and-Whisker Plots (pages 41 to 45)
1. (a) Tanya
minimum: 2
lower quartile: 8
median: 20
upper quartile: 24
maximum: 25
(b) Tanya's Mean: 16.2
(c) Tanya
Class
0 to 5
5 to 10
10 to 15
15 to 20
20 to 25
25 to 30
30 to 35
NSSAL
©2011
Frequency
3
1
2
1
5
3
0
Barb
minimum: 6
lower quartile: 12
median: 17
upper quartile: 20
maximum: 25
Suzette
minimum: 4
lower quartile: 7
median; 10
upper quartile: 21
maximum: 30
Barb's Mean: 15.9
Suzette's Mean: 13.9
Barb
Class
0 to 5
5 to 10
10 to 15
15 to 20
20 to 25
25 to 30
30 to 35
106
Frequency
0
2
4
4
3
1
0
Suzette
Class
0 to 5
5 to 10
10 to 15
15 to 20
20 to 25
25 to 30
30 to 35
Frequency
1
7
2
1
2
2
1
Draft
C. D. Pilmer
(d) Tanya
(e) Tanya
(f) Barb
(g) 24 to 25 points
(h) 6 to 12 points
(i) 10 to 30 points
2. (a) Mean Time: 12.0
(b) minimum: 10.6
lower quartile: 11.2
median: 12.05
upper quartile: 12.5
maximum: 16.2
(c) Class
10 to 11
11 to 12
12 to 13
13 to 14
14 to 15
15 to 16
16 to 17
Frequency
4
10
12
3
0
0
1
(d) no
(e) 10.6 to 12.05 seconds
(g) 10.6 to 11.2 seconds
(h) Track Meet A
3. Class A
minimum: 20.2
lower quartile: 23.5
median: 26.85
upper quartile: 28.1
maximum: 29.4
(f) 12.5 to 16.2 seconds
Class B
minimum: 17.2
lower quartile: 19.2
median: 22.15
upper quartile: 27.7
maximum: 32.7
Although the median for Class B is much lower (and in the normal range), we have far more
extremes in this class. There are a significant number in Class B that are underweight or
obese; that is why the box and whiskers are so much larger when plotting this classes BMI
data. For Class A the data is more clustered together with all individual being found within
the normal and overweight range, although more than half are in the overweight category.
Standard Deviation (pages 46 to 50)
1. σ = 2.89
2. σ = 0.41
NSSAL
©2011
107
Draft
C. D. Pilmer
3. (a) σ = 1.49 and σ = 2.49
(b) The standard deviation is lower for the first data set. That means this data is not as
spread out as the data in the second data set.
4. (a)
(b)
(c)
(d)
(e)
183
182
numerical data set
σ = 4.90
The average heights of these two groups of learners are the same however the standard
deviation for Barb’s group is much lower. That means that there is less variation in
heights between Barb’s male learners compared to the other instructor’s learners. The
heights of her learners are more clustered around the mean.
(f) The standard deviations are almost the same for the two groups of male learners,
however, the mean height for Barb’s group is higher. We can conclude that the average
height of male learners in Barb’s math courses is three centimeters more than the third
instructor’s male students. The variation in heights between the two groups is essentially
the same.
5. Histogram (i) matches with (c).
Histogram (ii) matches with (b).
Histogram (iii) matches with (d).
Histogram (iv) matches with (a).
6. Answers will vary.
Using Technology to Calculate Population Standard Deviation (pages 52 to 56)
1. (a) population
(b)
(c) µ = 14.1 , median: 9.91 , σ = 11.2 (Units: young persons out of 10 000 young persons)
(d) The mean is high because the incarceration rate for the Northwest Territories is so much
higher than the rates.
2. (a) population
(b)
NSSAL
©2011
108
Draft
C. D. Pilmer
(c)
(d)
(e)
(f)
µ = 55.6 years
σ = 9.5 years
median: 54.5 years
The data does not cluster well around the mean.
3. (a)
(b)
(c)
(d)
(e)
µ = 3.6 mmol/L
σ = 0.90 mmol/L
median: 3.4 mmol/L
Most of the patients are clustered in the near optimal and borderline ranges. There are a
few who are in desirable range, and even a few more in the high and too high ranges.
Distributions (pages 57 to 59)
1. (a)
(c)
(e)
(g)
(i)
uniform
skewed right
skewed left
normal
bimodal
(b)
(d)
(f)
(h)
(j)
bimodal
normal
uniform
skewed left
normal
Normal Distributions and the 68-95-99.7 Rule (pages 60 to 67)
1.
Hint:
(a) Between µ − σ and µ + σ
(b) Between µ and µ + 2σ
(c) Between µ − σ and µ
(d) Between µ − 3σ and µ
(e) Between µ − 2σ and µ + σ
(f) Between µ − σ and µ + 3σ
(g) Between µ − 3σ and µ + 2σ
(h) Between µ + σ and µ + 2σ
(i) Between µ − 3σ and µ − 2σ
(j) Between µ + σ and µ + 3σ
(k) Less than µ + σ
(l) Greater than µ − 2σ
(m) Less than µ − σ
NSSAL
©2011
Calculation:
----47.5% + 34%
34% + 49.85%
49.85% + 47.5%
47.5% - 34%
49.85% - 47.5%
49.85% - 34%
50% + 34%
47.5% + 50%
50% - 34%
109
Answer:
68%
47.5%
34%
49.85%
81.5%
83.85%
0.9735
0.135
0.0235
0.1585
0.84
0.975
0.16
Draft
C. D. Pilmer
2.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
Hint:
Between
Between
Between
Between
Between
Between
Between
Between
Between
Between
(k)
Less than µ − 2σ
µ − 3σ and µ + 3σ
µ − σ and µ + σ
µ − 2σ and µ
µ − σ and µ + 2σ
µ + σ and µ + 2σ
µ − 2σ and µ + 2σ
µ − 3σ and µ − σ
µ − 2σ and µ + 3σ
µ − 3σ and µ
µ + 2σ and µ + 3σ
Calculation:
---34% + 47.5%
47.5% - 34%
-49.85% - 34%
47.5% + 49.85%
-49.85% – 47.5%
Percentage:
99.7%
68%
47.5%
81.5%
13.5%
95%
15.85%
97.35%
49.85%
2.35%
Answer:
1994
1360
950
1630
270
1900
317
1947
997
47
50% - 47.5%
2.5%
50
Z-Scores (pages 68 to 79)
1. (a) -0.65
(b) 1.33
2. (a) 0.68
(b) -0.32
3. (a) Tylena, Elliott, Marcus
(b) Meera, Hamid
` (c) Beverly
(d) Elliott, no
(e) No, they may have all passed if the mean mark was very high or the majority could have
failed if the mean mark was very low. Without the mean and standard deviation we
cannot tell who passed and who failed.
4. (a)
(b)
(c)
(d)
(e)
0.4525
0.4082 + 0.2486 = 0.6568
0.4901 - 0.4082 = 0.0819
0.5 - 0.2486 = 0.2514
0.5 + 0.4525 = 0.9525
5. (a)
(b)
(c)
(d)
(e)
(f)
0.5
0.3770
0.5 + 0.2190 = 0.7190
0.3289 - 0.1026 = 0.2263
0.4826 - 0.1255 = 0.6081
0.5 - 0.2852 = 0.2148
6. (a) 0.3849 + 0.2881 = 0.6730
NSSAL
©2011
110
Draft
C. D. Pilmer
(b)
(c)
(d)
(e)
(f)
0.5 - 0.1554 = 0.3446
0.5
0.4452 - 0.1554 = 0.2898
0.2881 + 0.5 = 0.7881
0.4918
Growth Charts (pages 80 to 84)
1. 50th percentile; The head circumference for this 12 month old boy is equal to or greater than
the head circumference of 50% of the boys of the same age.
2. 95th percentile; The length of this 31 month old boy is equal to or greater than the length of
95% of the boys of the same age.
3. (a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
25th percentile
90th percentile
10th percentile
75th percentile
Between the 25th and 50th percentile
Between the 50th and 75th percentile
Between the 90th and the 95th percentile
Between 5th and 10th percentile
4. (a) 19 pounds (approximately 8.6 kg)
(b) 33 inches (approximately 83.7 cm)
(c) 19 inches (approximately 48.2 cm)
5. 29 inches (approximately 73.6 cm) to 33.5 inches (approximately 85.1 cm)
6. 18.25 inches (approximately 46.3 cm) to 20.5 inches (approximately 52 cm)
7. 10 to 21 months
8. 1 to 6 months
9. 28.5 pounds (approximately 12.9 kg) to 33 pounds (approximately 15 kg)
10. 32 inches (approximately 81.3 cm) to 35.5 inches (approximately 90.2 cm)
11. 6 to 17 months
12. 9 to 12 months
13. (Hint: Change to Percentiles) Should be concerned; the boy went from 97th percentile for
weight at birth to the 3rd percentile for weight by the age of 12 months
NSSAL
©2011
111
Draft
C. D. Pilmer
Putting It Together (pages 85 to 95)
1. Population: all 1386 members of the sportsplex
Sample: the 230 randomly selected members
2. (a) Numerical, Discrete
(c) Numerical, Continuous
(e) Numerical, Discrete
(b) Categorical
(d) Categorical
(f) Numerical, Continuous
3. (a) 56%
(b) 87%
(c) 4%
(d) 19314 (if you use a survival rate of 87%)
(e) double bar
(f) No, The graph does not show the number of cases. It only shows survival rates.
4. (a) population because all stores had to report toppings selected by all customers.
(b) 15%
(c) 23%
(d) Cannot determine based on the information supplied.
(e) 236 880 pizzas
(f) 186 120 pizzas
1
(g)
3
(h) sausage
5. The scale used makes one initially feel that there were drastic fluctuations in the number of
infant deaths between 2004 and 2007. This is not the case.
6. (a) circle graph
(b) double bar graph
(c) bar graph
(d) histogram
(e) line graph
(f) stacked bar graph
7. (a) Population: All suitcases on domestic flights
(b) Histogram
(c) x = 14.5 kg, Median = 14.8 kg, Mode = 14.8, x(T ) = 14.9 kg
8. (a) Mr. Tetford's Class
Minimum: 19
Lower Quartile: 22.5
Median: 25
Upper Quartile: 26.5
Maximum: 29
NSSAL
©2011
Mrs. Gatien's Class
Minimum: 20
Lower Quartile: 21
Median: 22.5
Upper Quartile: 24
Maximum: 30
112
Draft
C. D. Pilmer
(b)
(c)
(d)
(e)
25 to 29
20 to 21
24 to 30
Although Mrs. Gatien's class' lowest and highest marks are better than those for Mr.
Tetford's class, the middle 50% of her learners obtained marks between 21 and 24, while
the middle 50% of Mr. Tetford's learners obtained marks between 22.5 and 26.5 (actually
between 23 and 26 because half points were not awarded on the test). Mr. Tetford's class
outperformed Mrs. Gatien's class on this particular test.
9. (a) sample
(b) 13.8 g/dl
(c) 1.01 g/dl
10. (a) Bimodal
(c) Normal
(b) Skewed (left)
(d) Uniform
11. (a) 2040
(c) 5982
(e) 5841
(g) 810
(i) 3000
(k) 5850
(m) 150
(b)
(d)
(f)
(h)
(j)
(l)
2850
4890
5031
141
5040
960
12 (a) 1.67
(b) -0.67
13. (a)
(b)
(c)
(d)
(e)
(f)
0.0948 + 0.2642 = 0.3590
0.3849
0.50
0.3106 + 0.5 = 0.8106
0.2881 - 0.0478 = 0.2403
0.5 - 0.4452 = 0.0548
14. 10th percentile; The head circumference for this 11 month old boy is equal to or greater than
the head circumference of 10% of the boys of the same age.
15. 26 pounds (or 11.8 kg)
16. 33 inches to 38.5 inches
17. 2 months to approximately 6.7 months
18. 18.5 inches to almost 20 inches
NSSAL
©2011
113
Draft
C. D. Pilmer