Download Chapter 7 - ClassNet

Document related concepts

Data mining wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter 07
7/6/07
9:37 AM
Page 289
Home
Quit
Key Words
• one-variable data
• discrete data
• continuous data
• categorical data
What
You’ll Learn
To collect and analyse
one-variable data from
primary and secondary
sources
• distribution
And Why
People often make business
and personal decisions based
on data. Collecting data in an
unbiased way and analysing data
effectively are important life skills.
• population
• sample
• bias
• measures of central
tendency
• standard deviation
Chapter 07
7/6/07
9:37 AM
Page 290
Home
CHAPTER
7
Quit
Activate Prior Knowledge
Interpreting Circle Graphs
Prior Knowledge for 7.1
A circle graph is also known as a pie chart. It displays data by dividing a circle into sectors
that represent parts of a whole proportionally.
Example
This circle graph shows the results of a survey on the method of communication with
friends used most often by Ontario secondary students.
a) From the graph, which method is the
Method of Communication
most popular?
Text message, e-mail 7%
b) What percent of students prefer to use a
Cell phone 6%
cell phone?
Telephone
c) The survey was completed by 15 600 students.
12%
Internet chat
How many prefer to communicate in person?
or MSN 38%
Solution
In person 37%
Online chat or MSN has the greatest percent, 38%,
so it is the most popular.
b) 6% of students prefer to use a cell phone.
c) 37%, or 0.37 of the students surveyed prefer to communicate in person.
0.37 ⫻ 15 600 ⫽ 5772
So, 5772 students prefer to communicate in person.
a)
✓ Check
1. The same 15 600 students in the Guided Example were also asked this
Charity Type
question: “If you had $1000 to give to charity, which type
would you choose?” The results are shown in the circle graph.
a) Which choice is the most popular?
Other 21%
What percent of students chose this type of donation?
b) How many students chose to donate to wildlife and animals? Arts, culture,
Health 31%
sports 13%
2. A circle graph shows each category of data as a percent
of the complete set of data.
Why do you think this is a good way to display data?
What type of data would not fit in a circle graph?
290
CHAPTER 7: One-Variable Data
International
Wildlife/
animals 17% aid 18%
7/6/07
9:37 AM
Page 291
Home
Quit
Bar Graphs and Pictographs
Prior Knowledge for 7.1
A bar graph has horizontal or vertical bars that represent the data.
The graph compares the data in categories, such as the average rainfall of different months.
A pictograph is similar to a bar graph, but uses pictures or symbols to compare data.
Example
This table shows the average rainfall of Toronto, to the nearest millimetre,
for each month of the year over a period of 40 years.
Month
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Rainfall (mm)
47
46
58
66
67
66
72
82
70
63
67
62
Draw a bar graph and a pictograph to represent the same data.
b) About how much rain falls in the summer months, July and August?
a)
Solution
a)
The length of a bar represents
the amount of rain
represents 10 mm of rainfall
Average Rainfall in Toronto
Average Rainfall in Toronto
Rainfall (mm)
Chapter 07
90
80
70
60
50
40
30
20
10
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month
b)
Jan Feb Mar Apr May Jun Jul AugSep Oct NovDec
Month
Add the average rainfalls for July and August.
About 154 mm of rain falls in July and August.
✓ Check
1. Here is a record of the types of books read by students in English classes during 1 year.
Draw a bar graph and a pictograph to represent the data.
Genre
Mystery
Comic
Poetry
Romance
Biography
Books read
22
43
12
54
34
2. Do you prefer to use a bar graph or a pictograph to represent the data in question 1?
Explain when you would use a bar graph and when you would use a pictograph.
Activate Prior Knowledge
291
Chapter 07
7/6/07
9:37 AM
Page 292
Home
Quit
Organizing Data into Intervals
Prior Knowledge for 7.1
During a survey, you may collect data that spreads over a wide range.
To display and analyse the data, you need to group the data into appropriate intervals.
To group the data, use a suitable range to divide the data into a reasonable number of
intervals. Then determine the data that should go into each interval.
Example
The heights of 15 players of the 2006 Toronto Raptors basketball team are listed.
7⬘ 0⬙
6⬘ 10⬙
6⬘ 3⬙
6⬘ 0⬙
6⬘ 9⬙
6⬘ 7⬙
6⬘ 9⬙
5⬘ 11⬙
7⬘ 0⬙
6⬘ 6⬙
6⬘ 7⬙
6⬘ 10⬙
6⬘ 10⬙
6⬘ 5⬙
6⬘ 2⬙
Determine the number of players in each interval:
Under 6⬘, 6⬘ – 6⬘ 5⬙, 6⬘ 6⬙ – 6⬘ 11⬙, 7⬘ and over
Solution
Create a tally chart to help count the number of players in each interval.
Height
Tally
Under 6⬘
|
6⬘ – 6⬘ 5⬙
||||
6⬘ 6⬙ – 6⬘ 11⬙
|||| |||
7⬘ and over
||
1
4
8
2
Number of players
✓ Check
1. In 2005, the populations of the world’s 30 largest cities are:
35 327 000 19 013 000 18 498 000 18 336 000 18 333 000 15 334 000 14 299 000 13 349 000
13 194 000 12 665 000 12 560 000 12 146 000 11 819 000 11 469 000 11 286 000 11 146 000
11 135 000 10 849 000 10 677 000 10 672 000
8 711 000
8 180 000
7 615 000
7 594 000
9 854 000
9 760 000
7 352 000
7 182 000
9 592 000
9 346 000
Determine the number of cities in each interval:
Under 10 million, 10 million – 14 999 999, 15 million – 19 999 999, 20 million and over
2. These are the heights of 28 players of the 2006 Toronto Maple Leafs hockey team.
6⬘ 6⬙ 5⬘ 11⬙ 6⬘ 2⬙ 6⬘ 5⬙ 6⬘ 1⬙ 6⬘ 1⬙ 6⬘ 2⬙ 6⬘ 7⬙ 6⬘ 1⬙
6⬘ 4⬙
6⬘ 4⬙
6⬘ 2⬙ 5⬘ 10⬙ 6⬘ 1⬙
6⬘ 0⬙ 5⬘ 11⬙ 6⬘ 0⬙ 6⬘ 4⬙ 6⬘ 0⬙ 6⬘ 1⬙ 6⬘ 1⬙ 6⬘ 0⬙ 6⬘ 5⬙ 5⬘ 10⬙ 5⬘ 10⬙ 6⬘ 0⬙ 5⬘ 10⬙ 6⬘ 5⬙
Choose the intervals and organize the heights.
b) Explain how you chose the intervals.
c) Suppose you were organizing the data for the heights of students in your class.
What intervals would you use? Explain your choice.
a)
292
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:38 AM
Page 293
Home
Quit
Math in the Media: Be Informed!
Information in newspapers, magazines, and on Internet
sites often involves numbers. But not everything you read
is unbiased, or even true.
SOME COMMON MISTAKES AND MISLEADING PRACTICES
Misuse of
language
• The words average or typical are sometimes used without identifying
whether the number used is the mean, median, or mode.
• Survey questions can be worded so results favour one opinion.
Distorted
visuals
• In some pictographs or 3-D graphs, the sizes of parts of the graph can
make differences between numbers appear greater or less than they are.
• When axes do not start at 0, it is easy to conclude that differences
between numbers are greater than they are.
Try this after you
Questionable
sources
Section 7.6.
• Do the data come from a random, unbiased sample?
• When you see the word expert, ask yourself what makes
the person an expert. Is he or she an expert in the appropriate field?
• Are the presented data facts or are they opinions?
Just because many people believe something does not make it true.
have completed
Look at these data. How might they be misleading?
Purchasing power
of the Canadian
dollar, 1980 to 2000
Value of Average
Canadian $151,850
Literacy in Math: Math in the Media: Be Informed!
293
Chapter 07
7/6/07
9:38 AM
Page 294
Home
7.1
Quit
Organizing and Representing Data
Many people find a visual display of data easier to interpret than a set of numbers.
Looking at the shape of a bar graph or the size of the sectors in a circle graph
is a way to begin analysing the data.
Investigate
In a histogram, each
bar represents the data
within an interval.
There are no gaps
between bars.
The interval
“9 up to 10” includes
9 and all numbers
greater than 9 but less
than 10.
294
Exploring Shapes of Histograms
Work with a partner or in a group. You will need grid paper.
Four sets of data are given. Each data set is organized in intervals.
Draw a histogram for one data set.
Make sure each data set is graphed by at least one group.
Data Set 1: Fuel consumption of mid-size cars, 2006
Fuel consumption rating
(L/100 km)
Number of models
available
9 up to 10
9
10 up to 11
11
11 up to 12
35
12 up to 13
23
13 up to 14
13
14 up to 15
7
Data Set 2: Heights of high school students
CHAPTER 7: One-Variable Data
Height (cm)
Number of students
150 up to 155
3
155 up to 160
7
160 up to 165
17
165 up to 170
10
170 up to 175
10
175 up to 180
16
180 up to 185
10
185 up to 190
6
Chapter 07
7/6/07
9:38 AM
Page 295
Home
Quit
Data Set 3: Ages of cars in a parking lot
Age (years)
Number of cars
0 up to 3
20
3 up to 6
24
6 up to 8
22
9 up to 11
18
12 up to 14
20
Data Set 4: Population of Canada by age, 1994
Age (years)
Population
(thousands)
0 up to 19
7930
20 up to 39
9590
40 up to 59
7040
60 up to 79
3920
80 and over
650
➢ Describe the shape of the histogram you graphed.
➢ Which statement best describes the data you graphed?
Explain your choice.
• There are two distinct peaks in the data.
• Most of the data are clustered in the middle of the graph.
• The data are evenly distributed across the intervals.
• Most of the data are in the upper intervals.
• Most of the data are in the lower intervals.
What does the statement you chose tell you about the data?
Reflect
Share your graph and analysis with a group that used a different data
set. Compare your results. Then discuss this question.
➢ What can the shape of a histogram tell you about the data set
that it represents?
➢ How could you predict the shape of the histogram
by looking at the data set?
➢ Repeat this with another group that used another data set.
7.1 Organizing and Representing Data
295
Chapter 07
7/6/07
9:38 AM
Page 296
Home
Quit
Connect the Ideas
Types of data
One-variable data describe one piece of information about a
person, place, or thing.
Each piece of one-variable data is one number or word.
Data that involve numbers are called numeric data.
Numeric data may be discrete or continuous.
Discrete data consist of values from a countable set of possibilities.
Examples of discrete data are the number of siblings a person has,
the year a person was born, or the number of courses a person
is taking at school.
Continuous data consist of values from a range.
A person’s height, the length of time a competitor takes to run a race, or
the distance a person commutes to work are examples of continuous data.
Data that are grouped by categories are called categorical data.
Examples include the colours of cars in a parking lot, yes or no responses
on a questionnaire, or favourite types of music.
Data sets
Sometimes a data set consists of a list of numbers or words.
Favourite Colours of Students in My Class
blue
red
red
black
blue
red
purple
green
green
blue
blue
black
black
red
black
black
blue
yellow
green
blue
purple
Favourite Colours of Students in My Class
Other times, the data are
in a table, with tally marks or
Colour
Tally
Frequency
numbers showing how many
black
||||
5
pieces of data are in each
blue
|||| |
6
interval or category.
green
|||
3
These are called frequency
red
||||
4
tables.
purple
||
2
The number of pieces of data
in each interval or category
is called the frequency.
296
CHAPTER 7: One-Variable Data
yellow
|
1
7/6/07
9:38 AM
Page 297
Home
Types of graphs
Quit
The type of graph you draw depends on the type of data being represented.
Circle graphs and pictographs can represent categorical data
or discrete data. Circle graphs show the parts that make up a whole.
Favourite Colours of People in My Class
represents 5 students
Purple 10%
Number of Students at Pinewood High
with a Driver’s Licence
Yellow 5%
Grade 9
Blue 28%
Grade 10
Green 14%
Grade 11
Black 24%
Red 19%
Grade 12
Bar graphs can represent categorical or discrete data.
A histogram is a type of bar graph that shows numeric data
grouped in intervals.
There are no gaps between the maximum value of one interval
and the minimum value of the next.
There is no overlap between the intervals.
So, the bars have no spaces between them.
Household Incomes in Toronto
Favourite Colours of
People in My Class
Number of households
(thousands)
Chapter 07
Yellow
Purple
Red
Green
Blue
250
200
150
100
50
0
80 100 Over
100
Annual income (thousands of $)
Black
0
1
2
3
4
5
Number of people
20
40
60
6
7.1 Organizing and Representing Data
297
Page 298
Home
00
30
5:
00
5:
30
6:
00
M
or
e
4:
4:
00
30
3:
30
300
250
200
150
100
50
0
Time (h:min)
Vacant Apartments
14
12
10
8
6
4
2
0
0
80
0
90
0
10
00
11
00
12
00
Most of the data are at the high end
of the range. The graph tails off to the
left. If you choose one piece of data
at random, you are more likely to
get a high number than a low
number.
2006 Toronto Marathon Results
0
This is named
for the tail on
the left.
Most of the data are clustered in the
middle. The graph tails off the
farther you are away from the
middle. If you choose one piece of
data at random, you are likely to get
a number near the middle.
Time of day
70
Skewed Left
70
60
50
40
30
20
10
0
3:
Normal or BellShaped Distribution
Time of Birth for All Babies Born in One Day in Canada
60
Choosing data at random
means choosing so that
every member of the
data set has the same
chance of being chosen.
The data are approximately
evenly distributed across
the range. If you choose
one piece of data at random,
you are just as likely to get
a low number as a high
number.
Frequency
Uniform
Distribution
The shape of a histogram shows the distribution of the data.
Because some shapes are common, they are given special names.
2:
Distributions
Quit
Number of apartments
9:38 AM
Frequency
7/6/07
M
id
n
2: igh
00 t
4: a.m
00 .
6: a.m
00 .
8: a.m
00 .
10 a.m
:0
0 .
a.
m
N .
2: oon
00
4: p.m
00 .
6: p.m
00 .
8: p.m
00 .
10 p.m
:0
0 .
M p.m
id .
ni
gh
t
Monthly rent ($)
This is named
for the tail on
the right.
Most of the data are at the low end
of the range. The graph tails off to
the right. If you choose one piece
of data at random, you are more
likely to get a low number than a
high number.
Household Size in Canada, 2000
Number of households
(millions)
Skewed Right
4
3.5
3
2.5
2
1.5
1
0.5
5 6 or
more
Number of people in household
Bimodal
298
The histogram has two distinct
peaks. If you choose one piece of
data at random, you are likely to
get a number from one of the
peaks.
CHAPTER 7: One-Variable Data
0
1
2
3
4
Time Between Eruptions of Old Faithful
Geyser During 14-Day Period
Frequency
Chapter 07
40
35
30
25
20
15
10
5
0
40 50 60 70 80 90 100 110 120
Time (min)
7/6/07
9:38 AM
Page 299
Home
Quit
Practice
1. Is each piece of data numeric or categorical?
For those that are numeric, identify each as continuous or discrete.
a) A person’s eye colour
b) A person’s age
c) A person’s birth month
d) A person’s mass
e) A person’s height
f) A person’s favourite pet
g) Whether a person agrees with the statement
“Hockey is more fun than baseball.”
2. Copy this graphic organizer.
Write the words categorical, continuous,
data, discrete, and numeric
in the appropriate boxes to show the relationship
between the types of data.
3. Describe the shape of each histogram.
Then name the distribution that each graph matches the most.
Toronto Blue Jays Batting Averages, 2006
Number of players
a)
8
7
6
5
4
3
2
1
0
b)
50 100 150 200 250 300 350 400
Batting average
Time to Get to School
Number of students
Chapter 07
20
10
0
5
10
15 20 25
Time (min)
30
35
7.1 Organizing and Representing Data
299
Chapter 07
7/6/07
9:38 AM
Page 300
Home
Quit
Circle graphs are often used to display categorical data.
If the data are provided in a list, you need to make a frequency table before graphing.
Example
These data show 24 high school students’ answers to the question:
“What are your plans after graduation?”
Display the data in a circle graph.
You need a compass
and a protractor to
draw a circle graph.
Solution
college
work
college
college
college
work
university
college
university
college
work
work
college
work
college
university
college
college
university
work
college
college
college
college
First, make a frequency table for the data.
Destination
The sectors in a circle
graph proportionally
represent parts of a
whole.
When the data are
provided as fractions
or percents, begin by
determining each
sector angle.
Number of students
Tally
Frequency
14
College
|||| |||| ||||
University
||||
4
Work
|||||
6
Determine the fraction of students that selected each destination.
Multiply each fraction by 360⬚ to determine the sector angle.
14
14 out of 24 students responded “college”; 24 ⫻ 360⬚ ⫽ 210⬚
So, the sector representing college will have an angle of 210⬚.
4
4 out of 24 students responded “university”; 24 ⫻ 360⬚ ⫽ 60⬚
So, the sector representing university will have an angle of 60⬚.
The rest of the circle will represent the response “work.”
You can check the angle by calculating.
6
6 out of 24 students responded “work”; 24 ⫻ 360⬚ ⫽ 90⬚
So, the sector representing work will have an angle of 90⬚.
Draw the circle graph.
Label the sectors, and colour each
sector a different colour.
Planned Destinations of
24 High School Students
Work
25%
University
17%
300
CHAPTER 7: One-Variable Data
College
58%
Chapter 07
7/6/07
9:38 AM
Page 301
Home
Quit
4. A survey of how high school students get to school had these results:
car 37.5%, public transit 37.5%, walk 17.5%, cycle 7.5%
a) Draw a circle graph to illustrate the data.
b) Write a question someone could answer using the graph.
S, M, L, and XL
represent small,
medium, large, and
extra large.
5. These data show the sizes of T-shirts sold by a band at a concert.
Make a frequency table for the data.
b) Draw a circle graph to illustrate the data.
c) How could the graph help the band when it orders T-shirts for the next tour?
a)
S
XL
XL
XL
L
L
M
XL
XL
L
S
L
XL
XL
XL
M
L
L
S
XL
L
XL
L
XL
L
XL
S
XL
XL
6. The table shows the prices of houses for sale
in Thunder Bay in January 2007.
a) Are house prices discrete or continuous data?
Explain.
b) Predict the shape of a histogram for these data.
Explain your prediction.
c) Draw a histogram for these data.
How does the shape compare to your prediction
in part b?
M
House price ($)
Number for sale
50 000 up to 100 000
5
100 000 up to 150 000
9
150 000 up to 200 000
12
200 000 up to 250 000
17
250 000 up to 300 000
10
300 000 up to 350 000
4
350 000 up to 400 000
2
400 000 up to 450 000
1
7. These are the birth months of a group of people.
Jul
Oct
Sep
Sep
Oct
Jul
Aug
Jun
Mar
Mar
Jul
Aug
Apr
May
Feb
Dec
Jul
Dec
Sep
Sep
May
May
Jan
Jun
Jul
Apr
Aug
May
Mar
Oct
Are the data discrete, continuous, or categorical?
How do you know?
b) Make a frequency table for the data.
c) Draw a circle graph, bar graph, pictograph, or histogram.
Explain how you decided which type of graph to draw.
d) Write a question someone could answer using the graph.
a)
7.1 Organizing and Representing Data
301
Chapter 07
7/6/07
9:38 AM
Page 302
Home
Quit
8. Assessment Focus These are daily high temperatures in Waterloo for May
in a recent year. Each temperature is recorded in degrees Celsius (⬚C).
19.0
19.8
23.3
21.1
15.2
9.9
17.2
21.7
21.2
23.9
16.3
13.6
17.1
16.1
15.3
14.8
19.0
11.9
12.8
15.0
10.7
7.8
16.1
22.3
23.5
18.9
24.7
27.6
32.7
31.4
27.9
a) Are temperatures discrete or continuous data? Explain.
b) What are the greatest and least temperatures in the data set?
c) Make a frequency table for the data. Use at least 6 intervals.
Explain how you chose the intervals.
d) Graph the data. Explain how you decided which type of graph to draw.
Describe the shape of the graph.
e) Suppose you could graph similar data for July or November.
How do you think each graph might compare to this one?
Explain your thinking.
9. Would you use a circle graph, bar graph, histogram, or pictograph
to display data about each topic? Explain your choice.
a) The prices of houses for sale in your town or neighbourhood
b) The most popular car colours in the world
c) The per capita carbon dioxide emissions in different countries
d) The hours of television watched each week by Canadians of different ages
e) The amount of Ontario’s electricity generated from nuclear, coal, hydro,
natural gas, and renewable sources
10. Take it Further Suppose you collected data about each topic
and drew a histogram. Which distribution would you expect the data to have?
Explain your thinking.
a) A class set of marks out of 100 on a test
b) The foot lengths of a group of females
c) The foot lengths of a mixed group of males and females
d) The numbers of people in the families of students in your class
Explain how the type of graph you draw depends on the type of data you are representing.
Include examples in your explanation.
302
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:38 AM
Page 303
Home
7.2
Quit
Organizing and Representing Data
Using Technology
Statisticians deal with large amounts of data every day.
They use technology to organize and display these data.
A common tool for organizing data is a spreadsheet.
Spreadsheets allow you to graph the same data set
in many different ways.
Inquire
Organizing and Representing Data Using a Spreadsheet
You will need Microsoft Excel.
Work with a partner.
A school council is organizing extra-curricular clubs.
To find out about students’ interests, the council conducted a survey
about the number of hours each week the students spend doing certain activities.
• Using the Internet or playing computer games
• Watching TV
• Reading
In Excel and many
other spreadsheet
• Volunteering in the community
programs, a graph is
• Playing sports
called a chart.
• Spending time with friends
• Other
The council randomly selected 26 students for the survey.
Open the file Freetime.xls.
The spreadsheet contains
the results of the survey.
Each row contains the responses
of one student.
The columns contain
different types
of one-variable data.
Below the survey data, there are
four summary tables,
which you will graph in Excel.
7.2 Organizing and Representing Data Using Technology
303
Chapter 07
7/6/07
9:38 AM
Page 304
Home
Quit
1. The summary table with title Frequency of favourite activities
shows the number of students who spent the greatest amount
of their free time on each of the activities in the survey.
a) Are favourite activities categorical, discrete, or continuous data?
Justify your answer.
b) Hold down the mouse button
while you drag over the data
in the table, except for the Total row.
This selects the data.
From the Insert menu, select Chart.
Click Finish to insert the graph.
What kind of graph appears
on the screen?
What is the most common
favourite activity?
How does the graph show this?
How does the table show it?
c) Right-click on the blank area
around the graph.
From the menu that appears, select Chart Type.
In the Chart Type box, select Pie.
Click OK.
What kind of graph is shown now?
With the chart selected, select Chart Options from the Chart menu.
Click on the Data Labels tab, then check off Value and Percentage.
Click OK.
What does this graph show more clearly than the previous graph?
d) Right-click on the blank area around the graph.
From the menu that appears, select Chart Type.
What other types of graphs would be appropriate for these data?
Choose one of these types and graph the data.
Does the new graph provide any new information?
Explain your thinking.
e) Which graph do you think best represented the data?
Explain your choice.
304
CHAPTER 7: One-Variable Data
You may need to drag the
graph so that it does not
cover the table.
Chapter 07
7/6/07
9:39 AM
Page 305
Home
Quit
2. The summary table with title
Distribution of free hours shows the number of students
with different amounts of free time each week.
a) Are the amounts of free time
categorical, discrete, or continuous data?
Justify your answer.
b) Select the data in the table, except for the Total row.
From the Insert menu, select Chart.
Click Finish to insert the graph.
What kind of graph appears?
Describe the distribution of the data.
Why might this happen?
c) Right-click on the blank area around the graph.
From the menu that appears, select Chart Type.
What other types of graphs would be appropriate for these data?
Explain your thinking.
Choose one of these types and graph the data.
Does the new graph provide any new information? Explain.
d) Which graph do you think better represents the data?
How is it better?
7.2 Organizing and Representing Data Using Technology
305
Chapter 07
7/6/07
9:39 AM
Page 306
Home
Quit
3. The summary table with title
Distribution of number of activities
shows the number of students who spend time
doing different numbers of the activities
in the survey.
a) Are the numbers of activities categorical,
discrete, or continuous data?
Justify your answer.
b) Choose a graph that you think will represent
the data best. Explain your choice.
Graph the data.
Do most of the students spend time on many
different activities or only a few?
Explain how you know.
4. The summary table with title
Total time spent on each activity
shows the total amount of time spent
by all students doing each activity.
a) Are the total amounts of time categorical,
discrete, or continuous data?
Justify your answer.
b) Choose a graph that you think will represent
the data best. Explain your choice.
Graph the data.
What is the most popular activity?
How does the graph show this?
c) Graph the data from question 1
about the frequency
of favourite activities the same way.
Do the two graphs give the same information
about the popularity of different activities?
Explain your answer.
5. What clubs do you think the school council
should organize?
Explain how to use your graphs to convince
the council that you are correct.
306
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:40 AM
Page 307
Home
Quit
Practice
1. The Canadian Union of Farmers warns of a crisis in rural Canada
due to the decreasing numbers of family farms.
A family farm is a farm that is owned and operated by one family.
Its operating costs are generally less than those of a large farm
run as an agribusiness or a collective.
This table shows the number of farms in Ontario
with different operating costs in 1996 and 2001.
Year
Under
$50 000
$50 000 – $100 000 –
$99 999
$199 999
1996
1329
2427
2001
402
1164
$200 000 –
$349 999
$350 000 –
$499 999
$500 000 – $1 000 000 – $1 500 000
$999 999
$1 499 999
and over
11 151
17 962
10 770
14 857
4530
4494
6794
13 791
9453
15 060
5698
7366
Are operating costs categorical, discrete, or continuous data?
Enter the data into a spreadsheet or load the spreadsheet file Farms.xls.
b) Use a spreadsheet to make a bar graph for the data for each year.
c) Compare the bar graphs for the two years. How are the shapes different?
What does your answer tell you about how farms in Ontario changed
from 1996 to 2001? Do the graphs support the argument that the number
of family farms is decreasing? Justify your answer.
d) Change the two bar graphs to another type of graph and compare them.
How does the information this graph shows compare with your bar graph?
Explain the reason for your choice of the type of graph.
a)
7.2 Organizing and Representing Data Using Technology
307
Chapter 07
7/6/07
9:40 AM
Page 308
Home
Quit
2. In 2006, members of the Students’ Assembly
on Political Reform met to discuss changing the
way members of provincial parliament (MPPs)
are elected.
The first table shows the numbers of winning
candidates in the 2003 election who received
different percents of the votes in their ridings.
For example, 3 MPPs won with between 35%
and 40% of the votes in their ridings.
The second table shows the number of MPPs
elected from each political party in 2003.
a) Are the percent of votes categorical, discrete,
or continuous data?
b) What kind of data are the political parties?
Enter the data into a spreadsheet or load the
spreadsheet file Election.xls.
c) Use a spreadsheet to graph the percent votes
data. Explain your choice of graph.
What does the shape of the graph tell you
about the data?
d) About what percent of the MPPs won with
less than 50% of the votes?
Explain your strategy for answering this question.
e) What do you know about the votes received by the other parties
in the ridings represented by MPPs who received less than 50% of the votes?
f) Use a spreadsheet to graph the party totals data.
Explain your choice of graph.
About what fraction of the MPPs elected came from each of the three parties?
g) In 2007, the Students’ Assembly on Political Reform recommended that each party’s
share of the vote determine the party’s share of the seats in parliament.
Do you agree with this idea?
Use these data and graphs to support your answer.
Reflect
➢ Is it better to use technology to graph some data sets?
Explain using examples from this section.
➢ How can a graph help you interpret data?
Explain using examples from this section.
308
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:40 AM
Page 309
Home
7.3
Quit
Sampling Techniques
One way to determine the population of fish or reptiles in a habitat
would be to count all of them.
This is usually not possible and could harm some species.
So, biologists use a mark-and-recapture technique.
The biologist catches some animals, marks them in a non-destructive
way, then releases them. Later, another sample is caught.
The ratio of marked animals to all the animals in the second sample should be
approximately the same as the ratio in the total population.
Investigate
Estimating Using Mark-and-Recapture
Work with a partner or group to simulate a mark-and-recapture
experiment.
You will be given a bag containing between 30 and 50 slips of paper.
➢ Without looking, reach into the bag and take some slips of paper.
Mark an X on each one.
Record the number of slips drawn, then return them to the bag.
➢ Shake the bag, then take a handful of slips from it.
Count the number of marked slips and the total number of slips
in the handful.
➢ Estimate the total number of slips of paper in the bag.
Explain how you made your estimate.
➢ Return the slips. Draw and estimate a few more times.
Reflect
➢ Use your estimates from each draw.
What is your best prediction for the number of slips in the bag?
Explain your thinking.
➢ Take out all the slips from the bag and count them.
How accurate was your prediction?
Explain your thinking.
7.3 Sampling Techniques
309
Chapter 07
7/6/07
9:40 AM
Page 310
Home
Quit
Connect the Ideas
Population and
sample
The population of a city is all the people who live in it.
Similarly, the population of any set is all the objects in the set.
The members of the population are called individuals.
Collecting data about every individual in a population
is called a census.
Conducting a census can be costly and time consuming.
It may even be physically impossible.
In product testing, items may be damaged by the testing,
so a census is impractical.
Usually, data are collected for a smaller set of individuals selected
from the population. This is called a sample.
If a sample is not typical of the population it represents,
it is called biased.
A good sample should be of a suitable size compared to the population
size and as unbiased as possible.
Random samples
In all forms of random sampling, every individual in the population
has the same likelihood of being chosen.
There are different random sampling techniques.
Simple random sampling
The sampling you did in Investigate was simple random sampling.
Individuals are randomly chosen from the entire population.
Stratified sampling
Data sets are grouped before sampling.
Data can be grouped based on a characteristic such as income or
location.
A few individuals from each group are then chosen at random.
Data are collected for these individuals.
Cluster sampling
The population is grouped so that each group is representative
of the whole population.
Groups are chosen at random, and data are collected for every
individual in the selected groups.
310
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:40 AM
Page 311
Home
Quit
+10
+10 +10
Systematic sampling
Every nth individual from a
1 2 3 4 5 6 7 8 9 10 11 12 13 14 … 24 … 34 …
population is chosen.
For example, to sample every 10th name on a list of names, you could
randomly begin with the 4th name, then continue with the 14th, 24th,
and so on.
Other types of
samples
There are other, non-random sampling techniques.
For each of these techniques, every individual in the population
does not have the same likelihood of being chosen.
Convenience sampling
Only individuals who are easy to sample are chosen.
During elections, researchers often ask people leaving polling stations,
“Who did you vote for?” These exit polls are a convenience sample.
Judgement sampling
The person doing the sampling chooses a sample based
on her or his knowledge of the population.
The sample chosen may not be representative of the population
if the person is biased in her or his selection.
Voluntary sampling
Only individuals who volunteer to participate are included in the sample.
Phone-in polls used by television and radio programs are examples
of voluntary sampling.
Suppose a downtown merchant’s association wants to determine
people’s opinions on parking availability in the city. The association
could conduct a survey using one of these sampling techniques:
• Leave questionnaires in various locations in town for people
to pick up and fill in if they wish.
• Use a random number generator to select names from the
phone book.
• Choose random pages from the phone book and sample
every household on the pages.
• Begin with the 45th name in the phone book and choose
every 100th name after that: 45, 145, 245, 345, . . .
• Randomly choose people from each neighbourhood in the city,
making the sample size in each neighbourhood proportional
to the number of people who live in the neighbourhood.
• Survey people who are shopping downtown.
7.3 Sampling Techniques
311
Chapter 07
7/6/07
9:40 AM
Page 312
Home
Quit
Practice
1. Describe two reasons people collect data from a sample rather than the population.
2. Refer to the parking survey described in Connect the Ideas.
a)
b)
c)
d)
e)
What is the population?
Use the sampling techniques described in Connect the Ideas.
Decide which technique each survey uses. Give reasons for your choices.
Which samples might be biased? Explain your thinking.
Which sample would you recommend the association uses for its survey?
Justify your choice.
Suppose the association hopes to convince the city to provide more parking spaces.
Would your answer to part d change? Explain your thinking.
3. For each situation, identify the population.
Recommend whether to collect data from a sample or the population.
Give reasons for your recommendation.
a) The capacity of a battery is the number of hours it will work
at a particular rate of current.
A battery manufacturer wants to test the batteries produced each day
to ensure that they have an appropriate capacity.
b) A student wants to determine the sport that is most popular among her classmates.
c) An environmental group wants to determine people’s opinions
about pesticide use in a city.
d) A college placement officer wants to survey drug companies in his province
to determine which companies will hire students on workterms.
e) The student council wants to determine which local band students
would pay to see at a school concert.
4. For each situation in question 3 for which you recommended
using a sample, suggest a sampling technique.
Identify the type of sample and your reasons for suggesting it.
312
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:40 AM
Page 313
Home
Quit
Selecting a sample can sometimes involve several steps.
Each step should be as unbiased as possible.
Example
A company plans to survey Ontarians’ attitudes toward sport utility vehicles
and trucks.
To ensure it gets a representative view, the company wants every person
in the province to have the same chance of being selected.
It decides to use a stratified sample.
The company selects 3 cities at random, then randomly selects 200 people from
each city. The cities selected are:
• Guelph, population 126 000
• Peterborough, population 75 000
• Windsor, population 208 000
a) What is wrong with selecting 200 people from each city?
b) What else might bias the results of this survey?
How could this be corrected?
Solution
a)
b)
The cities’ populations are different, so the same number of people should
not be selected in each city.
To determine how many people to select for each city:
• calculate the fraction of the total population for each city; then
• multiply by the total number of samples wanted, in this case 600.
City
Population
Guelph
126 000
Peterborough
75 000
Windsor
208 000
TOTAL
409 000
Fraction of total
population
126 000
409 000
75 000
409 000
208 000
409 000
Sample size
(fraction of total ⴛ 600)
or 0.308
⭈ 185
0.398 ⫻ 600 ⫽
or 0.183
⭈ 110
0.183 ⫻ 600 ⫽
or 0.509
⭈ 305
0.509 ⫻ 600 ⫽
600
The company should have selected 185 people in Guelph, 110 people in
Peterborough, and 305 people in Windsor.
Only people who live in cities were included in this sample.
According to government statistics, the population of Ontario
is about 85% urban and 15% rural.
So, there are almost 6 times as many urban residents as rural.
Since 600 urban residents are surveyed, it would reduce the bias
to randomly survey 100 rural residents as well.
7.3 Sampling Techniques
313
Chapter 07
7/6/07
9:40 AM
Page 314
Home
Quit
5. Suppose you want to sample 50 students from a high school.
For each situation below, calculate how many students should be sampled from each grade
so that the numbers in the sample are proportional to the number of students in each grade.
a) The school has 220 Grade 9 students, 180 Grade 10 students, 160 Grade 11 students,
and 190 Grade 12 students.
b) Use the enrollment numbers from your school or estimates, if the numbers are not available.
6. Assessment Focus Students in a social studies class are to write a biography of a person
selected from a list of names. The names are organized in four groups.
Alexander Graham Bell
Marie Curie
Thomas Edison
Albert Einstein
Isaac Newton
Salvador Dali
Frida Kahlo
Pablo Picasso
Henri Matisse
Andy Warhol
Margaret Atwood
Robertson Davies
Margaret Laurence
Rohinton Mistry
Alice Munro
Norman Bethune
Tommy Douglas
Nellie McClung
Louis Riel
Tecumseh
How many individuals are in the total population?
b) How many individuals are in each group? How do the groups appear to be organized?
c) Suppose the teacher decides to reduce the choices to 8 names.
Describe how he could do this using a simple random sample,
a stratified random sample, and a judgement sample.
d) Write each name on a slip of paper. Try your ideas from part c, using the slips of paper.
Record the 8 names in your sample each time.
Which technique do you think ensures the most variety for students?
Explain your thinking.
a)
7. Take it Further Explain why the groups of names as arranged in question 6
are not suitable for a cluster sample.
Explain how to reorganize the groups so a cluster sample could be used.
Explain the difference between a sample and a population.
Include examples from your school or community in which you might collect data
from a sample and from a population.
314
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:40 AM
Page 315
Home
7.4
Quit
Designing and Using a Questionnaire
Suppose you need data about the opinions and habits of people in your school or community.
You will not usually find these data by researching in published sources of data.
You may need to collect your own data.
Inquire
Collecting Data with a Questionnaire
Work with a partner or in a group.
1. Investigating what makes a good question
➢ Questions can often be asked in an open way or with choices
that help organize the responses.
Which type of response in these examples do you think will be easier
to organize and use?
Explain your thinking.
Open
With choices
How much did you spend on entertainment
last week? _________
How much did you spend on entertainment last
week?
Less than $16 ____ $16–$30 ____ over $30 _____
What is your favourite subject? __________
Which is your favourite subject? (choose one)
Math ____ Science _____ English ______
Tech ____ Other _______ Don’t have one ____
Should school cafeterias be banned from
selling certain foods? If so, which foods?
______________________________
______________________________
Foods such as pop, chips, and French fries should
be banned in school cafeterias.
Strongly disagree ______
Disagree _____
Agree ____ Strongly agree ____ No opinion _____
➢ Try to avoid questions that may bias the results of a survey or provide vague responses.
Here are some ways in which questions can be flawed.
• Lead people to respond in a particular way because of the wording.
• Lead people to respond in a particular way by not providing
enough information or alternatives.
• Ask questions that are too general.
• Ask several things at once without allowing for this in the answers.
7.4 Designing and Using a Questionnaire
315
Chapter 07
7/6/07
9:40 AM
Page 316
Home
Quit
➢ Identify the better question in each pair.
Explain what is wrong with the question you do not choose.
Police officers, who perform vital services in
our community, should receive a pay raise.
Agree _____ Disagree _____
Which statement best describes your opinion about
police salaries?
Police salaries are lower than they should be.
Police salaries are higher than they should be.
Police salaries are appropriate.
Don’t know, no opinion
Have you purchased dog food in the last
3 months?
Yes
No
If yes, what type was it?
Canned ______ Dry ______ Both ________
Have you ever purchased canned or dried dog food?
Yes
No
Unsure
Do you think the minimum wage in Ontario
should be raised to $10/h?
Yes
No
Don’t know, no opinion
Do you support the efforts of Ontario anti-poverty
groups to raise the minimum wage to $10/h?
Yes
No
Don’t know, no opinion
2. Designing the questionnaire
➢ Choose a topic. If possible, choose a topic where the information you collect
might help to achieve some goal or to make a decision
• Suppose you were interested in starting an environmental club.
You could ask questions that would help you decide
whether people would join the club and
what days and times would be best for meetings.
• Or, suppose the student council is planning a fund-raising dance.
You could ask questions that would help determine the types of music
people would like to hear and how much they would pay for a ticket.
➢ Write the questions. Keep these guidelines in mind:
• Make the questions short and easy to understand.
• Respect the respondents’ privacy: do not ask questions that are too personal.
• Avoid questions that may offend or provoke an emotional response.
If people refuse to participate because they are offended,
it may bias your results.
• Avoid biased questions.
• Include a few questions that collect demographic data
such as age, gender, and grade.
➢ Organize your questions in a logical way.
316
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:40 AM
Page 317
Home
Quit
➢ Write a brief introduction that explains the survey and how you will use the results.
Your introduction should grab the attention of potential respondents
and encourage their participation.
Make it clear that the responses will be anonymous.
➢ Test your questionnaire on people in another group to check
that the questions are clear. Change any questions
that were misunderstood or caused confusion.
3. Collecting the data
➢ Use the questionnaire you have written.
➢ What is the population you hope to survey?
Use your knowledge of sampling techniques to plan
how to select people for your survey.
Write a brief explanation of your strategy.
➢ Decide how you will conduct the survey:
• Will you meet with each respondent, ask
the questions orally, and record the answers yourself?
• Will you hand out the questionnaires for
respondents to complete independently?
What are some benefits and drawbacks of each approach?
➢ Carry out your survey.
4. Displaying and analysing the data
➢ Decide which types of graphs are most appropriate
to represent the data you have collected.
Create a visual display to represent the data, either with paper and pencil,
or with a spreadsheet.
➢ Make your decision on the idea that inspired the survey.
If you need more data, explain what you have learned so far
and what steps you could follow to obtain more data.
Reflect
➢ Were you able to get data for an appropriate sample?
If not, how did this affect your results?
➢ Were the results what you expected or hoped for? Explain.
➢ How could you improve your questionnaire?
7.4 Designing and Using a Questionnaire
317
Chapter 07
7/6/07
9:40 AM
Page 318
Home
Quit
Mid-Chapter Review
7.1 1. Is each type of data numeric or
7.3
categorical? Identify data that are numeric
as continuous or discrete.
a) Age
b) Zodiac sign
c) Marital status
d) Height
e) Income
f) Gender
5. For each situation, identify the
population. Recommend whether to
collect data from a sample or the entire
population. Explain your thinking.
a) A radio station wants to determine
what type of music it should play to
attract 18- to 24-year-old listeners.
b) A company wants to test the quality of
the fuses it makes.
c) A teacher wants to determine which of
two field trips her class prefers.
2. Ioana is shopping for a used car.
List two categorical choices she might
consider.
b) List two numeric choices she might
consider.
a)
6. Thom is conducting a survey about
3. a) Make a frequency table for this set of
school library use. What sampling
technique is he using in each case?
a) He asks the first 30 people entering
the cafeteria at lunchtime each day for
a week.
b) He leaves questionnaires in the library
and cafeteria.
c) He chooses one class in each grade
and asks students in those classes.
d) He randomly selects 10% of the
people in each grade to ask.
data. Explain how you chose the
intervals.
Scores in a golf tournament
281 272 269 278 273 277 282
283 292 269 277 278 280 275
284 288 274 295 296 283 300
289 296 295 294 301 306 299
b)
Draw a histogram to display the data.
Describe the distribution.
7.1 4. These data show how students answered
7. In question 6, which sample would you
the question “What is your favourite
season?”
7.2
Summer 310
Winter 125
Spring 28
Fall 12
Graph the data. Explain how you
chose which type of graph to draw.
b) Write a question you could answer
using your graph.
a)
318
CHAPTER 7: One-Variable Data
recommend Thom use?
Explain your thinking.
7.4
8. Write a survey question about each topic.
Include several intervals or category
choices as answers for each question.
a) Favourite sport
b) Hours spent participating in sports
each week
c) Hours spent doing volunteer work
each week
d) A topic of your choice
Chapter 07
7/6/07
9:40 AM
Page 319
Home
7.5
Quit
Measures of Central Tendency
and Spread
Quality control technicians use measures of central tendency and
measures of spread to analyse and compare data and to make predictions.
Investigate
The mode of a set of
data is the number
that occurs most often.
Determining Mode, Mean, and Median
Work with a partner.
You will need a scientific calculator.
The annual salaries of employees at two small companies are shown.
Company A salaries ($)
The mean of a set of
data is the number you
get if you divide the
total evenly amongst
the set of numbers.
Company B salaries ($)
20 000
25 000
30 000
40 000
25 000
25 000
40 000
40 000
25 000
35 000
50 000
50 000
50 000
85 000
60 000
80 000
130 000
The median of a set of
data is a number such
that, when the data
are arranged in order,
half the data is above
the number and half is
below.
➢ What is the difference between the highest and lowest salaries
at each company?
➢ What is the mode salary at each company?
➢ What is the median salary at each company?
How did you determine each median?
➢ What is the mean salary at each company?
How did you determine each mean?
➢ You are asked to describe a typical salary at each company.
Would you use the mode, median, mean, or some other value?
Explain.
Reflect
➢ Compare your strategy for determining the median salary
at Company A with your strategy for Company B.
How are the strategies the same? How are they different? Why?
➢ Suppose you are offered an entry-level job at each company.
The job duties and benefits are similar. The salaries fit these data.
Which job would you take? Give reasons for your choice.
7.5 Measures of Central Tendency and Spread
319
Chapter 07
7/6/07
9:40 AM
Page 320
Home
Quit
Connect the Ideas
Measures of
central tendency
The mode, mean, and median are measures of central tendency
for a data set.
A measure of central tendency is sometimes called an average.
Mode
•
The mode is the number that occurs most often.
There may be no mode or there may be more than one mode.
At a recent concert, J.J. sold these sizes of
band T-shirts:
21 small, 16 medium, 50 large, and 14 extra-large
JJ sold more large T-shirts than any other size.
The mode is usually
the best measure
when the data
represent measures
such as shoe sizes or
other clothing sizes.
The mode size sold that night was large.
•
Mean
To determine the mean, add the numbers, then divide the sum
by the number of numbers.
Lila recently bought 5 CDs that cost:
$14.95, $9.99, $9.99, $13.95, and $12.95
The total cost was:
$14.95 ⫹ $9.99 ⫹ $9.99 ⫹ $13.95 ⫹ $12.95 ⫽ $61.83
⭈ $12.37
$61.83 ⫼ 5 ⫽
The mean is usually
the best measure
when no data in the
set are significantly
different from the
other numbers.
The mean price Lila paid was about $12.37.
Median
The median is usually
the best measure
when data in the set
are significantly
different.
•
To determine the median, arrange the numbers in order.
The median is the middle number.
For an even number of numbers, the median is the
mean of the two middle numbers.
Walid is a real estate agent.
Arranged from least to greatest,
the prices of the last 6 houses he sold were:
$185 500, $194 900, $219 900, $245 000,
$259 900, and $749 500
The 2 middle prices are: $219 900 and $245 000
The mean of these prices is: $219 900 +2 $245 000 ⫽ $232 450
The median price of these houses was $232 450.
320
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:41 AM
Page 321
Home
Measures of
spread
Quit
The measures of spread are the range and the standard deviation.
The range is the difference between the greatest number
and the least number in a data set.
The standard deviation tells how widely spread around the mean the
data in a set are. If the data points are all close to the mean,
then the standard deviation is close to 0.
We use measures of spread to compare 2 or more sets of data.
Petra scored 5 goals in 1 game, 2 goals in each of 4 games, 1 goal in each of
4 games, and no goals in 1 game. Her teammate Hasieba scored 3 goals in
each of 2 games, 2 goals in each of 3 games, and 1 goal in each of 5 games.
Based on these games, which player is the more consistent goal scorer?
Calculate the mean number of goals scored per game for each player.
Petra
Determine the
mean
Total number of goals
Hasieba
5 ⫹ (2 ⫻ 4) ⫹ 4 ⫽ 17 (3 ⫻ 2) ⫹ (2 ⫻ 3) ⫹ 5 ⫽ 17
Total number of games
10
Mean (total number of
goals⫼ total number of
games)
17
10
10
17
10
⫽ 1.7
⫽ 1.7
The mean is the same for each girl. To compare their consistency, we
need to calculate the range and standard deviation.
Petra
Determine the
range
Greatest number of goals
5
3
Least number of goals
0
1
5⫺0⫽5
3⫺1⫽2
Range
(greatest number ⫺ least number)
Calculate
the standard
deviation
Hasieba
To calculate the standard deviation:
• Calculate the mean.
• Subtract the mean from each data value.
• Square each difference.
• Add the squared numbers.
• Divide the sum by one less than the
number of data items.
• Determine the square root of the result.
When you have data
for a sample, divide by
1 less than the number
of items. When you
have data for an entire
population, divide by
the number of items.
7.5 Measures of Central Tendency and Spread
321
Chapter 07
7/6/07
9:41 AM
Page 322
Home
Quit
Organize the calculations in charts.
Each mean is 1.7.
Petra
Hasieba
Each
data
value
Data value ⫺
mean
Square of
difference
Each
data
value
Data value ⫺
mean
Square of
difference
5
5 ⫺ 1.7 ⫽ 3.3
10.89
3
3 ⫺ 1.7 ⫽ 1.3
1.69
2
2 ⫺ 1.7 ⫽ 0.3
0.09
3
3 ⫺ 1.7 ⫽ 1.3
1.69
2
2 ⫺ 1.7 ⫽ 0.3
0.09
2
2 ⫺ 1.7 ⫽ 0.3
0.09
2
2 ⫺ 1.7 ⫽ 0.3
0.09
2
2 ⫺ 1.7 ⫽ 0.3
0.09
2
2 ⫺ 1.7 ⫽ 0.3
0.09
2
2 ⫺ 1.7 ⫽ 0.3
0.09
1
1 ⫺ 1.7 ⫽ ⫺0.7
0.49
1
1 ⫺ 1.7 ⫽ ⫺0.7
0.49
1
1 ⫺ 1.7 ⫽ ⫺0.7
0.49
1
1 ⫺ 1.7 ⫽ ⫺0.7
0.49
1
1 ⫺ 1.7 ⫽ ⫺0.7
0.49
1
1 ⫺ 1.7 ⫽ ⫺0.7
0.49
1
1 ⫺ 1.7 ⫽ ⫺0.7
0.49
1
1 ⫺ 1.7 ⫽ ⫺0.7
0.49
0
0 ⫺ 1.7 ⫽ ⫺1.7
2.89
1
1 ⫺ 1.7 ⫽ ⫺0.7
0.49
Sum of squared differences:
Because these data are
sample results for part
of the hockey season,
divide by n ⫺ 1.
For Petra
10.89 ⫹ (4 ⫻ 0.09) ⫹ (4 ⫻ 0.49) ⫹ 2.89 ⫽ 16.1
Sum divided by 1 less than the number of data items:
16.1
10⫺1
=
16.1
9
16.1 ⭈
Take the square root 冑苶苶苶
⫽ 1.3375
9
So, the standard deviation for Petra’s goals is about 1.3.
For Hasieba
(2 ⫻ 1.69) ⫹ (3 ⫻ 0.09) ⫹ (5 ⫻ 0.49) ⫽ 6.1
Sum divided by 1 less than the number of data items:
6.1
10⫺1
=
6.1
9
6.1 ⭈
Take the square root 冑苶
⫽ 0.8233
苶苶
9
So, the standard deviation for Hasieba’s goals is about 0.8.
Interpret the
measures of
spread
322
Hasieba has the smaller range and lower standard deviation.
The spread of the data for Hasieba is smaller.
The data are closer to the mean.
This means Hasieba is the more consistent goal scorer.
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:41 AM
Page 323
Home
Quit
You can use a graphing calculator to determine the mean
and standard deviation of a data set.
For example, to determine the standard deviation for Petra’s goals,
follow these steps on a TI-83 or TI-84 graphing calculator.
Press y £ to begin a list.
Enter each number of goals,
using commas to separate the
numbers. Then press y ¤
ƒ y 1 Í.
’
The data is now stored in
L1 for the next step in the
calculations.
To display the statistical
calculation menu, press
… ~.
Because you are analysing
one-variable data, you will
use the first set of
calculations.
Press 1.
Press y 1 Í(
’
.
A list of statistical data
about the numbers in
L1 is displayed.
x苶 is the mean
Sx is the standard
deviation for a sample
␴x is the standard
deviation for a population
The standard deviation for
Petra’s goals is about 1.3.
Practice
You may use a graphing calculator to calculate measures.
1. Compare the mode, mean, and median of each data set.
Which measure best represents the data? Give reasons for your choice.
a) Cost of tickets: $5, $6, $2, $4, $6, $5, $5
b) Number of prizes in a package: 7, 2, 8, 4, 0, 9
c) Lengths of timber rattlesnakes: 82 cm, 90 cm, 150 cm, 112 cm, 184 cm
2. Determine the mean, range, and standard deviation for each set of data.
Explain what each measure of spread tells about the data.
a) Number of points scored in some games: 5, 2, 1, 10, 12, 8, 4, 7, 3, 14
b) Number of games won in a tournament: 7, 7, 6, 7, 5, 6
7.5 Measures of Central Tendency and Spread
323
Chapter 07
7/6/07
9:41 AM
Page 324
Home
Quit
3. The mean annual temperature in Windsor, Ontario is about 9.4⬚C.
The temperature range is about ⫺25⬚C to 35⬚C.
The mean annual temperature in Edinburgh, Scotland is about 8.3⬚C.
The temperature range is about 0⬚C to 20⬚C.
Which city would you say has the milder climate?
Justify your answer.
4. Use the data from Connect the Ideas about Hasieba’s goals.
Use a graphing calculator to determine the standard deviation.
5. Astrid recorded the prices of gas at a station
Astrid’s data
near her school.
Mode
78.3¢ 72.4¢ 76.6¢ 79.3¢
Mean
Gabe recorded the prices of gas at a station
Median
near his home.
Range
71.9¢ 76.3¢ 71.2¢ 74.6¢ 78.3¢ 76.3¢ Standard
They calculated measures of central tendency
deviation
and measures of spread.
Use the measures of central tendency
and spread to compare the cost of gas at these stations.
no mode
76.3¢
76.65¢
about 74.77¢
77.45¢
75.45¢
6.9¢
7.1¢
about 3.04¢
about 2.76¢
6. Choose two measures in Astrid and Gabe’s calculations in question 5.
Explain how you can estimate to show whether they are reasonable.
7. Assessment Focus A coach is taking members of the high school
cross-country team to OFSAA in Ottawa on October 27th.
He researched minimum daily temperatures in previous years.
Oct 25
Oct 26
Oct 27
Oct 28
2006
5⬚C
4⬚C
3⬚C
4⬚C
2004
6⬚C
8⬚C
6⬚C
4⬚C
2001
11⬚C
6⬚C
4⬚C
2⬚C
Determine the measures of central tendency for each year.
b) Determine the measures of spread for each year.
c) Which year has the least standard deviation?
How could you predict this by looking at the data?
d) Why do you think the coach would research temperatures for
more than one year?
a)
324
CHAPTER 7: One-Variable Data
Gabe’s data
7/6/07
9:41 AM
Page 325
Home
Quit
Data are sometimes provided in a frequency table or histogram.
Example
A company is testing two egg carton designs to see which could better withstand
a drop from a specified height. The results are shown in the table.
Broken eggs
0
1
Carton A
Number of cartons
2
12
22
28
25
8
3
Carton B
Number of cartons
0
5
27
36
28
3
1
2
3
4
5
6
a) Without calculating, which appears to be the better carton? Explain.
b) Draw a histogram for the number of broken eggs in each carton.
Which appears to be the better carton? Explain.
Calculate the mean and standard deviation for the number
of broken eggs for each carton. Which appears to be the better carton?
a) While Carton A had 2 results where no eggs broke, it also had 11 results where
5 or 6 eggs broke. The results for Carton B were more consistent.
So, Carton B appears to be more reliable, and thus the better carton.
c)
Test Results for Carton B
b)
Test Results for Carton A
30
25
20
15
10
5
0
0 1 2 3 4 5 6
Number of broken eggs
Use ␴x, the standard
deviation for a
population.
Number of cartons
Solution
Number of cartons
Chapter 07
40
35
30
25
20
15
10
5
0
0 1 2 3 4 5 6
Number of broken eggs
The data for Carton B appear to be more clustered around the centre
of the histogram than the data for Carton A.
This shows Carton B is more reliable, and thus the better carton.
c) Use a TI-83 or TI-84 graphing calculator. Use the technique described
in Connect the Ideas to store the whole numbers from 0 to 6 in L1.
To store the frequencies for Carton A in L2, press y £.
Enter the numbers of cartons from the first row of the table,
separated by commas. Then press y ¤ ƒ y 2 Í(
’
.
Press … ~ 1 y 1 ¢ y 2 Í
’
(to display the mean 2.98
and standard deviation 1.3113.
To store the frequencies for Carton B in L3, press y £.
Enter the numbers of cartons from the first row of the table,
separated by commas. Then press y ¤ ƒ y 3 Í(
’
.
Press … ~ 1 y 1 ¢ y 3 Í
’
(to display the mean 3
and standard deviation 0.9798.
The means are very similar, but the standard deviation for Carton B
is less than the standard deviation for Carton A.
So, Carton B is more reliable, and thus is the better carton.
7.5 Measures of Central Tendency and Spread
325
7/6/07
9:41 AM
Page 326
Home
Quit
8. During rush hour, a high-occupancy vehicle (HOV) lane is reserved for vehicles
carrying at least three people. At other times anyone can use the lane.
Without calculating, determine which set of data is likely to have
the greatest standard deviation and which the least standard deviation.
Explain your thinking.
100
b)
c)
People in Each Car in HOV Lane
3:30 p.m. to 4:00 p.m.
People in Each Car in HOV Lane
8:00 p.m. to 8:30 p.m.
Number of cars
80
60
40
20
0
1
80
80
Number of cars
a)
People in Each Car in HOV Lane
8:00 a.m. to 8:30 a.m.
Number of cars
Chapter 07
60
60
40
40
20
20
0
2
3
4
5
Number of people
1
0
2
3
4
5
Number of people
1
2
3
4
5
Number of people
9. A company has three machines that manufacture bolts.
Each bolt should have length 150 mm.
A quality control technician takes a sample of 25 bolts produced on
each machine and measures the lengths.
Bolt length (mm)
148
149
150
151
152
Machine A
Number of bolts
2
4
13
5
1
Machine B
Number of bolts
1
3
18
3
0
Machine C
Number of bolts
4
5
7
6
3
Without calculating, predict which set of data is likely to have the greatest standard
deviation and which the least standard deviation. Explain your thinking.
b) Calculate the mean and standard deviation for each set of data.
How do the results compare with your predictions in part a?
c) Which machine appears to be the most reliable producer of 150 mm long bolts?
Which appears to be the least reliable?
a)
10. Take it Further Explain how well
each mean describes a typical member
of the population it represents.
Data set
Mean
Standard deviation
Hourly salaries of employees ($)
20
8
Monthly bonuses for sales
representatives ($)
200
8
Why do you think mean, median, and mode are called measures of central tendency?
Why do you think range and standard deviation are called measures of spread?
326
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:41 AM
Page 327
Home
7.6
Quit
Analysing Data
Technology can be used to calculate measures of central tendency and measures of spread.
This makes it easier to focus on the interpretation of these measures and how
appropriate they are to describe a data set.
Inquire
Determining Measures of Central Tendency
and Spread Using a Spreadsheet
You will need Microsoft Excel.
Open the file sunriseandsunset.xls.
1. This spreadsheet shows the length
of each day in June and December
in Yellowknife, NWT in a recent year.
The daylight time is shown in hours,
minutes, and seconds.
a) How long was there daylight
on June 15?
b) How long was there daylight on the
shortest day in December?
2. Select any five empty cells in a column.
From the Format menu, choose Cells....
On the Number tab, under Category:
choose Custom.
Under Type:, choose h:mm:ss.
Then click OK.
3. a) Enter the formula for mode in a cell
you formatted: ⴝMODE(B2:B31)
For what cells did the formula
determine the mode?
b) Tell what data the mode describes.
c) What does the value in the cell for the mode tell you?
Does this make sense? Why or why not?
A formula in a spreadsheet
always starts with an equals sign.
7.6 Analysing Data
327
Chapter 07
7/6/07
9:41 AM
Page 328
Home
Quit
4. a) Enter the formula for median in a cell you formatted: ⴝMEDIAN(B2:B31)
How does the formula show what cells the median describes?
b) Tell what data the median describes.
c) What is the median for the data?
5. a) Enter the formula for mean in a cell
you formatted:
ⴝAVERAGE(B2:B31)
b) Tell the value of the mean and the
data it describes.
The formula for mean
uses the word
AVERAGE.
6. a) Enter the formula for range in a cell
you formatted:
ⴝMAX(B2:B31)-MIN(B2:B31)
Explain how this formula determines
the range.
b) Tell the value of the range and the
data it describes.
7. a) Enter the formula for standard
deviation of a population in a cell
you formatted:
ⴝSTDEVP(B2:B31)
How does the formula show what cells the standard deviation describes?
b) Tell the value of the standard deviation and the data it describes.
8. a) Repeat question 2 for five other cells.
Use a similar process to question 3 for the mode of the lengths of days in December.
What cells will you reference in your formula?
c) What does the value in the cell for the mode tell you?
Does this make sense? Why or why not?
b)
9. Repeat questions 4 to 7 to determine the median, mean, range, and standard
deviation for the length of days in December.
Remember to change the cells you reference in your formulas.
10. a) Determine the median and mean average number of hours of daylight each month.
Use these averages to describe the differences in daylight times in June and December.
Which measure better describes these differences?
b) What do standard deviations for June and December tell you
about the daylight times in these months?
328
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:41 AM
Page 329
Home
Quit
Practice
1. Josef researched these data about the cause of all identified forest fires in a recent year.
Open the file forestfires.xls.
a)
b)
c)
d)
e)
f)
Select any five empty cells in a column of the file forestfires.xls.
From the Format menu, choose Cells... .
On the Number tab, under Category: choose General.
Determine the measures of central tendency and measures of spread for the
number of forest fires due to human activities.
Use STDEV for these
Repeat part b for forest fires due to lightning.
data since they are
Repeat part b for forest fires due to unknown cause.
only all identified fires.
How can you use the standard deviation to interpret the mean?
Explain how you could use the measures you calculated to develop awareness
of the need for fire safety.
7.6 Analysing Data
329
Chapter 07
7/6/07
9:41 AM
Page 330
Home
Quit
2. Hilda researched the maximum depths of all the oceans and of the deepest seas.
Open the file oceansandseas.xls.
a) Select any five empty cells
in a column. From the Format
menu, choose Cells... .
On the Number tab, under
Category: choose General.
b) Determine each measure
of central tendency for the oceans.
c) Determine each measure
of spread for the oceans.
d) Repeat parts b and c for the seas.
e) Use the measures you calculated
to compare the data for oceans and the data for seas.
Use STDEVP for the
oceans and STDEV for
the seas.
3. The yield of a crop is the number of bushels that are
produced for every acre of land farmed.
Open the file cropyields.xls.
It shows the yields of different crops in each of 10 years.
a) Without calculating, which crop
appears to produce the most
consistent yields?
b) Which appears to have the
greatest variation in yield?
c) Use measures of central
tendency and measures of
spread to check
your predictions.
How might this information
be useful to a farmer?
Reflect
➢ Explain your strategy for naming the cells to be included in a formula.
Suppose you have data in the first 18 rows of Column A.
Describe how to use your strategy to enter a formula to determine
the standard deviation for data in these cells.
➢ Choose an example in this section. Explain how the measures of
central tendency and measures of spread can help you compare
data.
330
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:42 AM
Page 331
Home
Quit
Dice Choice
Materials
• 10 dice
• graphing calculator or Microsoft Excel
Play in a group of 2 to 4.
➢ Roll 10 dice.
➢ Each player writes the digit that
appears on each die.
➢ Then each player decides how to use
the digits to write five 2-digit numbers.
Here are two examples of numbers
players might create
for the digits on these dice.
41, 46, 31, 52, 52
Since you are calculating
the standard deviation for
all the numbers you
wrote, divide by the total
number of data items.
26, 31, 41, 42, 55
➢ Each player uses technology to determine
the standard deviation for her or his 2-digit numbers.
➢ The player with the lower or lowest standard deviation
scores 1 point.
➢ Roll the dice to continue.
➢ The first player to score 4 points wins.
➢ Is there a strategy that can help you win?
If so, describe it.
If not, explain why you cannot develop a strategy.
GAME: Dice Choice
331
Chapter 07
7/6/07
9:42 AM
Page 332
Home
7.7
Quit
Designing and Conducting
an Experiment
A questionnaire involves asking people about their opinions or habits.
An experiment involves counting or measuring physical
properties to test an idea or answer a question.
Inquire
Conducting an Experiment to Collect Data
Work with a partner or in a group.
Designing an experiment
When you plan an experiment, think about these questions.
➢ What factors might influence your results?
How can you consider these factors when you design your experiment?
➢ How many observations will you make or for how long will you observe?
➢ What materials will you need?
➢ How will you record your observations?
Use your answers to these questions to help plan the experiment.
A good experimental plan should include these items:
➢ The question you are investigating
➢ A list of the materials you will need
➢ The steps you will follow, in as much detail as possible
➢ Any tables you might need for recording your observations
332
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:42 AM
Page 333
Home
Quit
For example, suppose you want to explore how quickly after exercise a person’s
heart rate returns to its resting rate.
You will have to consider these issues:
➢ A person’s age may affect the result.
Will you collect data for a variety of ages
or just one age group?
➢ The amount of time the person exercises
may affect the result.
How will you ensure all the people
in your experiment exercise for the same amount
of time?
➢ The type of exercise may affect the result.
How will you ensure all the people
in your experiment do the same exercise?
➢ What materials will you need?
➢ From how many people should you collect data?
1. Suppose you are to conduct a heart rate experiment
like the one described above.
Answer each question that was posed above.
Write a plan for the experiment.
2. Suppose you want a bike lane on the street where your school is located.
You design a questionnaire asking people whether they will use a bike lane.
You also want to measure how much bike traffic the street has now.
a) Why should you allow for each of these factors
in your experimental design?
• time of day
• weather
• day of the week
How could you do this?
b) Write a plan for the experiment.
3. Can a person balance on one foot longer
with eyes open or with eyes closed?
a) What are some issues that you will need
to consider when designing an experiment
to answer this question?
b) Write a plan for the experiment.
7.7 Designing and Conducting an Experiment
333
Chapter 07
7/6/07
9:42 AM
Page 334
Home
Quit
4. Which brand of orange juice do students prefer in a taste test?
What are some issues that you will have to consider
when designing an experiment to answer this question?
b) Write a plan for the experiment.
a)
5. Choose one of the experiments you planned.
Compare your plan with the plan developed
by another group. Discuss any differences you notice.
b) Revise your plan if you see ways to improve it.
a)
Conducting the experiment
➢ Choose one of the experiments you planned, or design
a new experiment about another topic.
What question will you try to answer with the data
you collect?
➢ If your experiment involves having people perform tasks,
use your knowledge of sampling techniques to plan how you will
get data about a representative sample of people.
If your experiment involves observing and counting things that occur
without you planning them , think about where, when,
and how you make your observations.
If your experiment
is a taste test, ask
participants about food
allergies.
➢ Gather any materials you need. Carry out your experiment.
Displaying and analysing the data
➢ Decide which types of graphs are most appropriate for the data you have collected.
Create a visual display to represent the data, either by hand or with a spreadsheet.
➢ If the data are numeric, which measures of central tendency or spread
best represent the data? Explain your choice of measures.
➢ Answer the question that inspired the experiment.
If you need more data, explain what you have learned so far
and what steps you could follow to obtain more data.
Reflect
➢ Were you able to get data for an appropriate sample or find an
appropriate time and place to make your observations?
If not, how did this affect your results?
➢ How could you improve the design of your experiment?
334
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:42 AM
Page 335
Home
7.8
Quit
Collecting Data from
Secondary Sources
A student writing an essay, a business person preparing a proposal
to win a new client, or a charitable organization completing a grant
application all have one thing in common.
They need to know how to collect, analyse, and display data to support their cases.
Inquire
Collecting and Analysing Data
To complete both parts of this Inquire, you will need a computer with access
to the Internet and E-STAT.
If you do not have access to E-STAT, you can complete the Part 2 of this Inquire
using the Internet or printed materials.
You will also need Microsoft Excel.
Part 1: Collecting Data Using E-STAT
➢ Go to www.statcan.ca. Click English.
Select Learning Resources from the menu on the left.
Click on E-STAT in the yellow box on the right.
Then click Accept and Enter.
If you are working at home, you will need
to enter the user name and password assigned to your school.
You should see a table of contents on your screen.
You may need to scroll
down.
7.8 Collecting Data from Secondary Sources
335
Chapter 07
7/6/07
9:42 AM
Page 336
Home
Quit
➢ Click on Environment in the Land and Resources section.
You will be using tables from Human Activity and the Environment,
Annual Statistics 2006 and later. Click on the link to this document.
In the box that appears, click on View HTML.
➢ The next screen shows a table of contents. Click on Tables.
The table you need is in Section 4: Socio-economic response to environmental conditions.
Click on the link to this section.
If you cannot download the
Then click on the HTML link for Table 4.12.
file, enter the data for
The table shows Canadian recycling data for 2002.
Ontario, British Columbia,
and Nova Scotia into a
Right click on the table and Export to Microsoft Excel.
Microsoft Excel spreadsheet.
This screen shows the downloaded Excel file.
336
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:43 AM
Page 337
Home
Quit
➢ Before you explore these data, think about recycling in your community.
What different materials can you recycle at home or school?
Which two or three materials make up the biggest part of the material you recycle?
➢ Use the spreadsheet data to make a circle graph of materials recycled
in Ontario, by mass.
You will need to copy and paste the labels from Column A into a new
section of the spreadsheet.
Then copy and paste the Ontario data so that they are in a column
beside the labels you pasted.
What were the top three materials recycled in Ontario, by mass?
➢ British Columbia is the only other province with a complete set of data.
Make a circle graph for this province.
Again, you will have to copy and paste so that the labels are beside the data.
Compare the two graphs.
What were the top three materials recycled in British Columbia, by mass?
➢ Two categories of data are missing for Nova Scotia.
Make a circle graph for this province.
Again, you will have to copy and paste so that the labels are beside the data.
Replace the Xs in the Copper and aluminum and Other metals categories
with zeros before graphing. This ensures the colours used for each category
will match those in the other two graphs for ease of comparison.
What were the top three materials recycled in Nova Scotia, by mass?
7.8 Collecting Data from Secondary Sources
337
Chapter 07
7/6/07
9:43 AM
Page 338
Home
Quit
➢ What might you conclude about recycling programs in different parts
of the country? Explain your thinking.
➢ Write another question that someone could answer
using one or more of your graphs.
➢ Choose another topic to research for which data are available in E-STAT.
Think of a question, collect data to determine the answer,
and graph the data you find.
Part 2: Collecting Data Using Other Websites or Printed Materials
With clever searching on the Internet, you can find data on almost any topic.
Governments and international organizations,
such as the United Nations Statistics Division, are usually
trustworthy sources of data.
When you collect data,
either electronically or in
print, it is important to
consider how reliable the
source of the data is.
Other sources of data are the websites of professional sports organizations,
the International Olympic Committee, and the Census at School section
of the Statistics Canada website.
A search engine is a program that finds information by searching for keywords you enter.
It returns a list of websites where the keywords were found.
Here are some tips for using search engines to find data.
• Be as specific as possible in the keywords you enter.
• Use the + symbol before each word if you want only results that include all
the words you have entered.
• Use the – symbol before a word if you do not want any results that contain
this word.
• If you want a series of words to appear together in a particular order, type
quotation marks before the first word and after the last word.
Sometimes data can be downloaded from websites as a spreadsheet file
or as a CSV (comma separated value) file that can be used in any spreadsheet.
If there are only a few pieces of data or you use data from printed sources,
write down the data on paper or enter them in a spreadsheet.
338
CHAPTER 7: One-Variable Data
Chapter 07
7/6/07
9:43 AM
Page 339
Home
Quit
➢ Look back at the topics suggested in question 9 in Section 7.1.
Choose one of these topics or use a different topic of your choice.
Pose a problem you can try to solve with the data you collect.
If appropriate, predict what you think the answer will be.
Questions I could answer by collecting second-hand data
The Environment
Which countries produce
the most carbon dioxide
per capita?
Sports
How is most of Ontario’s
electricity generated?
Interesting Facts
How many Stanley Cups
has each NHL team
won?
How many minutes do
professional basketball players
play per game?
Students in Canada
What are the five most
popular car colours in the world?
How much of their gross national product
do different countries spend on education?
How many cigarettes do
Canadian teens smoke
each week?
What percent of Canadian
students are left handed?
➢ Use the Internet or printed sources to find the data you need.
Record the web addresses or names of your data sources
as references.
If you are having difficulty finding data, you may need
to choose a different topic.
It is often better to
change topics than to try
locating data which may
not exist.
➢ Decide which types of graphs are most appropriate for the data you find.
Create a visual display to represent the data, either by hand or with a spreadsheet.
➢ If the data are numeric, which measures of central tendency or spread
best represent the data? Explain your choice of measures.
➢ Solve the problem you posed. If you need to find more data,
explain what you have learned so far and
what steps you could follow to obtain more data.
Reflect
➢ What difficulty might someone have collecting data?
How could they deal with this difficulty?
➢ Does using second-hand data have advantages over collecting
your own data? If so, what are they? If not, why not?
➢ Why is it important to collect data from reliable sources?
7.8 Collecting Data from Secondary Sources
339
Chapter 07
7/6/07
9:43 AM
Page 340
Home
Quit
Chapter Review
What Do I Need to Know?
Types of Data and Graphs
One-variable data describe one piece of information about a person, place, or thing.
Data that involve numbers are called numeric data. They may be discrete or continuous.
• Data that are grouped by categories are called categorical.
The type of graph you draw depends on the type of data being represented.
• Circle graphs and pictographs can represent categorical data or discrete data.
Hours Spent Listening to Music Each Week
by Ontarians
Car Thefts in Selected Canadian Cities, 2001
represents 8 hours
On the
internet
1.9 h
Toronto
Courses Taken by a First Year Apprentice Millright
Ottawa
Welding 1
On
television
2.3 h
Calgary
Trade theory 1
On the radio
8.2 h
Edmonton
Trade practice 1
Montreal
On CD, mp3,
or cassette
7.4 h
Electrical 1
Hamilton
Drawings and
schematics
Vancouver
20
30 40 50
Age (years)
60
Number of people
Number of people
60
50
40
30
20
10
0
10
20 30 40 50
Age (years)
340
5
10 15 20 25
Number of shifts
30
CHAPTER 7: One-Variable Data
0
00
0
30
00
0
35
00
0
25
0
00
20
00
70
Bimodal
3500
3000
2500
2000
1500
1000
500
0
Minutes per Game Played by Toronto
Raptors Players
ov
0
er
15
0
0
15
10
75
50
5
25
Number of households
(thousands)
Skew Right
Household Incomes in Canada, 2001
Number of players
Skew Left
0
60
70
Edmonton Oilers Shifts Played per Game,
2005/2006
10
9
8
7
6
5
4
3
2
1
15
00
Normal Distribution
10
9
8
7
6
5
4
3
2
1
10
Number of stolen cars
Age of Audience Members at Movie
Age of Passengers Riding in a
Subway Car
0
10
50
Uniform Distribution
00
A histogram is a type of bar graph that shows numeric data that have
been grouped in intervals.
The shape of a histogram provides information about how the data are distributed.
Annual income (thousands of $)
Number of players
•
0
Winnipeg
4
3
2
1
0
5
10
15 20 25 30
Time played (min)
35
40
Chapter 07
7/6/07
9:43 AM
Page 341
Home
Quit
Sampling Techniques
The population of a data set is all the pieces of data in the set.
A sample is a smaller set selected from the population.
There are different techniques for selecting a sample.
With a random sampling technique, each member of the population has the same
chance of being selected. This is not true with other techniques.
Random sampling techniques
• Simple random sampling
• Stratified sampling
• Cluster sampling
• Systematic sampling
Other techniques
• Convenience sampling
• Judgement sampling
• Voluntary sampling
Measures of Central Tendency and Spread
The mode, mean, and median are measures of central tendency for a data set.
They are used to describe a typical or average value for a data set.
• The mode is the number that occurs most often.
• To determine the mean, add the numbers,
then divide the sum by the number of numbers.
• To determine the median, arrange the numbers in order.
The median is the middle number.
For an even number of numbers, the median is
the mean of the two middle numbers
The measures of spread are the range and the standard deviation.
• The range is the difference between the greatest number
and the least number in a data set.
• The standard deviation tells how widely spread
around the mean the data in a set are.
To calculate the standard deviation:
➢ Calculate the mean.
➢ Subtract the mean from each data value.
➢ Square each difference.
➢ Add the squared numbers.
➢ Divide the sum by one less than the number of data items
if the data are for a sample
➢ Divide by the number of data items if the data are for an entire population.
➢ Determine the square root of the result.
Chapter Review
341
Chapter 07
7/6/07
9:43 AM
Page 342
Home
Quit
What Should I Be Able to Do?
7.1
1. Is each type of data numeric or
7.3
recommend whether to collect data from
a sample or the entire population.
If you recommend a sample, suggest a
sampling technique. Explain the reason
for your suggestion.
a) Surveying the residents of a
condominium to determine their
opinions about a proposed renovation
b) Surveying students at your school
to determine whether they would
participate in a fundraiser for a
local hospital
c) Testing chocolate bars produced
each day in a factory to check
for peanut cross-contamination
categorical? Identify those data that are
numeric as continuous or discrete.
a) A yes/no response on a questionnaire
b) The fuel consumption rating of a
vehicle
c) The colour options for a new car
d) A person’s shoe size
e) The type of transportation a person
uses to get to work
f) The distance a person travels to get to
work
2. a) Make a frequency table for this set
of data. Explain how you choose the
intervals.
Heights of trees in a woodlot (m)
b)
18.0
21.3
17.1
23.5 19.8
17.9
17.0
21.5
19.2 19.0
20.6
19.5
14.5
12.4 24.0
15.4
17.6
22.8
13.6 21.7
5. A company wants to survey 500 of
its employees about job satisfaction.
The company employs 860 people in
British Columbia, 1100 people in
Ontario, and 560 people in New
Brunswick. How many employees
should be sampled in each province so
that the number in each provincial
sample is proportional to the number
of employees in that province?
Draw a histogram to display the data.
Describe the distribution.
7.1
3. These data show the geographic origins of
7.2
international students at the University of
Toronto in a recent year.
7.4
Region
Number of undergraduate
students
Asia
2577
Americas
650
Europe
487
Middle East
359
Oceania and Africa
245
Graph the data. Explain how you decided
which type of graph to draw
342
CHAPTER 7: One-Variable Data
4. Identify each population below, then
6. Suppose you want to determine data
about the geographic origins of students
at your school.
a) Would you do a census or collect data
from a sample? Why? If you suggest
using a sample, recommend an
appropriate sampling technique.
b) Write a question you could include
on a questionnaire to collect these
data.
Chapter 07
7/6/07
9:43 AM
Page 343
Home
Quit
7. Which question would you use on a
a)
questionnaire? Explain your choice.
a) How do you get to school on a typical
day? _________________________
b) How do you usually travel to school
(select one): walk ____ bike ____
car ____ public transit ____
other (please specify) ____
7.5
b)
c)
d)
8. Calculate the mean, median, and mode
heights for the tree data in question 2.
Which measure do you think best
represents the data? Explain your choice.
e)
f)
9. Lila had 10 members of a high school
volleyball team and 10 people randomly
selected from a shopping mall try serving
a ball 10 times each. She counted the
number of successful serves for each
person. Lila calculated the mean and the
standard deviation for each group.
Which group do you think would have a
greater standard deviation? Why?
7.7
7.6
North
Bay
Vancouver
Halifax
Winnipeg
May 11
17⬚C
16⬚C
15⬚C
10⬚C
May 12
19⬚C
13⬚C
16⬚C
15⬚C
May 13
15⬚C
15⬚C
17⬚C
13⬚C
May 14
16⬚C
17⬚C
20⬚C
14⬚C
May 15
14⬚C
21⬚C
24⬚C
14⬚C
May 16
17⬚C
28⬚C
16⬚C
23⬚C
May 17
20⬚C
20⬚C
18⬚C
19⬚C
May 18
21⬚C
19⬚C
18⬚C
15⬚C
11. How many sit-ups can a typical Canadian
teenager do in 1 min?
a) What are some issues that you
would have to consider when
designing an experiment to
answer this question?
b) Write a plan for the experiment.
Include an explanation of how you
would select people to participate
in the experiment.
7.5 10. A travel agent is gathering data to help
a client plan a trip. He found data on
the maximum temperatures in a few
cities for one week during the
previous year.
Determine the measures of central
tendency for North Bay.
Determine the measures of spread
for North Bay.
Repeat parts a and b for each
of the other cities.
Choose one of the cities. Which
measure of central tendency do you
think best describes the average
weather? Why is it best?
What do the measures of spread tell
about the temperatures?
Did you use a spreadsheet
for parts a or b?
Explain the reason for your choice.
7.8
12. Suppose you need to find data about each
subject. Explain how you would search
for the data.
a) The maximum temperatures for your
community or region for one month
last year
b) The population of each province
in 1981, 1991, and 2001
c) The distance the average
Canadian commutes to work or
the number of minutes it takes
the average Canadian to commute
to work
Chapter Review
343
Chapter 07
7/6/07
9:43 AM
Page 344
Home
Quit
Practice Test
Multiple Choice: Choose the correct answer for questions 1 and 2. Justify each choice.
1. What is the name for a data set made up of some of the individuals in a target group?
A.
population
B.
sample
C.
distribution
D.
census
2. Which type of graph would not be suitable to display eye colours of students in a class?
A.
circle graph
B.
bar graph
C.
histogram
D.
pictograph
Show your work for questions 3 to 6.
3. Communication The points scored in each game by a basketball player are given.
a)
b)
Make a frequency table for these data. Explain how you choose the intervals.
11
8
17
3
22 13
8
16 10 18 10 19 12 10
9
20
6
13 15 20
5
20 13 14 12
7
9
10 19 20
21 17 14
8
16
Draw a histogram to display the data. Describe the shape of the distribution.
4. Knowledge and Understanding Use the data in question 3.
Calculate the measures of central tendency and the range.
b) Which measure of central tendency do you think best represents the data?
Explain your choice.
c) What additional information would the standard deviation provide?
a)
5. Thinking A group wants to determine Ontarians’ opinions about raising
the minimum wage. What sampling technique is used in each of these samples?
Which sample do you think would best represent public opinion? Explain.
a) Phone the human resources managers at the 500 largest companies
in the province.
b) Select several cities and rural areas.
Telephone randomly selected households in each place.
c) Ask people at employment centres in 10 cities across the province.
d) Advertise on radio and in newspapers asking people to phone
with their opinions.
6. Application Two sprinters’ times in seconds for running 100 m are given.
Who would you choose to run the final leg of your relay team?
Give reasons for your choice.
344
Kate
13.22
11.39
13.53
12.99
11.18
12.34
13.05
11.36
11.46
14.13
Fiona
12.50
12.66
12.25
12.31
12.37
12.56
12.74
13.11
12.19
12.61
CHAPTER 7: One-Variable Data