Download Microsoft Word 97

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Data mining wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Mathematics A30
Module 1
Lesson 1
Mathematics A30
Statistics
1
Lesson 1
Mathematics A30
2
Lesson 1
Statistics
Introduction
The word statistics is a very common word. Announcers at sports events are always eager
to give you the statistics for every possible situation! Local, provincial and federal
governments use statistics in many ways to show trends in the economy or population
growth and decline.
Businesses of all types keep statistics as a measure of what has happened in the past in
hopes of making things better in the future.
Statistics don't mean very much if they can't be analyzed properly. This lesson will show
you different ways of analyzing data by measures of central tendency. The data will also
be organized so that the methods of analysis will be easier.
The graphing calculator will be used as an extension when creating box and whisker plots.
An appendix at the end of this lesson will take you through the steps necessary when
creating a box and whisker plot using the graphing calculator.
Part of the assignment will be activity based in that you will be responsible for organizing
and analyzing a set of data with the knowledge and skills that you have attained from the
lesson.
Mathematics A30
3
Lesson 1
Mathematics A30
4
Lesson 1
Objectives
After completing this lesson you will be able to
• list and describe the methods used to collect data.
• obtain data for real-world situations by using simulations.
• obtain data for real-world situations by using the Monte Carlo simulation.
• review the methods for determining the measures of central tendency.
• construct box and whisker plots from simulated data.
• define and utilize the concept of percentiles (including the first, second, and
third quartiles).
• solve related problems using statistical inference.
Mathematics A30
5
Lesson 1
Mathematics A30
6
Lesson 1
1.1 Introduction to Statistics and Data Collection
Data analysis is a big part of many businesses and institutions. People are always trying
to determine bigger and better ways to do things. Situations can only get better if people
know what has happened in the past.
The principle goal of data analysis is to draw information from data. It is an opportunity
to investigate and to explore information that is given in data. Data is often open to a
variety of alternative explanations and sometimes a problem may have no single correct
answer.
Statistics is the mathematical study of data. All statistics begin with the collection of
data, whether for politics, sport, research, or industry. In order to analyze data, you must
go through the process of collecting, organizing and displaying that data. Descriptive
statistics are sets of data that have gone through this process.
Once data has been collected, organized and displayed it can then be useful in making
conclusions and predictions.
The statistical process follows these steps:
Collect Data

Organize and Display

Analyze Results

Use Results for Predictions and Decisions
There are many different ways to collect data.
•
household surveys
•
censuses
•
telephone surveys
•
experiments
•
questionnaires
These different ways of data collection can be categorized by the following methods of
gathering statistical information.
Mathematics A30
7
Lesson 1
•
Census
•
You gather information by surveying the entire population being studied.
•
Every person or item in the population is surveyed.
•
Often this method is too time-consuming and costly to be the best way to
collect the data.
•
Simulation
•
Information is gathered using experimental or indirect methods.
•
This is used as a substitute when normal sampling methods are not practical
or safe.
•
Sample
•
A section of the population is used to obtain representative information about
the topic that is being studied.
•
It is important that the sample should represent the entire population, so a
random sample is chosen.
•
Validity is a measure of how well a sample represents the entire population.
It is important the other factors have not skewed or altered the results.
•
Samples are often named by the method that the data is collected.
Clustered sample
•
a sample chosen from a particular portion of the population
Stratified sample
•
a random sample which is based on dividing the population into groups
based on a common feature
Destructive sample
•
a sampling method often used to test products in manufacturing,
where the item tested is destroyed or can't be returned to the
population
Example 1
For each of the following questions, determine the population being studied,
the sample group, the type of sample, and 2 sources of sample error.
a)
Inspection of grapefruits by slicing 3 grapefruits from each of 400
crates.
b)
Determining the number of eligible voters in Saskatchewan by doing a
door to door survey.
c)
Determining how many households in Estevan have cable TV by
phoning every 20th phone number listed in the Estevan exchange.
d)
Polling Saskatchewan readers of Sports Illustrated to determine which
sport is the most popular for people to watch.
Mathematics A30
8
Lesson 1
Solution:
a)
Inspection of grapefruits
•
Population:
•
Sample group:
•
Type of sample:
•
Errors:
b)
Number of eligible voters
•
Population:
•
Sample group:
•
Type of sample:
•
Errors:
c)
Cable TV watchers
•
Population:
•
Sample group:
•
Type of sample:
•
Errors:
d)
Most popular sport
•
Population:
•
Sample group:
•
Type of sample:
•
Errors:
•
•
•
•
•
contents of 400 crates
1 200 grapefruits
destructive sample
sample size may be inadequate
visual inspection may have more validity
•
•
•
•
•
every Saskatchewan household
every Saskatchewan household
census
people may be missed if they are not at home
people with more than one residence may be
listed twice
•
•
•
•
•
all households in Estevan
20th name in the Estevan phone book
random sample
businesses aren't separated from residences
households with no phones or with unlisted
numbers will be missed
•
•
•
•
all households in Saskatchewan
Sports Illustrated subscribers
stratified sample
lots of people watch sports that aren't
subscribers to Sports Illustrated
Saskatchewan residents may be biased to
Canadian sports
•
Mathematics A30
9
Lesson 1
Simulations and The Monte Carlo Method
The data collection for some real-world situations is very difficult because of time or
money constraints or when sampling methods are not practical or safe. One example of
this would be using crash test dummies to test safety standards in a vehicle. In these
instances, a simulation of an experiment can be conducted to acquire data that will help
in analyzing the situation. A simulation is a procedure used to answer questions about a
real-world situation by setting up an experiment that will portray the actual outcome.
Simulations that involve an element of chance are referred to as Monte Carlo simulations.
A Monte Carlo simulation often involves the following methods:
•
rolling a dice
•
tossing a coin
•
using a table of random numbers
•
using calculators having a random number function
The most important part of a Monte Carlo simulation is designing an experiment to
simulate the data that is needed to analyze the problem.
Example 2
Design an experiment to estimate the probability of a family with four
children having all girls.
Solution:
Determine the possible methods.
1.
Toss a coin
•
Heads - Boy
•
Tails – Girl
2.
Roll a dice
•
Even number - Boy
•
Odd number – Girl
3.
Table of random numbers
•
Even number - Boy
•
Odd number - Girl
Mathematics A30
10
Lesson 1
Activity 1.1
•
•
•
Toss a coin to determine the probability of a family with four children
having all girls.
Toss the coin in sets of four. If the four tosses are all tails, enter yes in
the table. For any other combination of heads and tails, enter no.
Conduct 50 sets of four tosses.
Tosses
•
Total
Record the total number of "yes" responses after
•
10 sets of tosses.
•
20 sets of tosses.
•
30 sets of tosses.
•
40 sets of tosses.
•
50 sets of tosses.
The probability of an event happening is the number of favourable outcomes divided
by the number of total outcomes.
What is the probability of a family of four children having all girls?
What would you expect would be the result if you conducted the experiment with
100 tosses? 200 tosses?
A table of random numbers can also be used to simulate a situation. One of these tables
is on the next page. Even though the numbers are grouped in fives, any total number can
be used.
Example 3 will use the table of random numbers to simulate an experiment.
Mathematics A30
11
Lesson 1
Table of Random Numbers
98299
26431
31098
06439
43274
62016
74576
87395
71610
42808
50712
00780
04669
48862
51823
82841
98575
71690
10140
07702
02554
63110
14985
88016
92051
08614
43438
32489
92085
34118
13873
65725
79669
78448
32520
87432
30871
31501
28672
60639
34188
83554
77509
56618
68932
31878
99380
40649
89969
15851
63911
51768
50717
34523
92143
18507
13119
54631
54664
50695
86670
83438
59867
46204
76752
87577
69628
50972
84909
32355
27401
07287
05701
22244
27165
61516
31010
54732
54854
09574
02216
57412
77813
60381
37637
72766
94994
00137
14293
87392
24451
49382
50294
65107
22281
43244
62978
01227
41155
05975
87838
58593
25906
74234
79677
16420
28451
56787
55096
56464
98769
02557
68652
86106
00665
01353
18510
62632
61924
84951
52927
49678
39065
51072
73419
45502
10030
63005
30070
54955
53549
66868
74245
30295
38884
33937
59299
98425
24517
24567
30206
43084
08753
95492
82720
80922
50326
14697
69161
08850
18604
69059
61528
22953
99900
80628
07899
32169
71860
03601
56145
71179
63282
41650
57596
32658
41125
20012
53453
33940
26792
99900
64673
09010
24921
Mathematics A30
12
Lesson 1
Example 3
Use the table of random numbers to estimate the probability that a student
guessing the answers on a 10-question true-false test will get 7 or more
correct.
Solution:
Read the problem.
On the test, each of the ten responses are independent of each other. The answers are
either correct or not correct.
Within the ten responses, find the number of times that the answers are correct 7, 8, 9 or
10 times.
Develop a plan.
Use the random table of numbers.
•
A correct response is an even number.
•
An incorrect response is an odd number.
Start anywhere in the table and select a group of ten numbers across to the right.
Record the number of even numbers in the group of ten.
Carry out the plan.
Starting at
Random Numbers
Number of Even Digits
1st number, 1st row
98299 62016
6
6th number, 3rd row
32489 79669
5
2nd number 5th row
42808 51823
7
5th number, 9th row
54664 46204
9
3rd number, 10th row
15851 92143
3
1st number, 6th row
87432 34188
6
5th number, 11th row
24451 43244
7
3rd number, 12th row
57412 94995
3
3rd number, 18th row
62632 39065
6
1st number, 21st row
30206 80922
8
6th number, 15th row
05975 79677
2
1st number, 23rd row
08753 14697
4
Mathematics A30
13
Lesson 1
There were four times that the number of even digits in the random number was 7 or
more.
The probability of guessing 7 or more correct answers is:
P(A) 
number of outcomes in the event
4 1


total number of possible outcomes 12 3
Write a concluding statement.
The probability of guessing 7 or more correct answers on a true-false test with
10 questions is 0.3.
The more trials that you conduct, the more accurate you will be in determining the
actual probability of being able to guess correctly 7 or more questions on a true-false
test.
Mathematics A30
14
Lesson 1
Activity 1.12
This activity will be handed in with Assignment 1.
Your favourite fast food restaurant distributes one of 6 different toys with each
Kid's Meal. How many times do you have to order a Kid's Meal so that you are
able to collect all 6 toys?
•
•
•
•
•
•
•
Use one die.
Shake the die.
For each trial, cross out the number that appears on the die.
Determine the total number of shakes that it takes to complete each row.
Every number will have to be crossed out at least once.
Each row represents one trial.
If you do not have a dice, use 6 pieces of numbered paper in a container.
Trial
#1
#2
#3
#4
#5
#6
#7
#8
Toss the Die
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
Total
Shakes
4
4
4
4
4
4
4
4
5
5
5
5
5
5
5
5
6
6
6
6
6
6
6
6
• You may perform more trials to get a more accurate result.
On average, how many times do you have to buy a Kid's Meal before you get
all six of the toys?
Mathematics A30
15
Lesson 1
Mathematics A30
16
Lesson 1
Exercise 1.1
1.
Design an experiment to estimate the probability that a family with 3 children will
have 2 boys and a girl.
2.
The A-side final of a hockey league championship is a best of seven series. Theodore
and Langenburg are evenly matched. If a team must win four games to win the
championship, estimate the probability that the series will last seven games using
random numbers.
3.
There are 20 questions on a multiple choice quiz. Each multiple choice question has
four responses, a, b, c, and d. Design an experiment to predict the probability that a
student, guessing at the answers, will have at least half the questions correct.
1.2 Measures of Central Tendency
Often it is useful to find a single number that can best represent the entire sample or set
of data that is being studied. This single number is called a statistic, and the measure of
central tendency is one example of such a statistic. Three different measures of central
tendency will be studied in this course. Each has its advantages and disadvantages. The
most appropriate choice of a measure depends on the situation and the data that is
presented. It is not unusual for all three measures to be recorded and comparisons made
between the three.
Mean
The first measure of central tendency is the mean. This is the arithmetic average of the
numbers in the sample group.
The symbol for the mean of a set of data is x .
The average of a set of numbers is the sum of the numbers divided by the total
number of numbers.
Mathematics A30
17
Lesson 1
Example 1
The following data represents the results of a Mathematics A30 midterm
test:
62, 57, 86, 74, 93, 44, 73, 90, 72, 79, 82, 87, 38, 97, 63, 90, 89, 62, 58, 74, 66,
75, 90, 89, 63.
Determine the mean.
Solution:
•
•
There are 25 test scores.
The sum of the test scores is 1853.
1853
25
x  74 .1
Determine the mean.
x
The mean of the results of a Mathematics A30 midterm exam is 74.1%
A scientific calculator can be used to average numbers.
Median
The second measure of central tendency is the median. This is the middle number when
all of the data is arranged from smallest to largest.
When you are given a set of data, the data can be arranged in an array. This is another
way to say that the data is arranged from smallest to largest values.
Mathematics A30
18
Lesson 1
Example 2
The age of the mathematics teachers at a school is as follows:
27, 61, 35, 38, 58, 50, 48, 33, 37.
Order the data in an array and determine the median.
Solution:
Arrange the data in order from smallest to largest.
27, 33, 35, 37, 38, 48, 50, 58, 61
Determine the middle number.
The median of the data is 38.
The median age of the mathematics teachers at the school is 38 years old.
•
If there are 20 numbers in the sample group, there will be 2 numbers that are the
middle numbers, the 10th number and the 11th number. If this is the case, the
median is the average of these two middle numbers.
This is the same for all sample groups that have an even number of data.
Mathematics A30
19
Lesson 1
In the first example there were not very many numbers, and it was easy to order them
from smallest to largest. If you have a sample group where there is a lot of data, a
stem and leaf plot can help you to organize the data.
The following table shows a stem and leaf plot, where the stem is used to organize the
"first" or tens digits, and the leaf is used to record the "last" or units digits. Once the table
is completed it is much easier to write the data in an array.
The data from Example 2 was as follows:
27, 61, 35, 38, 58, 50, 48, 33, 37.
This data is shown in the following table.
Stem
Leaf
2
7
3
5, 8, 3, 7
4
8
5
8, 0
6
1
These values can now be written in an array.
If the data is repeated in the sample group it is still represented in the stem and leaf plot.
Example 3
The following data is the same data as in Example 1, representing the results
of a Mathematics A30 midterm test:
62, 57, 86, 74, 93, 44, 73, 90, 72, 79, 82, 87, 38, 97, 63, 90, 89, 62, 58, 74, 66,
75, 90, 89, 63.
Enter the data in a stem leaf plot, organize the data in an array and then
determine the median of the data.
Mathematics A30
20
Lesson 1
Solution:
Enter the data in a stem and leaf plot.
Stem
Leaf
3
8
4
4
5
7, 8
6
2, 3, 2, 6, 3
7
4, 3, 2, 9, 4, 5
8
6, 2, 7, 9, 9
9
3, 0, 7, 0, 0
Arrange the data in an array.
38, 44, 57, 58, 62, 62, 63, 63, 66, 72, 73, 74, 74, 75, 79, 82, 86, 87, 89, 89, 90, 90, 90,
93, 97
Determine the median or the middle term.
•
•
There are 25 terms.
The middle term will be the 13th term.
38, 44, 57, 58, 62, 62, 63, 63, 66, 72, 73, 74, 74, 75, 79, 82, 86, 87, 89, 89, 90, 90, 90,
93, 97
This is the 13th term
The median of the test scores on the Mathematics A30 midterm test is 74.
Mathematics A30
21
Lesson 1
Mode
The mode is the third way to measure central tendency. The mode is the most frequently
occurring number or result in the sample group. This is perhaps the most inefficient way
of determining central tendency as will be shown in the example on test scores.
Example 4
The following data is the same data as in Examples 1 and 3, representing the
results of a Mathematics A30 midterm test:
62, 57, 86, 74, 93, 44, 73, 90, 72, 79, 82, 87, 38, 97, 63, 90, 89, 62, 58, 74, 66,
75, 90, 89, 63.
Determine the mode.
Solution:
Determine the number that is repeated the most often.
•
The score, 90, is repeated three times.
The mode of this data is 90.
Is the mode a good measure of central tendency in this example?
Which measure would you say best describes the data?
Is there much difference between the median and the mean in this situation?
Mathematics A30
22
Lesson 1
Example 5
Data was collected for the monthly precipitation in the grassland area of
Shaunavon. The following table shows the results. Determine the mean,
median and mode of the data. Which value is the best measure of central
tendency?
Monthly
Precipitation
(cm)
J
F
M
A
M
J
J
A
S
O
N
D
1.2
2.0
2.1
2.8
4.0
8.0
5.1
3.9
3.1
2.1
1.5
1.4
Solution:
Determine the mean.
•
•
The sum of the data is 37.2.
There are 12 numbers in the sample group.
37 .2
12
x  3.1
x
The mean is 3.1 cm.
Mathematics A30
23
Lesson 1
Determine the median.
Construct a stem and leaf plot.
Stem
Leaf
1
.2, .5, .4
2
.0, .1, .8, .1
3
.9, .1
4
.0
5
.1
8
.0
Arrange the data in an array.
1.2, 1.4, 1.5, 2.0, 2.1, 2.1, 2.8, 3.1, 3.9, 4.0, 5.1, 8.0
1.2, 1.4, 1.5, 2.0, 2.1, 2.1, 2.8, 3.1, 3.9, 4.0, 5.1, 8.0
These two are the middle values.
2 .1 + 2 .8
2
Median  2 .45
Find the average of the two middle values.
Median 
The median is 2.45 cm.
Determine the mode.
•
•
The value, 2.1, is repeated 2 times in the sample group.
No other value has been repeated.
The mode is 2.1 cm.
Mathematics A30
24
Lesson 1
•
Which value would be the best indicator of central tendency?
Mode
•
The values are spread out and the 2.1 is only repeated 2 times, therefore this is not
a good measure.
Mean
•
There is an extreme value of 8.0 cm that is not close to the other values. This
increases the mean and perhaps skews this value a little.
Median
•
This is the best measure of central tendency because it shows the average of the
middle two terms.
The following are the three most common measures of central tendency:
•
Mean:
This is the arithmetic average of the numbers in the
sample group.
•
Median:
This is the middle number when all the data is arranged
in order from smallest to largest.
•
Mode:
This is the most frequently occurring number or result in
the sample group.
Mathematics A30
25
Lesson 1
Exercise 1.2
1.
The mature height of species of trees in Saskatchewan recommended for
shelterbelts is as follows:
Species
Mature
Height (m)
Green Ash
15
Manitoba Maple
14
American Elm
20
Siberian Elm
10
Willow
14
Poplar
20
Caragana
4
Villosa Lilac
4
Chokecherry
7
Buffaloberry
5
Honeysuckle
4
White Spruce
20
Colorado Spruce
25
Scots Pine
20
Determine the three measures of central tendency for this data.
Which value is the best measure of central tendency?
Mathematics A30
26
Lesson 1
2.
The estimated price of crops in Saskatchewan for 1995 is shown in the following
table.
Crop
Price
$/t
Winter Wheat
140
Spring Wheat
150
Duram
180
Oats
90
Barley
85
Fall Rye
90
Spring Rye
90
Flax
255
Canola
305
Mixed Grains
85
Mustard Seed
265
Sunflower Seed
275
Lentils
360
Field Peas
165
Canary Seed
240
Determine the three measures of central tendency for this data. Which value is the
best measure of central tendency?
3.
The winning team must win four out of seven games. The scores are shown in the
table.
Game
Team A
Team B
1
12
1
2
7
8
3
10
3
4
1
2
5
3
4
6
2
1
7
0
1
Explain why the mean score is not a good measure to determine the winner.
Mathematics A30
27
Lesson 1
1.3 Box and Whisker Plots
When analyzing data, it is often helpful to see the full range of results. Graphs are one of
the most common ways used to display data in a meaningful manner.
The box and whisker plot is one way of showing statistical data. Once the data has been
arranged and plotted, it is easier to analyze the results.
It is very important to understand how to determine the median value of a set of data.
This is the measure of central tendency that will guide the construction of the box and
whisker plots.
The data from the examples in Section 1.2 on the results of a Mathematics A30 midterm
test will again be used to illustrate the process used to develop a box and whisker plot.
The results of a Mathematics A30 midterm test:
62, 57, 86, 74, 93, 44, 73, 90, 72, 79, 82, 87, 38, 97, 63, 90, 89, 62, 58, 74, 66,
75, 90, 89, 63
The following steps are used when organizing data to be entered on a box and whisker
plot:
1.
Organize the data into an array and determine the median.
38, 44, 57, 58, 62, 62, 63, 63, 66, 73, 74, 74, 74, 75, 79, 82, 86, 87, 89, 89, 90, 90, 90,
93, 97
•
•
•
The median of the data is 74.
This value is also called the 2nd quartile.
The data has now been divided into two parts. There are 12 values in the
upper half of the data and 12 values in the lower half of the data.
Mathematics A30
28
Lesson 1
2.
Determine the median of the upper half of the data.
75, 79, 82, 86, 87, 89, 89, 90, 90, 90, 93, 97
These two are the middle terms.
•
•
3.
The average of the middle two terms is 89, and therefore the median of the
upper terms is 89.
Another name for this value is the 3rd quartile.
Determine the median of the lower half of the data.
38, 44, 57, 58, 62, 62, 63, 63, 66, 73, 74, 74
These two are the middle terms.
•
•
4.
Draw a number line which includes the entire range of data from the smallest
number to the largest number.
•
•
5.
The average of the middle two terms is 62.5 and therefore the median of the
lower terms is 62.5.
Another name for this value is the 1st quartile.
For this example the number line will have to go from 38 to 97.
Label the number line with an even scale.
Plot the data on the number line by placing dots on the line or above the line to
indicate each score.
Mathematics A30
29
Lesson 1
6.
•
•
•
•
7.
•
Draw a vertical line through the 2nd quartile or the median which is 74.
Draw a vertical line through the 3rd quartile or the upper median which is
89.
Draw a vertical line through the 1st quartile or the lower median which is
62.5.
Draw a box around the values between the 1st and 3rd quartiles.
Draw whiskers extending from each of the quartiles to the lowest and highest
values.
How can you use the box and whisker plot to analyze the data?
•
•
•
50% of the data is contained in the box.
25% of the data is between the upper quartile and the upper extreme.
25% of the data is between the lower quartile and the lower extreme.
A box and whisker plot can, at a glance, give you a quick impression of certain important
features of the set of data.
Location of the data:
•
The median shows the centre of the data.
Spread of the data:
•
•
The length of the box shows the spread of the middle 50% of the data.
The length of each whisker shows the spread of the values in the upper and lower
25% of the data.
Mathematics A30
30
Lesson 1
Symmetry of the data:
•
The symmetry of the box with respect to the median will usually determine the
symmetry of the data with respect to the median.
How can you analyze the data from the box and whisker plot of the results on the
Mathematics A30 test?
•
•
•
•
The median is a good indication of the centre of the data.
The values in the box are spread out showing that there is a range of values in the
middle 50% of the data.
The lower 25% of the data is spread out more because of a lower extreme.
The data within the box is fairly symmetrical, showing that the entire sample group
is symmetrical as well. This means that the median is a good indication of central
tendency.
A graphing calculator can create a box and whisker plot. The appendix at the
end of this lesson (page 49) will explain how to do this with the TI-83 Plus
graphing calculator.
•
•
•
Mathematics A30
Use this procedure to enter the above data from the question on the
test scores.
Set the WINDOW at
Xmin = 30
Xmax = 100
Xscl = 10
Use the TRACE function to determine the upper and lower extremes,
the 1st and 3rd quartiles and the median.
31
Lesson 1
Example 1
The number of passes attempted and completed by a Saskatchewan
Roughrider quarterback in each of the games in the 1996 season is given in
the table.
a)
b)
Mathematics A30
Show each of the sets of data in a box and whisker plot.
Analyze the data.
Game
Passes
Attempted
Passes
Completed
vs. Calgary
44
21
vs. Ottawa
33
19
vs. Edmonton
25
16
vs. Hamilton
38
22
vs. Toronto
28
14
vs. Calgary
33
17
vs. Montreal
25
16
vs. Edmonton
33
19
vs. Montreal
33
23
vs. Winnipeg
25
12
vs. Toronto
33
20
vs. Ottawa
23
11
vs. B.C.
29
14
vs. Winnipeg
27
11
vs. Hamilton
31
22
vs. Winnipeg
26
13
vs. B.C.
22
13
vs. Calgary
39
20
32
Lesson 1
Solution:
a)
Make an array of the attempted passes. Use a stem and leaf plot if necessary.
22, 23, 25, 25, 25, 26, 27, 28, 29, 31, 33, 33, 33, 33, 33, 38, 39, 44
Determine the 1st, 2nd and 3rd quartiles.
22, 23, 25, 25, 25, 26, 27, 28, 29, 31, 33, 33, 33, 33, 33, 38, 39, 44
1st quartile
3rd quartile
2nd quartile
Create a box and whisker plot.
Make an array of the completed passes.
11, 11, 12, 13, 13, 14, 14, 16, 16, 17, 19, 19, 20, 20, 21, 22, 22, 23
Determine the 1st, 2nd and 3rd quartiles.
11, 11, 12, 13, 13, 14, 14, 16, 16, 17, 19, 19, 20, 20, 21, 22, 22, 23
1st quartile
2nd quartile
3rd quartile
Create a box and whisker plot.
Mathematics A30
33
Lesson 1
Both box and whisker plots are shown on the same scale so that a comparison
between the data can be made.
Use the graphing calculator to create a box and whisker plot for each set of
data.
•
Set the WINDOW at:
Xmin = 0
Xmax = 60
Xscl = 10
b)
Analyze the data.
•
•
•
•
The number of completions is less than the number of passes.
The spread of the data for the middle 50% of the data is about the same for
both the attempted passes and the completed passes.
The number of passes in the upper 25% of the data is more spread out than in
the upper 25% for the number of completions.
Both sets of data are fairly symmetrical.
It could be concluded that during the 1996 season, more passes by Saskatchewan
Roughrider quarterbacks did not result in more completions.
Percentiles
When studying box and whisker plots it was necessary to determine the median and the
upper and lower quartile numbers.
The following observations about percent could be made from these values:
•
•
•
25% of the data was below the 1st quartile.
50% of the data was below the 2nd quartile.
75% of the data was below the 3rd quartile.
Percentiles allow you to determine the position of a number in a distribution of data.
•
•
•
The 1st quartile is at the 25th percentile.
The 2nd quartile is at the 50th percentile.
The 3rd quartile is at the 75th percentile.
Mathematics A30
34
Lesson 1
Other points can also mark the percentages of distribution, such as 5%, 15%, 62%, etc.
Each of these values also states the percent of the data that is below.
Percentile
The nth percentile P n of a distribution is the number below which n% of the
values fall.
Example 2
The following list contains the shoe sizes of some of the teachers at a school.
9, 4, 11, 6, 10, 7.5, 7.5, 7.5, 7, 6, 7.5, 8.5, 7.5, 9, 6.5, 8, 7, 8
Organize the data to determine P 10 , P 25 , P 60 , P 75 , and P 90 .
Solution:
Read the problem.
A list of shoe sizes is given.
Determine the 10th, 25th, 60th, 75th and 90th percentiles.
Develop a plan.
Arrange the data in an array and determine the median.
The 1st quartile is at the 25th percentile.
The 3rd quartile is at the 75th percentile.
Find the other percentiles by determining how many values are below them.
Mathematics A30
35
Lesson 1
Carry out the plan.
Create a stem and leaf plot.
Stem
Leaf
4
.0
6
.0, .0, .5
7
.5, .5, .5, .0, .5, .5, .0
8
.5, .0, .0
9
.0, .0
10
.0,
11
.0,
Write the data in an array.
4, 6, 6, 6.5, 7, 7, 7.5, 7.5, 7.5, 7.5, 7.5, 8, 8, 8.5, 9, 9, 10, 11
Determine the median.
4, 6, 6, 6.5, 7, 7, 7.5, 7.5, 7.5, 7.5, 7.5, 8, 8, 8.5, 9, 9, 10, 11
These are the middle two values
The median is 7.5.
Determine the 1st quartile.
4, 6, 6, 6.5, 7, 7, 7.5, 7.5, 7.5, 7.5, 7.5, 8, 8, 8.5, 9, 9, 10, 11
The value of the 1st
quartile is 7.
•
Second
quartile
Therefore P25 is 7.
Mathematics A30
36
Lesson 1
Determine the 3rd quartile.
•
Therefore P75 is 8.5.
Determine P10 .
•
•
•
10% of the values are below this value.
10% of 18 values is 1.8.
Only one value will be below the 10th percentile.
4, 6, 6, 6.5, 7, 7, 7.5, 7.5, 7.5, 7.5, 7.5, 8, 8, 8.5, 9, 9, 10, 11
•
Therefore P10 is 6.
Determine P60 .
•
•
•
60% of the values are below this value.
60% of 18 values is 10.8.
10 values will be below the 60th percentile.
4, 6, 6, 6.5, 7, 7, 7.5, 7.5, 7.5, 7.5, 7.5, 8, 8, 8.5, 9, 9, 10, 11
•
Therefore P60 is 7.5.
Determine P90 .
•
•
•
90% of the values are below this value.
90% of 18 values is 16.2.
16 values will be below the 90th percentile.
4, 6, 6, 6.5, 7, 7, 7.5, 7.5, 7.5, 7.5, 7.5, 8, 8, 8.5, 9, 9, 10, 11
•
Therefore P90 is 10.
Mathematics A30
37
Lesson 1
Example 3
The following graph shows the percentiles of the height and weight of girls
from the ages of 2 to 18.
a)
Using the percentile chart, determine what percentile Sacha, a 5 year
old girl would be for a weight of 18 kg and a height of 114 cm.
Explain what this means.
b)
A public health nurse tells a father that Paula, his 11 year old
daughter is at the 75th percentile for weight and at the 30th percentile
for height. What is her weight and height. Describe her physical
stature.
Percentiles Charts
Mathematics A30
38
Lesson 1
Mathematics A30
39
Lesson 1
Mathematics A30
40
Lesson 1
Solution:
a)
b)
•
•
•
•
She would be at the 50th percentile for weight and the 90th percentile for
height.
50% of other girls her age would be lighter than Sacha.
Only 10% of other girls her age would be taller than Sacha.
Sacha would more than likely be tall and slim.
•
•
•
Paula would weigh 43 kg.
Paula would be 141 cm tall.
Paula would more than likely be short and chubby.
Exercise 1.3
1.
In Exercise 3.2, questions 1 and 2 showed data from the mature height of trees, and
the estimated price of crops in Saskatchewan.
a)
b)
2.
Create a box and whisker plot for each of these sets of data.
Determine P10 , P40 , P75 , and P85 for each set of data.
Four boys and their fathers attended the public health clinic. The following
information was gathered and put into a table. Complete the percentiles for each
child.
Age
Mark
Kristian
Joey
Blair
Mathematics A30
2
7
12
16
Weight
(kg)
15
23
35
73
Percentiles
41
Height
(cm)
88
62
160
173
Percentile
Lesson 1
1.4 Data Analysis
Tourism Saskatchewan uses data from Statistics Canada and Customs Canada to monitor
and analyze the number of vehicles that enter Saskatchewan from the United States.
These vehicles carry tourists that will be staying in Saskatchewan for a limited amount of
time.
This section will present data that was attained from this study and show you how to
analyze this data using all of the different methods that have been described in this
lesson.
The following table shows data from two different port of entries from the United States
into Saskatchewan. One is Northgate and the other one is Oungre. The number of vehicles
entering Saskatchewan is broken down into each month.
Month
Northgate
Oungre
January
154
270
February
182
286
March
389
294
April
247
356
May
447
615
June
893
1 037
July
1 864
1 092
August
1 454
860
September
454
740
October
393
446
November
250
325
December
280
324
Mathematics A30
42
Lesson 1
Determine the mean number of vehicles entering each port monthly.
Northgate:
7007
12
x  584
Oungre:
6645
12
x  554
x
x
The mean number of vehicles entering Northgate each month is 584 and entering Oungre
each month is 554.
Determine the modes.
There is no mode value for either of the sets of data.
Determine the medians by creating a stem and leaf plot.
Northgate:
Stem
Leaf
1
54, 82
2
47, 50, 80
3
89, 93
4
47, 54
8
93
14
54
18
64
Write the data in an array.
154, 182, 247, 250, 280, 389, 393, 447, 454, 893, 1 454, 1 864
Mathematics A30
43
Lesson 1
Determine the median and 1st and 3rd quartiles.
154, 182, 247, 250, 280, 389, 393, 447, 454, 893, 1454, 1864
•
•
•
The 2nd quartile (median) is 391 vehicles per month.
The 3rd quartile is 674 vehicles per month.
The 1st quartile is 249 vehicles per month.
Oungre:
Stem
Leaf
2
70, 86, 94
3
56, 25, 24
4
46
6
15
7
40
8
60
10
37, 92
Write the data in an array.
270, 286, 294, 324, 325, 356, 446, 615, 740, 860, 1037, 1092
Mathematics A30
44
Lesson 1
Determine the median and 1st and 3rd quartiles.
270, 286, 294, 324, 325, 356, 446, 615, 740, 860, 1037, 1092
•
•
•
The 2nd quartile (median) is 401 vehicles per month.
The 3rd quartile is 800 vehicles per month.
The 1st quartile is 309 vehicles per month.
Represent each set of data on a box and whisker plot.
•
Keep the same scale for both box and whisker plots so that the data can be
compared.
Northgate:
Oungre:
Mathematics A30
45
Lesson 1
Use your graphing calculator to show these two box and whisker plots.
Enter the data for Northgate in L1.
Enter the data for Oungre in L2.
Use the appendix at the end of this lesson to guide you through the steps.
The graphing calculator will also show you the value of the median, and upper and
lower quartiles by using the TRACE function key.
Analyze the data.
•
•
•
•
•
Both sets of data have similar central tendencies.
The mean and median for both parts is very similar.
The box and whisker plots give a better indication of differences.
The data for the port of Northgate is spread out further as shown by the length of
the upper whisker. Northgate has a couple of months that are really busy, and the
other months the flow of traffic is fairly steady.
The data for the port of Oungre is not spread out as far. The values are not
symmetrical in that the lower end of the graph is bunched up and therefore the
values are similar. The upper 50% of the data are spread out evenly.
Can you think of any other conclusions that can be made from this data?
Mathematics A30
46
Lesson 1
Exercise 1.4
1.
Two other ports of entry of United States tourists entering Saskatchewan have been
chosen. Analyze the data in the same manner as in this section.
Month
Coronach
Estevan
January
83
346
February
85
367
March
214
604
April
248
536
May
353
619
June
330
720
July
391
878
August
372
717
September
332
621
October
143
613
November
144
605
December
131
661
Mathematics A30
47
Lesson 1
Mathematics A30
48
Lesson 1
Appendix
Using the TI-83 Plus Graphing Calculator for Box and Whisker Plots
1.
Clear off any data that has been previously entered into the calculator:
•
Press y= to clear any functions that have been entered.
•
Press STAT, ENTER and three columns will appear. To clear these columns,
use the arrows and move to the top of the column, press CLEAR, then use the
down arrow ()to go back to the column. It should now be empty. Do this for
all the columns that have data in them.
2.
To enter your data into the calculator:
•
Press STAT, ENTER (This takes you back to the three columns, which
should now be empty).
•
To enter your data, move the cursor to the first line in L1.
•
Type in the first number, then press ENTER. The cursor will then move
down to line two. Repeat this until you have all your data entered. If you
have more than one set of data, you move the cursor to the first line in L2
and repeat the process.
3.
To calculate the Mean:
•
Press 2nd, MODE (QUIT) to get back to the original screen.
•
Press 2nd, STAT, use the arrow to move over to MATH, then use the down
arrow to get to mean (3), press ENTER. You will see mean( and a flashing
cursor. Press 2nd, 1, ), ENTER. The mean will appear on the right. This is
the mean for the data in column 1(L1).
4.
To calculate the Median:
•
Repeat the steps for finding the mean, except move the arrow down to
median.
5.
To construct a Box and Whisker Plot:
•
You need to specify the values for the x and y axes. To do this press
WINDOW, the cursor will be waiting on Xmin. Type your value, then
ENTER. It will then move to Xmax. Type your value, then ENTER. It will
then move to Xscl. This is the amount you want your x values to go up by.
Type your value, then ENTER. Your max/min values correspond to your
data. The y-values are not necessary for a box and whisker plot.
•
Press Stat Plot (2nd, y=).
Mathematics A30
49
Lesson 1
•
•
•
Press ENTER, ENTER, (turns the stat plot on). Use the arrows to move
down to Type, then across to the fifth picture which is a box and whisker plot,
press ENTER, then move down to Xlist and across to the list that
corresponds to your graph. (You can graph three at a time). Do this for each
list that you have entered.
To draw the box and whisker plot, press GRAPH.
Using the TRACE button you can determine the values of the median,
1st quartile and 3rd quartile. Simply move the cursor left and right with the
arrow keys().
Mathematics A30
50
Lesson 1
Answers to Exercises
Exercise 1.1
Answers may vary. Here are some suggestions.
1.
The simulation can be done with the method similar to the one
in Example 2 except that three coins may be tossed at once. A
“yes” is recorded when the coins land with two heads and one
tail up.
2.
Select groups of seven digits at random from the table of random
numbers, as was done in Example 3. Let an even digit represent
a win and an odd digit a loss. If the first game played is
represented by the digit on the left, any series of 4 wins or 4
losses before the last game is recorded as a NO.
Probability of going seven games
=
3.
number of yes' s
number of seven digit groups
Select a row at random from the random numbers table and
going from left to right select the first two digits that are either
1, 2, 3 or 4 and bypass the other digits.
The first digit represents the correct answer of Question 1 and
the second digit represents a guess at the answer of Question 1.
If the two digits are the same, the guess is correct. The third
digit represents the correct answer for Question 2 and the fourth
digit represents the guess for Question 2, etc.
One trial is completed when 20 guesses are made. Do about 10
or 20 trials.
Probability that at least half the choices are correct
=
Mathematics A30
number of trials in which at least half are correct
number of trials
51
Lesson 1
Exercise 1.2
Exercise 1.3
1.
Mean
13
Median
14
Mode
20
The mean or the median is a good measure of central tendency.
2.
Mean
185
Median
165
Mode
90
The mean is a good measure of central tendency.
3.
Team A lost the series but their average score over the seven
games is much higher than that of team B.
1.
Height of Trees for Shelterbelts:
a.
Create an array.
4, 4, 4, 5, 7, 10, 14, 14, 15, 20, 20, 20, 20, 25
2nd quartile
1st quartile
b.
P10:
3rd quartile
10% of 14 is 1.4
One value lies below the 10th percentile. Therefore
P10 = 4.
P40 = 10
P75 = 20
P85 = 20
Mathematics A30
52
Lesson 1
Estimated Price of Crops in Saskatchewan for 1995:
a.
Create an array.
85, 85, 90, 90, 90, 140, 150, 165, 180, 240, 255, 265, 275, 305, 360
median
b.
P10: 10% of 15 = 1.5
One value lies below the 10th percentile. Therefore
P10 = 85.
P40 = 150
P75 = 265
P85 = 275
2.
Age
Mark
Kristian
Joey
Blair
Mathematics A30
2
7
12
16
Weight
(kg)
15
23
35
73
53
Percentiles
90th
50th
25th
82nd
Height
(cm)
88
62
160
173
Percentile
75th
5th
92nd
50th
Lesson 1
Exercise 1.4
1.
Coronach:
Mean: 235.5
Mode: no mode
Median: 231
1st quartile: 137
3rd quartile: 342.5
2.
Estevan:
Mean: 607.25
Mode: no mode
Median: 616
1st quartile: 570
3rd quartile: 689
Mathematics A30
•
The means show that on average, the traffic through
Estevan is much greater than through Coronach.
•
In both parts, the box and whisker plots are fairly
symmetrical. This shows that either the mean or the
median is a good indicator of central tendency.
•
The data for Coronach is closer together showing a steady
stream of traffic through this port.
•
The middle 50% of data for Estevan is close together, but
the whiskers extend out farther showing months where
the traffic is extremely heavy, or at the lower end, not
heavy at all.
•
You may also have more conclusions from this data.
54
Lesson 1
Mathematics A30
Module 1
Assignment 1
Mathematics A30
55
Assignment 1
Mathematics A30
56
Assignment 1
Staple here to the upper left
corner of your assignment
Before submitting your assignment,
please complete the following
procedures:
Print your name and address, with postal code. This address
sheet will be used when mailing back your corrected assignment.
1. Write your name and address and the
course name and assignment number
in the upper right corner of the first
page of each assignment.
Student Number
2. Number all the pages and place them
in order.
3. Complete the required information
details on this address sheet.
4. Staple this address sheet to the
appropriately numbered assignment.
Use one address sheet for each
assignment.
Name
8404
Course Number
01
Assignment Number
Street Address or P.O. Box
Mathematics A30
Course Title
City/Town, Province
Country
Distance-Learning Teacher’s Name
5. Staple the appropriately numbered
Assignment Submission Sheet to the
upper left corner, on top of this address
sheet.
Postal Code:
Mark Assigned:
Mathematics A30
58
Assignment 1
Assignment 1
Values Question
Part A can be answered in the space provided. You also have the option to do the
remaining questions in this assignment on separate lined paper. If you choose this option,
please complete all of the questions on the separate paper.
(8)
(8)
A.
1.
Hand in Activity 1.12.
2.
The production of flaxseed in Saskatchewan for the years 1980-1995 in thousands
of tonnes was as follows: (The data has been organized in an array.)
122, 123, 128, 145, 146, 160, 191, 206, 212, 230, 237, 260, 263, 306, 312, 407
Mathematics A30
a.
Label the 1st and 3rd quartiles of the data and the median of the data.
b.
Underline all the data in the first quarter (the data below the first quartile)
and double underline all the data in the fourth quarter (the data above the 3rd
quartile).
59
Assignment 1
(8)
Year
Pop.
3.
‘82
74
Mathematics A30
The population of sheep and lambs in Saskatchewan for the years
1982-1995 in thousands was as follows:
‘83
66
‘84
62
‘85
57
‘86
53
‘87
59
‘88
64
‘89
72
‘90
83
a.
Use a stem-and-leaf plot to order the data.
b.
Draw a box and whisker plot for the data.
60
‘91
92
‘92
85
‘93
82
‘94
79
‘95
83
Assignment 1
(8)
4.
Mathematics A30
a.
Describe in detail a simulation using one die to select one person from a
group of 2 persons. In your simulation does each person have an equal
chance of being selected?
b.
Describe in detail a simulation using a table of random numbers to select
three persons from a group of 12 persons.
61
Assignment 1
(8)
5.
Katrina got 75% on a mathematics exam. Her mark included with the marks of the
other students in her class were:
75, 75, 53, 64, 70, 79, 92
55, 65, 65, 44, 60, 89
71, 69, 68, 61, 60, 60
Calculate her percentile rank in the class.
(8)
6.
For the given data determine P45 , P50 , and P90 .
21, 13, 35, 19, 35, 10, 17, 56, 35, 20, 50, 22, 13, 9,
39, 22, 0, 2, 24, 38, 6, 5, 44, 22, 24, 25, 3, 13,
6, 12, 13, 33, 59, 10, 7, 63, 9, 25, 1, 15
Mathematics A30
62
Assignment 1
(8)
7.
Determine the mean, median, and mode for the data in Question 6.
(8)
8.
Select a population of 100 digits from the table of random numbers in Section 1.1
and determine the frequency with which each digit occurs. State the probability of
each digit occurring if a digit is selected at random from the table.
Be sure to organize your data in an appropriate table.
Mathematics A30
63
Assignment 1
Answer Part B and Part C in the space provided. Evaluation of your solution to each
problem will be based on the following.
(30)
B.
•
A correct mathematical method for solving the problem is shown.
•
The final answer is accurate and a check of the answer is shown where asked for by
the question.
•
The solution is written in a style that is clear, logical, well organized, uses proper
terms, and states a conclusion.
1.
The grade twelve students at Grassy Side High wrote an exam on probability and
after an additional review of the material wrote a similar exam one week later. The
scores for both exams are listed.
EXAM #1 (March 20)
67
0
37
63
59
95
15
61
59
20
29
32
41
50
45
35
100
71
65
43
70
70
41
43
71
29
57
46
89
38
66
56
91
46
30
50
96
65
40
50
64
62
5
60
97
85
5
91
90
89
26
55
54
95
45
50
53
85
50
47
50
61
55
50
EXAM #2 (March 27)
95
64
85
84
Mathematics A30
35
34
51
97
40
46
55
96
52
50
86
81
64
Assignment 1
a.
Mathematics A30
Show a stem and leaf plot for the data of each exam.
65
Assignment 1
Mathematics A30
b.
Write the data for each exam in an array.
c.
Represent each set of data in a box-and-whisker plot.
66
Assignment 1
Mathematics A30
d.
How many students are above or equal to the 90th percentile in each of the
exams?
e.
Determine P60 for each exam.
f.
If a student made 70% in the first exam, in what percentile is this mark? If
this student hoped to remain in at least the same percentile in the next exam,
what score would have to be made on the next exam?
67
Assignment 1
(6)
C.
1.
(STUDENT JOURNAL)
Each lesson assignment will include a question called
STUDENT JOURNAL.
For each assignment you will be asked to present written material on some aspect
of the course.
For this assignment:
a.
Write one or two paragraphs introducing yourself including a reason for
taking this course and what you would like to learn from this course.
b.
Summarize the content of this lesson into a single page. Do this in a way
which may be used to review at a later date.
_____
100
Mathematics A30
68
Assignment 1