Download Unit 2: Data Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Time series wikipedia , lookup

Transcript
5
Unit 2: Data Analysis
Normal Distributions
Objectives:
B.1 To describe and illustrate normal and skewed distributions using real world examples.
B.2 To calculate the standard deviation of a set of data using the formula for a population;

 (x  x )
2
n
B.3 To use the Standard deviation to interpret data represented by as normal distribution.
Measures of Central Tendency
Statistics is the branch of mathematics concerned with manipulating groups of numerical
facts so as to present significant information about the subject or source of the data.
As members of the information age we are subjected to statistics on a daily basis. Graphs of
stock prices, probabilities of developing cancer, even your average in math class are all examples
of statistical analysis.
Lists of numbers, such as test scores, are often represented on a graph such as a histogram or
a frequency distribution.
For example; a basket ball team with 12 players, their heights in centimeters are given in the
tables below;
Height(cm)
Frequency
175
1
0.5
1
1.5
2
2.5
180
2
181
2
184
3
185
2
188
1
191
1
High School Basketball Team
Frequency
High School Basketball Team
Frequency
2.5
2
1.5
1
0.5
2.5
2
1.5
1
0.5
175 180 181 184 185 188 191
Height (cm)
175 180 181 184 185 188 191
Height (cm)
Each of the diagrams above gives the same information. The vertical axis on both graphs lists
the number of times that each measurement appears in the table.
Unit 2: Data Analysis
Three measures of central tendency may be used;
1. The Mean is the average of all of the numbers. Here it is 183.2 cm.
2. The Median is the middle number in the list. Here it is 184 cm.
3. The Mode is the number that appears most often in the list. Here it is 184 cm.
It is also useful to compare the spread of a set of data;


The range is the difference between the lowest number (here 175) and the highest
number (here 192). The range of the basketball player’s heights is 192 - 175 = 17
The Deviation From the Mean is the difference between an individual data point and the
mean.
Graphing the distribution of data.
In a normal distribution of data, the shape of the frequency distribution graph represents a
bell shaped curve;
Frequency
Mean
Median
Mode
Domain
Notice that the mean, median and mode all occur at the center of the bell curve.
In a Skewed distribution, the mean and mode are not at the center.
The distribution on the right is Skewed to the Right or
Positively Skewed. The mean is higher in value than the
median.
This distribution is characterized by extreme values to
the right. For example, a graph of the heights of university
athletes would be skewed to the right by the heights of the
basketball players.
Frequency
The distribution to the right is Skewed to the Left or
Negatively Skewed. The mean is lower in value than the
median.
This distribution is characterized by extreme values to
the left. A graph of the the marks in math class might be
skewed to the left if one or two students did not attend any
classes.
Frequency
Mode
Mean
Median
Domain
Mode
Mean
Median
Domain
In our example of the high school basketball team, we can see that the data is slightly skewed
to the left because the mean(183.2 cm) is less than the median value(184). The data is skewed
this way because one student whose height is 175 cm is considerably shorter than the rest of the
students.
-23-
Unit 2: Data Analysis
Finding Mean, median and Mode.
The mean can be found by adding all of the data points and dividing by the number of data
points. This can be written in Sigma() notation as;
n
x
i
1 n
 xi
n i 1
n
Where x is the average or mean of all x, and xi is the ith data point of n points in total.
x
n
 x means (x
i 1
i
1
i 1
or x 
+ x2 + x3 +…+xn) or the sum of all values of x.
The Median is found by listing the data and finding the middle term when arranged in order
of size. If there is an even number of data points, the average of the two middle values is
calculated.
Example for the data; ( x1 , x2 , x3 , x4 , x5 ) , x3 would be the median.
x x
For the data; ( x1 , x2 , x3 , x4 , x5 , x6 ) , 3 4 would be the median.
2
The mode is easily found by selecting the value which appears most often in the data. If no
value appears more often than the others, then there is no mode.
Example: At an independent testing agency, the noise levels were measured from the operator’s
seat of several different makes of self propelled swathers. The raw data is as follows, all
measurements are in decibels;
92, 88, 84, 84, 90, 90, 87, 89, 87, 91, 95, 90, 87, 89, 90, 81
a. List this data in the form of a Histogram.
b. Calculate the mean, median, and mode for this data.
c. Is this a normal distribution or is the data skewed?
Solution:
b. Mean:
a.
Swather Noise Levels
1 n
1
1414
x

xi  (92  88  ...  90  81) 
 88.375
Frequency

n i 1
16
16
4
Median:(81,84,84,87,87,87,88,89,89,90,90,90,90,91,92,95)
89  89
 89
2
3
2
1
80 82 84 86 88 90 92 94 96 98
Noise (dB)
Mode: = 90
c. Since the mean is less than the median , this data is
skewed to the left or negatively skewed.
-24-
Unit 2: Data Analysis
TI-82 Calculator:
To enter data into the lists press [STAT] 1 (choose edit). Enter each data point into the list
and press [ENTER] to confirm. Data can also be entered using the curly brackets {} and [STO]
make sure the terms are separated by commas.
To clear a list from the edit screen press the up arrow until the list name (L1, L2…) is
highlighted press [CLEAR] and [ENTER]
To calculate mean or median for a list return to the main screen, press [2nd]-[STAT] []
(math menu) and choose 3:mean( or 4:median( from the list. Enter the list name ([2nd]-1 to 6)
and press [ENTER]
To sort a list press[2nd]-[STAT] and choose 1:sortA( Enter the list name([2nd]-1 to 6) and
press [ENTER]
Example: From our swather example press [STAT] 1 and then enter the data into L1. Press
[ENTER] after each value. Alternate method: Enter {92,88,84,…,95}[STO] L1. Curly
brackets are [2nd][(] and [2nd][)], and L1 is [2nd][1].
To find the mean press [2nd][STAT][] to get to the List Math menu. Choose 3:mean( and
press [ENTER] then indicate which list by pressing [2nd][1] to get L1.The result should be
88.375.
To find the median repeat the last sequence choosing 4:median( instead of 3:mean. The result
should be 89.
To find the mode one can sort the list by pressing [2nd][STAT][1] to choose SortA( and then
indicate which list by pressing [2nd][1] to get L1. You can then examine the list by pressing
[STAT][1] to see the list. 90 is the number with the most entries.
Practice Questions 1: Measures of Central Tendency
1. In a Math 30B class, the marks were as follows;
64, 68, 73, 47, 58, 76, 73, 82, 66, 55, 62, 71, 59, 62, 79, 86, 73, 65, 96, 68, 75, 78, 61, 74
Find the range, mean, median and mode for this data. Is this a normal distribution or is it
skewed?
2. The fifth hole in the Pleasant Valley Mini Golf course is an exceptionally tricky one. On one
long weekend the attendant, a very bored mathematics student, collected the following scores
from the players on hole number 5.
Saturday: 2, 2, 3, 3, 3, 4, 4, 5, 6, 6, 6,6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10,
10, 10, 11, 11, 11, 11
Sunday: 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8,
8, 8, 9, 9, 9, 9, 10
Monday: 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7.
a. Draw a frequency distribution graph for each of these days.
b. Find the range, mean median and mode for each day.
c. Are these normal distributions or skewed distributions.
d. What inferences might be drawn from this data?
-25-
Unit 2: Data Analysis
3. An RCMP officer clocking speeds in a radar trap records the following speeds in km/h;
55, 45, 40, 50, 52, 56, 58, 48, 50, 30, 35, 62, 70, 52, 53, 36, 55, 52, 60
a. Find the range, mean median and mode for this data.
b. Is this data in a normal distribution or skewed? Why?
4. The table on the right lists the percentage
change in employment by industry in
Saskatchewan from 1990 to 1996. This
data was gathered by the Statistics Canada
Labour Force Survey, 1997.
Change In Employment by Industry,
Saskatchewan, 1990 to 1996
Industry
Percent Change
Manufacturing
Other Primary Industries
Health and Social services
Trade
Transport, and Utilities
Accommodation & Food
Business Services
Education
Logging and Forestry
Other Services
Finance, Insurance Realty
Construction
a. Find the range, mean, median and
mode for this data.
b. Which measure of central tendency
would you use to describe this data if you
were,
i) Premier of the province? Why?
+4.3%
+4.1%
+3.9%
+2.4%
+1.9%
+1.6%
+1.6%
+1.4%
+1.3%
+0.8%
-0.2%
-2.4%
ii)Leader of the opposition? Why?
Percentage of Water Content
For Some Common Foods
5. The Percentage of water content for some
common foods is given in the table to the
right;
Food
Cucumber
Tomato (raw)
Celery
Milk (skim)
Orange
Milk (whole)
Apple
Banana
Egg (raw)
Spaghetti (cooked)
Cheese (cheddar)
Bread (white)
Bread (white toasted)
Bacon (broiled, drained)
Crackers (saltine)
Lard
a. State the range, mean, median and
mode for this data
b. Which measure of central tendency
would be best to use in a nutritional
information brochure? Why?
Percent Water
95%
94%
94%
91%
89%
87%
84%
76%
74%
64%
37%
35%
24%
8%
4%
0%
6. The mean age of 25 students in a class is 17.2. When the 31 year old teacher enters the room
what is the mean age of the people in the room?
7. A marathon runner traveling at a steady pace notices that the number of runners who pass her
is the same as the number she has passed. Is her pace the mean, median or mode of the speed
of the runners?
-26-
Unit 2: Data Analysis
Variance and Standard Deviation
In a collection of numbers it may be useful to obtain a measure of the Deviation from the
mean score. The average deviation can be found by finding sum of the deviations of the
values and dividing by the number of values. In summation (sigma) notation;
 (x  x )
Average Deviation 
n
For our basketball team described on page 21,
175  183.2   180 183.2   180 183.2   ...  191 183.2 
Average Deviation 
12


 8.2    3.2    3.2    2.2    2.2    0.8   0.8  1.8  1.8   4.8   7.8
12
0.4
 0.03
12
For most sets, this number will be very close to zero because the deviations above the
mean will cancel the deviations below the mean. For that reason, we find it more useful to
find the Variance; the sum of the squares of the deviations:
 ( x  x )2
Variance 
n
 8.2    3.2    3.2    2.2    2.2    0.8   0.8  1.8  1.8   4.8   7.8

2
2
2
2
2
2
2
2
2
2
12
189.68

 15.81
12
Notice that the effect of squaring each difference is to produce a positive result.
The Standard Deviation(S.D. or  is the positive square root of the variance;
S .D.    Variance 
 (x  x )
2
n
 15.81  3.98
Example:Two Math 30B classes scored the following marks on an exam;
Class A: 63, 73, 77, 44, 76, 89, 56, 23, 81, 52, 67, 60, 84, 65, 73, 57, 66, 85, 75, 73, 72
Class B: 63, 78, 55, 76, 81, 66, 92, 83, 78, 77, 81, 73, 81, 72, 51, 54, 62, 62, 63, 77
a. Find the mean, median and mode for each class.
b. Find the range for each class.
c. Find the standard deviation from each class.
d. Using your knowledge of statistics, comment about the class performance
-27-
2
Unit 2: Data Analysis
Solution:To set up the standard deviation it is best to use a table.
Class A
xx
x
n
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Total
Average
Mark
23
44
52
56
57
60
63
65
66
67
72
73
73
73
75
76
77
81
84
85
89
1411
67.19
Deviation
-44.19
-23.19
-15.19
-11.19
-10.19
-7.19
-4.19
-2.19
-1.19
-0.19
4.81
5.81
5.81
5.81
7.81
8.81
9.81
13.81
16.81
17.81
21.81
0
0
x  x
2
Class B
xx
x  x
51
54
55
62
62
63
63
66
72
73
76
77
77
78
78
81
81
81
83
92
Deviation
-20.25
-17.25
-16.25
-9.25
-9.25
-8.25
-8.25
-5.25
0.75
1.75
4.75
5.75
5.75
6.75
6.75
9.75
9.75
9.75
11.75
20.75
Variation
410.06
297.56
264.06
85.56
85.56
68.06
68.06
27.56
0.56
3.06
22.56
33.06
33.06
45.56
45.56
95.06
95.06
95.06
138.06
430.56
1425
71.25
0
0
2343.75
117.19
x
Variation
1952.76
537.78
230.74
125.22
103.84
51.70
17.56
4.80
1.42
0.04
23.14
33.76
33.76
33.76
61.00
77.62
96.24
190.72
282.58
317.20
475.68
4651.24
221.49
Mark
2
Class A:
 ( x)
Mean 
n
1411

 67.19
21
Median = 72
Mode= 73
Range = 89-23 = 66
Class B:
 ( x)
Mean 
n
1425

 71.25
20
Median
73  76
 74.5
=
2
Mode = 81
Range = 92-51 = 41
a. Class A: x = 67.19, Median = 72, Mode = 73
Class B: x = 71.25, Median = 74.5, Mode = 81
b. Class A: Range = 66, Class B: Range = 41
20
30
40
50
60
70
80
90
100
0.5
1
1.5
2
2.5
c. Class A: S.D. =  
 (x  x )
Class B: S.D. =  
x  x)
20
30
40
50
60
70
80
90
100
0.5
1
1.5
2
 (2.5
2
n
n

4651.24
 14.88
21

2343.75
 10.83
20
2
d. Class A shows a wider range of abilities than class B. Either the students in class A are
inconsistent in their work or attendance, or the teacher of class B is better at conveying the
material. In both classes the data is negatively skewed because a few students did poorly.
Class A Marks
Frequency
Class B Marks
Frequency
2.5
2
1.5
1
0.5
2.5
2
1.5
1
0.5
2030405060708090100
Mark
2030405060708090100
Mark
-28-
Unit 2: Data Analysis
TI-82 Calculator:
To find the variance or standard deviation on the calculator, enter the list of data as described
on page 24. Press [STAT][] to access the calc menu. Select 1:1-Var Stats. Enter the name of the
list that you wish to obtain stats for (L1 to L6) and press [ENTER]. The calculator will then
x=
display;
the mean for the list,
 x = ...........the sum of all data points,
x
2
= ........the sum of the squares of the data points,
Sx = .............the standard deviation for a sample,
 x = .............the standard deviation for a population (the one we want),
n= ..............the number of data points,
minX= ........the minimum value,
Q1= .............the first quartile,
Med=...........the median,
Q3= .............the third quartile, and
maxX= .......the maximum value.
You will have to use the up and down arrows to see all of the information.
Practice Questions 2: Standard Deviation
1. Boards from a rail shipment are selected at random and measured. The following measures
(in meters) are obtained:
4.01, 3.96, 4.05, 3.92, 3.95, 3.98, 4.08, 4.03, 4.03, 3.98
To pass inspection,the boards’ average length must be within 1 cm of 4.0 m, and the standard
deviation must not exceed 0.048 m. Does this shipment pass inspection?
2. The number of hours on different days that a machine is in operation is given below:
12 h, 18 h, 15 h, 6 h, 4 h, 17 h, 10 h, 13 h, 10 h, 7 h, 16 h, 11 h, 4 h
a. Calculate the mean.
b. Find the standard deviation
3. A commuter recorded the number of minutes spent waiting for a bus on each working day
for two weeks:2, 4, 13, 5, 8, 5, 7, 11, 7, 8
a. Calculate the mean.
b. Find the standard deviation
4. Because of water shortages during a drought in a Canadian city, watering of lawns and
gardens was permitted only from 06:00 to 09:00 and 19:00 to 22:00. The peak consumption
was recorded (as a percentage of total capacity) each watering period for a week in July:
88, 95, 65, 94, 67, 94, 75, 93, 77, 100, 85, 100, 85, 100
a. Calculate the mean.
b. Find the standard deviation
c. Water pressure problems occur when 95% of capacity is reached, on what percentage
of days did this happen?
5. Determine the standard deviation for the following sets of data.
a.
Value
3
4
5
6
7
Frequency
2
8
9
6
3
b.
-29-
Value
8
10
12
16
25
Frequency
2
3
5
4
1
3SD = 68.26%
2SD
1SD
99.74%
95.44%
Unit 2: Data Analysis
Standard Deviation and Normal Distribution
When data with a normal distribution is plotted on a graph in a frequency distribution it
forms the familiar Bell Curve with the mean located at the highest point of the curve.
In the following graph of a normal distribution the graph is divided into standard deviations
on either side of the mean.
Normal Distribution
Relative
Frequency
3SD = 99.74%
2SD = 95.44%
1SD = 68.26%
34.13%
34.13%
2.15%
2.15%
13.59%
-3 SD
-2 SD
13.59%
-1 SD
x
+1 SD
+2 SD
+3 SD
This graph has the properties that:
• 68.26% of the data is within 1 standard deviation of the mean.
-The area between the mean and 1 S.D. will hold 1/2 of 68.26% or 34.13%
-The area between the mean and -1 S.D. will also hold 34.13% of the data.
• 95.44% of the data is within 2 standard deviations of the mean.
-The area between 1 S.D. and 2 S.D. will hold 13.59% of the data.
-The area between -1 S.D. and -2 S.D. will hold 13.59% of the data.
• 99.74% of the data is located within 3 standard deviations of the mean.
-The area between 2 S.D. and 3 S.D. will hold 2.15% of the data.
-The area between -2 S.D. and -3 S.D. will hold 2.15% of the data.
Example: Mr. Krusties Cookies are randomly sampled to see how many chocolate chips each
cookie contains. According to the findings, the mean number of chips per cookie is 7.3 with
a standard deviation of 2.3. If we assume that the entire sample falls within a normal
distribution find:
a. The percentage of cookies with more than 7.3 chips.
b. The percentage of chips with fewer than 5 chips.
c. The percentage of cookies with more than 2.7 chips and fewer than 11.9 chips.
Solution: a. Because the normal distribution is symmetrical about the mean, exactly
50% of the cookies will have more than 7.3 chips.
1
2
or
b. Those cookies with fewer than 5 chips are more than 1 S.D less then the mean.
Since 50% of the cookies are greater than the mean and 34.17% of the cookies are between
the mean and -1 S.D. then 100% -(50% + 34.17%) =15.83% of the cookies have fewer than
5 chips.
-30-
Unit 2: Data Analysis
c. The percentage of cookies with more than 2.7 chips and less than 11.9 chips are
those within 2 S.D. of the mean. That would be 95.44%.
Example: In a recent provincial exam, the marks for 30B Math had an average of 62% with a
standard deviation of 6%. If 2500 students wrote the exam:
a. How many students scored above 62%?
b. How many students failed the exam?
c. How many students scored higher than 80%?
Solution: To calculate % of a number, convert the percentage to a decimal and multiply.
a. If 62% is the mean score then half or 50% of the students scored higher.
50 % of 2500 = 0.50  2500 = 1250.
1250 students scored above 62%
b. 50% is 2 S.D. below the mean score, the number of students below that would be:
100% - (13.59% + 34.13% + 50%) = 2.28%.
2.28% of 2500 = 0.0228  2500 = 57 students.
57 students failed the exam.
c. 80% is 2 S.D. above the mean score, the number of students above that would be:
100% - (13.59% + 34.13% + 50%) = 2.28%.
2.28% of 2500 = 0.0228  2500 = 57 students.
57 students scored above 80%. NB. This is the same answer as for b.
Practice Questions 3. Standard Deviation and Normal Distributions.
1. The mean score on a math exam was 65 and the standard deviation was 10. If the data is
normally distributed:
a. What percentage of students scored between 55 and 75?
b. What percentage of students scored between 45 and 85?
c. if 60 students wrote the exam, how many scored between 55 and 75?
2. Consumer testing has shown that the life of a hair dryer under daily use averages 6.5 years.
The data is normally distributed with a standard deviation of 1.5 years. If a retail store sells
5000 of the hair dryers with a 2 year guarantee, how many will they have to replace?
3. The mean mass of game fish in lake Magalloway is determined to be 2.5 kg with a standard
deviation of 0.75 kg. There are about 1000 fish in the lake:
a. How many are between 1 kg and 4 kg?
b. If fish with a mass of less than 1.75 kg must be released, how many of the fish in the
lake must be thrown back?
4. An I.Q. test was given to all members of the armed forces. The results were normally
distributed with a mean of 110 and a standard deviation of 15.
a. What percentage of scores were above 125?
b. What percentage of scores were below 80?
c. If 75 000 personnel took the test, how many scored above 140?
d. How many scored between 110 and 125?
-31-
Unit 2: Data Analysis
5. An egg producer has determined that the mass of eggs produced by his chickens averages
150 g with a standard deviation of 12.5 g. In his daily production of about 1440 eggs:
a. How many mass more than 175 g?
b. How many mass between 137.5 g and 162. 5 g?
6. The time required to register for university is found to average 33 minutes with a S.D. of 6
minutes. What percentage of registrations will last;
a. more than 45 minutes?
b. Less than 27 minutes?
c. between 21 and 39 minutes?
Z-Scores and Data Analysis
Objectives:
B.4 To define and calculate z-scores using the formula z 
xx

or
xx
S .D.
B.5 To be able to use z-scores as an aid in interpreting data.
B.6 To solve real-world problems using statistical inference.
Notes:
In order to compare different scores within one set of data or to compare scores from
different sets of data with different means and standard deviations we can use z-scores.
For any value (x) in a set of data, the z-score, (z), can be determined by subtracting the mean
( x ) from the value, (x), and dividing by the standard deviation ( or S.D.)
xx
xx
z
or z 

S .D.
The z gives the distance from the mean as a multiple of the standard deviation.
Example: 16 students entered a free throw competition. Each competitor attempted 20 shots.
Their scored were as follows:
6, 9, 7, 8, 10, 15, 7, 11, 9, 14, 12, 11, 13, 10, 11, 7
Determine the z-score for Ryan who scored 7 and for Janet who scored 15.
Solution: Calculate the mean;
x
 x  (6  9 
n
 7)
16

160
 10
16
Find the standard deviation;

 (x  x )
n
2

(6  10)2  (9  10)2 
16
Ryan’s z-score;
z
Janet’s z-score;
z
xx

xx

 (7  10)2

7  10
3

 1.17
2.57 2.57

15  10
5

 1.95
2.57
2.57
-32-

106
 2.57
16
Unit 2: Data Analysis
From the example above we can see that Ryan was 1.17 standard deviations below the
average and Janet was 1.95 standard deviations above average.
We can use z scores and the table for areas under the standard normal distribution curve to
calculate percentiles or probabilities:
Table 2.1 on the next page lists the percentage of scores between the mean ( x ) and the score
that is z standard deviations from the mean.
x
For example a person with a z-score of 1.95; (see circled number on table 2.1)
We look on the row that is labeled 1.9 in the column labeled 0.05 for the value 0.4744
 The score is 37.08 % above the mean or the score is better than 50% + 47.44% = 97.44%
Table 2.1: Area Under Normal Distribution Curve.
x
z
z-score
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
O.00
0.0000
0.0398
0.0793
0.1179
0.1554
0.1915
0.2258
0.2580
0.2881
0.3159
0.01
0.0040
0.0438
0.0832
0.1217
0.1591
0.1950
0.2291
0.2612
0.2910
0.3186
0.02
0.0080
0.0478
0.0871
0.1255
0.1628
0.1985
0.2324
0.2642
0.2939
0.3212
0.03
0.0120
0.0517
0.0910
0.1293
0.1664
0.2019
0.2357
0.2673
0.2976
0.3238
0.04
0.0160
0.0557
0.0948
0.1331
0.1700
0.2054
0.2389
0.2704
0.2996
0.3264
0.05
0.0199
0.0596
0.0987
0.1368
0.1736
0.2088
0.2422
0.2734
0.3023
0.3289
0.06
0.0239
0.0636
0.1026
0.1406
0.1772
0.2123
0.2454
0.2764
0.3051
0.3315
0.07
0.0279
0.0675
0.1064
0.1443
0.1808
0.2157
0.2486
0.2794
0.3078
0.3340
0.08
0.0319
0.0714
0.1103
0.1480
0.1844
0.2190
0.2518
0.2823
0.3106
0.3365
0.09
0.0359
0.0754
0.1141
0.1517
0.1879
0.2224
0.2459
0.2852
0.3133
0.3389
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
0.3413
0.3643
0.3849
0.4302
0.4192
0.4332
0.4452
0.4554
0.4641
0.4713
0.3438
0.3665
0.3869
0.4049
0.4207
0.4345
0.4463
0.4564
0.4649
0.4719
0.3461
0.3686
0.3888
0.4066
0.4222
0.4356
0.4474
0.4574
0.4656
0.4726
0.3485
0.3708
0.3907
0.4082
0.4236
0.4370
0.4484
0.4582
0.4664
0.4732
0.3508
0.3729
0.3925
0.4099
0.4251
0.4382
0.4495
0.4591
0.4671
0.4738
0.3531
0.3749
0.3944
0.4115
0.4265
0.4394
0.4505
0.4599
0.4678
0.4744
0.3554
0.3770
0.3962
0.4131
0.4279
0.4406
0.4515
0.4608
0.4686
0.4750
0.3577
0.3790
0.3980
0.4147
0.4292
0.4418
0.4525
0.4616
0.4693
0.4756
0.3599
0.3810
0.3997
0.4162
0.4306
0.4429
0.4535
0.4625
0.4699
0.4761
0.3621
0.3830
0.4015
0.4177
0.4319
0.4441
0.4545
0.4633
0.4706
0.4767
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
0.4772
0.4821
0.4861
0.4893
0.4918
0.4938
0.4953
0.4965
0.4974
0.4981
0.4778
0.4826
0.4864
0.4896
0.4920
0.4940
0.4955
0.4966
0.4975
0.4982
0.4783
0.4830
0.4868
0.4898
0.4922
0.4941
0.4956
0.4967
0.4976
0.4982
0.4788
0.4834
0.4871
0.4901
0.4925
0.4943
0.4957
0.4968
0.4977
0.4983
0.4793
0.4838
0.4875
0.4904
0.4927
0.4945
0.4959
0.4969
0.4977
0.4984
0.4798
0.4842
0.4878
0.4906
0.4929
0.4946
0.4960
0.4970
0.4978
0.4984
0.4803
0.4846
0.4881
0.4909
0.4931
0.4948
0.4961
0.4971
0.4979
0.4985
0.4808
0.4850
0.4884
0.4911
0.4932
0.4949
0.4962
0.4972
0.4979
0.4985
0.4812
0.4854
0.4887
0.4913
0.4934
0.4951
0.4963
0.4973
0.4980
0.4986
0.4817
0.4857
0.4890
0.4916
0.4936
0.4952
0.4964
0.4974
0.4981
0.4986
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
0.4987
0.4990
0.4993
0.4995
0.4997
0.4998
0.4998
0.4999
0.4999
0.5000
0.4987
0.4991
0.4993
0.4995
0.4997
0.4998
0.4998
0.4999
0.4999
0.5000
0.4987
0.4991
0.4994
0.4995
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.4988
0.4991
0.4994
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.4988
0.4992
0.4994
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.4989
0.4992
0.4994
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.4989
0.4992
0.4994
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.4989
0.4992
0.4995
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.4990
0.4993
0.4995
0.4996
0.4997
0.4998
0.4999
0.4999
0.4999
0.5000
0.4990
0.4993
0.4995
0.4997
0.4998
0.4998
0.4999
0.4999
0.4999
0.5000
-33-
Unit 2: Data Analysis
-34-
Unit 2: Data Analysis
Z-score questions come in two varieties:
In the first type we are asked to find out which score is better as compared to the mean in two
data sets.
Example: The annual daily mean temperature in Saskatoon is 3.5_C with a standard deviation of
6.75_C. The annual daily mean temperature in Regina is 3.1_C with a standard deviation of
10.6_C. One spring day, the temperature is 11_C in Saskatoon and 12_C in Regina. Which
city experienced a better than average day with respect to the mean temperature?
Solution:Calculate the z-score for each city;
x  x 12  3.1 8.9
x  x 11  3.5 7.5


 0.84 Saskatoon- z 


 1.11
Regina- z 

10.6
10.6

6.75
6.75
The temperature in Regina was only 0.84 standard deviations above the mean while
Saskatoon was 1.11 standard deviations above the mean. Saskatoon was having a better day.
In the second type we may be asked to use the table of the area under the normal distribution
curve to calculate the probability of a certain score. The curriculum guide is unclear about
whether or not this kind of question should be included.
x
Example 1: Determine the probability of the following event using z-scores.
a. greater than z = -2.1
b. Between z = -1.3 and z = 1.3
Solution: a. From table 2.1, we see that the area between z = -2.1 and the mean is 0.4821. this
corresponds to 48.21% probability between -2.1 S.D and the mean.(see graph)
x
z=-2.1 x
The probability is, therefore, 48.21% + 50% = 98.21%. Remember that the area above
the mean equals exactly half or 50% of the scores.
b. From table 2.1 we see that the area between z = 1.3 and the mean is 0.4032. The area
between z = -1.3 and the mean will also be 0.4032. (see graph)
z=-1.3 x
z=1.3
The probability is 40.32% + 40.32% = 80.64%.
Example 2: The manufacturer of watch batteries uses random testing to determine the life of the
batteries. They discovered that they would last 2.3 years on average with a standard
deviation of 0.79. What is the probability that a consumer can get a battery that dies in less
than 1 year?
Solution: The z-score for a battery that lasts one year is;
x  x 1  2.3 1.3
z


 1.65

0.79 0.79
-35-
x
Unit 2: Data Analysis
From table 2.1 we see that the area between z=1.65 and the mean is 0.4505. From the
diagram below we can see that the probability is 50% - 45.05% =4.95%
z=-1.63 x
Practice Questions 4: z-scores and Standard Distributions
1. A on a recent Biology 30 final exam, the provincial average was 68.3% with a standard
deviation of 11.3% Calculate the z-scores for the following students:
a. Mark who earned 81%
c. Leslie who scored 71%
b. Freddie who earned 54%
d. Annette who scored 92%
2. Calculate the percentile for Freddie and Annette from question 1. (Percentile is the
percentage of students who scored the same or worse.)
3. In the last season the LittleRock Rockers Hockey team scored the following number of goals
per game:
3, 7, 6, 2, 9, 5, 11, 3, 5, 6, 6, 9, 2, 1, 0, 6, 5, 7, 4, 5, 2, 3, 7, 6, 2, 1
a. Calculate the range, mean median and mode for this data.
b. Calculate the standard deviation for this data.
c. Find the z-score for the playoffs where they scored 6 goals, 2 goals and 1 goal.
d. What was the z-score for their best game?
4. For the first three units of Mathematics B30, Elaine’s marks were 81%, 69%, and 73%
respectively. The mean and standard deviation of the unit marks in her class were;
Unit I: x = 75%
Unit II: x = 63%
Unit III: x = 74%
 = 8%
 = 7%
 = 8%
Compared to the rest of the students taking the course, in which unit did Elaine do the
best?
5. Find the probability of he following;
a. a z-score of more than 2.1
c. a z-score between -2.5 and +1.3
b. a z-score of more than -1.5
6. Consumer reports show that one brand of compact cars is able to travel an average of
120000 km before brake service is required. If the standard deviation is 15000 km, what is
the probability that a car will need brake servicing before 100000 km?
7. If standardized testing shows that Canadians have an average I.Q. of 100 with a standard
deviation of 15, how many Canadians (out of 30 000 000) would be likely to have an I. Q.
higher than 150?
-36-
Unit 2: Data Analysis
8. In a recent study, a science student tested a new type of plant food on Geranium plants. He
collected the following data;
Group A growth; 8.1 cm, 7.3 cm, 2.5 cm, 9.3 cm, 7.1 cm, 7.5 cm, 8.0 cm, 7.6 cm, 6.9 cm,
7.5 cm.
Group B growth; 4.3 cm, 5.2 cm, 6.5 cm, 4.9 cm, 5.5 cm, 5.9 cm, 6.8 cm, 4.0 cm, 9.0 cm,
5.7 cm, 6.2 cm.
a. Calculate the mean, median, and range for each data set.
b. Are the data skewed?
c. Find the standard deviation for each data set.
d. Calculate the z-score for the greatest and least growth in each data set.
e. At the end of the study, the student found one plant with no label. If its growth was
6.0 cm, which data set is it most likely to belong to? (lowest z-score)
f. If Group A used the new plant food, what conclusions can the student make?
-37-
8
10
1
2
3
4
5
6
8 2: Data Analysis
10
Unit64212
Unit 2 Solutions
Practice Questions 1: page 24
1. Range = 49, Mean = 69.63, Median = 69.5, no mode, This is very close to normal but
skewed slightly right.
2. a.
Saturday
Frequency
6
5
4
3
2
1
5
4
3
2
1
2 4 6 8 10 12
Score
3.
4.
5.
6.
Sunday
Frequency
Monday
Frequency
6
4
2
2 4 6 8 10
Score
2
4
6 8 10 12
Score
b. Sat: Range = 9, Mean = 7.35, Median = 8, no Mode c. Skewed - left
Sun: Range = 8, Mean = 8.3, Median = 6, Mode = 6
Skewed - right
Mon: Range = 6, Mean = 4.2, Median = 4, Mode = 5
Not skewed (much)
d. Varies.
a. Range = 40, Mean = 50.47, Median = 52, Mode = 52
b. Skewed very slightly to the left because the mean is less than the median.
a. Range = 19.6%, Mean = 0.415%, Median = 1.6%, Mode = 1.6%
b. i) Premier prefers Median or mode because it sounds more prosperous.
ii) Opposition prefers Mean because it sounds worse.
a. Range = 95%, Mean = 56.7%, Median = 74%, no Mode b. Varies
Mean is 17.73 years.
7.She is the median because she is the middle runner.
Practice Questions 2: Page 28
1.  = 0.047, Yes the shipment passes.
2. a. Mean = 11 h b.  = 4.57 h 3. a. Mean = 7 min b.  = 3.10 min
4. a. Mean = 87% b.  = 11.47% c. Reached 100% on 4/7 or 42% of days.
5. a.  = 1.10 b.  = 4.18
Practice Questions 3: Page 30
1. a. 68.26% b. 95.44% c. 41 students
2. 0.13% of 5000 = 6.5dryers replaced.
3. a. 95.4% = 954 fish b. 15.87% = 159 fish
4. a. 15.87% b. 2.28% c. 2.28% of 75000 = 1,710 d. 34.13% of 75 000 = 25,598
5. a. 2.28% of 1440 = 33 eggs b. 68.26% of 1440 = 983
6. a. 2.28% b. 15.87% c. 81.85%
Practice Questions 4: Page 34
1. a. 1.5929 b. -1.2655 c. 0.2389 d. 2.0973 2. Freddie 10 th Annette 98 th
3. a. Range = 11, Mean = 4.73, Median = 5, Mode = 6 b.  = 2.70
c. z1 = 0.47, z2 = -1.01, z3 = -1.38 d. 2.32
4. Unit I -  = 0.75, Unit II -  = 0.86, Unit III -  = -0.13, She did best on Unit II
5. a. 1.8% b. 93.32% c. 92.4% 6. z = -1.3 6.98% 7. z = 3.33, 0.05% = 1,500 people
8. a. Group A: Mean = 7.25 cm, Median = 7.5 cm, Range = 6.8 cm
Group B: Mean = 5.81 cm, Median = 5.7 cm, Range = 5 cm
b. Group A skewed slightly left. Group B skewed very slightly right.
c. Group A:  = 1.62, Group B:  = 1.30
-38-
Unit 2: Data Analysis
d. Group A: Most z = 1.27, Least z = -2.93 Group B: Most z = 2.45, Least z = -1.39
e. zA = -0.77, zB = 0.146 The plant probably belongs to group B.
f. The student can conclude that the new plant food is effective.
-39-
8
10
2
4
6
8
10
2
4
6
Unit 2: Data Analysis
Unit 2: Review
1. Statistics is the branch of mathematics concerned with manipulating groups of numerical
facts so as to present significant information about the subject or source of the data.
2. Lists of data may be represented in tables, Histograms, or Frequency Distributions:
Histogram
Frequency Distribution
Frequency
Frequency
6
4
2
6
4
2
2
4
6
Domain
8
10
2
4 6 8
Domain
10
In each case, the vertical axis represents the number of scores at each value in the domain.
3. Measures of central tendency include;
Mean - the mathematical average of the data.
Median - the middle value in the data when listed in order. If there are an even number of
data points, the median is the average of the two middle values.
Mode - The value that appears most often.
4. In a Normal Distribution of data, the shape of a frequency distribution graph is a bell
shaped curve. The mean, median and mode are all located at the center or highest point of the
curve.
5. In a Skewed distribution, the mean and median have different values.
Skewed to the right or Positively Skewed data has the mean greater than the median.
Skewed to the left or Negatively Skewed data has the mean less then the median.
6. Measures of deviation include;
Range - the difference between the maximum and minimum values.
Variance - the average of the squares of the deviations from the mean:
 x  individual values
( x  x )2


Variance 
Where  x  the mean
n
n  the number of values

Standard Deviation (S.D. or ) - the square root of the variance:
 x  individual values
( x  x )2



Where  x  the mean
n
n  the number of values

-40-
Unit 2: Data Analysis
7. The area under the curve of a Normal Distribution represents the probability of a score
falling in that area of the data. It is distributed so that exactly 50% of the data falls above the
mean and 50% below the mean. If the domain is divided into standard deviations on either
side of the mean, the percentage of data falling in each section is as follows:
• 68.26% of the data is within 1 standard deviation of the mean.
-The area between the mean and 1 S.D. will hold 1/2 of 68.26% or 34.13%
-The area between the mean and -1 S.D. will also hold 34.13% of the data.
• 95.44% of the data is within 2 standard deviations of the mean.
-The area between 1 S.D. and 2 S.D. will hold 13.59% of the data.
3SD = 68.26%
2SD
1SD
99.74%
95.44%
-The area between -1 S.D. and -2 S.D. will hold 13.59% of the data.
• 99.74% of the data is located within 3 standard deviations of the mean.
-The area between 2 S.D. and 3 S.D. will hold 2.15% of the data.
-The area between -2 S.D. and -3 S.D. will hold 2.15% of the data.
This figure will be provided for every exam.
Normal Distribution
Relative
Frequency
3SD = 99.74%
2SD = 95.44%
1SD = 68.26%
34.13%
34.13%
2.15%
2.15%
13.59%
-3 SD
-2 SD
13.59%
-1 SD
+1 SD
x
+2 SD
+3 SD
8. Data from different sets can be compared using z-scores;
 x  the individual value
xx

z
, Where  x  the mean

  the standard deviation

Z-scores represent the number of standard deviations that a value falls away from the mean.
A table of areas under the normal distribution curve (Table 2.1, Page 32) can be used to find the
probability or the percentage as a decimal of the scores thet lie between a particular value and the
mean. If z-scores are to be used to find probabilities a table will be provided.
N.B. There is a second method for calculating not used in this course. The Standard deviation
for a Sample is;
s
 (x  x )
n 1
-41-
2
r
Unit 2: Data Analysis
Unit 2 Review Questions
1. A Television manufacturer advertises that the mean life of picture tubes in new TV sets is
10,000 h with a standard deviation of 1000 h. A local hotel has purchased 200 of the TV sets.
If we assume a normal distribution;
a. What percentage of the picture tubes should last more than 11,000 h?
b. What percentage of the picture tubes should last less than 8,000 h?
c. How many TVs should need to be repaired or replaced in the first 7,000 h?
2. A machine used to package candies in 90 g packages is thought to be faulty. A sample of 10
packages is randomly selected and their actual masses in grams are;
86, 91, 89, 88, 92, 90, 93, 90, 91, 90
a. Find the range, mean, median and mode.
b. Calculate the standard deviation.
c. If the machine is supposed to be within a standard deviation of 1.3 g, does this one
need repairs?
3. Scores on an I.Q. test are normally distributed with a mean of 100 and a standard deviation
of 15. How many students in a group of 750 would you expect to have an I.Q. higher than
130?
4. The life of a houshold blender is normally distributed with a mean of 7 years and a standard
deviation of 1.5 years. What is the probability that a blender will last;
a. longer than 8.5 years
b. less than 5.5 years.
c. between 4 and 5.5 years?
5. The ages of stunt persons are normally distributed with a mean of 28 years and a standard
deviation of 1.5 years.
a. What is the probability that a stunt person is over 31 years old.
b. Nine fingered Mary is 39 years old. What is her z score?
c. What is the probability that Mary will still be a stunt person in 5 years?
6. The mean height of North American women is 162 cm with a standard deviation of 4 cm. In
a group of 500 randomly chosen women, how many would you expect to be over 166 cm
tall?
7. The mean reaction time taken to apply the brakes on a car is 0.75 s with a standard deviation
of 0.05 s.
a. Calculate Edna’s z-score if her reaction time is 0.65 s.
b. What percentage of the population is faster than Edna?
c. Find the probability that a person has a reaction time between 0.70 s and 0.85 s.
8. Calculate the Standard deviation for the following sets of data.
a. 20, 24, 27, 29, 25, 21, 22
b. 6, 8, 10, 12, 9, 7, 7
9. The mean waiting time at a bank is 5 min with a standard deviation of 1.25 min.
a. What is the z-score for a person who waits 2.5 min?
b. What is the probability that a customer will wait between 3.75 min and 6.25 min?
-42-
Unit 2: Data Analysis
10. The total points scored in the Hyper Bowl final in each of the last five years were; 54, 50, 64,
35, and 47. Calculate the standard deviation.
11. A company claims that their aerosol cans each contain 175 g of spray disinfectant. The can
will not spray properly if the cans are more than 15 g over full, and the customer is cheated if
the cans are not full. A random sample of cans from one shipment were tested and found to
have the following weights; 175, 175, 179, 175, 175, 159, 180, 175, 177, 180.
a. Find the Standard deviation for the sample.
b. If 95% of the shipment is supposed to be within 2 standard deviations of the mean,
should this one be accepted?
Unit 2 Review Solutions
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
a. 16% b. 2.5% c. 0.26 TV’s (none)
a. Mean = 90 g b.  = 2 g c. Yes
17 Students
a. 15.7% = 0.157 b. 15.7% = 0.157 c. 13.59% = 0.1359
2.28% = 0.0228
80
a. z = -2 b. 2.28% c. 81.85% = 0.8185
a.  = 3.3 b.  = 2.1
a. z = -2 b. 68.26% = 0.6826
 = 10.56
a.  = 6.0 g b. No.
-43-