Download Chapter 1

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
© 2012 McGraw-Hill
Ryerson Limited
© 2009 McGraw-Hill Ryerson Limited
1
Lind
Marchal
Wathen
Waite
© 2012 McGraw-Hill Ryerson Limited
2
Learning Objectives
LO 1 Calculate the arithmetic mean, weighted mean,
median, mode, and geometric mean.
LO 2 Explain the characteristics, uses, advantages, and
disadvantages of each measure of central location.
LO 3 Identify the position of the mean, the median, and the
mode for both symmetric and skewed distributions.
LO 4 Compute and explain the range, the mean deviation,
the variance, and the standard deviation.
LO 5 Explain the characteristics, uses, advantages, and
disadvantages of each measure of dispersion.
© 2012 McGraw-Hill Ryerson Limited
3
Learning Objectives
LO 6 Explain Chebyshev’s theorem and the Empirical Rule
as they relate to a set of observations.
LO 7 Compute and explain the coefficient of skewness and
the coefficient of variation.
LO 8 Compute and interpret quartiles, deciles, and
percentiles.
LO 9 Construct and interpret box plots.
© 2012 McGraw-Hill Ryerson Limited
4
Introduction
© 2012 McGraw-Hill Ryerson Limited
5
Numerical Descriptive Measures
Measures of Location
•
•
•
•
•
Arithmetic Mean
Weighted Mean
Median
Mode
Geometric Mean
Measures of Dispersion
• Range
• Mean Absolute
Deviation
• Variance
• Standard Deviation
© 2012 McGraw-Hill Ryerson Limited
6
LO
1
THE POPULATION MEAN
© 2012 McGraw-Hill Ryerson Limited
7
Population Mean
For ungrouped data, the population mean is the sum of all
the population values divided by the total number of
population values.
The formula for Population Mean is :
Sum of all the values in the population
Population Mean =
Number of values in the population
LO
1
© 2012 McGraw-Hill Ryerson Limited
8
Population Mean
The mean of a population using mathematical symbols is:
Where:
μ represents the population mean. It is the Greek lowercase
letter “mu.”
N is the number of items in the population.
X represents any particular value.
Σ is the Greek capital letter “sigma” and indicates the
operation of adding.
ΣX is the sum of the X values.
LO
1
© 2012 McGraw-Hill Ryerson Limited
9
Example – Population Mean
There are 15 teams in the Eastern Conference of the NHL.
Listed below is the number of goals scored by each team in
the 2010–2011 season (www.nhl.com).
Team
Goals Scored
Team
Goals Scored
Philadelphia Flyers
256
Washington Capitals
219
Boston Bruins
244
Atlanta Thrashers
218
Tampa Bay Lightning
241
Toronto Maple Leafs
213
Buffalo Sabres
240
Montreal Canadiens
213
Carolina Hurricanes
231
Florida Panthers
191
Pittsburgh Penguins
228
Ottawa Senators
190
New York Islanders
225
New Jersey Devils
171
New York Rangers
224
What is the arithmetic mean number of goals scored?
LO
1
© 2012 McGraw-Hill Ryerson Limited
10
Solution – Population Mean
X


N
256  244  ...  190  171

15
3304

15
 220
LO
1
© 2012 McGraw-Hill Ryerson Limited
11
LO
1
THE SAMPLE MEAN
© 2012 McGraw-Hill Ryerson Limited
12
Sample Mean
For ungrouped data, the sample mean is the sum of all the
sample values divided by the total number of sampled
values.
The formula for Sample Mean is :
Sum of all the values in the sample
Sample Mean =
Number of values in the Sample
LO
1
© 2012 McGraw-Hill Ryerson Limited
13
Sample Mean
The mean of a sample using mathematical symbols is:
Where:
X is the sample mean. It is read “X bar.”
n is the number sample.
LO
1
© 2012 McGraw-Hill Ryerson Limited
14
Example – Sample Mean
Microcorp is doing research about the number of minutes
spent on social networking website by internet users. A
random sample of 14 internet users showed the following
number of minutes used last month.
85
93
105
87
120
97
120
79
114
107
115
88
109
113
What is the arithmetic mean number of minutes used?
LO
1
© 2012 McGraw-Hill Ryerson Limited
15
Solution – Sample Mean
X

X
n
85  93  105  ...  113

14
1432

14
 102.28
LO
1
© 2012 McGraw-Hill Ryerson Limited
16
The Mean In Excel
1. From the tool bar, select
the Paste Function, or
use Insert, Function.
2. From the Function
Category list, select
Statistical. In the
Function name list,
select AVERAGE. Click
OK. A dialog box
opens.
3. Enter the range A2:A14
in the Number1 box.
The answer appears in
the dialog box. Click OK.
LO
1
© 2012 McGraw-Hill Ryerson Limited
17
LO
2
THE PROPERTIES OF
THE ARITHMETIC MEAN
© 2012 McGraw-Hill Ryerson Limited
18
Properties of the Arithmetic Mean
1. Every set of interval-level and ratio-level data has a
mean.
2. All the values are included in computing the mean.
3. The mean is unique.
4. The sum of the deviations of each value from the mean
is zero.
LO
2
© 2012 McGraw-Hill Ryerson Limited
19
You Try It Out!
The annual growth of a
sample of manufacturing
industries at Northernhouse
are:
$70000, $72000, $65000,
$75000 and $68000.
a) Find the sample mean.
b) Is the mean you computed in
(a) a statistic or a parameter?
Why?
c) What is your best estimate of
the population mean?
LO
2
All 511 employees in
Micronet are considered the
population. Their working
hours in a week are 45, 40,
38, 37, 42, 41, 35, 44
a) Compute the mean working
hours in a week.
b) Is the mean you computed in
(a) a statistic or a parameter?
Why?
© 2012 McGraw-Hill Ryerson Limited
20
LO
1
THE WEIGHTED MEAN
© 2012 McGraw-Hill Ryerson Limited
21
Weighted Mean
The weighted mean of a set of numbers designated X1,
X2, ..., Xn, with corresponding weights w1, w2, ...,wn, is
computed from the following formula:
LO
1
© 2012 McGraw-Hill Ryerson Limited
22
Example – Weighted Mean
A car manufacturer recorded the speeds of newly designed
models. The grades follow. What is the mean grade of the
speed?
Grade on Speed
Number of cars
LO
1
5
4
6
5
7
13
© 2012 McGraw-Hill Ryerson Limited
8
11
9
8
10
7
23
Solution – Weighted Mean
4(5)  5(6)  13(7)  11(8)  8(9)  7(10)
4  5  13  11  8  7
371

48
 7.73
Xw 
LO
1
© 2012 McGraw-Hill Ryerson Limited
24
You Try It Out!
NewEra Bakery sold 100 cakes for the regular price of $200.
For Christmas the cake price was reduced to $150 and 130
were sold. On New Year’s, the price was reduced to $100
and 175 cakes were sold.
a) What was the weighted mean price of cake?
b) NewEra Bakery spent $150 a cake for the 400 cakes. Comment on
the Bakery’s profit per cake if a baker receives a $20 commission for
each one sold.
LO
1
© 2012 McGraw-Hill Ryerson Limited
25
LO
1
THE MEDIAN
© 2012 McGraw-Hill Ryerson Limited
26
The Median
The median is the midpoint of the values after they have
been ordered from the smallest to the largest.
1. There are as many values above the median as below it
in the data array.
2. For an even set of values, the median will be the
arithmetic average of the two middle numbers.
LO
1
© 2012 McGraw-Hill Ryerson Limited
27
Example – The Median
LO
The ages for a sample of
seven bank employees are:
The weights of six patients,
in kilograms, are:
31, 35, 29, 30, 32, 33,34
66, 63, 70, 65,64,62
Find the median age.
Find the median weight.
1
© 2012 McGraw-Hill Ryerson Limited
28
Solution – The Median
Arranging the data in
ascending order gives:
Arranging the data in
ascending order gives:
29,30,31,32,33,34,35
62,63,64,65,66,70
Thus the median is 32.
LO
1
Thus the median is 64.5.
© 2012 McGraw-Hill Ryerson Limited
29
LO
2
THE PROPERTIES OF
THE MEDIAN
© 2012 McGraw-Hill Ryerson Limited
30
Properties of the Median
1. There is a unique median for each data set.
2. It is not affected by extremely large or small values and
is therefore a valuable measure of central tendency
when such values occur.
3. It can be computed for ratio-level, interval-level, and
ordinal-level data.
LO
2
© 2012 McGraw-Hill Ryerson Limited
31
The Median In Excel
1. From the tool bar, select the Paste Function, or use
Insert, Function.
2. From the Function category list, select Statistical. In
the Function name list, select MEDIAN. Click OK. A
dialogue box opens.
3. Enter the range in the Number1 box. The answer
appears in the dialogue box. Click OK.
LO
2
© 2012 McGraw-Hill Ryerson Limited
32
LO
1
THE MODE
© 2012 McGraw-Hill Ryerson Limited
33
The Mode
The mode is the value of the observation that appears
most frequently.
CHART 3–1 Number of Respondents Favoring Various Bath Oils
LO
1
© 2012 McGraw-Hill Ryerson Limited
34
Example – The Mode
Average earnings of Canadians 15 years and older with
university degrees in selected cities are shown below.
(Statistics Canada, Census of Population, 2001) What is the modal
average earning?
LO
1
City
Salary ($)
City
Salary ($)
Calgary
58000
Regina
46000
Edmonton
45000
Saint John
46000
Halifax
43000
Greater Sudbury
50000
Hamilton
52000
Toronto
56000
London
48000
Winnipeg
43000
Montreal
47000
Vancouver
46000
Ottawa-Gatineau
55000
Victoria
42000
© 2012 McGraw-Hill Ryerson Limited
35
Solution – The Mode
A perusal of the earnings reveals that $46 000 appears
more often (three times) than any other amount. Therefore,
the mode is $46 000.
LO
1
© 2012 McGraw-Hill Ryerson Limited
36
You Try It Out!
The average weekly vehicle
insurance benefits for 2008,
by category, are: $433,
$325, $176, $469, $245,
and $287.
The number of work
stoppages in the
manufacturing industry for
selected months are 8, 5,
12, 7, 9, and 0.
a)
a)
b)
What is the median monthly
benefit?
How many observations are
below the median? Above it?
b)
c)
LO
1
What is the median number
of stoppages?
How many observations are
below the median? Above it?
What is the modal number of
work stoppages?
© 2012 McGraw-Hill Ryerson Limited
37
The Mode In Excel
1. Select the Paste Function.
2. In the Function name list, select MODE. Click OK. A
dialogue box opens.
3. Enter the range in the Number1 box. The answer
appears in the dialogue box. Click OK.
LO
2
© 2012 McGraw-Hill Ryerson Limited
38
Excel Solutions
1. Open Excel and the Excel file Table 02-1 from the
DataSets on the CD provided.
2. From the menu bar, select Tools, Data Analysis, and
Descriptive Statistics; then click OK. In Excel 2007,
select Data in place of Tools.
3. Enter A1 : A97 as the Input Range. For Grouped By,
select Columns to indicate that your data is in a
column; select Labels in First Row to indicate that you
have the label List Price in the first cell of the input
range. Place the output in the same worksheet by
entering C3 in the Output Range.
4. Select the Summary statistics box; click OK.
LO
2
© 2012 McGraw-Hill Ryerson Limited
39
Excel Solutions
LO
2
© 2012 McGraw-Hill Ryerson Limited
40
LO
3
THE RELATIVE POSITIONS OF
THE MEAN, MEDIAN, AND
MODE
© 2012 McGraw-Hill Ryerson Limited
41
The Relative Positions of the Mean, Median
and the Mode
CHART 3-2 A Symmetric Distribution
CHART 3-3 A Positively Skewed Distribution
CHART 3-4 Negatively Skewed Distribution
LO
3
© 2012 McGraw-Hill Ryerson Limited
42
You Try It Out!
The monthly profits of Hi-Mark corporation were
organized into a frequency distribution. The mean of
monthly profits was computed to be $105 600, the
median $104 700, and the mode $104 200.
a)
b)
LO
3
Sketch the profit in the form of a smoothed frequency polygon.
Note the location of the mean, median, and mode on the
X-axis.
Is the distribution symmetrical, positively skewed, or negatively
skewed? Explain.
© 2012 McGraw-Hill Ryerson Limited
43
LO
1
THE GEOMETRIC MEAN
© 2012 McGraw-Hill Ryerson Limited
44
The Geometric Mean
Useful in finding the average change of percentages,
ratios, indexes, or growth rates over time
Has a wide application in business and economics because
we are often interested in finding the percentage changes
in sales, salaries, or economic figures, such as the GDP,
which compound or build on each other
LO
1
© 2012 McGraw-Hill Ryerson Limited
45
The Geometric Mean
Will always be less than or equal to the arithmetic mean
Defined as the nth root of the product of n values
The formula for the geometric mean is written:
LO
1
© 2012 McGraw-Hill Ryerson Limited
46
Example – The Geometric Mean
The profit on investment earned by Super Constructions
Company for five successive years was: 20 percent, 30
percent, 30 percent, 50 percent and 300 percent. What is
the geometric mean rate of profit on investment?
LO
1
© 2012 McGraw-Hill Ryerson Limited
47
Solution– The Geometric Mean
GM  5 (1.2)(1.3)(0.7)(1.5)(4.0)
 5 6.552
 1.456
LO
1
© 2012 McGraw-Hill Ryerson Limited
48
Another Application
Another use of the geometric mean is to determine the
percent increase in sales, production or other business or
economic series from one time period to another.
LO
1
© 2012 McGraw-Hill Ryerson Limited
49
Example – Another Application
The population of Alberta grew from 2974807 in January
2001 to 3290350 in January 2007. What was the average
annual rate of percentage increase during the period?
LO
1
© 2012 McGraw-Hill Ryerson Limited
50
Solution – Another Application
3 290 350
GM 
1
2 974 807
 1.0169  1
 0.0169
6
LO
1
© 2012 McGraw-Hill Ryerson Limited
51
You Try It Out!
The percent increase in
profits, for the last five years
at Cure Medico are: 5.61,
7.85, 11.22, 19.77 and
23.87.
a)
b)
c)
LO
1
Find the geometric percent
increase.
Find the arithmetic percent
increase.
Is the arithmetic mean equal
to or greater than the
geometric mean?
Sale of Shine Bulbs
increased from 65 000 units
in 2006 to 132 745 units in
2010.
a) Find the geometric mean
annual percent increase.
© 2012 McGraw-Hill Ryerson Limited
52
Why Study Dispersion?
A measure of location, such as the mean or the median,
only describes the centre of the data, but it does not tell us
anything about the spread of the data.
For example, if your nature guide told you that the river
ahead averaged 1 m in depth, would you want to wade
across on foot without additional information? Probably not.
You would want to know something about the variation in
the depth.
A second reason for studying the dispersion in a set of data
is to compare the spread in two or more distributions.
© 2012 McGraw-Hill Ryerson Limited
53
Samples of Dispersions
CHART 3-5 Histogram of Years of Employment at Hammond Iron Works Inc.
CHART 3-6 Hourly Production of Computers at the Kanata and Waterloo Plans
© 2012 McGraw-Hill Ryerson Limited
54
LO
4
MEASURES OF DISPERSION
© 2012 McGraw-Hill Ryerson Limited
55
Range
The range is the difference between the largest and the
smallest value in a data set.
LO
4
© 2012 McGraw-Hill Ryerson Limited
56
LO
5
ADVANTAGES and
DISADVANTAGES of RANGE
© 2012 McGraw-Hill Ryerson Limited
57
Advantages and Disadvantages of Range
1) Only two values are used in its calculation
2) It is influenced by extreme values.
3) It is easy to compute and understand.
LO
5
© 2012 McGraw-Hill Ryerson Limited
58
Example – Range
Below are the hourly production rates of Vision Quest
computers at the Kanata and Waterloo plants. Find the
range for each of the plants.
CHART 3-6 Hourly Production of Computers at the Kanata and Waterloo Plants
LO
5
© 2012 McGraw-Hill Ryerson Limited
59
Solution – Range
Kanata: 52 – 48
=4
Waterloo: 60 – 40
= 20
LO
5
© 2012 McGraw-Hill Ryerson Limited
60
Mean Deviation
The mean deviation is the arithmetic mean of the absolute
values of the deviations from the arithmetic mean.
LO
4
© 2012 McGraw-Hill Ryerson Limited
61
LO
5
ADVANTAGES AND
DISADVANTAGES OF MEAN
DEVIATION
© 2012 McGraw-Hill Ryerson Limited
62
Advantages and Disadvantages of Mean
Deviation
1) All values are used in the calculation.
2) It is easy to understand because it is the average of the
deviations from the mean.
3) Major drawback is the use of absolute values.
LO
5
© 2012 McGraw-Hill Ryerson Limited
63
Example – Mean Deviation
The number of chocolate drinks sold at the Choco-Coffee
shop in Ottawa between 5 and 8 p.m. for a sample of 6
days last year were 30, 50, 60, 70, 80 and 100. Determine
the mean deviation.
X  X
Number of Chocolate
Drinks Sold Daily
30
(30 – 65) = –35
Absolute
Deviation
35
50
(50 – 65) = –15
15
60
(60 – 65) = –5
5
70
(70 – 65) = 5
5
80
(80 – 65) = 15
15
100
(100 – 65) = 35
35
Total
LO
5
110
© 2012 McGraw-Hill Ryerson Limited
64
Solution – Mean Deviation
XX

MeanDeviation 
n
110
6
 18.33

LO
5
© 2012 McGraw-Hill Ryerson Limited
65
You Try It Out!
The heights of a group of newly arrived plants in a nursery
are given below (in centimetres)
10
a)
b)
c)
LO
5
15
14
16
25
13
17
23
What is the range of the heights?
Compute the arithmetic mean height.
Compute the mean deviation of the heights.
© 2012 McGraw-Hill Ryerson Limited
66
LO
4
VARIANCE AND STANDARD
DEVIATION
© 2012 McGraw-Hill Ryerson Limited
67
Population Variance
The Population Variance is the arithmetic mean of the
squared deviations from the population mean.
Where:
σ2 is the symbol for the population variance. It is usually
referred to as “sigma squared.”
μ is the arithmetic mean of the population.
X is the value of each observation in the population.
N is the number of observations in the population.
LO
4
© 2012 McGraw-Hill Ryerson Limited
68
LO
5
ADVANTAGES AND
DISADVANTAGES OF
POPULATION VARIANCE
© 2012 McGraw-Hill Ryerson Limited
69
Advantages and Disadvantages of Population
Variance
1. Advantage: All values are used in the calculation
2. Disadvantage: The units are awkward, the square of the
original units
LO
5
© 2012 McGraw-Hill Ryerson Limited
70
Steps in computing Population Variance
1. Begin by finding the mean.
2. Next, find the difference between each observation and
the mean.
3. Square the difference.
4. Sum all of the squared differences.
5. Divide the sum by the total number of observations in
the population.
LO
5
© 2012 McGraw-Hill Ryerson Limited
71
Example – Population Variance
The ages of all the patients in the isolation ward of
Mountainview Hospital are 35, 24, 11, 42, and 23 years.
Determine the population variance.
Age (X)
 X  
X  
35
24
11
42
23
135
8
-3
-16
15
-4
0*
64
9
256
225
16
570
2
*Sum of the deviations from mean must equal 0.
LO
5
© 2012 McGraw-Hill Ryerson Limited
72
Example – Population Variance
X


N
135

6
 27
2 
 X  
2
N
570

6
 95
LO
5
© 2012 McGraw-Hill Ryerson Limited
73
Population Standard Deviation
The population standard deviation σ is the square root of
the population variance.
LO
5
© 2012 McGraw-Hill Ryerson Limited
74
You Try It Out!
An office of Fine Tune Telecommunications hired seven
trainee engineers this year. Their monthly starting salaries
were: $5356; $5651; $5423; $5534; $5467; $5289 and
$5670.
a)
b)
c)
d)
LO
5
Compute the population mean.
Compute the population variance.
Compute the population standard deviation.
Another location hired eight trainee engineers. Their mean monthly
salary was $5650, and the standard deviation was $480. Compare
the two groups.
© 2012 McGraw-Hill Ryerson Limited
75
Sample Variance
The population standard deviation σ is the square root of
the population variance.
It tends to underestimate the population variance σ 2. The
use of (n – 1) in the denominator provides the appropriate
correction for this tendency.
LO
5
© 2012 McGraw-Hill Ryerson Limited
76
Sample Variance
Where:
s2 is the sample variance.
X is the vale of each observation in the sample.
X is the mean of the sample.
n is the number of observations in the sample.
LO
5
© 2012 McGraw-Hill Ryerson Limited
77
Example – Sample Variance
The sample of prices of different cuts of beef available at a
restaurant is given below.
$13, $21, $15, $19, and $22.
What is the sample variance?
LO
5
© 2012 McGraw-Hill Ryerson Limited
78
Solution – Sample Variance
Deviation Method
Note the sample mean is: 90 = 18
5
Price ($)
X
13
21
15
19
22
90
LO
5
$
X  X
–5
3
–3
1
4
0
$2
X  X
25
9
9
1
16
60
2
s
2
X  X



2
n 1
60
5 1
 15$2

© 2012 McGraw-Hill Ryerson Limited
79
Solution – Sample Variance
Continued
Direct Method
Price ($)
LO
5
2
$
2
X
13
X
169
21
441
15
225
19
361
22
484
90
1680
s2 

 X2 
 X 
n 1
n
 90 
1680 
5 1
2
2
5
60
5 1
 15$2

© 2012 McGraw-Hill Ryerson Limited
80
LO
5
SAMPLE STANDARD DEVIATION
© 2012 McGraw-Hill Ryerson Limited
81
Sample Standard Deviation
The sample standard deviation is the square root of the
sample variance.
LO
5
© 2012 McGraw-Hill Ryerson Limited
82
Example – Sample Standard Deviation
The sample variance in the previous example involving
hourly quantity for cuts of beef is $152. What is the sample
standard deviation?
LO
5
© 2012 McGraw-Hill Ryerson Limited
83
Solution – Sample Standard Deviation
The sample standard deviation is $3.87, found by 15 . Note
again that the sample variance is in terms of dollars
squared, but taking the square root of 15 gives us $3.87,
which is in the same units (dollars) as the original data.
LO
5
© 2012 McGraw-Hill Ryerson Limited
84
You Try It Out!
A sample of seven employees who will stop working after a few
months is given below:
3, 6, 2, 7, 1, 4 and 3.
a)
b)
LO
5
What is the sample variance?
Compute the sample standard deviation.
© 2012 McGraw-Hill Ryerson Limited
85
LO
6
INTERPRETATION AND USES OF
THE STANDARD DEVIATION
© 2012 McGraw-Hill Ryerson Limited
86
Chebyshev’s Theorem
For any set of observations (sample or population), the
proportion of the values that lie within k standard deviations
of the mean is at least 1-1/k2, where k is any constant
greater than 1.
Allows us to determine the minimum proportion of the
values that lie within a specified number of standard
deviations of the mean.
Concerned with any set of values; that is the distribution of
values can have any shape.
LO
6
© 2012 McGraw-Hill Ryerson Limited
87
Example – Chebyshev’s Theorem
The average number of students present today in each
class is 89.9, and the standard deviation is 11.31. At least
what percent of students lie within plus and minus 3.5
standard deviations?
LO
6
© 2012 McGraw-Hill Ryerson Limited
88
Solution – Chebyshev’s Theorem
1
1
1

1

2
k2
 3.5 
1
12.25
 0.92
 1
LO
6
© 2012 McGraw-Hill Ryerson Limited
89
LO
6
THE EMPIRICAL RULE
© 2012 McGraw-Hill Ryerson Limited
90
The Empirical Rule
For a symmetrical, bell-shaped frequency distribution:
1) Approximately 68 percent
of the observations will lie
within ±1 standard
deviation of the mean.
2) About 95 percent of the
observations will lie within
±2 standard deviations of
the mean.
3) Practically all (99.7 percent) CHART 3–7 A Symmetrical Bell-shaped Curve
Showing the relationship between the standard
Deviation and the Observations
will lie within ±3 standard
deviations of the mean.
LO
6
© 2012 McGraw-Hill Ryerson Limited
91
Example – The Empirical Rule
A sample of the rental rates at the Street Park apartments
approximates a symmetrical, bell-shaped distribution. The
sample mean is $600; the standard deviation is $24. Using
the Empirical Rule, answer these questions:
1. About 68 percent of the rental rates are between what two
amounts?
2. About 95 percent of the rental rates are between what two
amounts?
3. Almost all of the rental rates are between what two amounts?
LO
6
© 2012 McGraw-Hill Ryerson Limited
92
Solution – The Empirical Rule
1. About 68 percent are between
$576 ($600 - $24) and $624 ($600 + $24).
2. About 95 percent are between
$552 ($600 - 2($24)) and $648 ($600 + 2($24)).
3. Almost all (99.7 percent) are between
$528($600 - 3($24)) and $672 ($600 + 3($24)).
LO
6
© 2012 McGraw-Hill Ryerson Limited
93
You Try It Out!
The Quality Metal Company is one of several domestic
manufacturers of PVC pipe. The quality control department
sampled 700 20m lengths. At a point 2 m from the end of
the pipe they measured the outside diameter. The mean
was 2.4 m and the standard deviation 0.2 m.
a) If the shape of the distribution is not known, at least what percent of
the observations will lie between 2.05 m and 2.35 m?
b) If we assume that the distribution of diameters is symmetrical and
bell-shaped, about 68 percent of the observations will be between
what two values?
c) If we assume that the distribution of diameters is symmetrical and
bell-shaped, about 95 percent of the observations will be between
what two values?
LO
6
© 2012 McGraw-Hill Ryerson Limited
94
LO
7
RELATIVE DISPERSION
© 2012 McGraw-Hill Ryerson Limited
95
Relative Dispersion
In order to make a meaningful comparison of different
measures, we need to convert each of these measures to a
relative value – that is, a percent.
The coefficient of variation is the ratio of the standard
deviation to the arithmetic mean, expressed as a
percentage.
LO
7
© 2012 McGraw-Hill Ryerson Limited
96
Example – Relative Dispersion
A study of the amount of money spent on the maintenance
of a car and the years of service of the car resulted in these
statistics: The mean amount spent was $150; the standard
deviation was $30; the mean number of years of service
was 15 years; the standard deviation was 1.5 years.
Compare the relative dispersion in the two distributions
using the coefficient of variation.
LO
7
© 2012 McGraw-Hill Ryerson Limited
97
Solution – Relative Dispersion
Money Spent
Years of Service
s
CV  (100)
X
$30
CV 
(100)
$150
CV  20%

s
CV  (100)
X
$1.5
CV 
(100)
$15
CV  10%
There is more dispersion relative to the mean in the
distribution of money spent compared with the distribution
of years of service (because 20 percent 10 percent).

LO
7
© 2012 McGraw-Hill Ryerson Limited
98
You Try It Out!
A large group of management trainees was given two types
of aptitude tests, a quantitative aptitude test and a
management aptitude test. The arithmetic mean score on
the quantitative aptitude test was 400, with a standard
deviation of 20. The mean was 50 and the standard
deviation for the management aptitude test was 10.
Compare the relative dispersion in the two groups.
LO
7
© 2012 McGraw-Hill Ryerson Limited
99
LO
7
SKEWNESS
© 2012 McGraw-Hill Ryerson Limited
100
Skewness
In a symmetric set of observations the mean and median
are equal and the data values are evenly spread around
these values. The data values below the mean and median
are a mirror image of those above.
A set of values is skewed to the right or positively
skewed if there is a single peak and the values extend
much further to the right of the peak than to the left of the
peak. In this case the mean is larger than the median.
LO
7
© 2012 McGraw-Hill Ryerson Limited
101
Skewness
In a negatively skewed distribution there is a single peak
but the observations extend further to the left, in the
negative direction, than to the right. In a negatively skewed
distribution the mean is smaller than the median.
A bimodal distribution will have two or more peaks.
LO
7
© 2012 McGraw-Hill Ryerson Limited
102
Shapes of Frequency Polygons
CHART 3 – 8 Shapes of Frequency Polygons
LO
7
© 2012 McGraw-Hill Ryerson Limited
103
Pearson’s Coefficient of Skewness
The major characteristics of the coefficient of skewness
are:
1) It can range from –3.00 up to 3.00.
2) A value of 0 indicates a symmetric distribution.
It is computed as follows:
LO
7
© 2012 McGraw-Hill Ryerson Limited
104
Example – Skewness
Following are the earnings per share, in dollars, for a
sample of 16 software companies for the year 2008.
$0.08 0.12
0.44
0.52
4.55
7.93
8.62 11.15 14.88 17.43 13.13
7.36
1.10
1.19
2.49
1.18
The mean is $5.76. The standard deviation is $5.85. The
median is $3.52. Find the coefficient of skewness using
Pearson’s estimate.
LO
7
© 2012 McGraw-Hill Ryerson Limited
105
Solution – Skewness
3($5.76  $3.52)
sk 
$5.85
 1.143
LO
7
© 2012 McGraw-Hill Ryerson Limited
106
Skewness In Excel
LO
7
© 2012 McGraw-Hill Ryerson Limited
107
You Try It Out!
A sample of ten customer care technicians employed in the
customer service department of a large telecommunication
company received the following number of calls yesterday:
85, 78, 90, 97, 86, 72, 95, 89, 90, and 75.
a)
b)
c)
LO
7
Find the mean, median, and the standard deviation.
Compute the coefficient of skewness using Pearson’s method.
What is your conclusion regarding the skewness of the data?
© 2012 McGraw-Hill Ryerson Limited
108
LO
8
OTHER MEASURES OF
DISPERSION
© 2012 McGraw-Hill Ryerson Limited
109
Other Measures of Dispersion
Quartiles divide a set of observations into four equal parts.
Deciles divide a set of observations into 10 equal parts.
Percentiles divide a set of observations into 100 equal
parts.
LO
8
© 2012 McGraw-Hill Ryerson Limited
110
Example – Percentiles
Listed below are the commissions earned, in dollars, last
month by a sample of 17 brokers at a real estate office.
$2345 1657 1876 1235 2789 2354 1987 2309 1985
2001
1967 1976 2343 2654 2313 1768 2650
Locate the median, the first quartile, and the third quartile
for the commissions earned.
LO
8
© 2012 McGraw-Hill Ryerson Limited
111
Solution - Percentiles
The first step is to organize the data from the smallest
commission to the largest.
$1235 1657 1768 1876 1967 1976 1985 1987 2001
2309 2313 2343 2345 2354 2650 2654 2789
25
L25  (17  1)
 4.5  $1967
100
50
L50  (17  1)
 9  $2001
100
L75  (17  1)
LO
8
75
 13.5  $2345
100
© 2012 McGraw-Hill Ryerson Limited
112
You Try It Out!
The quality control department of a tomato sauce company
is responsible for checking the mass of the 250g jar of
tomato sauce. The masses of a sample of eight jars
produced last hour are:
251 249 246 248 250 250 243 242 245
a)
b)
LO
8
What is the median mass?
Determine the masses corresponding to the first and third
quartiles.
© 2012 McGraw-Hill Ryerson Limited
113
LO
9
BOX PLOT
© 2012 McGraw-Hill Ryerson Limited
114
Box Plots
A box plot is a graphical display, based on quartiles, that
helps to picture a set of data
Five pieces of data are needed to construct a box plot:
1.
2.
3.
4.
5.
LO
9
Minimum value
First quartile
Median
Third quartile
Maximum value
© 2012 McGraw-Hill Ryerson Limited
115
EXAMPLE – Box Plots
Stuart’s Pizza offers free delivery of its pizza within 15 km.
Stuart, the owner, wants some information on the time it
takes for delivery. How long does a typical delivery take?
Within what range of times will most deliveries be
completed? For a sample of 20 deliveries, he determined
the following information:
Minimum value
= 15 minutes
1st Quartile (Q1)
= 16 minutes
Median
= 18 minutes
3rd Quartile (Q3)
= 26 minutes
Maximum value
= 31 minutes
Develop a box plot for the delivery times.
LO
9
© 2012 McGraw-Hill Ryerson Limited
116
Solution – Box Plots
LO
9
© 2012 McGraw-Hill Ryerson Limited
117
You Try It Out!
The following box plot is given.
What are the median, the largest and smallest values, and
the first and third quartiles? Would you agree that the
distribution is symmetrical?
LO
9
© 2012 McGraw-Hill Ryerson Limited
118
THE MEAN, MEDIAN, AND
STANDARD DEVIATION
OF GROUPED DATA
© 2012 McGraw-Hill Ryerson Limited
119
The Mean of Grouped Data
To approximate the arithmetic mean of data organized into
a frequency distribution, we begin by assuming the
observations in each class are represented by the midpoint
of the class
Where:
X is the designation for each sample mean.
X is the midpoint of each class.
f is the frequency in each class.
fX is the frequency in each class times the midpoint of the
class.
 fX is the sum of these products.
n is the total number of frequencies.
© 2012 McGraw-Hill Ryerson Limited
120
Example – The Mean of Grouped Data
The computations for the
arithmetic mean of data
grouped into a frequency
distribution will be shown
based on the real estate data,
from the south-east area of
Edmonton. The frequency
distribution for the list prices is
given here. Determine the
arithmetic mean of the listed
prices.
List Price
($ thousands)
Frequency
$250 to under $375
33
375 to under 500
41
500 to under 625
11
625 to under 750
5
750 to under 875
4
875 to under 1000
1
1000 to under 1125
1
Total
96
© 2012 McGraw-Hill Ryerson Limited
121
Solution – The Mean of Grouped Data
List Prices, South-East Area of Edmonton
List Price
($ thousands)
Frequency
f
Midpoint ($)
X
fX ($)
$250 to under $375
33
$312.5
$10312.5
375 to under 500
41
437.5
17937.5
500 to under 625
11
562.5
6187.5
625 to under 750
5
687.5
3437.5
750 to under 875
4
812.5
3250.0
875 to under 1000
1
937.5
937.5
1000 to under 1125
1
1062.5
1062.5
Total
96
X
 fX
n

$43125.0
$43125
 $449.2(thousands)
96
© 2012 McGraw-Hill Ryerson Limited
122
The Median of Grouped Data
If the data is grouped some of the raw data values may not
be available, and so we cannot necessarily determine the
exact value of the median.
But we can estimate the median by first finding the position
of the median (which class it falls in), and then calculating
an estimate of the median within this median class.
Where:
L is the lower limit of the median class.
N is the size of the population.
f is the frequency of the median class.
fc is the cumulative frequencies up to but excluding the median class
i is the class width of the median class
© 2012 McGraw-Hill Ryerson Limited
123
Example – The Median of Grouped Data
Professor Law lists the following quiz marks for his class of
30 students. Determine the median quiz mark. Note the
marks have already been sorted from lowest to highest.
Grade on Quiz
0
1
2
3
4
5
6
7
8
9
10
Number of
Students
0
0
1
0
1
2
4
12
5
3
2
© 2012 McGraw-Hill Ryerson Limited
124
Solution – The Median of Grouped Data
Find the position of the median mark.
50
L50  (30  1)
 15.5  7
100
Quiz
Mark
Number of
Students
fc
0
0
0
1
0
0
2
1
1
3
0
1
4
1
2
5
2
4
6
4
8
7
12
20
8
5
25
9
3
28
10
2
30
© 2012 McGraw-Hill Ryerson Limited
125
The Standard Deviation of Grouped Data
If the data of interest are in grouped form (in a frequency
distribution), the sample standard deviation can be
approximated by substituting ∑fX2 for ∑X2 and ∑fX for ∑X.
Where:
s
X
f
n
is the symbol for the sample standard deviation.
is the midpoint of a class.
is the class frequency.
is the total number of sample observations.
© 2012 McGraw-Hill Ryerson Limited
126
Solution – The Standard Deviation of Grouped
Data
List Prices, South-East Area of Edmonton
fX ($)
fX2
List Price
($ thousands)
Frequency
f
Midpoint ($)
X
$250 to under $375
33
$312.5
$10312.5
3222656.25
375 to under 500
41
437.5
17937.5
7847656.25
500 to under 625
11
562.5
6187.5
3480468.75
625 to under 750
5
687.5
3437.5
2363281.25
750 to under 875
4
812.5
3250.0
2640625.00
875 to under 1000
1
937.5
937.5
878906.25
1000 to under 1125
1
1062.5
1062.5
1128906.24
Total
96
$43125.0
21562499.99
(43125)2
21562499.9996
s
 151.83
96  1
© 2012 McGraw-Hill Ryerson Limited
127
You Try It Out!
The net incomes of a sample
of large art dealers were
organized into the table given
to the right.
a)
b)
c)
d)
What is the table called?
Based on the distribution, what
is the estimate of the
arithmetic mean net income?
Based on the distribution, what
is the estimate of the median
net income?
Based on the distribution, what
is the estimate of the standard
deviation?
Net Income
($Millions)
Number
of
Importers
$4 to under $8
2
8 to under 12
3
12 to under 16
12
16 to under 20
7
20 to under 24
5
© 2012 McGraw-Hill Ryerson Limited
128
Chapter Summary
I. A measure of central location is a value used to describe
the centre of a set of data.
A. The arithmetic mean is the most widely reported
measure of central location.
1. It is calculated by adding the values of the
observations and dividing by the total number of
observations.
a. The formula for a population mean of
ungrouped or raw data is:
X

[3–1]
N
b. The formula for the mean of a sample is:
X
X=
[3–2]
n
© 2012 McGraw-Hill Ryerson Limited
129
Chapter Summary
c
The formula for the sample mean of data in a
frequency distribution is X = fX .
[3–16]
n
2. The major characteristics of the arithmetic mean
are:
a. At least the interval scale of measurement is
required.
b. All the data values are used in the calculation.
c. A set of data has only one mean. That is, it is
unique.
d. The sum of the deviations from the mean
equals 0.
© 2012 McGraw-Hill Ryerson Limited
130
Chapter Summary
B. The weighted mean is found by multiplying each
observation by it’s corresponding weight.
1. The formula for determining the weighted mean
is: X w  w1 X1  w2 X 2  w3 X 3   wn X n
[3–3]
w1  w2  w3 
 wn
2. It is a special case of the arithmetic mean.
C. The median is the value in the middle of a set of
ordered data.
1. To find the median, sort the observations from
smallest to largest and identify the middle value.
2. The major characteristics of the median are:
a. At least the ordinal scale of measurement is
required.
© 2012 McGraw-Hill Ryerson Limited
131
Chapter Summary
b. It is not influenced by extreme values.
c. Fifty percent of the observations are larger
than the median.
d. It is unique to a set of data.
3. The formula for the median of grouped data is:
N
 fc
[3–17]
2
Median = L 
(i )
f
D. The mode is the value that occurs most often in a
set of data.
1. The mode can be found for nominal-level data.
2. A set of data can have more than one mode.
© 2012 McGraw-Hill Ryerson Limited
132
Chapter Summary
E. The geometric mean is the nth root of the product of
n values.
1. The formula for the geometric mean is:
[3–4]
GM = n ( X1 )( X 2 )( X 3 ) ( X n )
2. The geometric mean is also used to find the rate
of change from one period to another.
[3–5]
Value at end of period
GM =
n
Value at beginning of period
1
3. The geometric mean is always equal to or less
than the arithmetic mean.
© 2012 McGraw-Hill Ryerson Limited
133
Chapter Summary
II. The dispersion is the variation or spread in a set of data.
A. The range is the difference between the largest and
the smallest value in a set of data.
1. The formula for the range is:
Range  Largest value  Smallest value
[3–6]
2. The major characteristics of the range are:
a. Only two values are used in its calculation.
b. It is influenced by extreme values.
c. It is easy to compute and to understand.
B. The mean absolute deviation is the sum of the
absolute values of the deviations from the mean
divided by the number of observations.
© 2012 McGraw-Hill Ryerson Limited
134
Chapter Summary
1. The formula for computing the mean absolute
deviation is: MD =   X  X |
[3–7]
n
2. The major characteristics of the mean absolute
deviation are:
a. It is not unduly influenced by large or small
values.
b. All observations are used in the calculation.
c. The absolute values are difficult to work with.
C. The variance is the mean of the squared deviations
from the arithmetic mean.
1. The formula for the population variance is:
 X   2
2
[3–8]
 
N
© 2012 McGraw-Hill Ryerson Limited
135
Chapter Summary
2. The formula for the sample variance is:
 X  X )2
s 
n 1
2
[3–10]
3. The major characteristics of the variance are:
a. All observations are used in the calculation.
b. It is not unduly influenced by extreme
observations.
c. The units are somewhat difficult to work with;
they are the original units squared.
D. The standard deviation is the square root of the
variance.
© 2012 McGraw-Hill Ryerson Limited
136
Chapter Summary
1. The major characteristics of the standard
deviation are:
a. It is in the same units as the original data.
b. It is the square root of the average squared
deviation from the mean.
c. It cannot be negative.
d. It is the most widely reported measure of
dispersion.
2. The formula for the sample standard deviation is:
[3–12]
X )2
2
X 
s=
n 1
n
© 2012 McGraw-Hill Ryerson Limited
137
Chapter Summary
3. The formula for the standard deviation of
fX )2
grouped data is:
[3–18]
fX 2 
s=
n1
n
III. We interpret the standard deviation using two
measures.
A. Chebyshev’s theorem states that regardless of the
shape of the distribution, at least 1 - 1/k2 of the
observations will be within k standard deviations of
the mean, where k is greater than 1.
B. The Empirical Rule states that for a bell-shaped
distribution about 68 percent of the values will be
within 1 standard deviation of the mean, about 95
percent within 2, and virtually all within 3.
© 2012 McGraw-Hill Ryerson Limited
138
Chapter Summary
IV. The coefficient of variation is a measure of relative
dispersion.
A. The formula for the coefficient of variation is:
s
CV = (100)
[3–13]
X
B. It reports the variation relative to the mean.
C. It is useful for comparing distributions with different
units.
V. The coefficient of skewness measures the symmetry of
a distribution.
A. In a positively skewed set of data the long tail is to
the right.
B. In a negatively skewed distribution the long tail is to
the left.
© 2012 McGraw-Hill Ryerson Limited
139
Chapter Summary
C. There are two formulas for the coefficient of
skewness.
1. The formula developed by pearson is:
sk =
3( X  Median)
s
[3–14]
2. The coefficient of skewness computed by
statistical software is:
  X  X 3 
n
 
sk =
 
( n  1)( n  2)   s  


VI. Measures of location also describe the spread in a set
of observations.
A. A quartile divides a set of observations into four
equal parts.
© 2012 McGraw-Hill Ryerson Limited
140
Chapter Summary
1. Twenty-five percent of the observations are less
than the first quartile, 50 percent are less than
the second quartile (the median), and 75 percent
are less than the third quartile.
2. The interquartile range is the difference between
the third and the first quartile.
B. Deciles divide a set of observations into ten equal
parts and percentiles into 100 equal parts.
© 2012 McGraw-Hill Ryerson Limited
141
Chapter Summary
C. A box plot is a graphic display of a set of data.
1. A box is drawn enclosing the regions between
the first and third quartiles.
a. A line is drawn inside the box at the median
value.
b. Dotted line segments are drawn from the third
quartile to the largest value to show the
highest 25 percent of the values and from the
first quartile to the smallest value to show the
lowest 25 percent of the values.
2. A box plot is based on five statistics: the
maximum and minimum values, the first and third
quartiles, and the median.
© 2012 McGraw-Hill Ryerson Limited
142