• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Regression toward the mean wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Degrees of freedom (statistics) wikipedia, lookup

Transcript
```© 2012 McGraw-Hill
Ryerson Limited
1
Lind
Marchal
Wathen
Waite
2
Learning Objectives
LO 1 Calculate the arithmetic mean, weighted mean,
median, mode, and geometric mean.
LO 2 Explain the characteristics, uses, advantages, and
disadvantages of each measure of central location.
LO 3 Identify the position of the mean, the median, and the
mode for both symmetric and skewed distributions.
LO 4 Compute and explain the range, the mean deviation,
the variance, and the standard deviation.
LO 5 Explain the characteristics, uses, advantages, and
disadvantages of each measure of dispersion.
3
Learning Objectives
LO 6 Explain Chebyshev’s theorem and the Empirical Rule
as they relate to a set of observations.
LO 7 Compute and explain the coefficient of skewness and
the coefficient of variation.
LO 8 Compute and interpret quartiles, deciles, and
percentiles.
LO 9 Construct and interpret box plots.
4
Introduction
5
Numerical Descriptive Measures
Measures of Location
•
•
•
•
•
Arithmetic Mean
Weighted Mean
Median
Mode
Geometric Mean
Measures of Dispersion
• Range
• Mean Absolute
Deviation
• Variance
• Standard Deviation
6
LO
1
THE POPULATION MEAN
7
Population Mean
For ungrouped data, the population mean is the sum of all
the population values divided by the total number of
population values.
The formula for Population Mean is :
Sum of all the values in the population
Population Mean =
Number of values in the population
LO
1
8
Population Mean
The mean of a population using mathematical symbols is:
Where:
μ represents the population mean. It is the Greek lowercase
letter “mu.”
N is the number of items in the population.
X represents any particular value.
Σ is the Greek capital letter “sigma” and indicates the
ΣX is the sum of the X values.
LO
1
9
Example – Population Mean
There are 15 teams in the Eastern Conference of the NHL.
Listed below is the number of goals scored by each team in
the 2010–2011 season (www.nhl.com).
Team
Goals Scored
Team
Goals Scored
256
Washington Capitals
219
Boston Bruins
244
Atlanta Thrashers
218
Tampa Bay Lightning
241
Toronto Maple Leafs
213
Buffalo Sabres
240
213
Carolina Hurricanes
231
Florida Panthers
191
Pittsburgh Penguins
228
Ottawa Senators
190
New York Islanders
225
New Jersey Devils
171
New York Rangers
224
What is the arithmetic mean number of goals scored?
LO
1
10
Solution – Population Mean
X


N
256  244  ...  190  171

15
3304

15
 220
LO
1
11
LO
1
THE SAMPLE MEAN
12
Sample Mean
For ungrouped data, the sample mean is the sum of all the
sample values divided by the total number of sampled
values.
The formula for Sample Mean is :
Sum of all the values in the sample
Sample Mean =
Number of values in the Sample
LO
1
13
Sample Mean
The mean of a sample using mathematical symbols is:
Where:
X is the sample mean. It is read “X bar.”
n is the number sample.
LO
1
14
Example – Sample Mean
Microcorp is doing research about the number of minutes
spent on social networking website by internet users. A
random sample of 14 internet users showed the following
number of minutes used last month.
85
93
105
87
120
97
120
79
114
107
115
88
109
113
What is the arithmetic mean number of minutes used?
LO
1
15
Solution – Sample Mean
X

X
n
85  93  105  ...  113

14
1432

14
 102.28
LO
1
16
The Mean In Excel
1. From the tool bar, select
the Paste Function, or
use Insert, Function.
2. From the Function
Category list, select
Statistical. In the
Function name list,
select AVERAGE. Click
OK. A dialog box
opens.
3. Enter the range A2:A14
in the Number1 box.
the dialog box. Click OK.
LO
1
17
LO
2
THE PROPERTIES OF
THE ARITHMETIC MEAN
18
Properties of the Arithmetic Mean
1. Every set of interval-level and ratio-level data has a
mean.
2. All the values are included in computing the mean.
3. The mean is unique.
4. The sum of the deviations of each value from the mean
is zero.
LO
2
19
You Try It Out!
The annual growth of a
sample of manufacturing
industries at Northernhouse
are:
\$70000, \$72000, \$65000,
\$75000 and \$68000.
a) Find the sample mean.
b) Is the mean you computed in
(a) a statistic or a parameter?
Why?
c) What is your best estimate of
the population mean?
LO
2
All 511 employees in
Micronet are considered the
population. Their working
hours in a week are 45, 40,
38, 37, 42, 41, 35, 44
a) Compute the mean working
hours in a week.
b) Is the mean you computed in
(a) a statistic or a parameter?
Why?
20
LO
1
THE WEIGHTED MEAN
21
Weighted Mean
The weighted mean of a set of numbers designated X1,
X2, ..., Xn, with corresponding weights w1, w2, ...,wn, is
computed from the following formula:
LO
1
22
Example – Weighted Mean
A car manufacturer recorded the speeds of newly designed
speed?
Number of cars
LO
1
5
4
6
5
7
13
8
11
9
8
10
7
23
Solution – Weighted Mean
4(5)  5(6)  13(7)  11(8)  8(9)  7(10)
4  5  13  11  8  7
371

48
 7.73
Xw 
LO
1
24
You Try It Out!
NewEra Bakery sold 100 cakes for the regular price of \$200.
For Christmas the cake price was reduced to \$150 and 130
were sold. On New Year’s, the price was reduced to \$100
and 175 cakes were sold.
a) What was the weighted mean price of cake?
b) NewEra Bakery spent \$150 a cake for the 400 cakes. Comment on
the Bakery’s profit per cake if a baker receives a \$20 commission for
each one sold.
LO
1
25
LO
1
THE MEDIAN
26
The Median
The median is the midpoint of the values after they have
been ordered from the smallest to the largest.
1. There are as many values above the median as below it
in the data array.
2. For an even set of values, the median will be the
arithmetic average of the two middle numbers.
LO
1
27
Example – The Median
LO
The ages for a sample of
seven bank employees are:
The weights of six patients,
in kilograms, are:
31, 35, 29, 30, 32, 33,34
66, 63, 70, 65,64,62
Find the median age.
Find the median weight.
1
28
Solution – The Median
Arranging the data in
ascending order gives:
Arranging the data in
ascending order gives:
29,30,31,32,33,34,35
62,63,64,65,66,70
Thus the median is 32.
LO
1
Thus the median is 64.5.
29
LO
2
THE PROPERTIES OF
THE MEDIAN
30
Properties of the Median
1. There is a unique median for each data set.
2. It is not affected by extremely large or small values and
is therefore a valuable measure of central tendency
when such values occur.
3. It can be computed for ratio-level, interval-level, and
ordinal-level data.
LO
2
31
The Median In Excel
1. From the tool bar, select the Paste Function, or use
Insert, Function.
2. From the Function category list, select Statistical. In
the Function name list, select MEDIAN. Click OK. A
dialogue box opens.
3. Enter the range in the Number1 box. The answer
appears in the dialogue box. Click OK.
LO
2
32
LO
1
THE MODE
33
The Mode
The mode is the value of the observation that appears
most frequently.
CHART 3–1 Number of Respondents Favoring Various Bath Oils
LO
1
34
Example – The Mode
Average earnings of Canadians 15 years and older with
university degrees in selected cities are shown below.
(Statistics Canada, Census of Population, 2001) What is the modal
average earning?
LO
1
City
Salary (\$)
City
Salary (\$)
Calgary
58000
Regina
46000
Edmonton
45000
Saint John
46000
Halifax
43000
Greater Sudbury
50000
Hamilton
52000
Toronto
56000
London
48000
Winnipeg
43000
Montreal
47000
Vancouver
46000
Ottawa-Gatineau
55000
Victoria
42000
35
Solution – The Mode
A perusal of the earnings reveals that \$46 000 appears
more often (three times) than any other amount. Therefore,
the mode is \$46 000.
LO
1
36
You Try It Out!
The average weekly vehicle
insurance benefits for 2008,
by category, are: \$433,
\$325, \$176, \$469, \$245,
and \$287.
The number of work
stoppages in the
manufacturing industry for
selected months are 8, 5,
12, 7, 9, and 0.
a)
a)
b)
What is the median monthly
benefit?
How many observations are
below the median? Above it?
b)
c)
LO
1
What is the median number
of stoppages?
How many observations are
below the median? Above it?
What is the modal number of
work stoppages?
37
The Mode In Excel
1. Select the Paste Function.
2. In the Function name list, select MODE. Click OK. A
dialogue box opens.
3. Enter the range in the Number1 box. The answer
appears in the dialogue box. Click OK.
LO
2
38
Excel Solutions
1. Open Excel and the Excel file Table 02-1 from the
DataSets on the CD provided.
2. From the menu bar, select Tools, Data Analysis, and
Descriptive Statistics; then click OK. In Excel 2007,
select Data in place of Tools.
3. Enter A1 : A97 as the Input Range. For Grouped By,
select Columns to indicate that your data is in a
column; select Labels in First Row to indicate that you
have the label List Price in the first cell of the input
range. Place the output in the same worksheet by
entering C3 in the Output Range.
4. Select the Summary statistics box; click OK.
LO
2
39
Excel Solutions
LO
2
40
LO
3
THE RELATIVE POSITIONS OF
THE MEAN, MEDIAN, AND
MODE
41
The Relative Positions of the Mean, Median
and the Mode
CHART 3-2 A Symmetric Distribution
CHART 3-3 A Positively Skewed Distribution
CHART 3-4 Negatively Skewed Distribution
LO
3
42
You Try It Out!
The monthly profits of Hi-Mark corporation were
organized into a frequency distribution. The mean of
monthly profits was computed to be \$105 600, the
median \$104 700, and the mode \$104 200.
a)
b)
LO
3
Sketch the profit in the form of a smoothed frequency polygon.
Note the location of the mean, median, and mode on the
X-axis.
Is the distribution symmetrical, positively skewed, or negatively
skewed? Explain.
43
LO
1
THE GEOMETRIC MEAN
44
The Geometric Mean
Useful in finding the average change of percentages,
ratios, indexes, or growth rates over time
Has a wide application in business and economics because
we are often interested in finding the percentage changes
in sales, salaries, or economic figures, such as the GDP,
which compound or build on each other
LO
1
45
The Geometric Mean
Will always be less than or equal to the arithmetic mean
Defined as the nth root of the product of n values
The formula for the geometric mean is written:
LO
1
46
Example – The Geometric Mean
The profit on investment earned by Super Constructions
Company for five successive years was: 20 percent, 30
percent, 30 percent, 50 percent and 300 percent. What is
the geometric mean rate of profit on investment?
LO
1
47
Solution– The Geometric Mean
GM  5 (1.2)(1.3)(0.7)(1.5)(4.0)
 5 6.552
 1.456
LO
1
48
Another Application
Another use of the geometric mean is to determine the
percent increase in sales, production or other business or
economic series from one time period to another.
LO
1
49
Example – Another Application
The population of Alberta grew from 2974807 in January
2001 to 3290350 in January 2007. What was the average
annual rate of percentage increase during the period?
LO
1
50
Solution – Another Application
3 290 350
GM 
1
2 974 807
 1.0169  1
 0.0169
6
LO
1
51
You Try It Out!
The percent increase in
profits, for the last five years
at Cure Medico are: 5.61,
7.85, 11.22, 19.77 and
23.87.
a)
b)
c)
LO
1
Find the geometric percent
increase.
Find the arithmetic percent
increase.
Is the arithmetic mean equal
to or greater than the
geometric mean?
Sale of Shine Bulbs
increased from 65 000 units
in 2006 to 132 745 units in
2010.
a) Find the geometric mean
annual percent increase.
52
Why Study Dispersion?
A measure of location, such as the mean or the median,
only describes the centre of the data, but it does not tell us
For example, if your nature guide told you that the river
across on foot without additional information? Probably not.
You would want to know something about the variation in
the depth.
A second reason for studying the dispersion in a set of data
is to compare the spread in two or more distributions.
53
Samples of Dispersions
CHART 3-5 Histogram of Years of Employment at Hammond Iron Works Inc.
CHART 3-6 Hourly Production of Computers at the Kanata and Waterloo Plans
54
LO
4
MEASURES OF DISPERSION
55
Range
The range is the difference between the largest and the
smallest value in a data set.
LO
4
56
LO
5
57
1) Only two values are used in its calculation
2) It is influenced by extreme values.
3) It is easy to compute and understand.
LO
5
58
Example – Range
Below are the hourly production rates of Vision Quest
computers at the Kanata and Waterloo plants. Find the
range for each of the plants.
CHART 3-6 Hourly Production of Computers at the Kanata and Waterloo Plants
LO
5
59
Solution – Range
Kanata: 52 – 48
=4
Waterloo: 60 – 40
= 20
LO
5
60
Mean Deviation
The mean deviation is the arithmetic mean of the absolute
values of the deviations from the arithmetic mean.
LO
4
61
LO
5
DEVIATION
62
Deviation
1) All values are used in the calculation.
2) It is easy to understand because it is the average of the
deviations from the mean.
3) Major drawback is the use of absolute values.
LO
5
63
Example – Mean Deviation
The number of chocolate drinks sold at the Choco-Coffee
shop in Ottawa between 5 and 8 p.m. for a sample of 6
days last year were 30, 50, 60, 70, 80 and 100. Determine
the mean deviation.
X  X
Number of Chocolate
Drinks Sold Daily
30
(30 – 65) = –35
Absolute
Deviation
35
50
(50 – 65) = –15
15
60
(60 – 65) = –5
5
70
(70 – 65) = 5
5
80
(80 – 65) = 15
15
100
(100 – 65) = 35
35
Total
LO
5
110
64
Solution – Mean Deviation
XX

MeanDeviation 
n
110
6
 18.33

LO
5
65
You Try It Out!
The heights of a group of newly arrived plants in a nursery
are given below (in centimetres)
10
a)
b)
c)
LO
5
15
14
16
25
13
17
23
What is the range of the heights?
Compute the arithmetic mean height.
Compute the mean deviation of the heights.
66
LO
4
VARIANCE AND STANDARD
DEVIATION
67
Population Variance
The Population Variance is the arithmetic mean of the
squared deviations from the population mean.
Where:
σ2 is the symbol for the population variance. It is usually
referred to as “sigma squared.”
μ is the arithmetic mean of the population.
X is the value of each observation in the population.
N is the number of observations in the population.
LO
4
68
LO
5
POPULATION VARIANCE
69
Variance
1. Advantage: All values are used in the calculation
2. Disadvantage: The units are awkward, the square of the
original units
LO
5
70
Steps in computing Population Variance
1. Begin by finding the mean.
2. Next, find the difference between each observation and
the mean.
3. Square the difference.
4. Sum all of the squared differences.
5. Divide the sum by the total number of observations in
the population.
LO
5
71
Example – Population Variance
The ages of all the patients in the isolation ward of
Mountainview Hospital are 35, 24, 11, 42, and 23 years.
Determine the population variance.
Age (X)
 X  
X  
35
24
11
42
23
135
8
-3
-16
15
-4
0*
64
9
256
225
16
570
2
*Sum of the deviations from mean must equal 0.
LO
5
72
Example – Population Variance
X


N
135

6
 27
2 
 X  
2
N
570

6
 95
LO
5
73
Population Standard Deviation
The population standard deviation σ is the square root of
the population variance.
LO
5
74
You Try It Out!
An office of Fine Tune Telecommunications hired seven
trainee engineers this year. Their monthly starting salaries
were: \$5356; \$5651; \$5423; \$5534; \$5467; \$5289 and
\$5670.
a)
b)
c)
d)
LO
5
Compute the population mean.
Compute the population variance.
Compute the population standard deviation.
Another location hired eight trainee engineers. Their mean monthly
salary was \$5650, and the standard deviation was \$480. Compare
the two groups.
75
Sample Variance
The population standard deviation σ is the square root of
the population variance.
It tends to underestimate the population variance σ 2. The
use of (n – 1) in the denominator provides the appropriate
correction for this tendency.
LO
5
76
Sample Variance
Where:
s2 is the sample variance.
X is the vale of each observation in the sample.
X is the mean of the sample.
n is the number of observations in the sample.
LO
5
77
Example – Sample Variance
The sample of prices of different cuts of beef available at a
restaurant is given below.
\$13, \$21, \$15, \$19, and \$22.
What is the sample variance?
LO
5
78
Solution – Sample Variance
Deviation Method
Note the sample mean is: 90 = 18
5
Price (\$)
X
13
21
15
19
22
90
LO
5
\$
X  X
–5
3
–3
1
4
0
\$2
X  X
25
9
9
1
16
60
2
s
2
X  X



2
n 1
60
5 1
 15\$2

79
Solution – Sample Variance
Continued
Direct Method
Price (\$)
LO
5
2
\$
2
X
13
X
169
21
441
15
225
19
361
22
484
90
1680
s2 

 X2 
 X 
n 1
n
 90 
1680 
5 1
2
2
5
60
5 1
 15\$2

80
LO
5
SAMPLE STANDARD DEVIATION
81
Sample Standard Deviation
The sample standard deviation is the square root of the
sample variance.
LO
5
82
Example – Sample Standard Deviation
The sample variance in the previous example involving
hourly quantity for cuts of beef is \$152. What is the sample
standard deviation?
LO
5
83
Solution – Sample Standard Deviation
The sample standard deviation is \$3.87, found by 15 . Note
again that the sample variance is in terms of dollars
squared, but taking the square root of 15 gives us \$3.87,
which is in the same units (dollars) as the original data.
LO
5
84
You Try It Out!
A sample of seven employees who will stop working after a few
months is given below:
3, 6, 2, 7, 1, 4 and 3.
a)
b)
LO
5
What is the sample variance?
Compute the sample standard deviation.
85
LO
6
INTERPRETATION AND USES OF
THE STANDARD DEVIATION
86
Chebyshev’s Theorem
For any set of observations (sample or population), the
proportion of the values that lie within k standard deviations
of the mean is at least 1-1/k2, where k is any constant
greater than 1.
Allows us to determine the minimum proportion of the
values that lie within a specified number of standard
deviations of the mean.
Concerned with any set of values; that is the distribution of
values can have any shape.
LO
6
87
Example – Chebyshev’s Theorem
The average number of students present today in each
class is 89.9, and the standard deviation is 11.31. At least
what percent of students lie within plus and minus 3.5
standard deviations?
LO
6
88
Solution – Chebyshev’s Theorem
1
1
1

1

2
k2
 3.5 
1
12.25
 0.92
 1
LO
6
89
LO
6
THE EMPIRICAL RULE
90
The Empirical Rule
For a symmetrical, bell-shaped frequency distribution:
1) Approximately 68 percent
of the observations will lie
within ±1 standard
deviation of the mean.
2) About 95 percent of the
observations will lie within
±2 standard deviations of
the mean.
3) Practically all (99.7 percent) CHART 3–7 A Symmetrical Bell-shaped Curve
Showing the relationship between the standard
Deviation and the Observations
will lie within ±3 standard
deviations of the mean.
LO
6
91
Example – The Empirical Rule
A sample of the rental rates at the Street Park apartments
approximates a symmetrical, bell-shaped distribution. The
sample mean is \$600; the standard deviation is \$24. Using
the Empirical Rule, answer these questions:
1. About 68 percent of the rental rates are between what two
amounts?
2. About 95 percent of the rental rates are between what two
amounts?
3. Almost all of the rental rates are between what two amounts?
LO
6
92
Solution – The Empirical Rule
1. About 68 percent are between
\$576 (\$600 - \$24) and \$624 (\$600 + \$24).
2. About 95 percent are between
\$552 (\$600 - 2(\$24)) and \$648 (\$600 + 2(\$24)).
3. Almost all (99.7 percent) are between
\$528(\$600 - 3(\$24)) and \$672 (\$600 + 3(\$24)).
LO
6
93
You Try It Out!
The Quality Metal Company is one of several domestic
manufacturers of PVC pipe. The quality control department
sampled 700 20m lengths. At a point 2 m from the end of
the pipe they measured the outside diameter. The mean
was 2.4 m and the standard deviation 0.2 m.
a) If the shape of the distribution is not known, at least what percent of
the observations will lie between 2.05 m and 2.35 m?
b) If we assume that the distribution of diameters is symmetrical and
bell-shaped, about 68 percent of the observations will be between
what two values?
c) If we assume that the distribution of diameters is symmetrical and
bell-shaped, about 95 percent of the observations will be between
what two values?
LO
6
94
LO
7
RELATIVE DISPERSION
95
Relative Dispersion
In order to make a meaningful comparison of different
measures, we need to convert each of these measures to a
relative value – that is, a percent.
The coefficient of variation is the ratio of the standard
deviation to the arithmetic mean, expressed as a
percentage.
LO
7
96
Example – Relative Dispersion
A study of the amount of money spent on the maintenance
of a car and the years of service of the car resulted in these
statistics: The mean amount spent was \$150; the standard
deviation was \$30; the mean number of years of service
was 15 years; the standard deviation was 1.5 years.
Compare the relative dispersion in the two distributions
using the coefficient of variation.
LO
7
97
Solution – Relative Dispersion
Money Spent
Years of Service
s
CV  (100)
X
\$30
CV 
(100)
\$150
CV  20%

s
CV  (100)
X
\$1.5
CV 
(100)
\$15
CV  10%
There is more dispersion relative to the mean in the
distribution of money spent compared with the distribution
of years of service (because 20 percent 10 percent).

LO
7
98
You Try It Out!
A large group of management trainees was given two types
of aptitude tests, a quantitative aptitude test and a
management aptitude test. The arithmetic mean score on
the quantitative aptitude test was 400, with a standard
deviation of 20. The mean was 50 and the standard
deviation for the management aptitude test was 10.
Compare the relative dispersion in the two groups.
LO
7
99
LO
7
SKEWNESS
100
Skewness
In a symmetric set of observations the mean and median
are equal and the data values are evenly spread around
these values. The data values below the mean and median
are a mirror image of those above.
A set of values is skewed to the right or positively
skewed if there is a single peak and the values extend
much further to the right of the peak than to the left of the
peak. In this case the mean is larger than the median.
LO
7
101
Skewness
In a negatively skewed distribution there is a single peak
but the observations extend further to the left, in the
negative direction, than to the right. In a negatively skewed
distribution the mean is smaller than the median.
A bimodal distribution will have two or more peaks.
LO
7
102
Shapes of Frequency Polygons
CHART 3 – 8 Shapes of Frequency Polygons
LO
7
103
Pearson’s Coefficient of Skewness
The major characteristics of the coefficient of skewness
are:
1) It can range from –3.00 up to 3.00.
2) A value of 0 indicates a symmetric distribution.
It is computed as follows:
LO
7
104
Example – Skewness
Following are the earnings per share, in dollars, for a
sample of 16 software companies for the year 2008.
\$0.08 0.12
0.44
0.52
4.55
7.93
8.62 11.15 14.88 17.43 13.13
7.36
1.10
1.19
2.49
1.18
The mean is \$5.76. The standard deviation is \$5.85. The
median is \$3.52. Find the coefficient of skewness using
Pearson’s estimate.
LO
7
105
Solution – Skewness
3(\$5.76  \$3.52)
sk 
\$5.85
 1.143
LO
7
106
Skewness In Excel
LO
7
107
You Try It Out!
A sample of ten customer care technicians employed in the
customer service department of a large telecommunication
company received the following number of calls yesterday:
85, 78, 90, 97, 86, 72, 95, 89, 90, and 75.
a)
b)
c)
LO
7
Find the mean, median, and the standard deviation.
Compute the coefficient of skewness using Pearson’s method.
What is your conclusion regarding the skewness of the data?
108
LO
8
OTHER MEASURES OF
DISPERSION
109
Other Measures of Dispersion
Quartiles divide a set of observations into four equal parts.
Deciles divide a set of observations into 10 equal parts.
Percentiles divide a set of observations into 100 equal
parts.
LO
8
110
Example – Percentiles
Listed below are the commissions earned, in dollars, last
month by a sample of 17 brokers at a real estate office.
\$2345 1657 1876 1235 2789 2354 1987 2309 1985
2001
1967 1976 2343 2654 2313 1768 2650
Locate the median, the first quartile, and the third quartile
for the commissions earned.
LO
8
111
Solution - Percentiles
The first step is to organize the data from the smallest
commission to the largest.
\$1235 1657 1768 1876 1967 1976 1985 1987 2001
2309 2313 2343 2345 2354 2650 2654 2789
25
L25  (17  1)
 4.5  \$1967
100
50
L50  (17  1)
 9  \$2001
100
L75  (17  1)
LO
8
75
 13.5  \$2345
100
112
You Try It Out!
The quality control department of a tomato sauce company
is responsible for checking the mass of the 250g jar of
tomato sauce. The masses of a sample of eight jars
produced last hour are:
251 249 246 248 250 250 243 242 245
a)
b)
LO
8
What is the median mass?
Determine the masses corresponding to the first and third
quartiles.
113
LO
9
BOX PLOT
114
Box Plots
A box plot is a graphical display, based on quartiles, that
helps to picture a set of data
Five pieces of data are needed to construct a box plot:
1.
2.
3.
4.
5.
LO
9
Minimum value
First quartile
Median
Third quartile
Maximum value
115
EXAMPLE – Box Plots
Stuart’s Pizza offers free delivery of its pizza within 15 km.
Stuart, the owner, wants some information on the time it
takes for delivery. How long does a typical delivery take?
Within what range of times will most deliveries be
completed? For a sample of 20 deliveries, he determined
the following information:
Minimum value
= 15 minutes
1st Quartile (Q1)
= 16 minutes
Median
= 18 minutes
3rd Quartile (Q3)
= 26 minutes
Maximum value
= 31 minutes
Develop a box plot for the delivery times.
LO
9
116
Solution – Box Plots
LO
9
117
You Try It Out!
The following box plot is given.
What are the median, the largest and smallest values, and
the first and third quartiles? Would you agree that the
distribution is symmetrical?
LO
9
118
THE MEAN, MEDIAN, AND
STANDARD DEVIATION
OF GROUPED DATA
119
The Mean of Grouped Data
To approximate the arithmetic mean of data organized into
a frequency distribution, we begin by assuming the
observations in each class are represented by the midpoint
of the class
Where:
X is the designation for each sample mean.
X is the midpoint of each class.
f is the frequency in each class.
fX is the frequency in each class times the midpoint of the
class.
 fX is the sum of these products.
n is the total number of frequencies.
120
Example – The Mean of Grouped Data
The computations for the
arithmetic mean of data
grouped into a frequency
distribution will be shown
based on the real estate data,
from the south-east area of
Edmonton. The frequency
distribution for the list prices is
given here. Determine the
arithmetic mean of the listed
prices.
List Price
(\$ thousands)
Frequency
\$250 to under \$375
33
375 to under 500
41
500 to under 625
11
625 to under 750
5
750 to under 875
4
875 to under 1000
1
1000 to under 1125
1
Total
96
121
Solution – The Mean of Grouped Data
List Prices, South-East Area of Edmonton
List Price
(\$ thousands)
Frequency
f
Midpoint (\$)
X
fX (\$)
\$250 to under \$375
33
\$312.5
\$10312.5
375 to under 500
41
437.5
17937.5
500 to under 625
11
562.5
6187.5
625 to under 750
5
687.5
3437.5
750 to under 875
4
812.5
3250.0
875 to under 1000
1
937.5
937.5
1000 to under 1125
1
1062.5
1062.5
Total
96
X
 fX
n

\$43125.0
\$43125
 \$449.2(thousands)
96
122
The Median of Grouped Data
If the data is grouped some of the raw data values may not
be available, and so we cannot necessarily determine the
exact value of the median.
But we can estimate the median by first finding the position
of the median (which class it falls in), and then calculating
an estimate of the median within this median class.
Where:
L is the lower limit of the median class.
N is the size of the population.
f is the frequency of the median class.
fc is the cumulative frequencies up to but excluding the median class
i is the class width of the median class
123
Example – The Median of Grouped Data
Professor Law lists the following quiz marks for his class of
30 students. Determine the median quiz mark. Note the
marks have already been sorted from lowest to highest.
0
1
2
3
4
5
6
7
8
9
10
Number of
Students
0
0
1
0
1
2
4
12
5
3
2
124
Solution – The Median of Grouped Data
Find the position of the median mark.
50
L50  (30  1)
 15.5  7
100
Quiz
Mark
Number of
Students
fc
0
0
0
1
0
0
2
1
1
3
0
1
4
1
2
5
2
4
6
4
8
7
12
20
8
5
25
9
3
28
10
2
30
125
The Standard Deviation of Grouped Data
If the data of interest are in grouped form (in a frequency
distribution), the sample standard deviation can be
approximated by substituting ∑fX2 for ∑X2 and ∑fX for ∑X.
Where:
s
X
f
n
is the symbol for the sample standard deviation.
is the midpoint of a class.
is the class frequency.
is the total number of sample observations.
126
Solution – The Standard Deviation of Grouped
Data
List Prices, South-East Area of Edmonton
fX (\$)
fX2
List Price
(\$ thousands)
Frequency
f
Midpoint (\$)
X
\$250 to under \$375
33
\$312.5
\$10312.5
3222656.25
375 to under 500
41
437.5
17937.5
7847656.25
500 to under 625
11
562.5
6187.5
3480468.75
625 to under 750
5
687.5
3437.5
2363281.25
750 to under 875
4
812.5
3250.0
2640625.00
875 to under 1000
1
937.5
937.5
878906.25
1000 to under 1125
1
1062.5
1062.5
1128906.24
Total
96
\$43125.0
21562499.99
(43125)2
21562499.9996
s
 151.83
96  1
127
You Try It Out!
The net incomes of a sample
of large art dealers were
organized into the table given
to the right.
a)
b)
c)
d)
What is the table called?
Based on the distribution, what
is the estimate of the
arithmetic mean net income?
Based on the distribution, what
is the estimate of the median
net income?
Based on the distribution, what
is the estimate of the standard
deviation?
Net Income
(\$Millions)
Number
of
Importers
\$4 to under \$8
2
8 to under 12
3
12 to under 16
12
16 to under 20
7
20 to under 24
5
128
Chapter Summary
I. A measure of central location is a value used to describe
the centre of a set of data.
A. The arithmetic mean is the most widely reported
measure of central location.
1. It is calculated by adding the values of the
observations and dividing by the total number of
observations.
a. The formula for a population mean of
ungrouped or raw data is:
X

[3–1]
N
b. The formula for the mean of a sample is:
X
X=
[3–2]
n
129
Chapter Summary
c
The formula for the sample mean of data in a
frequency distribution is X = fX .
[3–16]
n
2. The major characteristics of the arithmetic mean
are:
a. At least the interval scale of measurement is
required.
b. All the data values are used in the calculation.
c. A set of data has only one mean. That is, it is
unique.
d. The sum of the deviations from the mean
equals 0.
130
Chapter Summary
B. The weighted mean is found by multiplying each
observation by it’s corresponding weight.
1. The formula for determining the weighted mean
is: X w  w1 X1  w2 X 2  w3 X 3   wn X n
[3–3]
w1  w2  w3 
 wn
2. It is a special case of the arithmetic mean.
C. The median is the value in the middle of a set of
ordered data.
1. To find the median, sort the observations from
smallest to largest and identify the middle value.
2. The major characteristics of the median are:
a. At least the ordinal scale of measurement is
required.
131
Chapter Summary
b. It is not influenced by extreme values.
c. Fifty percent of the observations are larger
than the median.
d. It is unique to a set of data.
3. The formula for the median of grouped data is:
N
 fc
[3–17]
2
Median = L 
(i )
f
D. The mode is the value that occurs most often in a
set of data.
1. The mode can be found for nominal-level data.
2. A set of data can have more than one mode.
132
Chapter Summary
E. The geometric mean is the nth root of the product of
n values.
1. The formula for the geometric mean is:
[3–4]
GM = n ( X1 )( X 2 )( X 3 ) ( X n )
2. The geometric mean is also used to find the rate
of change from one period to another.
[3–5]
Value at end of period
GM =
n
Value at beginning of period
1
3. The geometric mean is always equal to or less
than the arithmetic mean.
133
Chapter Summary
II. The dispersion is the variation or spread in a set of data.
A. The range is the difference between the largest and
the smallest value in a set of data.
1. The formula for the range is:
Range  Largest value  Smallest value
[3–6]
2. The major characteristics of the range are:
a. Only two values are used in its calculation.
b. It is influenced by extreme values.
c. It is easy to compute and to understand.
B. The mean absolute deviation is the sum of the
absolute values of the deviations from the mean
divided by the number of observations.
134
Chapter Summary
1. The formula for computing the mean absolute
deviation is: MD =   X  X |
[3–7]
n
2. The major characteristics of the mean absolute
deviation are:
a. It is not unduly influenced by large or small
values.
b. All observations are used in the calculation.
c. The absolute values are difficult to work with.
C. The variance is the mean of the squared deviations
from the arithmetic mean.
1. The formula for the population variance is:
 X   2
2
[3–8]
 
N
135
Chapter Summary
2. The formula for the sample variance is:
 X  X )2
s 
n 1
2
[3–10]
3. The major characteristics of the variance are:
a. All observations are used in the calculation.
b. It is not unduly influenced by extreme
observations.
c. The units are somewhat difficult to work with;
they are the original units squared.
D. The standard deviation is the square root of the
variance.
136
Chapter Summary
1. The major characteristics of the standard
deviation are:
a. It is in the same units as the original data.
b. It is the square root of the average squared
deviation from the mean.
c. It cannot be negative.
d. It is the most widely reported measure of
dispersion.
2. The formula for the sample standard deviation is:
[3–12]
X )2
2
X 
s=
n 1
n
137
Chapter Summary
3. The formula for the standard deviation of
fX )2
grouped data is:
[3–18]
fX 2 
s=
n1
n
III. We interpret the standard deviation using two
measures.
A. Chebyshev’s theorem states that regardless of the
shape of the distribution, at least 1 - 1/k2 of the
observations will be within k standard deviations of
the mean, where k is greater than 1.
B. The Empirical Rule states that for a bell-shaped
distribution about 68 percent of the values will be
within 1 standard deviation of the mean, about 95
percent within 2, and virtually all within 3.
138
Chapter Summary
IV. The coefficient of variation is a measure of relative
dispersion.
A. The formula for the coefficient of variation is:
s
CV = (100)
[3–13]
X
B. It reports the variation relative to the mean.
C. It is useful for comparing distributions with different
units.
V. The coefficient of skewness measures the symmetry of
a distribution.
A. In a positively skewed set of data the long tail is to
the right.
B. In a negatively skewed distribution the long tail is to
the left.
139
Chapter Summary
C. There are two formulas for the coefficient of
skewness.
1. The formula developed by pearson is:
sk =
3( X  Median)
s
[3–14]
2. The coefficient of skewness computed by
statistical software is:
  X  X 3 
n
 
sk =
 
( n  1)( n  2)   s  


VI. Measures of location also describe the spread in a set
of observations.
A. A quartile divides a set of observations into four
equal parts.
140
Chapter Summary
1. Twenty-five percent of the observations are less
than the first quartile, 50 percent are less than
the second quartile (the median), and 75 percent
are less than the third quartile.
2. The interquartile range is the difference between
the third and the first quartile.
B. Deciles divide a set of observations into ten equal
parts and percentiles into 100 equal parts.
141
Chapter Summary
C. A box plot is a graphic display of a set of data.
1. A box is drawn enclosing the regions between
the first and third quartiles.
a. A line is drawn inside the box at the median
value.
b. Dotted line segments are drawn from the third
quartile to the largest value to show the
highest 25 percent of the values and from the
first quartile to the smallest value to show the
lowest 25 percent of the values.
2. A box plot is based on five statistics: the
maximum and minimum values, the first and third
quartiles, and the median.