Download Chapter 3

Document related concepts

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Chapter 3
1
◦ Methods for organizing, displaying and
describing data using tables, graphs and
summary measures
 Raw data is made more manageable
 Raw data is presented in a logical form
 Patterns can be seen from organised data




Frequency tables
Graphical techniques
Measures of Central Tendency
Measures of Spread (variability)
Chapter 3
2








Organize data and display data using tables and graphs
a) presentation of qualitative data
b) presentation of quantitative data
Describe the characteristics of data set using statistical
measures
a) measures of central tendency
b) measures of dispersion
c) measures of skewness
d) Box and whisker plot
e) Population vs sample
Chapter 3
3
Chapter 3
4
Definition:
Data
recorded in the sequence in which they
are collected and before they are processed
or ranked are called raw data
21
18
25
22
25
19
20
19
28
23
24
19
31
21
18
25
22
19
20
37
29
19
23
22
27
34
19
18
22
23
26
25
23
21
21
27
22
19
20
25
37
25
23
19
21
33
23
26
21
24
Ex: Ages of 50 students
Chapter 3
5






Qualitative Data
Data that cannot be measured but can be
classified into different categories
Example: gender, status of a students,
nationality, races
Quantitative Data
Data that can be measured numerically
Example: income, heights, gross sales, prices of
homes, numbers of cars owned and numbers of
accident
Chapter 3
6


a) Organizing qualitative data
◦ (i) Frequency distributions
◦ (ii)Relative frequency and
distributions
percentage
b) Graphing qualitative data
◦ (i) Bar graphs
◦ (ii)Pie charts
Chapter 3
7
A
frequency
distribution
for
qualitative data lists all categories
and the number of elements that
belong to each of the categories.
Chapter 3
8
A
sample of 20 employees from large
companies was selected and these employees
were asked how stressful their jobs were. The
responses are recorded as very represents very
stressful, somewhat means somewhat stressful
and none stands for not stressful at all.
somewhat
none
somewhat
very
none
very
somewhat
none
somewhat
somewhat
very
somewhat
very
somewhat
somewhat
somewhat
Chapter 3
very
very
none
very
9
Stress on job
Tally
Frequency (f )
Very
|||| ||
7
Somewhat
|||| ||||
9
None
||||
4
Sum = 20
Frequency Distribution of Stress on Job
Chapter 3
10
Frequency of that category
Relative frequency of a category =
Sum of all frequencies
Percentage = (Relative ferquency)  100
Chapter 3
11
Stress on job
Relative frequency
Percentage (%)
Very
7/20 = 0.35
0.35(100) = 35
Somewhat
9/20 = 0.45
0.45(100) = 45
None
4/20 = 0.20
0.20(100) = 20
Sum = 1.00
Sum = 100%
Relative frequency and percentage distributions of
stress on job
Chapter 3
12
A
graph made of bars where the categories
are on the horizontal axis and the frequencies
(or relative frequencies) are on the vertical
axis.
60
40
20
0
heart
cancer stroke CLRD accident
Chapter 3
13
A
circle divided into portions that represent the
relative frequencies or percentages of a
population or a sample belonging to different
categories is called a pie chart.
heart
cancer
stroke
CLRD
Chapter 3
accident
14
Numerical
Data
1
Array
2
3
Types of
quantitative data
a
Histogram
b
Polygon
Frequency
Distributions
c
Ogive
Chapter 3
d
Stem & Leaf
15
1. Organizes data to focus on major
features
i. Ascending
Example: 1, 2, 3, 4, 5,….
ii. Descending
Example: 10, 9, 8, 7, 6,….
iii.Range (difference between the largest and
smallest)
Example: largest height is 74 inch
smallest height is 60 inch
range is 74 – 60 = 14 inch

Chapter 3
16




o
o
o
o
Quickly notice lowest and highest values in
the data
Easily divide data into sections
Easily see values that occur frequently
Observe variability in the data
Chapter 3
17
Raw Data: Yards Produced by 30 Carpet Looms
16.2 15.4 16.0 16.6 15.9 15.8 16.0 16.8 16.9 16.8
15.7 16.4 15.2 15.8 15.9 16.1 15.6 15.9 15.6 16.0
16.4 15.8 15.7 16.2 15.6 15.9 16.3 16.3 16.0 16.3
(ungrouped data)
Chapter 3
18
Raw Data: Yards Produced by 30 Carpet Looms
16.2 15.4 16.0 16.6 15.9 15.8 16.0 16.8 16.9 16.8
15.7 16.4 15.2 15.8 15.9 16.1 15.6 15.9 15.6 16.0
16.4 15.8 15.7 16.2 15.6 15.9 16.3 16.3 16.0 16.3
Data Array:
Daily Production in
Yards of 30 Carpet
Looms
15.2
15.4
15.6
15.6
15.6
15.7
15.7
15.8
15.8
15.8
15.9
15.9
15.9
15.9
16.0
16.0
16.0
16.0
16.1
16.2
Chapter 3
16.2
16.3
16.3
16.3
16.4
16.4
16.6
16.8
16.8
16.9
19




Discrete data - integer values 0, 1, 2
Example: number of children, cars,..
Continuous data
Example: weight, length, time, area,
price, 256.312 grams
Chapter 3
20
A
frequency distribution for quantitative data
lists all the classes and the number of values
that belong to each class. Data presented in the
form of a frequency distribution are called
grouped data
Chapter 3
21
variable
third class
classes
lower limit
of sixth class
Weekly Earnings
(RM)
401 - 600
Num of
Employees (f )
9
601 - 800
12
801 - 1000
39
1001 - 1200
15
1201 - 1400
9
1401 - 1600
6
frequency
column
frequencies
upper limit of sixth class
Chapter 3
22

Class boundary = upper limit + lower limit of
next class
2
Ex: Upper boundary of first class
(600+601)/2 = 600.5
Lower boundary of second class
(601+600)/2 = 600.5
Upper boundary one class = Lower boundary next class
Chapter 3
23

Class width = upper boundary - lower boundary
Example:
Width of first class
600.5 - 400.5 = 200
Width of second class
800.5 - 600.5 = 200
Chapter 3
24

Class midpoint = lower limit + upper limit
2
Ex: Midpoint of the first class
(401 + 600)/2 = 500.5
Ex: Midpoint of the second class
(601 + 800)/2 = 700.5
Chapter 3
25
class
interval
i.
ii.
iii.
iv.
v.
Height (cm)
Number of
Students
60 - 62
63 - 65
66 - 68
69 - 71
72 – 74
10
18
42
27
8
Total
105
frequency
First class limits. Lower class limit = 60
Upper class limit = 62
First class boundary. Upper boundary = 62.5
Lower boundary = 59.5
Class width. Example: c = 62.5 - 59.5 = 3
First class midpoint = (60 + 62)/2 = 61
Class frequency = number of students
Chapter 3
26
Weekly Earnings
(RM)
400 - 600
Num of
Employees (f )
9
600 - 800
12
800 - 1000
39
1000 - 1200
15
1200 - 1400
9
1400 - 1600
6
Class limit = Class boundary
Chapter 3
27
15.2 15.2 15.3 15.3 15.3 15.3 15.3 15.4 15.4 15.4
Raw Data: 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.5 15.5
15.5 15.5 15.5 15.5 15.6 15.6 15.6 15.7 15.7 15.7
Frequency
Distribution
Class
Tallies
15.2
15.3
15.4
15.5
15.6
15.7
//
////
//// //// /
//// /
///
///
Frequency
2
5
11
6
3
3
Relative
Frequency
Distribution
Class
15.2
15.3
15.4
15.5
15.6
15.7
Frequency
(1)
2
5
11
6
3
3
30
Relative Freq.
(1)  30
0.07
0.16
0.37
0.20
0.10
0.10
1.00
Chapter 3
Cumulative
Relative
Frequency
0.07
0.23
0.60
0.80
0.90
1.00
29

When constructing a frequency distribution
table, we need to make the following three
major decisions :



Number of Classes
Class Width
Lower Limit of the First Class / Starting Point
Chapter 3
30

Number of Classes
k = 1 + 3.3 log n

Class width
i ≥ Largest Value – Smallest Value

Number of classes (k)
Lower Limit of the First Class/ Starting
Point
◦ Any convenient number that is equal to or less than
the smallest value in the data set can be used as
the lower limit of the first class.
Chapter 3
31

1. Determine the Class Interval Size or
Class Width)
Example: Given the following data
100 74 84 95 95 110 99 87
100 108 85 103 99 83 91 91
84 110 113 105 100 98 100 108
100 98 100 107 79 86 123 107
87 105 88 85 99 101 93 99
u
R = 123 - 74 = 49
Chapter 3
32

Number of Classes
k =
=
=
≈
1 + 3.3 log n
1 + 3.3 log 40
6.3
6
Chapter 3
33
 Class




Width
i ≥ Largest Value – Smallest Value
Number of classes (k)
≥ 49/6
≥9
Chapter 3
34
Grouped
Frequency
Distribution
6 classes
Cumulative
Class
Frequency Relative Frequency
%
(1)
(1)  40
71 - 80
Class Interval Midpoint
81 - 90
(71 + 80)/2 = 75.5
91 - 100
Upper Limit
101 - 110
100
111 - 120
Lower Limit
121 - 130
91
Class width = 130.5 – 120.5
= 10
Chapter 3
35

Class Boundary – Is given by the mid-point of
the upper limit of one class and the lower limit
of the next class. Class boundaries are also call
real class limit.
Chapter 3
36
•
•
•
Histogram is a certain kind of graph that can
be drawn for a frequency distribution, a
relative frequency distribution or a percentage
distribution.
To draw histogram, mark horizontal axis as
classes and vertical axis as frequencies (or
relative frequencies or percentage).
A histogram is called a frequency histogram, a
relative frequency histogram or a percentage
histogram depending on the vertical axis
Chapter 3
37
Class
15.2-15.5
15.5-15.8
15.8-16.1
16.1-16.4
16.4-16.7
16.7-16.10
Frequency
12
10
8
Frequency
2
5
11
6
3
3
6
4
2
0
15.2
15.5
15.5
15.8
15.8
16.1
16.1
16.4
16.4 16.7
16.7 16.10
Chapter 3
38
•
•
•
A graph formed by joining the midpoints of the
tops of successive bars in a histogram with
straight lines is called a polygon.
A graph of polygon consist of class midpoints
on the horizontal axis and the frequencies,
relative frequencies or percentages on the
vertical axis.
A histogram is called a frequency histogram, a
relative frequency histogram or a percentage
histogram depending on the vertical axis
Chapter 3
39
Frequency
12
10
8
6
4
2
0
15.0 15.3 15.6 15.9 16.2 16.5 16.8
Production Level in Yards
17.1
Chapter 3
40
Frequency
12
10
8
6
4
2
0
15.0 15.3 15.6 15.9 16.2 16.5 16.8
Production Level in Yards
17.1
Chapter 3
41
•
Ogive is a curve drawn for the cumulative
frequency distribution by joining with straight lines
the dots marked above the upper boundaries of
classes at heights equal to the cumulative
frequencies of respective classes.
Chapter 3
42
•
•
•
Each value is divided into two portions (a
stem and a leaf). The leaves for each stem are
shown separately is a display.
An advantage of a stem and leaf display is we
do not lose information on individual
observations
only for quantitative data
Chapter 3
43
The
following are scores of 30 college students
on a statistics test:
75
93
79
71
Construct
52
95
68
87
80
69
50
72
96
72
92
92
65
81
83
57
79
61
84
98
71
76
77
87
86
64
a stem and leaf display.
Chapter 3
44
1.
2.
3.
4.
Split each score into two parts
First part contains first digit which called stem
Second part contains the second digit which called
the leaf
Arranged in increasing order.
stem
5
6
7
8
9
2
leaves
5
Chapter 3
45
The
complete stem and leaf display for scores is
shown below:
5
6
7
8
9
2
5
5
0
6
0
9
9
7
3
7
1
1
1
5
8
2
6
2
4
6 9 7 1 2
3 4 7
2 8
From
the figure, the stem 7 has the highest
frequency followed by stem 8,9,6 and 5
Chapter 3
46
The
leaves for each stem are ranked in increasing
order as below:
5
6
7
8
9
0
1
1
0
2
2 7
4 5 8 9
1 2 2 5 6 7 9 9
1 3 4 6 7 7
2 3 5 6 8
Chapter 3
47




Diastolic blood pressure on 120 people.
60 Type A people vs. 60 Type B people
Type A: Extremely hostile, competitive,
impatient
Type B: Laid back people
Chapter 3
48
Type A: Extremely hostile, competitive, impatient
53, 57, 58, 59, 59, 60, …
Type B: Laid back people
51, 52, 59, 59, 60, …
Chapter 3
49
5
6
6
6
6
7
7
8
37899
00001111
2223333
444455555
666777778888
0000111
333444789
011
5
6
6
6
7
7
8
9
1299
0001122233333
4445555555777
888889
0000111
222333466899
0000
3
Chapter 3
50
5
6
6
6
6
7
7
8
37899
00001111
2223333
444455555
666777778888
0000111
333444789
011
Modes
Chapter 3
51
Chapter 3
52
 distinguish among the measures of central
tendency, measures of dispersion and
measures of skewness.
 calculate values for common measures of
location, including the arithmetic mean,
median and mode.
 calculate values for common measures of
dispersion, including range, variance, standard
deviation and quartile deviation
 calculate values for measures of skewness.
Chapter 3
53
measure of asymmetry:
to show frequency
distribution symmetrical
about the mean or skewed
Measure
of central
tendency
measure of location:
to show where the centre
of the data
Statistical
Measures
Measure of
skewness
Measure of
dispersion
measure of spread:
to show how spread out
the data are around the
centre
Chapter 3
54
MEASURE OF CENTRAL TENDENCY
1. Mean ( average, x
)
- Add all observation
- Divide this sum by the
number of observation
x

a) Set of values,x =
n
b) Simple frequency distribution
fx

x=
f
c) Grouped frequency
fx

x=
f
( x = class midpoint)
Chapter 3
55
MEASURE OF CENTRAL TENDENCY
 it might be distorted by extremely high or low
values.
Chapter 3
56
MEASURE OF CENTRAL TENDENCY
◦ Advantages
 it is widely understood
 the value of every item is included in the computation of
the mean.
 it is well suited to further statistical analysis.
◦ Disadvantages
 its value may not correspond to any actual value.
 it might be affected by extremely high or low values.
Chapter 3
57
MEASURE OF CENTRAL TENDENCY
Example
a. The arithmetic mean (mean) of the number 8, 3, 5, 12,
and 10 is..
b. If 5, 8, 6, and 2 occur with frequencies 3, 2, 4 and 1,
the mean is..
c. Find the mean of the following frequency distribution
Class
Frequency
1-3
4-6
7-9
10 - 12
13 - 15
16 - 18
1
4
8
6
3
1
Chapter 3
58
MEASURE OF CENTRAL TENDENCY
a. x 
 x  8  3  5  12  10  7.6
n
5
b. x   fx  5(3)  8(2)  6(4)  2(1)  5.7
3  2  4 1
f
c.
Class
f
x (midpoint)
fx
1-3
4-6
7-9
10 - 12
13 - 15
16 - 18
1
4
8
6
3
1
2
5
8
11
14
17
2
20
64
66
42
17
f
 fx
 23
 211
x
Chapter 3
 fx  211  9.17
 f 23
59
MEASURE OF CENTRAL TENDENCY
2. Median (middle value
of a distribution or array)
- Arrange the observations
in order of increasing size
- Find the number of observations
and the middle observation
- Identify the median as this middle
value
a) Set of data
b) Simple frequency distribution
n 1
odd
2
n
n
and  1
2
2
even
( n = sample size )
c) Grouped frequency
(i) Graphical method
(ii) Interpolation method
Chapter 3
60
MEASURE OF CENTRAL TENDENCY
(i) Graphical Method
Median = 700
Chapter 3
61
MEASURE OF CENTRAL TENDENCY
(ii) Interpolation Method
Median =
n

 Fm1
2

Lm  
Cm
fm




Where:
Lm
n
= the lower boundary of the class containing the median.
= the total frequencies.
Fm-1 = the cumulative frequency in the classes immediately
preceding the class containing the median.
fm
Cm
= the frequency in the class containing the median.
= the width of the class in which the median lies.
Chapter 3
62
MEASURE OF CENTRAL TENDENCY
 it is unaffected by extremely high or low
values.
Chapter 3
63
MEASURE OF CENTRAL TENDENCY
 Advantages
 it is unaffected by extremely high or low values.
 can be used when certain end values of a set or
distribution are difficult, expensive or impossible to
obtain, particularly appropriate to ‘life’ data.
 can be used with non-numeric data if desired, providing
the measurements can be naturally ordered.
 will often assume a value equal to one of the original
data.
 Disadvantages
 it is difficult to handle theoretically in more advanced
statistical work, so its use is restricted to analysis at a
basic level.
 it fails to reflect the full range of values.
Chapter 3
64
MEASURE OF CENTRAL TENDENCY
Example
a. The times taken to inspect five units coming from a
production line
are recorded as 13, 14, 11, 17 and 11 minutes. What is
the median?
b. Find the median of the following frequency distribution
Class
Frequency
118 - 126
127 - 135
136 - 144
145 - 153
154 - 162
163 - 171
172 - 180
3
5
9
12
5
4
2
Chapter 3
65
MEASURE OF CENTRAL TENDENCY
a.
b.
n 1 5 1

3
2
2
median  13
median 
11, 11, 13, 14, 17
Class
f
F
118 - 126
127 - 135
136 - 144
145 - 153
154 - 162
163 - 171
172 - 180
3
5
9
12
5
4
2
3
8
17
29
34
38
40
median class 
n 40

 20
2 2
n

 Fm1
2

median  Lm  
Cm

 fm 


 40

 17
 2

=144.5+ 
(153.5  144.5)

 12 


 147
Chapter 3
66
MEASURE OF CENTRAL TENDENCY
3. Mode (value which occurs
most often)
- Draw a frequency table
for the data
- Identify the mode as the
most frequent value
a) Set of data
b) Simple frequency distribution
Mode = value that
appears most frequently
c) Grouped frequency
(i) Graphical method
(ii) Interpolation method
Chapter 3
67
MEASURE OF CENTRAL TENDENCY
(i) Graphical Method
16
14
No. of cars
12
10
8
Mode =
146
6
4
2
0
110 - 120 120 - 130 130 - 140 140 - 150 150 - 160 160 - 170 170 - 180
Mileage (km)
Chapter 3
68
MEASURE OF CENTRAL TENDENCY
(ii) Interpolation Method
Mode =
Where:
 D1 
C
L
D D 
 1
2
L = The lower class boundary of class containing the
mode.
C = The class width for class containing the mode.
D1 = Difference between the largest frequency and the
frequency immediately preceding it (f0 – f-).
D2 = Difference between the largest frequency and the
frequency immediately following it (f0 – f+).
Chapter 3
69
MEASURE OF CENTRAL TENDENCY
Mode
 the mode of a set of data is that value
which occurs most often, or, equivalently ,
has the largest frequency.
Chapter 3
70
MEASURE OF CENTRAL TENDENCY
◦ Advantages
 it is more appropriate average to use in situations
where it is useful to know the most common value.
 easy to understand, not difficult to calculate and can
be used when a distribution has opened-ended classes.
 it is not affected by extreme values.
◦ Disadvantages
 it ignores dispersion around the modal value and it
does not take all the values into account.
 it is unsuitable for further statistical analysis.
 although it ignores extreme values, it is thought to be
too much affected by the most popular class when a
distribution is significantly skewed.
Chapter 3
71
MEASURE OF CENTRAL TENDENCY
Example
a. Find the mode of the following frequency distribution
Class
Frequency
1-3
4-6
7-9
10 - 12
13 - 15
16 - 18
1
4
8
6
3
1
Chapter 3
72
MEASURE OF CENTRAL TENDENCY
Class
Frequency
1-3
4-6
7-9
10 - 12
13 - 15
16 - 18
1
4
8
6
3
1
mode class
 D1 
mode  L  
C
 D1  D2 
84


 6.5  
(9.5  6.5)

 (8  4)  (8  6) 
 8.5
Chapter 3
73
MEASURE OF DISPERSION
1. Range
maximum value – minimum value
Chapter 3
74
MEASURE OF DISPERSION
2. Standard deviation
- Calculate the mean value
a) Set of data
s=
x - x
2
s=
x
n
n
2
 x 
-
  n 


2
- find the deviation of each
observation from this mean
b) Simple frequency distribution
- Square these deviations
- add the squares
s=
- divide this sum by num of
observations
- Square root of the value
obtained
 fx
f
2
  fx 
-
  f 


2
c) Grouped frequency
s=
 fx
f
2
  fx 
-
  f 


2
where x = class mid-point
Chapter 3
75
MEASURE OF DISPERSION
Comparing standard deviation
Chapter 3
76
MEASURE OF DISPERSION
a) Set of data
3. Variance
x - x

v=
x

v=
2
n
n
2
  fx 
-
  n 


2
b) Simple frequency distribution
variance =  standard deviation 
2
s
2
fx

=
f
2
  fx 
-
  f 


2
c) Grouped frequency
fx   fx 

2
s =
-


f  f 
2
where x Chapter
= class
3
2
77
MEASURE OF DISPERSION
Example
a. Find the variance and standard deviation of the following data:
Class
Frequency
0 - 4.9
5 - 9.9
10 - 14.9
15 - 19.9
20 - 24.9
3
5
7
6
2
Chapter 3
78
MEASURE OF DISPERSION
Class
f
x
x2
fx
fx2
0 - 4.9
5 - 9.9
10 - 14.9
15 - 19.9
20 - 24.9
3
5
7
6
2
2.45
7.45
12.45
17.45
22.45
6.0025
55.5025
155.0025
304.5025
504.0025
7.35
37.25
87.15
104.7
44.9
18.0075
277.5125
1085.0175
1827.015
1008.005


f
 23
fx
 281.35
2
  fx 
fx

2
s 
-


f
f
  
 4215.5575
2
4215.5575  281.35 



23
 23 
 183.2851  149.6367
 33.6484
 33.65

fx 2
2
s  s2
 5.8
Chapter 3
79
MEASURE OF DISPERSION
4. Chebyshev’s Theorem
- By using the mean and standard deviation, we can find the propor
or percentage of the total observation that fall within a given inte
about the mean using Chebyshev’s theorem.
For any number k greater than 1, at least (1  1
k
2
) of the
data values lie within k standard deviations of the mean.
At least (1-1/k2) of the
values lie in the shaded
areas.
  k
k

  k
k
Chapter 3
80
MEASURE OF DISPERSION
Example
The average systolic blood pressure for 4000 women who were
screened for high blood pressure was found to be 187 with a
standard deviation of 22. Using Chebyshev’s theorem, find at
least what percentage of women in this group have a systolic
blood pressure between 143 and 231.
Chapter 3
81
MEASURE OF DISPERSION
Solution:
  187 and   22
To find the percentage of blood pressure between 143 and 231
143 - 187 = -44
143
231 - 187 = 44
  187
231
k is obtained by dividing the distance between the mean by standard de
44
k
2
22
1
1
1  2  1  2  1  0.25  0.75
k
(2)
Chapter 3
82
MEASURE OF DISPERSION
At least 75% of the women have systolic blood pressure between
143 and 231
At least 75% of the women
have systolic blood pressure
between 143 and 231.
143
  2
187

231
  2
Chapter 3
83
MEASURE OF DISPERSION
5. Empirical Rule
- The empirical rule applies only to a specific type of distribution ca
a bell-shaped distribution also known as normal curve.
• 68% of the observations lie within one standard deviation of the
mean
• 95% of the observations lie within two standard deviation of the
mean
• 99.7% of the observations lie within three standard deviation of
the mean
99.7%
95%
68%
  3   2
 

     2   3
Chapter 3
84
MEASURE OF DISPERSION
Example 1
The age distribution of a sample of 5000 person is bellshaped with a mean of 40 years and a standard deviation of
12 years. Determine the approximate percentage of people
who are 16 to 64 years old.
Chapter 3
85
MEASURE OF DISPERSION
Solution:
x  40 and s  12
To find the percentage of age between 16 and 64
16 - 40 = -24
16
64 - 40 = 24
x  40
64
Dividing the distance,24 by the standard deviation,12 we have the
distance is equal 2s
24
2
12
Chapter 3
86
MEASURE OF DISPERSION
16 - 40 = -24
= -2s
16
x  2s
64 - 40 = 24
= 2s
x  40
64
x  2s
Because the area within two standard deviations of the mean is
approximately 95% for a bell-shaped curve, approximately 95%
of the people in the sample are 16 to 64 years old.
Chapter 3
87
MEASURE OF DISPERSION
Example 2
Assuming the incomes for all single parent household last year
produces a bell shaped distribution with mean RM23,500 and
standard deviation of RM4,500. Determine the range of
income if it is distributed for
68%
=
(RM19,000,RM28,000)
95%
=
(RM14,500,RM32,500)
99.7%
=
(RM10,000,RM37,000)
Chapter 3
88
MEASURE OF DISPERSION
6. Coefficient of variation
standard deviation (s)
×100%
x
• The coefficient of variation represents the ratio of the
standard deviation to the mean, and it is a useful
statistic for comparing the degree of variation from one
data series to another, even if the means are drastically
different from each other.
• Investopedia explains Coefficient Of Variation - CV
In the investing world, the coefficient of variation allows
you to determine how much volatility (risk) you
are assuming in comparison to the amount of return you
can expect from your investment. In simple language, the
lower the ratio of standard deviation to mean return, the
better your risk-return tradeoff.
Chapter 3
89
MEASURE OF DISPERSION
Comparing coefficient of variation
the higher the coefficient
of variation, the more
dispersed are the data
Chapter 3
90
MEASURE OF DISPERSION
Example 2
New Car
Used Car
Mean = RM20,100
Mean = RM5,485
Standard deviation = RM6,125
Standard dev.= RM2,730
Chapter 3
91
MEASURE OF DISPERSION
7. Quartile Deviation
a) Set of data
b) Simple frequency distribution
- Quartiles are defined as
value which are quarter
the data
- Q1
- first quartile
- value below 25% of
observations
- Q2
- second quartile
- half of the data(median)
- Q3
- third quartile
- value below 75% of
Quartile Deviation =
Q3 - Q1
2
Inter-quartile range = Q3 -Q1
Q1 
 n  1
4
3  n  1
Q3 
4
c) Grouped frequency
(i) Graphical method
(ii) Interpolation method
observation
Chapter 3
92
MEASURE OF DISPERSION
(i) Graphical Method
F
n
3n/4
n/4
x
Q1
Q3
Chapter 3
93
MEASURE OF DISPERSION
(ii) Interpolation Method
n

 4 - F Q1-1 
Q1 = LQ + 
CQ

1
1
fQ


1


Where:
LQ1 = the lower boundary of the class containing Q1.
n
= the total frequencies
FQ1-1 = the cumulative number of frequency in the classes
immediately preceding the class containing Q1.
fQ1 = the frequency in the class containing Q1.
CQ1 = the width of the class in which Q1 lies.
Chapter 3
94
MEASURE OF DISPERSION
 3n

- FQ
4

3-1
Q3 = L Q + 
CQ

3
3
fQ


3


Where:
LQ3 = the lower boundary of the class containing Q3.
n = the total frequencies.
FQ3-1 = the cumulative number of frequency in the classes
immediately preceding the class containing Q3.
fQ3 = the frequency in the class containing Q3.
CQ3 = the width of the class in which Q3 lies.
Chapter 3
95
MEASURE OF DISPERSION
Example
a. Find the quartile deviation of the following data:
Class
Frequency
0 - 9.9
10 - 19.9
20 - 29.9
30 - 39.9
40 - 49.9
50 - 59.9
60 - 69.9
5
19
38
43
34
17
4
Chapter 3
96
MEASURE OF DISPERSION
Class
f
F
0 - 9.9
10 - 19.9
20 - 29.9
30 - 39.9
40 - 49.9
50 - 59.9
60 - 69.9
5
19
38
43
34
17
4
5
24
62
105
139
156
160
n

- FQ

1-1  C
Q1 = LQ +  4
 Q1
1
fQ


1


n
Q =
1 4
3n
Q =
3 4
 3n

- FQ
4

3-1
Q3 = L Q + 
 C Q3
3
 f Q3 


 160

 4  24 
19.95  
 10
38




 3(160)


105


 39.95   4
 10
34




 24.16
 44.36
Chapter 3
97
MEASURE OF DISPERSION
Therefore the quartile deviation is,
Q3 - Q1
2
44.36  24.16

2
 10.1
Quartile Deviation =
Chapter 3
98
MEASURE OF SKEWNESS
•Skewness is the degree of asymmetry
•Method to describe data distribution
•Data which are not symmetrical may be either positively or
negatively skewed.
negative skewness
positive skewness
Chapter 3
99
MEASURE OF SKEWNESS
Mean
Mode
Median
Mode
Median
Mean
Symmetric Histogram
Positive Skewed Histogram
Median
Mean
Mode
Chapter 3
Negative Skewed Histogram
100
MEASURE OF SKEWNESS
Example
a.
What type of distribution is described by the following
information? Mean = 56
Median = 58.1
Mode = 63
Answer : Negatively skewed
b. 1
2
3
4
1
3
1
0
1
4
1
0
2 2 3 3 4 5 6 7
4 5 6 6
2 2 2 3
1
Based on the stem-and-leaf plots above, find the
i)
median,
ii) mode,
iii) mean and
iv) describe the shape of the distribution.
Answer :
i) 24 ii) 32
iii) 23.76 iv) Negative skewed distribution
Chapter 3
101
MEASURE OF SKEWNESS
c.
Class
Frequency
0 - 100
100 - 200
200 - 300
300 - 400
400 - 500
5
19
38
43
34
Based on the distribution table
i) construct a histogram, and
ii) describe the shape of the distribution.
Chapter 3
102
MEASURE OF SKEWNESS
Curve A
Chapter 3
103
MEASURE OF SKEWNESS
Curve A
Curve B
Chapter 3
104
MEASURE OF SKEWNESS
Curve C
Curve A
Curve B
Chapter 3
105
MEASURE OF SKEWNESS
Curve A:
Chapter 3
106
MEASURE OF SKEWNESS
Curve A:
Curve B:
Chapter 3
107
MEASURE OF SKEWNESS
Curve A:
Positively Skewed
Chapter 3
108
MEASURE OF SKEWNESS
Curve A:
Positively Skewed
Curve B:
Negatively Skewed
Chapter 3
109
BOX-AND-WHISKER PLOT
A plot that show the center, spread and skewness of a data
set. It is constructed by drawing a box and two whiskers that
use the median,the first quartile, the third quartile and the
smallest and the largest values in the data set between the
lower and the upper inner fences.
Minimum
Q1
Q2
Q3
Maximum
Chapter 3
110
BOX-AND-WHISKER PLOT
Example
The following data are the incomes (in thousands of dollars) for
a sample of 12 households.
35
29
44
72
34
64
41
50
104
39
58
Construct a box-and-whisker plot for these data.
Chapter 3
54
111
BOX-AND-WHISKER PLOT
Solution:
Step 1: Rank the data
29
58
34
64
Q3
35
72
39
104
Q1
41
44
50
54
median
44  50
median 
 47
2
35  39
Q1 
 37
2
58  64
Q3 
 61
2
IQR(Q3  Q1 )  61  37
Chapter 3
112
Step 2:
Determine the lower and upper inner fences
1.5  IQR  1.5  24  36
Lower inner fence  Q1  36  37  36  1
Upper inner fence  Q3  36  61  36  97
Step 3:
Determine the smallest and the largest values in the
data set within the two inner fences
Smallest value = 29
Largest value = 72
Step 4:
Draw
median
First quartile
25
35
45
Third quartile
55
65
75
85
95
105
Chapter 3
113
: called whiskers
Step 5:
median
First quartile
Third quartile
smallest value
within the two
inner fences
25
35
45
55 65
75
largest value
within the two
inner fences
85
an outlier
*
95 105
outlier : value that falls outside
the two inner fences (value that
are very small or very large
relative).
The data are skewed to the right
Chapter 3
114
BOX-AND-WHISKER PLOT
S<0
Negatively
Skewed
S=0
S>0
Symmetric
(Not Skewed)
Positively
Skewed
Chapter 3
115
BOX-AND-WHISKER PLOT
Left-Skewed
Q1
Q2 Q3
Symmetric
Q1Q2Q3
Right-Skewed
Q1 Q2 Q3
Chapter 3
116
BOX-AND-WHISKER PLOT






Median close to the center of the box -- symmetrical
Median close to the left of the center of the box -positive skewed
Median close to the right of the center of the box -negative skewed
Whiskers are the same length -- symmetrical
Whisker is longer than the left whisker -- positive
skewed
Whisker is longer than the right whisker -- negative
skewed
Chapter 3
117
BOX-AND-WHISKER PLOT



A bimodal distribution has two modes.
All classes occur with approximately the same
frequency in a uniform distribution.
An outlier in any graph of data is an individual
observation that falls outside the overall
pattern of the graph.
Chapter 3
118
POPULATION VERSUS SAMPLE
Measurement
Sample
Population
Mean
x

Standard
deviation
s

Variance
2

s
Chapter 3
119
POPULATION VERSUS SAMPLE
1. The following are ages of all eight employees of a small
company
53
32
61
27
39
44
49
57
Find the mean age of these employees.
POPULATION
2. The following data give the weight lost (in pounds) by a
sample of five members of a health club at the end of two
months of membership.
10
5
19
8
3
Find the median
SAMPLE
Chapter 3
120
POPULATION VERSUS SAMPLE
Example
3. Data in table below refer to the 2002 payrolls (in million of
dollars) of five MLB teams. Those data are reproduced here.
MLB Team
Anaheim Angels
Atlanta Braves
New York Yankees
St Louis Cardinals
Tampa Bay Devil Rays
2002 Total Payroll
(million of dollars)
62
93
126
75
34
Find the variance and standard deviation of these data
SAMPLE
Chapter 3
121
POPULATION VERSUS SAMPLE
Example
4. Following are the 2002 earning (in thousand of dollars)
before taxes for all six employees of a small company.
48.50 38.40 65.50 22.6
Calculate the variance and standard deviation for these data.
POPULATION
Chapter 3
122