Download Chapter 3

Document related concepts
no text concepts found
Transcript
Business Statistics, 5th ed.
by Ken Black
Chapter 3
Discrete Distributions
Descriptive
Statistics
PowerPoint presentations prepared by Lloyd Jaisingh,
Morehead State University
Learning Objectives
• Distinguish between measures of central
tendency, measures of variability, measures
of shape, and measures of association.
• Understand the meanings of mean, median,
mode, quartile, percentile, and range.
• Compute mean, median, mode, percentile,
quartile, range, variance, standard deviation,
and mean absolute deviation on ungrouped
data.
• Differentiate between sample and
population variance and standard deviation.
Learning Objectives -- Continued
• Understand the meaning of standard
deviation as it is applied by using the
empirical rule and Chebyshev’s theorem.
• Compute the mean, mode, standard
deviation, and variance on grouped data.
• Understand skewness, kurtosis, and box and
whisker plots.
• Compute a coefficient of correlation and
interpret it.
Measures of Central Tendency:
Ungrouped Data
• Measures of central tendency yield
information about the center, or middle part,
of a group of numbers.
• Common Measures of central tendency
– Mode
– Median
– Mean
– Percentiles
– Quartiles
Mode
• The most frequently occurring value in a
data set
• Applicable to all levels of data
measurement (nominal, ordinal, interval,
and ratio)
• Bimodal -- Data sets that have two modes
• Multimodal -- Data sets that contain more
than two modes
Mode -- Example
• The mode is 44.
• 44 is the most frequently
occurring data value.
35
41
44
45
37
41
44
46
37
43
44
46
39
43
44
46
40
43
44
46
40
43
45
48
Median
• Middle value in an ordered array of
numbers
• Applicable for ordinal, interval, and ratio
data
• Not applicable for nominal data
• Unaffected by extremely large and
extremely small values
Median: Computational Procedure
• First Procedure
– Arrange the observations in an ordered array.
– If there is an odd number of terms, the median
is the middle term of the ordered array.
– If there is an even number of terms, the median
is the average of the middle two terms.
• Second Procedure
– The median’s position in an ordered array is
given by (n+1)/2.
Median: Example
with an Odd Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22
•
•
•
•
There are 17 terms in the ordered array.
Position of median = (n+1)/2 = (17+1)/2 = 9
The median is the 9th term, which is 15.
If the 22 is replaced by 100, the median is
15.
• If the 3 is replaced by -103, the median is
15.
Median: Example
with an Even Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21
• There are 16 terms in the ordered array.
• Position of median = (n+1)/2 = (16+1)/2 = 8.5
• The median is between the 8th and 9th terms,
14.5.
• If the 21 is replaced by 100, the median is
14.5.
• If the 3 is replaced by -88, the median is 14.5.
Arithmetic Mean
•
•
•
•
•
Commonly called ‘the mean’
Is the average of a group of numbers
Applicable for interval and ratio data
Not applicable for nominal or ordinal data
Affected by each value in the data set,
including extreme values
• Computed by summing all values in the
data set and dividing the sum by the number
of values in the data set
Population Mean
X X  X  X ...  X



1
2
N
N
24  13  19  26  11

5
93

5
 18. 6
3
N
Sample Mean
X X  X  X ...  X

X

1
2
3
n
n
57  86  42  38  90  66

6
379

6
 63.167
n
Percentiles
• Measures of central tendency that divide a
group of data into 100 parts
• At least n% of the data lie below the nth
percentile, and at most (100 - n)% of the data
lie above the nth percentile
• Example: 90th percentile indicates that at least
90% of the data lie below it, and at most 10%
of the data lie above it
• The median and the 50th percentile have the
same value.
• Applicable for ordinal, interval, and ratio data
• Not applicable for nominal data
Percentiles: Computational Procedure
• Organize the data into an ascending ordered
array.
Where
• Calculate the
P = percentile
P
percentile location:
i= percentile
i
100
(n)
location
n= sample size
• Determine the percentile’s location and its
value.
• If i is a whole number, the percentile is the
average of the values at the i and (i + 1)
positions.
• If i is not a whole number, the percentile is at
the whole number part of (i + 1) in the ordered
array.
Percentiles: Example
• Raw Data: 14, 12, 19, 23, 5, 13, 28, 17
• Ordered Array: 5, 12, 13, 14, 17, 19, 23, 28
• Location of
30
30th percentile:
i
(8)  2.4
100
• The location index, i, is not a whole number; i + 1
= 2.4 + 1 = 3.4; the whole number portion is 3; the
30th percentile is at the 3rd location of the array;
the 30th percentile is 13.
Quartiles
• Measures of central tendency that divide a group
of data into four subgroups
• Q1: 25% of the data set is below the first quartile
• Q2: 50% of the data set is below the second
quartile
• Q3: 75% of the data set is below the third quartile
• Q1 is equal to the 25th percentile
• Q2 is located at 50th percentile and equals the
median
• Q3 is equal to the 75th percentile
• Quartile values are not necessarily members of the
data set
Quartiles: Example
• Ordered array: 106, 109, 114, 116, 121, 122,
125, 129
109114
25
• Q1
i
(8)  2
100
Q1 
2
 1115
.
• Q 2:
50
i
(8)  4
100
116121
Q2 
 1185
.
2
• Q 3:
75
i
(8)  6
100
122125
Q3 
 1235
.
2
Variability
No Variability in Cash Flow (same amounts)
Mean
Mean
Variability in Cash Flow (different amounts)
Mean
Mean
Variability
Variability
No Variability
Measures of Variability:
Ungrouped Data
• Measures of variability describe the spread
or the dispersion of a set of data.
• Common Measures of Variability
– Range
– Interquartile Range
– Mean Absolute Deviation
– Variance
– Standard Deviation
– Z scores
– Coefficient of Variation
Range
• The difference between the largest and the
smallest values in a set of data
• Simple to compute
35
41
44
• Ignores all data points except 37 41 the44
two extremes
37
43
44
• Example:
Range
39
43 = 44
Largest - Smallest
=
40
43
44
48 - 35 = 13
40
43
45
45
46
46
46
46
48
Interquartile Range
• Range of values between the first and third
quartiles
• Range of the middle 50% of the ordered data
set
• Less influenced by extremes
Interquartile Range  Q 3  Q1
Deviation from the Mean
• Data set: 5, 9, 16, 17, 18
• Mean:  = 13
• Deviations (x - ) from the mean: -8, -4, 3, 4, 5
-4
-8
0
5
10

+3
15
+4
+5
20
Mean Absolute Deviation
• Average of the absolute deviations from the
mean
X
5
9
16
17
18
X  
X  
-8
-4
+3
+4
+5
0
+8
+4
+3
+4
+5
24
M . A. D. 

24
5
 4.8

X 
N
Population Variance
• Average of the squared deviations from the
arithmetic mean
X
5
9
16
17
18
X   X
-8
-4
+3
+4
+5
0


64
16
9
16
25
130
2
 X   
2

2

130

5
 26 .0
N
Population Standard Deviation
• Square root of the
variance
 X   
2

2

N
130

5
 2 6 .0
 


2
2 6 .0
 5 .1
Sample Variance
• Average of the squared deviations from the
arithmetic mean
X
2,398
1,844
1,539
1,311
7,092
X  X X
625
71
-234
-462
0
 X

390,625
5,041
54,756
213,444
663,866
2
S
2

 X  X
n1
6 6 3 ,8 6 6

3
 2 2 1 , 2 8 8 .6 7

2
Sample Standard Deviation
• Square root of the
sample variance
 X  X 
2
S
2

n1
6 6 3 ,8 6 6

3
 2 2 1, 2 8 8 .6 7
S 

S
2
2 2 1, 2 8 8 .6 7
 4 7 0 .4 1
Uses of Standard Deviation
• Indicator of financial risk
• Quality Control
– construction of quality control charts
– process capability studies
• Comparing populations
– household incomes in two cities
– employee absenteeism at two plants
Standard Deviation as an
Indicator of Financial Risk
Annualized Rate of Return
Financial
Security


A
15%
3%
B
15%
7%
Coefficient of Variation
• Ratio of the standard deviation to the mean,
expressed as a percentage
• Measurement of relative dispersion

C V  100

Coefficient of Variation
1  29

 2  84


4.6
1


CV 
1
1
100
1
4.6
100

29
 15.86

10
2


CV 
2
2
100
2
10
100

84
 11.90
Empirical Rule
• Data are normally distributed (or approximately
normal)
Distance from
the Mean
  1
  2
  3
Percentage of Values
Falling Within Distance
68
95
99.7
Chebyshev’s Theorem
• Applies to all distributions
1
P(  k  X    k )  1  2
k
for k > 1
Chebyshev’s Theorem
• Applies to all distributions
Number
of
Standard
Deviations
K=2
K=3
K=4
Distance from
the Mean
  2
  3
  4
Minimum Proportion
of Values Falling
Within Distance
1-1/22 = 0.75
1-1/32 = 0.89
1-1/42 = 0.94
Measures of Central Tendency
and Variability: Grouped Data
• Measures of Central Tendency
– Mean
– Median
– Mode
• Measures of Variability
– Variance
– Standard Deviation
Mean of Grouped Data
• Weighted average of class midpoints
• Class frequencies are the weights
 


 fM
 f
 fM
N
f 1M
1
 f 2M
f1 f
2
2
 f 3 M 3      f iM
 f 3      fi
i
Calculation of Grouped Mean
Class Interval Frequency Class Midpoint
20-under 30
6
25
30-under 40
18
35
40-under 50
11
45
50-under 60
11
55
60-under 70
3
65
70-under 80
1
75
50
fM 2150



50
f
 43. 0
fM
150
630
495
605
195
75
2150
Median of Grouped Data
N
 cfp
W 
Median  L  2
fmed
Where:
L  the lower limit of the median class
cfp = cumulative frequency of class preceding the median class
fmed = frequency of the median class
W = width of the median class
N = total of frequencies
Median of Grouped Data -- Example
Cumulative
Class Interval Frequency Frequency
20-under 30
6
6
30-under 40
18
24
40-under 50
11
35
50-under 60
11
46
60-under 70
3
49
70-under 80
1
50
N = 50
N
 cfp
W 
Md  L  2
fmed
50
 24
10
 40  2
11
 40.909
Mode of Grouped Data
• Midpoint of the modal class
• Modal class has the greatest frequency
30  40
Class Interval Frequency
Mode 
 35
20-under 30
6
2
30-under 40
18
40-under 50
50-under 60
60-under 70
70-under 80
11
11
3
1
Variance and Standard Deviation
of Grouped Data
Population
Sample
 f  M   S
 
N
2
2
 

2
2

S 

M  X 
2
f
n1
S
2
Population Variance and Standard
Deviation of Grouped Data
Class Interval
f
M
fM
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80
6
18
11
11
3
1
50
25
35
45
55
65
75
150
630
495
605
195
75
2150

2


M  
M 
M
-18
-8
2
12
22
32
2
f
N
7200

 144
50

2
f

1944
1152
44
1584
1452
1024
7200
324
64
4
144
484
1024

M
2

2
 144  12
Measures of Shape
• Skewness
– Absence of symmetry
– Extreme values in one side of a distribution
• Kurtosis
–
–
–
–
Peakedness of a distribution
Leptokurtic: high and thin
Mesokurtic: normal shape
Platykurtic: flat and spread out
• Box and Whisker Plots
– Graphic display of a distribution
– Reveals skewness
Symmetrical and Skewness
0.30
0.4
0.25
0.3
0.20
0.15
0.2
0.10
0.1
0.05
0.00
0
0.0
-4
-3
-2
Symmetrical
-1
0
1
2
0
0
3
2
4
6
8
10
12
Right or Positively
Skewed
12
10
8
6
4
2
0
0
0.70
0.75
0.80
0.85
0.90
0.95
1.00
Left or Negatively
Skewed
Relationship of Mean, Median and Mode
Relationship of Mean, Median and Mode
Relationship of Mean, Median and Mode
Coefficient of Skewness
• Summary measure for skewness
Sk 
3   M d 

• If Sk < 0, the distribution is negatively skewed
(skewed to the left).
• If Sk = 0, the distribution is symmetric (not
skewed).
• If Sk > 0, the distribution is positively skewed
(skewed to the right).
Coefficient of Skewness

1
M
d1

S
1
1
 23

 26
M
d2
 12.3


2
3 1 

M

d1
1
3 23  26

12.3
 0.73

S
2
2
 26

 26
M
d3
 12.3


3
3 2 

M
d2
2
3 26  26

12.3
0


S
3
3
 29
 26
 12.3


3 3 

M

d3
3
3 29  26

12.3
 0.73
Types of Kurtosis
Platykurtic Distribution
Leptokurtic Distribution
Mesokurtic Distribution
Box and Whisker Plot
• Five specific values are used:
–
–
–
–
–
Median, Q2
First quartile, Q1
Third quartile, Q3
Minimum value in the data set
Maximum value in the data set
• Inner Fences
– IQR = Q3 - Q1
– Lower inner fence = Q1 - 1.5 IQR
– Upper inner fence = Q3 + 1.5 IQR
• Outer Fences
– Lower outer fence = Q1 - 3.0 IQR
– Upper outer fence = Q3 + 3.0 IQR
Box and Whisker Plot
Minimum
Q1
Q2
Q3
Maximum
Measures of Association
• Measures of association are statistics that
yield information about the relatedness of
numerical variables.
• Correlation is a measure of the degree of
relatedness of variables.
Pearson Product-Moment
Correlation Coefficient
r


SSXY
 SSX  SSY 
  X  X Y  Y 
  X  X   Y Y 
X  Y 


 XY 
n
2




2
 X


2
X
2
n


Y
  Y 2 
n


 
2


1 r  1
Three Degrees of Correlation
r<0
r>0
r=0
Computation of r for
the Economics Example (Part 1)
Day
1
2
3
4
5
6
7
8
9
10
11
12
Summations
Interest
X
7.43
7.48
8.00
7.75
7.60
7.63
7.68
7.67
7.59
8.07
8.03
8.00
92.93
Futures
Index
Y
221
222
226
225
224
223
223
226
226
235
233
241
2,725
X2
55.205
55.950
64.000
60.063
57.760
58.217
58.982
58.829
57.608
65.125
64.481
64.000
720.220
Y2
48,841
49,284
51,076
50,625
50,176
49,729
49,729
51,076
51,076
55,225
54,289
58,081
619,207
XY
1,642.03
1,660.56
1,808.00
1,743.75
1,702.40
1,701.49
1,712.64
1,733.42
1,715.34
1,896.45
1,870.99
1,928.00
21,115.07
Computation of r
for the Economics Example (Part 2)
r


X   Y 


XY 

n



2


Y 
 X
2
2
X  n   Y  n 


 92.93 2725
21,115.07  
12
2


92
.
93
 720.22  
  619 ,207   2725
12
12








.815
2



 
2


Scatter Plot and Correlation Matrix
for the Economics Example
245
Futures Index
240
235
230
225
220
7.40
7.60
7.80
8.00
Interest
Interest
Interest
Futures Index
Futures Index
1
0.815254
1
8.20
Copyright 2008 John Wiley & Sons, Inc.
All rights reserved. Reproduction or translation
of this work beyond that permitted in section 117
of the 1976 United States Copyright Act without
express permission of the copyright owner is
unlawful. Request for further information should
be addressed to the Permissions Department, John
Wiley & Sons, Inc. The purchaser may make
back-up copies for his/her own use only and not
for distribution or resale. The Publisher assumes
no responsibility for errors, omissions, or damages
caused by the use of these programs or from the
use of the information herein.
Related documents