Download Chapter 3 Part B descriptive statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Descriptive Statistics – Central
Tendency & Variability
Chapter 3 (Part 2)
MSIS 111
Prof. Nick Dedeke
Learning Objectives
Distinguish between measures of central
tendency, measures of variability, measures of
shape, and measures of association.
Compute variance, standard deviation, and
mean absolute deviation on ungrouped data.
Understand the meaning of standard
deviation as it is applied by using the
empirical rule and Chebyshev’s theorem.
Introduction of skewness, box and whisker
plots.
Measures of Central Tendency:
Ungrouped Data
Measures of central tendency yield
information about the center, or middle part,
of a group of numbers.
Common Measures of central tendency





Mode
Median
Mean
Percentiles
Quartiles
Exercise: Computing Central Tend.
Measures using Frequency Tables
We want to choose one of the two suppliers. We have
data about their lateness in delivery (data is in hours). Which
one has better statistical measures of central tendency?
Supplier 2
Supplier 1
Xi
Fi
Fi * Xi
Xi
Fi
Fi * Xi
1
4
6
2
4
3
2
16
18
0
1
4
2
0
3
0
0
12
10
12

3
2
n=14
30
24
90
6
10

5
4
n=14
30
40
82
Response: Computing Central Tend.
Measures using Frequency Tables
Which one has better statistical measures of central tendency?
Supplier 1
Supplier 2
Xi
Fi
Fi * Xi
Xi
Fi
Fi * Xi
1
4
2
4
2
16
0
1
2
0
0
0
6
10
12

3
3
2
n=14
18
30
24
90
4
6
10

3
5
4
n=14
12
30
40
82
Mode= 4 hours
Median position= 15/2 = 7.5
Median value=
6 hours
Mean = 90/14 = 6.43 hours
Mode= 6 hours
Median position= 15/2 = 7.5
Median value=
6 hours
Mean = 82/14 = 5.8 hours
Measures of Dispersion: Variability
No Variability in Cash Flow (same amounts)
Mean
Mean
Variability in Cash Flow (different amounts)
Mean
Mean
Measures of Variability:
Ungrouped Data
Measures of variability describe the spread or
the dispersion of a set of data.
Common Measures of Variability







Range
Interquartile Range
Mean Absolute Deviation
Variance
Standard Deviation
Z scores
Coefficient of Variation
Range
The difference between the largest and the
smallest values in a set of data
35
41
44
Simple to compute
Ignores all data points except
37
41
44
the two extremes
37
43
44
Example:
39
43
44
Range = Largest - Smallest
40
43
44
= 48 - 35 = 13
40
43
Weakness: Depends only on two extreme values
45
45
46
46
46
46
48
Interquartile Range
Range of values between the first and third
quartiles
Range of the middle 50% of the ordered
data set
Less influenced by extremes
Interquartile Range  Q 3  Q1
Deviation from the Mean
Data set: 5, 9, 16, 17, 18
Mean:  = 13
Deviations (Xi - ) from the mean: -8, -4,
3, 4, 5
-4
-8
0
5
10

+3
15
+4
+5
20
Mean Absolute Deviation
Average of the absolute deviations from
the mean ( = 13)
X X   X  
 X
5
9
16
17
18
-8
-4
+3
+4
+5
0
+8
+4
+3
+4
+5
24
M . A. D. 
24
5
 4.8

N
Variance and Standard Deviation
of Grouped Data
Population
Sample
 f  M   S
 
N
2
2
 

2
2

S 

f
M  X 
n1
S
2
2
Population Variance
Average of the squared deviations from the
arithmetic mean ( = 13)
X
5
9
16
17
18
X   X
-8
-4
+3
+4
+5
0


64
16
9
16
25
130
2
 X   
2

2

130

5
 26 .0
N
Population Standard Deviation
Square root of the
variance

2

 X 
2
N
130

5
 2 6 .0
 

2
 2 6 .0
 5 .1
Mathematically SD’s values for this case include +5.1 and -5.1
Computing Dispersion Measures for a Sample
Mean=  Fi *Xi
 Fi
= 1655/15
=110.33
Xi
Fi
Fi * Xi
55
60
100
125
2
1
3
5
110
60
300
625
140

4
15
560
1655
Computing Dispersion Measures for
Ungrouped Samples (Formula 1)
Mean (μ) =  Fi *Xi
 Fi
=1655/15
=110.33
Variance (s 2) =  ( Fi * (Xi- μ)2 )
(n –1)
=13573.335/(15 –1)
=969.52
Standard deviation (s) = 31.137 inches
Xi
Fi
Fi * Xi
(Xi- μ)
(Xi- μ)2
Fi * (Xi- μ)2
55
60
100
2
1
3
110
60
300
-55.33
-50.33
-10.33
3061.409
2533.109
106.709
6122.818
2533.109
320.127
125
140

5
4
15
625
560
1655
14.67
29.67
215.209
880.309
1076.045
3521.236
13573.335
Computing Dispersion Measures for Ungrouped
Samples (Formula 2)
Var (s 2) =( Fi* Xi 2 – ( Fi*Xi)2/n)
(n –1)
= 196175 – (1655 2/15)/(15 –1)
=(196175 – 182601.66)/14 =
= 969.52
Standard deviation (s) = 31.137 inches
Xi
Fi
Fi * Xi
(Xi)
55
60
100
2
1
3
110
60
300
3025
3600
10000
6050
3600
30000
125
140

5
4
15
625
560
1655
15625
19600
78125
78400
196175
2
Fi*(Xi)2
Exercise: Dispersion Measures
Var (s 2) = ( ( Fi* Xi 2 ) – (( Fi*Xi)2/n) )
(n –1)
Standard deviation (s) =
Xi
Fi
5
6
10
2
1
3
12
14

2
1
Fi * Xi
(Xi)
2
Fi*(Xi)2
Exercise: Variability Measures with Frequency Tables
Which worker is more efficient?
Worker 1: Time in hours to do work Worker 2: Time in hours to do work
Xi
Fi
Xi
Fi
5
2
5
0
6
1
6
3
10
3
10
4
12
2
12
1
14
1
14
1

n=14

n=14
Mode=
Median position=
Median value=
Mean =
Mode=
Median position=
Median value=
Mean =
In-Class Exercise
For the supplier selection problem.
Calculate the standard deviation for
supplier 1.
Put your names on the paper that you
use.
Response: Computing Central Tend.
Measures using Frequency Tables
Which one has better statistical measures of central tendency?
Supplier 1
Supplier 2
Xi
Fi
Fi * Xi
Xi
Fi
Fi * Xi
1
4
2
4
2
16
0
1
2
0
0
0
6
10
12

3
3
2
n=14
18
30
24
90
4
6
10

3
5
4
n=14
12
30
40
82
Mode= 4 hours
Median position= 15/2 = 7.5
Median value=
6 hours
Mean = 90/14 = 6.428 hours
Mode= 6 hours
Median position= 15/2 = 7.5
Median value=
6 hours
Mean = 82/14 = 5.8 hours
Exercise: Computing Standard Deviation
using Frequency Tables
Which one has better statistical measures of central tendency?
Supplier 2 (mean = 5.8 hours)
Xi
Fi
Fi * Xi
(Xi- Mean)
(Xi- Mean)2
Fi *(Xi- Mean)2
0
2
0
(0 - 5.8) = - 5.8
33.64
67.26
1
0
0
(1- 5.8) = - 4.8
23.04
92.16
4
3
12
(4 - 5.8) = - 1.8
3.24
9.72
6
5
30
(6 - 5.8) = + 0.2
0.04
0.12
10
4
40
(10 - 5.8) = + 4.2
17.64
35.28

n=14
82
77.6
204.54
Mode= 6 hours
Median position= 15/2 = 7.5 Median value=
6 hours
Mean = 82/14 = 5.8 hours
Variance (s2) = 204.54/(14-1) = 15.734 hours
Standard deviation (s) = 3.966 hours
Exercise: Computing Standard Deviation
using Frequency Tables
Which one has better statistical measures of central tendency?
Supplier 1 (mean=6.43 hrs)
Xi
Fi
Fi * Xi
(Xi- Mean)
(Xi- Mean)2 Fi *(Xi- Mean)2
1
2
2
(1- 6.43) = - 5.423
29.408
58.816
4
4
16
(4- 6.43) = - 2.43
5.905
23.62
6
3
18
(6 - 6.43) = - 0.43
0.185
0.555
10
3
30
(10 - 6.43) = + 3.57
12.745
38.235
12
2
24
(12 - 6.43) = + 5.57
31.025
62.049

n=14
90
Mode= 4 hours
Median position= 15/2 = 7.5 Median value=
6 hours
Mean = 90/14 = 6.43 hours
Variance (s2) = 183.275/(14-1) = 14.098 hours
Standard deviation (s) = 3.755 hours
183.275
Which supplier
is better? Why?
Mode
Median position
Median
Mean
Variance
Stand. deviation
Supplier 1
4 hrs
7.5
6 hours
6.43 hours
14.09 hours
3.75 hours
Supplier 2
6 hrs
7.5
6 hours
5.8 hours
15.73 hours
3.96 hours
We want to choose one of the two suppliers. We have
data about their lateness in delivery (data is in hours).
Which supplier
is better? Supplier 2 Why?
Mode
Median position
Median
Mean
Variance
Stand. deviation
Supplier 1
4 hrs
7.5
6 hours
6.43 hours
14.09 hours
3.75 hours
10.18 to 2.68
Supplier 2
6 hrs
7.5
6 hours
5.8 hours
15.73 hours
3.96 hours
9.76 t0 1.84
We want to choose one of the two suppliers. We have
data about their lateness in delivery (data is in hours).
Grouped Data Examples
Class interval
Freq
(Fi)
M
Fi * M
Fi * M2
[1 – 3) inch
16
2 inches
32 inches
64 inches
[3 – 5) inch
2
4 inches
8 inches
32 inches
[5 – 7) inch
4
6 inches
24 inches
144 inches
[7 – 9) inch
3
8 inches
24 inches
192 inches
[9 – 11) inch
9
10 inches
90 inches
900 inches
[11 – 13) inch
6
12 inches
72 inches
864 inches

40
250
2,196
Var (s 2) = Fi* Mi 2 – ( Fi*Mi)2/n
(n –1)
Standard deviation (s) = 4.03 inches
= 2196 – 1562.5 = 16.24
39
Grouped Data Exercise
Class interval
Freq
(Fi)
[1 – 4) inches
4
[4 – 8) inches
4
[8 – 12) inches
6
[12 – 16) inches
12
[16 – 20) inches
8
[20 – 24) inches
6

40
M
Fi * M
Var (s 2) =( (Fi* Mi 2) – (( Fi*Mi)2/n)) =
(n –1)
Standard deviation (s) =
Fi * M2
In-Class exercise: Grouped data
Uses of Standard Deviation
Indicator of financial risk
Quality Control


construction of quality control charts
process capability studies
Comparing populations


household incomes in two cities
employee absenteeism at two plants
Standard Deviation as an
Indicator of Financial Risk
Annualized Rate of Return
Financial
Security


A
15%
3%
B
15%
7%
Measures of Shape
Skewness
 Absence of symmetry
 Extreme values in one side of a distribution
Kurtosis
 Peakedness of a distribution
 Leptokurtic:
high and thin
 Mesokurtic:
normal shape
 Platykurtic: flat and spread out
Box and Whisker Plots
 Graphic display of a distribution
 Reveals skewness
Relationship of Mean, Median and
Mode
Relationship of Mean, Median and
Mode
Relationship of Mean, Median and
Mode
Empirical Rule
Id data are normally distributed (or
approximately normal)
Distance from
the Mean
  1
  2
  3
Percentage of Values
Falling Within Distance
68
95
99.7
Chebyshev’s Theorem
Applies to all distributions
1
P(  k  X    k )  1  2
k
for k > 1
Chebyshev’s Theorem
Applies to all distributions
Number
of
Number
Standard
of
Deviations
Standard
KDeviation
= 2= 2
K
s
KK
= 3= 3
KK
= 4= 4
Distance
Distancefrom
from
the
theMean
Mean
22
33
44
Minimum
Minimum Proportion
Proportion
of
ofValues
Values Falling
Falling
Within
WithinDistance
Distance
22= 0.75
1-1/2
1-1/2 = 0.75
22= 0.89
1-1/3
1-1/3 = 0.89
1-1/42 = 0.94
Box and Whisker Plot
Five specific values are used:





Median, Q2
First quartile, Q1
Third quartile, Q3
Minimum value in the data set
Maximum value in the data set
Inner Fences



IQR = Q3 - Q1
Lower inner fence = Q1 - 1.5 IQR
Upper inner fence = Q3 + 1.5 IQR
Outer Fences


Lower outer fence = Q1 - 3.0 IQR
Upper outer fence = Q3 + 3.0 IQR
Box and Whisker Plot
Minimum
Q1
Q2
Q3
Maximum
Exercises
Related documents