Download Chapter 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Descriptive Statistics – Central
Tendency & Variability
Chapter 3 (Part 2)
MSIS 111
Prof. Nick Dedeke
Learning Objectives
Distinguish between measures of central
tendency, measures of variability, measures of
shape, and measures of association.
Compute variance, standard deviation, and
mean absolute deviation on ungrouped data.
Differentiate between sample and population
variance and standard deviation.
Learning Objectives -- Continued
Understand the meaning of standard
deviation as it is applied by using the
empirical rule and Chebyshev’s theorem.
Compute the mean, mode, standard
deviation, and variance on grouped data.
Understand skewness, kurtosis, and box and
whisker plots.
Measures of Central Tendency:
Ungrouped Data
Measures of central tendency yield
information about the center, or middle part,
of a group of numbers.
Common Measures of central tendency





Mode
Median
Mean
Percentiles
Quartiles
Mode
The most frequently occurring value in a data
set
Applicable to all levels of data measurement
(nominal, ordinal, interval, and ratio)
Bimodal -- Data sets that have two modes
Multimodal -- Data sets that contain more
than two modes
Mode -- Example
The mode is 44.
44 is the most frequently
occurring data value.
35
41
44
45
37
41
44
46
37
43
44
46
39
43
44
46
40
43
44
46
40
43
45
48
Median
Middle value in an ordered array of
numbers
Applicable for ordinal, interval, and ratio
data
Not applicable for nominal data
Unaffected by extremely large and
extremely small values
Median: Computational
Procedure
First Procedure



Arrange the observations in an ordered array.
If there is an odd number of terms, the median is
the middle term of the ordered array.
If there is an even number of terms, the median
is the average of the middle two terms.
Second Procedure

The median’s position in an ordered array is given
by (n+1)/2.
Median: Example
with an Odd Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22
There are 17 terms in the ordered array.
Position of median = (n+1)/2 = (17+1)/2 = 9
The median is the 9th term, which is 15.
If the 22 is replaced by 100, the median is
15.
If the 3 is replaced by -103, the median is 15.
Median: Example
with an Even Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21
• There are 16 terms in the ordered array.
• Position of median = (n+1)/2 = (16+1)/2 = 8.5
• The median is between the 8th and 9th terms,
14.5.
NOTE
• If the 21 is replaced by 100, the median is 14.5.
• If the 3 is replaced by -88, the median is 14.5.
Arithmetic Mean
Commonly called ‘the mean’
Is the average of a group of numbers
Applicable for interval and ratio data
Not applicable for nominal or ordinal data
Affected by each value in the data set,
including extreme values
Computed by summing all values in the data
set and dividing the sum by the number of
values in the data set
Population Mean
Data for total population:
57, 57, 86, 86, 42, 42, 43, 56, 57, 42, 42, 43
X



X
1

X
2

X
3
 ... 
X
N
N
N
57  57  86  86  42  42  43  56  57  42  42  43

12
653

12
 54.4167
Mean for a Sample of 3
X

X 

X
1
n
57  86  42

3
185

3
 61.667

X
2

X
n
3
 ... 
X
n
Example: Computing Central Tend.
Measures using Frequency Tables
Mean=  Fi *Xi
 Fi
= 1655/15
=110.33
Xi
Fi
Fi * Xi
55
60
100
125
2
1
3
5
110
60
300
625
140

4
15
560
1655
Mode= 125
Median position =
= (15+1)/2 = 8th
Median value = 125
Exercise: Computing Central Tend.
Measures using Frequency Tables
Mean=  Fi *Xi
 Fi
=
=
Xi
Fi
1
10
4
6
2
3
4
3
12

2
n=14
Fi * Xi
Mode=
Median position =
=
Median value =
Exercise: Central Tendency Measures for
Grouped Data
Class interval
Frequency
(Fi)
Midpoints
(Mi)
[1 – 3) inches
16
2
[3 – 5) inches
2
4
[5 – 7) inches
4
6
[7 – 9) inches
3
8
[9 – 11) inches
9
10
[11 – 13) inches
6
12

40
40
Modal class:
Median position:
Median class:
Example: Central Tendency Measures for
Grouped Data
Class interval
Frequency
(Fi)
Midpoint
(Mi)
(Fi)*(Mi)
[1 – 3) inches
16
2
32
[3 – 5) inches
2
4
8
[5 – 7) inches
4
6
24
[7 – 9) inches
3
8
24
[9 – 11) inches
9
10
90
[11 – 13) inches
6
12
72

40
40
226
Find the mean for the distribution:
Mean: = (Σ Fi*Mi)/n = 226/40 = 5.65 inches
Exercise: Central Tendency Measures for
Grouped Data
Class interval
Frequency
(Fi)
[1 – 2) inches
2
[2 – 3) inches
2
[3 – 4) inches
4
[4 – 5) inches
2
[5 – 6) inches
1
Midpoint
(Mi)

Find the mean for the distribution:
Mean: = (Σ Fi*Mi)/n = inches
(Fi)*(Mi)
Exercise: Computing Central Tend.
Measures using Frequency Tables
We want to choose one of the two suppliers. We have
data about their lateness in delivery (data is in hours). Which
one has better statistical measures of central tendency?
Supplier 2
Supplier 1
Xi
Fi
Fi * Xi
Xi
Fi
Fi * Xi
1
4
6
2
4
3
2
8
18
0
1
4
2
0
3
0
0
12
10
12

3
2
n=14
30
24
82
6
10

5
4
n=14
30
40
82
Measures of Dispersion: Variability
No Variability in Cash Flow (same amounts)
Mean
Mean
Variability in Cash Flow (different amounts)
Mean
Mean
Measures of Variability:
Ungrouped Data
Measures of variability describe the spread or
the dispersion of a set of data.
Common Measures of Variability







Range
Interquartile Range
Mean Absolute Deviation
Variance
Standard Deviation
Z scores
Coefficient of Variation
Range
The difference between the largest and the
smallest values in a set of data
35
41
44
Simple to compute
Ignores all data points except
37
41
44
the two extremes
37
43
44
Example:
39
43
44
Range = Largest - Smallest
40
43
44
= 48 - 35 = 13
40
43
45
45
46
46
46
46
48
Interquartile Range
Range of values between the first and third
quartiles
Range of the middle 50% of the ordered
data set
Less influenced by extremes
Interquartile Range  Q 3  Q1
Deviation from the Mean
Data set: 5, 9, 16, 17, 18
Mean:  = 13
Deviations (Xi - ) from the mean: -8, -4,
3, 4, 5
-4
-8
0
5
10

+3
15
+4
+5
20
Mean Absolute Deviation
Average of the absolute deviations from
the mean
X X   X  
 X
5
9
16
17
18
-8
-4
+3
+4
+5
0
+8
+4
+3
+4
+5
24
M . A. D. 
24
5
 4.8

N
Population Variance
Average of the squared deviations from the
arithmetic mean
X
5
9
16
17
18
X   (X
-8
-4
+3
+4
+5
0

)
64
16
9
16
25
130
2
 (X   )
2
s
2

130

5
 26 .0
N
Population Standard Deviation
Square root of the
variance
s
2

 (X )
N
130

5
 2 6 .0
s 
s
2
 2 6 .0
 5 .1
2
Sample Variance
Average of the squared deviations from the
arithmetic mean
X
2,398
1,844
1,539
1,311
7,092
X  X (X
625
71
-234
-462
0
 X
)
390,625
5,041
54,756
213,444
663,866
2
S
2

(X  X
n1
6 6 3 ,8 6 6

3
 2 2 1 , 2 8 8 .6 7
)
2
Sample Standard Deviation
Square root of the
sample variance
S
2

(X  X
n1
6 6 3 ,8 6 6

3
 2 2 1 , 2 8 8 .6 7
S 
S
2
 2 2 1 , 2 8 8 .6 7
 4 7 0 .4 1
)
2
Uses of Standard Deviation
Indicator of financial risk
Quality Control


construction of quality control charts
process capability studies
Comparing populations


household incomes in two cities
employee absenteeism at two plants
Exercise: Computing Standard Deviation
using Frequency Tables
Which one has better statistical measures of central tendency?
Supplier 2 (mean = 5.8hours)
Xi
Fi
Fi * Xi
0
2
0
1
4
0
4
3
12
6
3
30
10
2
40

n=14
82
Exercise: Computing Standard Deviation
using Frequency Tables
Which one has better statistical measures of central tendency?
Supplier 1 (mean=5.8 hrs)
Xi
Fi
Fi * Xi
1
2
2
4
4
8
6
3
18
10
3
30
12
2
24

n=14
82
Mode= 4 hours
Median position= 15/2 = 7.5 Median value=
Mean = 82/14 = 5.8 hours
6 hours
Which supplier
is better? Why?
Standard Deviation as an
Indicator of Financial Risk
Annualized Rate of Return
Financial
Security

s
A
15%
3%
B
15%
7%
Variance and Standard Deviation
of Grouped Data
Population
Sample
 f ( M  ) S
s 
N
2
2
s 
s
2
2

S 

f
(M  X )
n1
S
2
2
Population Variance and Standard
Deviation of Grouped Data
Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80
s
2


f
f
M
fM
6
18
11
11
3
1
50
25
35
45
55
65
75
150
630
495
605
195
75
2150
(M )
N
M 
(M
-18
-8
2
12
22
32
2
7200

 144
50
)
2
f
(M
1944
1152
44
1584
1452
1024
7200
324
64
4
144
484
1024
s
)
2
s
2
144  12
Measures of Shape
Skewness


Absence of symmetry
Extreme values in one side of a distribution
Kurtosis




Peakedness of a distribution
Leptokurtic: high and thin
Mesokurtic: normal shape
Platykurtic: flat and spread out
Box and Whisker Plots


Graphic display of a distribution
Reveals skewness
Relationship of Mean, Median and
Mode
Relationship of Mean, Median and
Mode
Relationship of Mean, Median and
Mode
Empirical Rule
Data are normally distributed (or
approximately normal)
Distance from
the Mean
  1s
  2s
  3s
Percentage of Values
Falling Within Distance
68
95
99.7
Chebyshev’s Theorem
Applies to all distributions
1
P(  ks  X    ks )  1  2
k
for k > 1
Chebyshev’s Theorem
Applies to all distributions
Number of
Standard
Deviations
K=2
K=3
K=4
Distance from
the Mean
  2s
  3s
  4s
Minimum Proportion
of Values Falling
Within Distance
1-1/22 =0.75
1-1/32 = 0.89
1-1/42 = 0.94
Box and Whisker Plot
Five specific values are used:





Median, Q2
First quartile, Q1
Third quartile, Q3
Minimum value in the data set
Maximum value in the data set
Inner Fences



IQR = Q3 - Q1
Lower inner fence = Q1 - 1.5 IQR
Upper inner fence = Q3 + 1.5 IQR
Outer Fences


Lower outer fence = Q1 - 3.0 IQR
Upper outer fence = Q3 + 3.0 IQR
Box and Whisker Plot
Minimum
Q1
Q2
Q3
Maximum
Exercises
Related documents