Download Asian School of Business PG Programme in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Asian School of Business
PG Programme in Management (2005-06)
Course: Quantitative Methods in Management I
Instructor: Chandan Mukherjee
Session 2: Summarising a Distribution
Modern/EDA
Terminology
Classical
Terminology
Cluster, Level, Centre
Central Tendency, Location
Scatter, Spread
Dispersion
Shape
Skewness
Tails
Kurtosis
Numerical Summaries (Descriptive Statistics)
Feature
Mean based summary
Order based summary
Level
Arithmatic Mean
Median
Spread
Standard Deviation
Midspread
The order based summaries are resistant to extreme
values i.e. that are not unduly influenced by a small part of
the data. That is why they are called Resistant
Summaries.
Numerical Summaries (Descriptive Statistics)
• Both the mean based and the order based summaries
of the spread of a distribution (Standard Deviation and
Midspread) are scale dependent.
• That is why we need to neutralise the scale effect by
dividing by their respective summaries of the Center.
• Coefficient of Variation = Standard Deviation / Mean
• Relative Midspread = Midspread / Median
Definitions
Variance = Average Squared Distance from the Mean
1 n
2
(
X

X
)
 i
n 1
=
Serial
No.
1
2
3
4
5
Total
Average
Data
(X)
6
7
8
9
20
50
10
X–Mean
-4
-3
-2
-1
10
0
where
1 n
X  1 X i
n
(X–Mean)2
16
9
4
1
100
130
26 = Variance
DEFINITION (contd.)
Standard Deviation (SD ) = Square root of Variance
 26
 5.099
Co-efficient Variation = SD/Mean = 0.0517 (or, 5.17%)
DEFINITION (contd.)
Median = The value that divides the ordered data values
into two equal halves
To compute Median:
• Sort the data in ascending order
• Divide the number of observations (data values) by 2
• If the result is an integer, say 9, then median is the
average of the 9th and 10th observations
• If the result is not an integer, say 9.3, then round it up to
the next integer above i.e. 10 in this case. The 10th
observation is the median
DEFINITION (contd.)
Example: Finding the Median
0.07
1.99
3.25
6.93
8.57
9.15
13.95
33.97
34.46
34.52
36.33
39.15
39.67
40.40
40.55
40.67
44.87
46.08
48.43
50.11
51.02
51.97
54.16
55.07
57.99
60.25
63.36
67.95
70.93
82.20
94.43
99.41
102.51
113.70
114.09
119.17
121.25
126.28
126.54
128.40
133.02
141.80
150.11
156.63
162.75
193.56
200.19
220.06
282.59
302.75
Average of 26th and 27th observations = 61.80
405.21
445.63
DEFINITION (contd.)
Quartiles = The three values that divide the ordered observations
into four equal parts
25% of the observations lie below the First (Lower) Quartile
50% of the observations lie below the Second (Middle) Quartile
75% of the observations lie below the Third (Upper) Quartile
The Third or the Middle Quartile is obviously the Median
DEFINITION (contd.)
To compute the Lower and the Upper Quartile:
• Sort the data in ascending order
• Divide the total number of observations by 4
• If the result is an integer, say 12, then take the average
of the 12th and the 13th observations from the lowest
observation (downward) as the Lower Quartile. Similarly,
take the 12th and the 13th observations from the highest
observation (upward) as the Upper Quartile
• If the result is not an integer, say, 12.7, then round it up
to the next integer above, i.e. 13 in this case. The Lower
Quartile is the 13th observations from the lowest, and the
Upper quartile is the 13th observation from the highest
DEFINITION (contd.)
Example: Finding the Lower & Upper Quatiles
0.07
1.99
3.25
6.93
8.57
9.15
13.95
33.97
34.46
34.52
36.33
39.15
39.67
40.40
40.55
40.67
44.87
46.08
48.43
50.11
51.02
51.97
54.16
55.07
57.99
60.25
63.36
67.95
70.93
82.20
94.43
99.41
102.51
113.70
114.09
119.17
121.25
126.28
126.54
128.40
133.02
141.80
150.11
156.63
162.75
193.56
200.19
220.06
282.59
302.75
405.21
445.63
Lower Quartile
Upper quartile
(39.67 + 40.40)/2 = 40.04
(126.54 + 128.40)/2 = 127.47
DEFINITION (contd.)
Midspread = Upper Quartile – Lower Quartile
The range that holds the middle 50% of the observations
Relative Midspread = Midspread / Median
= (127.47 – 40.04)/61.80
= 1.41
Five Number Summary
Five numbers can comprehensively summarise the
features of a distribution without being unduly
affected by a small part of the data
Minimum (MN)
Lower Quartile (LQ)
Median (MD)
Upper Quartile (UQ)
Maximum (MX)
Five Number Summary is Comprehensive:
The Grand Summary of a Distribution
Lower Tail
Upper Tail
Indentifying the Extreme Values:
Are The Outliers?
Here is a thumb rule (based on theory):
Step = 1.5 times Midspread
Lower Fence = Lower Quartile – Step
Upper Fence = Upper Quartile + Step
All observations below the Lower Fence are Negative
Outliers
All observations above the Upper Fence are Positive
Outliers
Who
Cotton Blended Yearn Companies:
Five Number Summary & Fences
MN
0.07
LQ
40.04
MD
61.80
UQ
127.47
MX
445.63
Midspread
87.43
Step
131.14
Lower Fence
-91.10
Upper Fence
258.61
Box & Whisker Plot
Gross Fixed Asset
445.63
MX
Outliers!
Upper Fence
UQ
.07
Lower Fence
MN
Cotton & Blended Yarn Companies
MD
LQ
Comparing Two Distributions of Gross Fixed Asset:
Fabrics and Yarn Companies
GF Asset (Crores)
2000
1500
1000
500
0
Fabrics
Yarn
Summary Of The Points
Related documents