Download ECON1003: Analysis of Economic Data - Ka

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lesson 2:
Descriptive Statistics
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-1
Outline
Mean
Median
Mode
Measures of dispersion
Variance
Interpretation and uses of standard deviation
Working with mean and standard deviation
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-2
Population Parameters and Sample
Statistics



A population parameter is number calculated from all
the population measurements that describes some
aspect of the population.
 The population mean, denoted , is a population
parameter and is the average of the population
measurements.
A point estimate is a one-number estimate of the value
of a population parameter.
A sample statistic is number calculated using sample
measurements that describes some aspect of the
sample.
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-3
The Mean
Population X1, X2, …, XN

Sample x1, x2, …, xn
x
Population Mean
Sample Mean
N

Ka-fu Wong © 2004

n
Xi
i =1
N
ECON1003: Analysis of Economic Data
x
x
i
i =1
n
Lesson2-4
Population Mean
 For ungrouped data, the population mean is the sum of all
the population values divided by the total number of
population values:
μ=
∑X
N
where µ is the population mean.
N is the total number of observations.
X is a particular value.
 indicates the operation of adding.
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Sample Mean
 For ungrouped data, the sample mean is the sum of all the
sample values divided by the number of sample values:
ΣX
X=
n
Where n is the total number of values in the sample.
This sample mean is also referred as arithmetic
mean, simple mean, or simply sample average.
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
EXAMPLE
 A sample of five executives received the following
bonus last year ($000):
14.0, 15.0, 17.0, 16.0, 15.0
ΣX 14.0 + ...+15.0 77
X=
=
=
= 15.4
n
5
5
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Population and Sample Proportions
Population X1, X2, …, XN
Sample x1, x2, …, xn
p̂
p
Population Proportion
Sample Proportion
n
pˆ 
x
i
i =1
n
xi = 1 if characteristic present,
0 if not
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-8
EXAMPLE
 A sample of five executives received the following bonus
last year ($000):
7.0, 15.0, 17.0, 16.0, 15.0
 Changing the first observation from 14.0 to 7.0 will change
the sample mean.
ΣX 14.0 + ... + 15.0 77
X=
=
=
= 15.4
n
5
5
ΣX 7.0 + ... + 15.0 70
X=
=
=
= 14
n
5
5
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Weighted Mean
 The weighted mean of a set of numbers X1, X2, ..., Xn, with
corresponding weights w1, w2, ...,wn, is computed from the
following formula:
(w1 X1 + w 2 X 2 + ... + w n X n )
Xw =
(w1 + w 2 + ...wn )
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
EXAMPLE
 During a one hour period on a hot Saturday afternoon
cabana boy Chris served fifty drinks. He sold five drinks
for $0.50, fifteen for $0.75, fifteen for $0.90, and fifteen
for $1.10. Compute the weighted mean of the price of the
drinks sold.
5($0.50) + 15($0.75) + 15($0.90) + 15($1.15)
Xw =
5 + 15 + 15 + 15
$44.50
=
= $0.89
50
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
The Median
 The Median is the midpoint of the values after they have
been ordered from the smallest to the largest.
 There are as many values above the median as below it in
the data array.
 For an even set of values, the median will be the
arithmetic average of the two middle numbers.
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
EXAMPLE
 The ages for a sample of five college students are:
21, 25, 19, 20, 22
 Arranging the data in ascending order gives:
19, 20, 21, 22, 25. Thus the median is 21.
 The heights of four basketball players, in inches, are:
76, 73, 80, 75
 Arranging the data in ascending order gives:
73, 75, 76, 80. Thus the median is 75.5
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
The Mode
 The mode is the value of the observation that appears
most frequently.
 EXAMPLE: The exam scores for ten students are:
81, 93, 84, 75, 68, 87, 81, 75, 81, 87.
Because the score of 81 occurs the most often, it is the
mode.
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Properties of Mean and Median
Property
Mean
Median
Mode
Uniqueness
Yes
Yes
No
Effect of extreme values
Strong
Small
Maybe
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-15
Measures of dispersion
1.
2.
3.
4.
Range
Mean Deviation
Variance and standard deviation
Coefficient of variation
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-16
Range
 The range is the difference between the largest and the
smallest value.
 Only two values are used in its calculation.
 It is influenced by an extreme value.
 It is easy to compute and understand.
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-17
Mean Deviation
 The Mean Deviation is the arithmetic mean of the absolute
values of the deviations from the arithmetic mean.
 All values are used in the calculation.
 It is not influenced too much by large or small values.
 The absolute values are difficult to manipulate.
Σ X-X
MD =
n
Mean deviation is also known as Mean Absolute Deviation (MAD).
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-18
EXAMPLE: Range and Mean Deviation
 The weights of a sample of crates containing books for the
bookstore (in pounds ) are:
103, 97, 101, 106, 103
Find the range and the mean deviation.
Range = 106 – 97 = 9
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
EXAMPLE: Range and Mean Deviation
The first step is to find the mean weight.
ΣX 510
X= =
= 102
n
5
The mean deviation is:
ΣX-X
103 - 102 + ... + 103 - 102
MD =
=
n
5
1 + 5 +1 + 4 + 5
=
= 2.4
5
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-20
Population Variance
 The population variance is the arithmetic mean of the
squared deviations from the population mean.
 All values are used in the calculation.
 More likely to be influenced by extreme values than mean
deviation.
 The units are awkward, the square of the original units.
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
The Variance
Population X1, X2, …, XN
Sample x1, x2, …, xn
s2
2
Population Variance
N
 
2

Sample Variance
n
(X i -  ) 2
i=1
N
s2 =

(x i - x ) 2
i =1
n -1
Note in the sample variance formula the sum of deviation is
divided by (n-1) instead of n in order to yield an unbiased
estimator of the population variance.
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-22
EXAMPLE: Population variance
 The ages of the Dunn family are:
2, 18, 34, 42
What is the population variance?
ΣX
96
μ=
=
= 24
n
4
2
2
2
(
)
(
)
Σ(X
μ)
2
24
+
...
+
42
24
σ2 =
=
N
4
944
=
= 236
4
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
EXAMPLE: Population Standard Deviation
 The population standard deviation (σ) is the square root of
the population variance.
 In the last example, the population variance is 236. Hence,
the population standard deviation is 15.36, found by
σ=
Ka-fu Wong © 2004
2
σ =
236 = 15.36
ECON1003: Analysis of Economic Data
EXAMPLE: Sample variance
The hourly wages earned by a sample of five students
are:
$7, $5, $11, $8, $6.
Find the variance.
ΣX 37
X=
=
= 7.40
n
5
s2 =
2
(
)
Σ X-X
=
(7 - 7.4)2 + ... + (6 - 7.4)2
n -1
21.2
=
= 5.30
5 -1
Ka-fu Wong © 2004
5 -1
ECON1003: Analysis of Economic Data
EXAMPLE: Sample Standard Deviation
 The sample standard deviation is the square root of the
sample variance.
 In the last example, the sample variance is 5.29. Hence,
the sample standard deviation is 2.30
s=
Ka-fu Wong © 2004
s2 =
5.29 = 2.30
ECON1003: Analysis of Economic Data
Sample Variance For Grouped Data
 The formula for the sample variance for grouped data is:
Σf(x - x )
s =
Σf - 1
2
2
Σfx - 2xΣfx  Σf x

n -1
2
2
Σfx - 2nx  nx

n -1
2
Σfx - nx

n -1
2
Ka-fu Wong © 2004
2
Σf(x - 2xx  x )

n -1
2
2
2
2
ECON1003: Analysis of Economic Data
EXAMPLE: Sample Variance For Grouped
Data
 During a one hour period on a hot Saturday afternoon
cabana boy Chris served fifty drinks. He sold five drinks for
$0.50, fifteen for $0.75, fifteen for $0.90, and fifteen for
$1.10. Compute the variance of the price of the drinks.
Σf(x - x )2
s =
Σf - 1
5(0.5  0.89)2  15(0.75  0.89)2  15(0.90  0.89)2  15(1.15  0.89)2

(5  15  15  15) - 1
2.07

 0.042
50 - 1
2
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-28
Interpretation and Uses of the Standard
Deviation
 Chebyshev’s theorem: For any set of observations, the
minimum proportion of the values that lie within k
standard deviations of the mean is at least:
1
1- 2
k
where k is any constant greater than 1.
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Chebyshev’s theorem
Chebyshev’s theorem: For any set of observations, the
minimum proportion of the values that lie within k
standard deviations of the mean is at least 1- 1/k2
Ka-fu Wong © 2004
K
Coverage
1
0%
2
75.00%
3
88.89%
4
93.75%
5
96.00%
6
97.22%
ECON1003: Analysis of Economic Data
Lesson2-30
Interpretation and Uses of the
Standard Deviation
 Empirical Rule: For any symmetrical, bell-shaped
distribution:
 About 68% of the observations will lie within 1s the
mean,
 About 95% of the observations will lie within 2s of the
mean
 Virtually all the observations will be within 3s of the
mean
Empirical rule is also known as normal rule.
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Bell-shaped Curve showing the relationship between σ and μ

Ka-fu Wong © 2004
3
2 1

1 2 
ECON1003: Analysis of Economic Data
3
Why are we concern about dispersion?
 Dispersion is used as a measure of risk.
 Consider two assets of the same expected (mean) returns.
 -2%, 0%,+2%
 -4%, 0%,+4%
 The dispersion of returns of the second asset is larger then
the first. Thus, the second asset is more risky.
 Thus, the knowledge of dispersion is essential for
investment decision. And so is the knowledge of expected
(mean) returns.
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-33
Relative Dispersion
 The coefficient of variation is the ratio of the standard
deviation to the arithmetic mean, expressed as a
percentage:
s
CV = (100%)
X
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Sharpe Ratio and Relative Dispersion
 Sharpe Ratio is often used to measure the performance of
investment strategies, with an adjustment for risk.
 If X is the return of an investment strategy in excess of the
market portfolio, the inverse of the CV is the Sharpe Ratio.
 An investment strategy of a higher Sharpe Ratio is
preferred.
http://www.stanford.edu/~wfsharpe/art/sr/sr.htm
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-35
Skewness
 Skewness is the measurement of the lack of symmetry
of the distribution.
 The coefficient of skewness can range from 3.00 up to
3.00.
 A value of 0 indicates a symmetric distribution.
 It is computed as follows:
3(x - median)
sk =
S
Or
Ka-fu Wong © 2004
3

xx 
n
 
 
sk =
(n - 1)(n - 2)   s  


ECON1003: Analysis of Economic Data
Why are we concerned about skewness?
 Skewness measures the degree of asymmetry in risk.
 Upside risk
 Downside risk
 Consider the distribution of asset returns:
 Right skewed implies higher upside risk than downside
risk.
 Left skewed implies higher downside risk than upside
risk.
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-37
Symmetric Distribution
zero skewness:
mode = median = mean
Density Distribution
(the height may be
interpreted as relative
frequency)
The area under the density distribution is 1. The sum of relative frequency is 1.
Thus median always splits the density distribution into two equal areas.
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Right Skewed Distribution
Positively skewed:
(Skew to the right)
Mean and Median are to
the right of the Mode.
Mode<Median<Mean
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Left Skewed Distribution
Negatively Skewed: Mean and Median are to the
(skew to the left)
left of the Mode.
Mean<Median<Mode
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Working with mean and Standard
Deviation
Set
Data
Mean
St Dev
(1)
19
20
21
20.00
0.82
(2)
-1
0
1
0.00
0.82
(3)
19
20
20
20.00
0.71
(4)
38
40
42
40.00
1.63
(5)
57
60
63
60.00
2.45
(6)
19
19
20
20.00
0.82
(7)
3
5
8
5.33
2.05
(8)
4
7
9
6.67
2.05
(9)
7
12
17
12.00
4.08
12
20
21
35.56
18.04
(10)
Ka-fu Wong © 2004
21
20
27
21
32
21
35
45
56
ECON1003: Analysis of Economic Data
72
Lesson2-41
Working with mean and Standard
Deviation
Set
Data
Mean
St Dev
(1)
19
20
21
20.00
0.82
(2)
-1
0
1
0.00
0.82
(3)
19
20
20
20.00
0.71
(4)
38
40
42
40.00
1.63
(5)
57
60
63
60.00
2.45
21
 (2) = (1) – mean(1):
 Mean(2)=0; Stdev(2)=Stdev(1)
 (3) = (1) + mean(1)
 Mean(3)=Mean(1); Stdev(3)<Stdev(1).
 (4) = (1)*2; (5) = (1)*3
 Mean(4)=mean(1)*2; mean(5)=mean(1)*3
 Stdev(4)=stdev(1)*2; stdev(5)=stdev(1)*3
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-42
Working with mean and Standard
Deviation
Set
Data
Mean
St Dev
20.00
0.82
20.00
0.82
(1)
19
20
21
(6)
19
19
20
(7)
3
5
8
5.33
2.05
(8)
4
7
9
6.67
2.05
(9)
7
12
17
12.00
4.08
12
20
21
35.56
18.04
(10)
20
27
21
32
21
35
45
56
72
 (6)=(1) multiplied by some frequency
 Mean(6)=Mean(1); Stdev(6)=Stdev(1).
 (9) = (7)+(8)
 Mean(9)=mean(7)+mean(8)
 (10) = (7) *(8)
 Mean(10)=mean(7)*mean(8)
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-43
Lesson 2:
Descriptive Statistics
- END -
Ka-fu Wong © 2004
ECON1003: Analysis of Economic Data
Lesson2-44
Related documents