Download 5. STANDARD DEVIATIONS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
5. STANDARD DEVIATIONS
The standard deviation is the most widely used measure of
dispersion (fluctuation, risk, volatility).
Eg 1: Consider the monthly returns of Intel and Hershey Foods,
over a period of 10 years.
Intel and Hershey Returns
0.4
Intel
Hershey
monthly returns
0.3
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
Intel
Hershey
company
The boxplots indicate that Intel has a higher mean return, but
is much more variable than Hershey.
Which stock is a better investment?
To answer this, we need a more refined measure of risk. The
IQR is not adequate here, since it ignores the most extreme
50% of the returns.
These are the months of the largest profits and losses!
The standard deviation uses all available data, and therefore
provides a more sensitive measure of volatility.
For a population x1 , ··· , xN we define the population standard
deviation
N
1
σ = N ∑ ( xi − μ) 2
i =1
• σ = Root Mean Squared deviation of the population values from
their mean μ.
Explanation:
1) For each data point xi compute the deviation xi − μ
2) Square the deviations to make them nonnegative
3) Average the squared deviations
4) Take the square root to retain the original units of measurement
• The standard deviation is a very important concept in Statistics
since it is the basic tool for summarizing the amount of
randomness in a situation.
The population variance is the square of the population standard
deviation:
N
2
1
σ = N ∑ ( xi − μ) 2
i =1
• Despite its name, the variance does not measure fluctuation; the
S.D. does.
To see why, consider:
1) If we double all values, what happens to the amount of
“fluctuation” or “volatility”? To the S.D.? To the variance?
2) If the data are prices in dollars, what are the units of μ? Of σ?
Of σ2?
• Variance is nevertheless sometimes used in Finance for
measuring risk.
Given a sample x1, ··· , xn from the population, we ask: How
much do the data values fluctuate from their mean, x ?
Measure the dispersion of a sample by the sample standard
deviation:
n
s = n1−1 ∑ ( xi − x ) 2
i =1
The square of the sample standard deviation,
n
s 2 = n1−1 ∑ ( xi − x ) 2
i =1
is called the sample variance.
Again, s2 does not measure dispersion; s does.
• Reason for using n − 1: To ensure that s2 neither
underestimates nor overestimates σ2, on the average.
• Aside from the n − 1, there is a strong similarity between
the definitions for s and σ.
Eg 2: Your firm spends $20 Million per year on advertising,
and top management is wondering if that figure is
appropriate. Is your expenditure in line with those of your
five competitors (similar in size to you), which average $21
million per year?
Preliminary Solution: We need more information. Suppose
their expenditures (in millions of dollars) are 20, 18, 22, 22,
23. The average is x = 1055 = 21 as claimed. The sample
variance is
s 2 = 14 [(20 − 21) 2 + (18 − 21) 2 + (22 − 21) 2 + (22 − 21) 2 + (23 − 21) 2 ]
= 14 [1 + 9 + 1 + 1 + 4] = 16
4
=4
.
The sample standard deviation is s = 4 = 2 . ($2 Million).
How does this help us put an expenditure of $20 million into
context? We’ll return to this later.
Risk-Adjusted Returns
and Portfolio Analysis
In Finance, the risk-adjusted return for an investment is defined as
the ratio of the mean to the standard deviation. This can be useful
for selecting a portfolio.
In Example 1, we considered the monthly returns for Intel Corp and
Hershey Foods. Let’s also consider a portfolio consisting of an equal
dollar amount of the stocks.
Intel, Hershey, and Portfolio Returns
0.4
Intel
Hershey
Portfolio
0.3
Returns
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
Intel
Hershey
Portfolio
Type
The investments performed as follows:
Mean Return, x
Standard Deviation, s
Risk-Adjusted Return x s ,
Intel
Hershey
Portfolio
0.0169
0.1349
0.1253
0.0089
0.0735
0.1211
0.0129
0.0826
0.1562
Intel had a higher mean return (1.7% per month) than Hershey
(0.9% per month), but Intel was much more volatile. In terms of
risk-adjusted return, the two stocks are roughly equivalent, and
both are inferior to the portfolio.
Note that the portfolio provides a mean return which is the
average of that of the two stocks, but achieves (through
diversification) a standard deviation close to that of the less risky
stock, Hershey.
The main point of diversification is to reduce risk, and here
the portfolio achieved this goal.
Z-Scores
Suppose x is a value, or “score”, which we wish to compare to a
data set with mean x (or μ) and standard deviation s (or σ).
x may or may not be a part of the data set.
x−μ
Define the z-score for x as z = x − x or z =
s
σ
• The z-score tells us how many standard deviations x is from the
mean.
• The further z is from zero, the more “atypical” x is, relative to
the given data set. In fact, the “empirical rule” states that for
roughly bell-shaped distributions:
about 68% of the data values will have z-scores between ±1,
about 95% between ±2, and
about 99.7% (i.e., almost all) between ±3.
Thus, for example, roughly 95% of the data should lie within two
standard deviations of the mean, that is, in the interval
( x − 2 s, x + 2 s )
The empirical rule is only exactly correct for normal distributions
(using known population means and variances), but it often
provides a useful approximation, even for non-normal
distributions.
It is worth memorizing the empirical rule.
Eg 2, contd.: Our firm spent $20 Million on Advertising. In
relation to the data set of 5 firms, having x = 21 , s = 2, our
z-score is
z = x − x = 20 − 21 = −0.5 .
s
2
So our expenditures are only one-half of a standard deviation
below the mean, and are therefore quite typical.
Eg: Minitab gives the sample mean and sample standard
deviation for the 1987 baseball salaries as
x = $1,447,690 and s = $1,865,460
(Minitab is treating the data set as a sample, although it's really
a population!)
The salary for Sammy Sosa was $8,000,000, which is z = 3.512
standard deviations above the mean.
Eg: Consider the daily returns on the S&P 500, from July 2, 1962
to Dec 29, 1995 (n = 8431).
Minitab’s descriptive statistics gives the following output.
Descriptive Statistics
Variable
returns
N
8431
Mean
0.00032
Variable
returns
SE Mean
0.00009
Minimum
-0.20467
Median
0.00036
TrMean
0.00031
Maximum
0.09099
StDev
0.00859
Q1
-0.00394
Q3
0.00457
Using more advanced features of Minitab, we find that the
proportion of all observations within 1,2,3 standard deviations of
the mean is 0.771, 0.954, and 0.989. So far, the data seems to fit
the Empirical Rule fairly well.
The daily return on the S&P 500 for the day of the October 19,
1987 stock market crash was −0.20467. The corresponding zscore for that day was −23.864. This is an amazingly large zscore, as we will discuss later.
Variability in Practice
Eg: The natural variability in temperatures.
The New York Times, January 3, 1991
Overadjustment Increases Variation: Deming’s “Funnel” Experiment.
(In-Class Demo).
Color Distribution of M&M's Varies from Bag to Bag.
[M&M Lab].