Download Topic 1. Estimation and Hypothesis Testing - Studies2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Transcript
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
Topic 1. Estimation and Hypothesis Testing
Laurent E. Calvet
HEC Paris
Fall 2014
1 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
Statistical methods in business and finance
Definition
Statistics is the science of collecting, organizing, analyzing, and
interpreting data to assist in making more effective decisions.
Why study statistical methods?
In the good old days: investors and CEOs relied on their gut
to make critical decisions...
Today: stakes are too high and the competition is too fierce
to rely on your gut.
Trend: toward data-based decision-making in a variety of
fields: management, economics, medicine, law, sports...
2 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
Financial data
Security prices
- An impressive number of assets are available to investors
around the world: bonds, stocks, mutual funds, exchange
traded funds, hedge funds, options, futures, swaps, swaptions,
collateralized debt obligations...
- Since the advent of high-frequency trading, the time between
two consecutive trades on some securities is of the order of a
microsecond (= 10−6 second).
- High-frequency traders, mutual fund managers, quantitative
hedge funds, derivative traders and long-term investors use
security price data to design trading strategies.
See Michael Lewis, Flash Boys: A Wall Street Revolt (2014), and
Scott Patterson, The Quants (2011).
3 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
Financial data (cont.)
Corporate finance
- Financial statements
- Corporate announcements
- Analyst reports
Household finance
- In many countries, surveys on household finances and
brokerage data are available.
- In Nordic countries, administrative datasets now provide
extensive information on the finances of every resident.
- See, e.g., Calvet Campbell and Sodini (2007) and Calvet and
Sodini (2014a, 2014b). Available at:
http://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=75695
4 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
Wanted: More financial data!
The financial crisis has been blamed on the lack of data
available to policymakers and regulators.
The Dodd-Frank Wall Street Reform and Consumer
Protection Act (signed into law by President Obama in July
2010) established the Office of Financial Research within
the Treasury Department.
Its mission: improve the quality of financial data available to
policymakers and researchers.
http://www.treasury.gov/initiatives/wsr/ofr/Pages/default.aspx
The hope is that more data will help mitigate systemic
risk.
5 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
Wanted: Better financial models!
The financial crisis has also been blamed on poor pricing
and risk management models, which do not accurately
reflect the statistical properties of the data.
Example: The Formula that Killed Wall Street, Wired
Magazine, February 2009:
http://www.wired.com/techbiz/it/magazine/17-03/wp_quant?currentPage=all
A new generation of models is currently under development.
One example is the Markov-switching multifractal (Calvet
and Fisher 2004, 2008, 2012):
http://en.wikipedia.org/wiki/Markov_switching_multifractal
Used by financial institutions such as the Bank of England to
assess market risk.
6 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
Objectives
This lecture is a brief review of basic statistical concepts.
7 / 35
1
Probability theory
2
Estimation
3
Hypothesis testing
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
1. Probability theory
Basic definitions
An experiment is the process of observing the outcome of a
chance event.
The sample space, denoted S, is the set of all possible
outcomes.
Example
Consider the experiment of tossing a coin.
Outcomes are heads and tails.
The sample space is S = {H, T } .
8 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
Random variable
Definition
A random variable X is a function from the sample space S into
the real line.
Random variables are usually denoted by uppercase letters
(e.g. X ).
Lowercase x represents a realization of X .
9 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
Probability distribution
The distribution of a discrete random variable X is
characterized by the probability mass function (pmf),
P(X = x) = f (x) .
The distribution of a continuous random variable is
represented by the probability density function (pdf)
f : R → R+ , which satisfies:
P(a ≤ X ≤ b) =
10 / 35
Laurent E. Calvet HEC Paris
Z
b
f (x)dx.
a
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
2.1 Point estimation
2.2 Interval estimation
2.3 Confidence intervals for the mean
2. Estimation and confidence intervals
Goal: suppose that observations x1 , . . . , xn are independent
realizations of fθ .
Question
How can we estimate θ?
Example
Using the sample we would like to estimate the mean µ and
standard deviation σ of a normal distribution.
11 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
2.1 Point estimation
2.2 Interval estimation
2.3 Confidence intervals for the mean
Statistic
Let X1 , . . . , Xn denote independent and identically distributed
(i.i.d.) random variables, Xi ∼ fθ for all i .
Definition
A statistic is any function, possibly vector valued, of the random
sample X1 , . . . , Xn .
Example
P
X = n1 ni=1 Xi is a statistic.
Remark: a statistic is a random variable/vector.
12 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
2.1 Point estimation
2.2 Interval estimation
2.3 Confidence intervals for the mean
Estimator
Definitions
A (point) estimator of θ is a function of the random variables
X1 , . . . , Xn ∼ i.i.d fθ .
A (point) estimate is the realized value of the estimator given the
observations x1 , . . . , xn ).
We usually denote by θ̂ an estimator of the parameter θ.
θ̂ is a statistic and a random variable (vector).
Question
How well does θ̂ estimate the parameter θ?
13 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
2.1 Point estimation
2.2 Interval estimation
2.3 Confidence intervals for the mean
Bias
Definition
The bias of an estimator θ̂ is the difference between the expected
value of θ̂ and the target parameter θ:
bias(θ̂, θ) = E(θ̂) − θ .
θ̂ is said to be an unbiased estimator of θ if
E(θ̂) = θ .
14 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
2.1 Point estimation
2.2 Interval estimation
2.3 Confidence intervals for the mean
Sample mean and sample variance
Proposition
Consider X1 , . . . , Xn ∼ i.i.d. f (x) such that E(X1 ) = µ and
Var (X1 ) = σ 2 . Then,
1
2
3
15 / 35
X is an unbiased estimator E(X ) = µ;
Var (X ) = σ 2 /n;
1 Pn
2
2
σ̂ 2 = n−1
i =1 (Xi − X ) is an unbiased estimator of σ .
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
2.1 Point estimation
2.2 Interval estimation
2.3 Confidence intervals for the mean
Example
The sample measure is unlikely to match exactly the population
parameter.
Theorem
Consider X1 ∼ N (µ1 , σ22 ) and X2 ∼ N (µ2 , σ22 ). If X1 and X2 are
independent, then for any a, b, c ∈ R,
aX1 + bX2 + c ∼ N (aµ1 + bµ2 + c, a2 σ12 + b 2 σ22 ) .
Corollary
Consider X1 , . . . , Xn ∼ i.i.d. N (µ, σ 2 ). Then,
X ∼ N (µ, σ 2 /n)
16 / 35
and
Laurent E. Calvet HEC Paris
X −µ
√ ∼ N (0, 1) .
σ/ n
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
2.1 Point estimation
2.2 Interval estimation
2.3 Confidence intervals for the mean
2.2 Interval estimation
To simplify notation we denote the random sample by
X = (X1 , . . . , Xn )
and the set of realizations by
x = (x1 , . . . , xn ) .
Definition
An interval estimate of θ is a pair of functions L(x) and U(x)
such that L(x) ≤ U(x) . The random interval
[L(X), U(X)]
is called an interval estimator.
17 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
2.1 Point estimation
2.2 Interval estimation
2.3 Confidence intervals for the mean
Confidence level and confidence interval
Definitions
The probability that the interval estimator [L(X), U(X)] contains
the true parameter θ is called the confidence level.
If the confidence level of the interval estimator is 1 − α, then the
interval estimate [L(x), U(x)] is called a (1 − α) confidence
interval for θ. It is denoted by
CI (θ, 1 − α) .
18 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
2.1 Point estimation
2.2 Interval estimation
2.3 Confidence intervals for the mean
Illustrative example of point and interval estimates
In the US, on all new cars, a fuel economy estimate is displayed on
the window sticker as required by the Environmental Protection
Agency (EPA):
19 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
2.1 Point estimation
2.2 Interval estimation
2.3 Confidence intervals for the mean
2.3 Confidence intervals for the mean
Consider the realizations x1 , . . . , xn of a random sample
X1 , . . . , Xn ∼ i.i.d. N (µ, σ 2 ).
Case 1: Known σ
√
Recall that (X − µ)/(σ/ n) ∼ N (0, 1) .
For a given α ∈ [0, 1], we know that:
X −µ
√ < zα/2 = 1 − α ,
P −zα/2 <
σ/ n
where zα/2 is the (1 − α/2)th -quantile of N (0, 1).
20 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
2.1 Point estimation
2.2 Interval estimation
2.3 Confidence intervals for the mean
Case 1: Known σ (cont.)
Confidence interval for µ
An (1 − α)-confidence interval for µ is
h
σ
σ i
CI (µ, 1 − α) = x − zα/2 √ , x + zα/2 √
n
n
where
√σ
n
is often called standard error of the mean.
h
σ i
CI (µ, 95%) = x ± 1.96 √ ,
n
h
σ i
CI (µ, 99%) = x ± 2.576 √ .
n
21 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
2.1 Point estimation
2.2 Interval estimation
2.3 Confidence intervals for the mean
Case 2: Unknown σ
Can we replace σ with the sample standard deviation?
√
The answer is yes, but (X − µ)/(σ̂/ n) is not exactly normal.
Theorem
Let X1 , . . . , Xn ∼ i.i.d. P
N (µ, σ 2 ) be a random sample. Consider
1
the estimator σ̂ 2 = n−1 ni=1 (Xi − X )2 of σ 2 . Then,
X −µ
√ ∼ tn−1 ,
σ̂/ n
where tn−1 denotes the Student’s t distribution with (n − 1)
degrees of freedom.
22 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
2.1 Point estimation
2.2 Interval estimation
2.3 Confidence intervals for the mean
Case 2: Unknown σ (cont.)
The probability density function of a Student t is known (no need
to learn it by heart).
Definition: Student t (William Gosset, 1908, Biometrika)
A random variable X has a Student t distribution with k degrees
of freedom if
)
Γ( k+1 ) x 2 −( k+1
2
f (x) = √ 2 k 1 +
, x ∈ R,
k
kπΓ( 2 )
R∞
where Γ(y ) = 0 t y −1 e −t dt.
23 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
2.1 Point estimation
2.2 Interval estimation
2.3 Confidence intervals for the mean
Case 2: Unknown σ (cont.)
For large k, the Student t distribution gets very close to N (0, 1).
24 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
2.1 Point estimation
2.2 Interval estimation
2.3 Confidence intervals for the mean
Case 2: Unknown σ (cont.)
Confidence interval for µ
An (1 − α)-confidence interval for µ is
h
s i
CI (µ, 1 − α) = x ± tn−1,α/2 √ ,
n
where s is the sample standard deviation and tn−1,α/2 is the
(1 − α/2)th -quantile of the tn−1 distribution.
Excel function: TDIST
25 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
2.1 Point estimation
2.2 Interval estimation
2.3 Confidence intervals for the mean
Application: Expected return on equity
Question
You have been asked by your company’s CFO to compute the
expected return µ on the company’s stock (also known as cost of
equity). You have downloaded the yearly returns on the company’s
stock over the past 9 years. You have computed that the sample
mean return is 15% and that the sample standard deviation is
45%.
Compute a 95% confidence interval for µ.
What do you conclude?
26 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
3.1 Five-step procedure for testing a hypothesis
3.2 Testing for a population mean
3.3 The p-value in hypothesis testing
3. Testing statistical hypotheses
Goal: from a sample of observations we would like to answer
questions concerning characteristics of the population.
Definition
A hypothesis is a statement about a population parameter subject
to verification.
Question
How can we verify/determine whether a hypothesis is reasonable?
27 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
3.1 Five-step procedure for testing a hypothesis
3.2 Testing for a population mean
3.3 The p-value in hypothesis testing
3.1 Five-step procedure for testing a hypothesis
Step A: State the null and alternative hypotheses
We select two complementary hypotheses, called the null
hypothesis H0 and the alternative hypothesis Ha .
H0 is a statement about the value of a population parameter
that is initially assumed to be true.
Ha is a claim that is contradictory to H0 .
Example:
H0 : µ = µ0 .
If Ha states a direction (e.g. Ha : µ > µ0 or Ha : µ < µ0 ), the
test is called one-tailed. If no direction is specified
(Ha : µ 6= µ0 ), the test is two-tailed.
28 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
3.1 Five-step procedure for testing a hypothesis
3.2 Testing for a population mean
3.3 The p-value in hypothesis testing
Step B: Select a significance level
A test can produce two types of errors.
Type I error: Rejecting H0 when H0 is true.
Type II error: Accepting H0 when H0 is false.
Definition
The probability of making a type I error is denoted by α and is
called the significance level of the test.
We must decide on α. Traditionally, we choose α = 0.05 in
finance.
The probability of a type II error is denoted by β. We call 1 − β is
called the power of the test.
29 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
3.1 Five-step procedure for testing a hypothesis
3.2 Testing for a population mean
3.3 The p-value in hypothesis testing
Step C: Select the test statistic
Definition
A test statistic is a statistic used to determine whether to reject
the null hypothesis.
30 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
3.1 Five-step procedure for testing a hypothesis
3.2 Testing for a population mean
3.3 The p-value in hypothesis testing
Step D: Formulate the decision rule
We determine a region of rejection delimited by critical values.
Definition
A critical value is a dividing point between the region where H0 is
rejected (called rejection region) and the region where it is not
rejected.
31 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
3.1 Five-step procedure for testing a hypothesis
3.2 Testing for a population mean
3.3 The p-value in hypothesis testing
Step E: Make a decision
Calculate the observed value of the test statistic using data.
Decision based on critical values:
Is the observed value in
the rejection region?
Yes
Reject H0
ց
Do not reject H0
ր
No
32 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
3.1 Five-step procedure for testing a hypothesis
3.2 Testing for a population mean
3.3 The p-value in hypothesis testing
3.2 Testing for a population mean
Consider data from the random sample X1 , . . . , Xn ∼ N (µ, σ 2 ).
The null hypothesis is H0 : µ = µ0 .
The alternative hypothesis is Ha : µ 6= µ0 .
Known σ : We use the Z test statistic:
Z =
H
X −µ
√ 0 ∼0
σ/ n
N (0, 1)
Unknown σ : We use the T test statistic:
T =
33 / 35
Laurent E. Calvet HEC Paris
H
X −µ
√ 0 ∼0
σ̂/ n
tn−1
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
3.1 Five-step procedure for testing a hypothesis
3.2 Testing for a population mean
3.3 The p-value in hypothesis testing
Limitation of the critical value approach
We can reach the same conclusion for very different
observed values of the test statistic!
Example: In the case of a two-tailed test with critical value 1.96:
we reject H0 for z = 2.03 as well as for z = 5.6;
we accept H0 for z = 0.27 as well as for z = 1.93.
Question
How confident are we in rejecting the null hypothesis?
34 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing
1. Probability theory
2. Estimation and confidence intervals
3. Testing statistical hypotheses
3.1 Five-step procedure for testing a hypothesis
3.2 Testing for a population mean
3.3 The p-value in hypothesis testing
3.3 The p-value in hypothesis testing
Additional information is usually reported on the strength of the
rejection or acceptance.
Definition
The p-value is the probability, calculated assuming that H0 is true,
of obtaining a test statistic value at least as contradictory to H0 as
the value actually obtained.
Small p-values give evidence that Ha is true.
35 / 35
Laurent E. Calvet HEC Paris
Topic 1. Estimation and Hypothesis Testing