Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses Topic 1. Estimation and Hypothesis Testing Laurent E. Calvet HEC Paris Fall 2014 1 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses Statistical methods in business and finance Definition Statistics is the science of collecting, organizing, analyzing, and interpreting data to assist in making more effective decisions. Why study statistical methods? In the good old days: investors and CEOs relied on their gut to make critical decisions... Today: stakes are too high and the competition is too fierce to rely on your gut. Trend: toward data-based decision-making in a variety of fields: management, economics, medicine, law, sports... 2 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses Financial data Security prices - An impressive number of assets are available to investors around the world: bonds, stocks, mutual funds, exchange traded funds, hedge funds, options, futures, swaps, swaptions, collateralized debt obligations... - Since the advent of high-frequency trading, the time between two consecutive trades on some securities is of the order of a microsecond (= 10−6 second). - High-frequency traders, mutual fund managers, quantitative hedge funds, derivative traders and long-term investors use security price data to design trading strategies. See Michael Lewis, Flash Boys: A Wall Street Revolt (2014), and Scott Patterson, The Quants (2011). 3 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses Financial data (cont.) Corporate finance - Financial statements - Corporate announcements - Analyst reports Household finance - In many countries, surveys on household finances and brokerage data are available. - In Nordic countries, administrative datasets now provide extensive information on the finances of every resident. - See, e.g., Calvet Campbell and Sodini (2007) and Calvet and Sodini (2014a, 2014b). Available at: http://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=75695 4 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses Wanted: More financial data! The financial crisis has been blamed on the lack of data available to policymakers and regulators. The Dodd-Frank Wall Street Reform and Consumer Protection Act (signed into law by President Obama in July 2010) established the Office of Financial Research within the Treasury Department. Its mission: improve the quality of financial data available to policymakers and researchers. http://www.treasury.gov/initiatives/wsr/ofr/Pages/default.aspx The hope is that more data will help mitigate systemic risk. 5 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses Wanted: Better financial models! The financial crisis has also been blamed on poor pricing and risk management models, which do not accurately reflect the statistical properties of the data. Example: The Formula that Killed Wall Street, Wired Magazine, February 2009: http://www.wired.com/techbiz/it/magazine/17-03/wp_quant?currentPage=all A new generation of models is currently under development. One example is the Markov-switching multifractal (Calvet and Fisher 2004, 2008, 2012): http://en.wikipedia.org/wiki/Markov_switching_multifractal Used by financial institutions such as the Bank of England to assess market risk. 6 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses Objectives This lecture is a brief review of basic statistical concepts. 7 / 35 1 Probability theory 2 Estimation 3 Hypothesis testing Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 1. Probability theory Basic definitions An experiment is the process of observing the outcome of a chance event. The sample space, denoted S, is the set of all possible outcomes. Example Consider the experiment of tossing a coin. Outcomes are heads and tails. The sample space is S = {H, T } . 8 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses Random variable Definition A random variable X is a function from the sample space S into the real line. Random variables are usually denoted by uppercase letters (e.g. X ). Lowercase x represents a realization of X . 9 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses Probability distribution The distribution of a discrete random variable X is characterized by the probability mass function (pmf), P(X = x) = f (x) . The distribution of a continuous random variable is represented by the probability density function (pdf) f : R → R+ , which satisfies: P(a ≤ X ≤ b) = 10 / 35 Laurent E. Calvet HEC Paris Z b f (x)dx. a Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 2.1 Point estimation 2.2 Interval estimation 2.3 Confidence intervals for the mean 2. Estimation and confidence intervals Goal: suppose that observations x1 , . . . , xn are independent realizations of fθ . Question How can we estimate θ? Example Using the sample we would like to estimate the mean µ and standard deviation σ of a normal distribution. 11 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 2.1 Point estimation 2.2 Interval estimation 2.3 Confidence intervals for the mean Statistic Let X1 , . . . , Xn denote independent and identically distributed (i.i.d.) random variables, Xi ∼ fθ for all i . Definition A statistic is any function, possibly vector valued, of the random sample X1 , . . . , Xn . Example P X = n1 ni=1 Xi is a statistic. Remark: a statistic is a random variable/vector. 12 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 2.1 Point estimation 2.2 Interval estimation 2.3 Confidence intervals for the mean Estimator Definitions A (point) estimator of θ is a function of the random variables X1 , . . . , Xn ∼ i.i.d fθ . A (point) estimate is the realized value of the estimator given the observations x1 , . . . , xn ). We usually denote by θ̂ an estimator of the parameter θ. θ̂ is a statistic and a random variable (vector). Question How well does θ̂ estimate the parameter θ? 13 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 2.1 Point estimation 2.2 Interval estimation 2.3 Confidence intervals for the mean Bias Definition The bias of an estimator θ̂ is the difference between the expected value of θ̂ and the target parameter θ: bias(θ̂, θ) = E(θ̂) − θ . θ̂ is said to be an unbiased estimator of θ if E(θ̂) = θ . 14 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 2.1 Point estimation 2.2 Interval estimation 2.3 Confidence intervals for the mean Sample mean and sample variance Proposition Consider X1 , . . . , Xn ∼ i.i.d. f (x) such that E(X1 ) = µ and Var (X1 ) = σ 2 . Then, 1 2 3 15 / 35 X is an unbiased estimator E(X ) = µ; Var (X ) = σ 2 /n; 1 Pn 2 2 σ̂ 2 = n−1 i =1 (Xi − X ) is an unbiased estimator of σ . Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 2.1 Point estimation 2.2 Interval estimation 2.3 Confidence intervals for the mean Example The sample measure is unlikely to match exactly the population parameter. Theorem Consider X1 ∼ N (µ1 , σ22 ) and X2 ∼ N (µ2 , σ22 ). If X1 and X2 are independent, then for any a, b, c ∈ R, aX1 + bX2 + c ∼ N (aµ1 + bµ2 + c, a2 σ12 + b 2 σ22 ) . Corollary Consider X1 , . . . , Xn ∼ i.i.d. N (µ, σ 2 ). Then, X ∼ N (µ, σ 2 /n) 16 / 35 and Laurent E. Calvet HEC Paris X −µ √ ∼ N (0, 1) . σ/ n Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 2.1 Point estimation 2.2 Interval estimation 2.3 Confidence intervals for the mean 2.2 Interval estimation To simplify notation we denote the random sample by X = (X1 , . . . , Xn ) and the set of realizations by x = (x1 , . . . , xn ) . Definition An interval estimate of θ is a pair of functions L(x) and U(x) such that L(x) ≤ U(x) . The random interval [L(X), U(X)] is called an interval estimator. 17 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 2.1 Point estimation 2.2 Interval estimation 2.3 Confidence intervals for the mean Confidence level and confidence interval Definitions The probability that the interval estimator [L(X), U(X)] contains the true parameter θ is called the confidence level. If the confidence level of the interval estimator is 1 − α, then the interval estimate [L(x), U(x)] is called a (1 − α) confidence interval for θ. It is denoted by CI (θ, 1 − α) . 18 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 2.1 Point estimation 2.2 Interval estimation 2.3 Confidence intervals for the mean Illustrative example of point and interval estimates In the US, on all new cars, a fuel economy estimate is displayed on the window sticker as required by the Environmental Protection Agency (EPA): 19 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 2.1 Point estimation 2.2 Interval estimation 2.3 Confidence intervals for the mean 2.3 Confidence intervals for the mean Consider the realizations x1 , . . . , xn of a random sample X1 , . . . , Xn ∼ i.i.d. N (µ, σ 2 ). Case 1: Known σ √ Recall that (X − µ)/(σ/ n) ∼ N (0, 1) . For a given α ∈ [0, 1], we know that: X −µ √ < zα/2 = 1 − α , P −zα/2 < σ/ n where zα/2 is the (1 − α/2)th -quantile of N (0, 1). 20 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 2.1 Point estimation 2.2 Interval estimation 2.3 Confidence intervals for the mean Case 1: Known σ (cont.) Confidence interval for µ An (1 − α)-confidence interval for µ is h σ σ i CI (µ, 1 − α) = x − zα/2 √ , x + zα/2 √ n n where √σ n is often called standard error of the mean. h σ i CI (µ, 95%) = x ± 1.96 √ , n h σ i CI (µ, 99%) = x ± 2.576 √ . n 21 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 2.1 Point estimation 2.2 Interval estimation 2.3 Confidence intervals for the mean Case 2: Unknown σ Can we replace σ with the sample standard deviation? √ The answer is yes, but (X − µ)/(σ̂/ n) is not exactly normal. Theorem Let X1 , . . . , Xn ∼ i.i.d. P N (µ, σ 2 ) be a random sample. Consider 1 the estimator σ̂ 2 = n−1 ni=1 (Xi − X )2 of σ 2 . Then, X −µ √ ∼ tn−1 , σ̂/ n where tn−1 denotes the Student’s t distribution with (n − 1) degrees of freedom. 22 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 2.1 Point estimation 2.2 Interval estimation 2.3 Confidence intervals for the mean Case 2: Unknown σ (cont.) The probability density function of a Student t is known (no need to learn it by heart). Definition: Student t (William Gosset, 1908, Biometrika) A random variable X has a Student t distribution with k degrees of freedom if ) Γ( k+1 ) x 2 −( k+1 2 f (x) = √ 2 k 1 + , x ∈ R, k kπΓ( 2 ) R∞ where Γ(y ) = 0 t y −1 e −t dt. 23 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 2.1 Point estimation 2.2 Interval estimation 2.3 Confidence intervals for the mean Case 2: Unknown σ (cont.) For large k, the Student t distribution gets very close to N (0, 1). 24 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 2.1 Point estimation 2.2 Interval estimation 2.3 Confidence intervals for the mean Case 2: Unknown σ (cont.) Confidence interval for µ An (1 − α)-confidence interval for µ is h s i CI (µ, 1 − α) = x ± tn−1,α/2 √ , n where s is the sample standard deviation and tn−1,α/2 is the (1 − α/2)th -quantile of the tn−1 distribution. Excel function: TDIST 25 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 2.1 Point estimation 2.2 Interval estimation 2.3 Confidence intervals for the mean Application: Expected return on equity Question You have been asked by your company’s CFO to compute the expected return µ on the company’s stock (also known as cost of equity). You have downloaded the yearly returns on the company’s stock over the past 9 years. You have computed that the sample mean return is 15% and that the sample standard deviation is 45%. Compute a 95% confidence interval for µ. What do you conclude? 26 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 3.1 Five-step procedure for testing a hypothesis 3.2 Testing for a population mean 3.3 The p-value in hypothesis testing 3. Testing statistical hypotheses Goal: from a sample of observations we would like to answer questions concerning characteristics of the population. Definition A hypothesis is a statement about a population parameter subject to verification. Question How can we verify/determine whether a hypothesis is reasonable? 27 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 3.1 Five-step procedure for testing a hypothesis 3.2 Testing for a population mean 3.3 The p-value in hypothesis testing 3.1 Five-step procedure for testing a hypothesis Step A: State the null and alternative hypotheses We select two complementary hypotheses, called the null hypothesis H0 and the alternative hypothesis Ha . H0 is a statement about the value of a population parameter that is initially assumed to be true. Ha is a claim that is contradictory to H0 . Example: H0 : µ = µ0 . If Ha states a direction (e.g. Ha : µ > µ0 or Ha : µ < µ0 ), the test is called one-tailed. If no direction is specified (Ha : µ 6= µ0 ), the test is two-tailed. 28 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 3.1 Five-step procedure for testing a hypothesis 3.2 Testing for a population mean 3.3 The p-value in hypothesis testing Step B: Select a significance level A test can produce two types of errors. Type I error: Rejecting H0 when H0 is true. Type II error: Accepting H0 when H0 is false. Definition The probability of making a type I error is denoted by α and is called the significance level of the test. We must decide on α. Traditionally, we choose α = 0.05 in finance. The probability of a type II error is denoted by β. We call 1 − β is called the power of the test. 29 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 3.1 Five-step procedure for testing a hypothesis 3.2 Testing for a population mean 3.3 The p-value in hypothesis testing Step C: Select the test statistic Definition A test statistic is a statistic used to determine whether to reject the null hypothesis. 30 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 3.1 Five-step procedure for testing a hypothesis 3.2 Testing for a population mean 3.3 The p-value in hypothesis testing Step D: Formulate the decision rule We determine a region of rejection delimited by critical values. Definition A critical value is a dividing point between the region where H0 is rejected (called rejection region) and the region where it is not rejected. 31 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 3.1 Five-step procedure for testing a hypothesis 3.2 Testing for a population mean 3.3 The p-value in hypothesis testing Step E: Make a decision Calculate the observed value of the test statistic using data. Decision based on critical values: Is the observed value in the rejection region? Yes Reject H0 ց Do not reject H0 ր No 32 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 3.1 Five-step procedure for testing a hypothesis 3.2 Testing for a population mean 3.3 The p-value in hypothesis testing 3.2 Testing for a population mean Consider data from the random sample X1 , . . . , Xn ∼ N (µ, σ 2 ). The null hypothesis is H0 : µ = µ0 . The alternative hypothesis is Ha : µ 6= µ0 . Known σ : We use the Z test statistic: Z = H X −µ √ 0 ∼0 σ/ n N (0, 1) Unknown σ : We use the T test statistic: T = 33 / 35 Laurent E. Calvet HEC Paris H X −µ √ 0 ∼0 σ̂/ n tn−1 Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 3.1 Five-step procedure for testing a hypothesis 3.2 Testing for a population mean 3.3 The p-value in hypothesis testing Limitation of the critical value approach We can reach the same conclusion for very different observed values of the test statistic! Example: In the case of a two-tailed test with critical value 1.96: we reject H0 for z = 2.03 as well as for z = 5.6; we accept H0 for z = 0.27 as well as for z = 1.93. Question How confident are we in rejecting the null hypothesis? 34 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing 1. Probability theory 2. Estimation and confidence intervals 3. Testing statistical hypotheses 3.1 Five-step procedure for testing a hypothesis 3.2 Testing for a population mean 3.3 The p-value in hypothesis testing 3.3 The p-value in hypothesis testing Additional information is usually reported on the strength of the rejection or acceptance. Definition The p-value is the probability, calculated assuming that H0 is true, of obtaining a test statistic value at least as contradictory to H0 as the value actually obtained. Small p-values give evidence that Ha is true. 35 / 35 Laurent E. Calvet HEC Paris Topic 1. Estimation and Hypothesis Testing