* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 4: Classical Normal Linear Regression Classical Normal
Survey
Document related concepts
Transcript
Basic Econometrics Basic Econometrics Chapter 4: Classical Normal Linear Normal Linear Regression Regression Model (CNLRM) Iris Wang [email protected] Sampling distributions Sampling distributions • We have studied the expected the expected value and and variance of the OLS estimators • In order to In order to do inference, we do inference we need to know the the full sampling distribution of the estimator • To make this To make this sampling distribution tractable, we sampling distribution tractable we now assume that the unobserved error term (u) is normally distributed in the population. is normally distributed in the population. ¾ This is often referred to as the normality assumption (Assumption 10) assumption. (Assumption Assumption 10: Normality 10: Normality • We continue to make the assumptions to make the assumptions introduced in the previous lecture (linear regression, no perfect collinearity, zero conditional mean, homoskedasticity, … mean homoskedasticity ). ) • And we add the following: • Assumption A i 10: 10 Normality N li – The population error Th l i u is independent of the explanatory variables x1, x2,…,xk, and is normally di ll distributed with zero di ib d i h mean and variance σ2: u ~ Normal(0, σ2) Recap: The normal distribution Recap: The normal distribution • The normal distribution is very widely used in statistics & econometrics (one reason is that normality simplifies probability calculations) • A normal random variable is a continuous random variable that can take on any value. • The shape of the probability density function (pdf) for the normal distribution is shown on the right the right. • The mathematical formula for the pdf is as follows: …where: Why are we assuming normality? • Answer: Answer: It implies It implies that the OLS estimator the OLS estimator follows a normal distribution too. And this makes it straightforward to do inference. • Under the CLM assumptions (1‐7), conditional on the sample values of the independent variables, The result that implies that • In words, this says that the deviation between the estimated value and the true parameter value, divided by the standard deviation of the estimator is normally by the standard deviation of the estimator, is normally distributed with mean zero and variance equal to 1. • On p.100 On p.100 • Th The assumptions ti 1 7 1—7 are called ll d the classical th l i l linear model (CLM) assumptions. • One immediate implication of the CLM assumptions is that, conditional is that, conditional on the on the explanatory variables, the dependent variable y has a normal distribution with constant has a normal distribution with constant variance, p.101. How justify the normality the normality assumption? • Central Central limit theorem limit theorem (CLT): the residual (CLT): the residual u is the sum is the sum of of many different factors; and by the CLT the sum of many random variables is normallyy distributed • This argument is not without weaknesses (e.g. doesn’t hold if u is not additive). ) • Whether normality holds in a particular application is an empirical p matter – which can be investigated g • Sometimes using a transformation – e.g. taking the log – yyields a distribution that is closer to normal. Example: CEO Salary l and Return d on Equity • Data: Data: CEOSAL1.SAV (available CEOSAL1 SAV (available on course on course website) • Salaries expressed in thousands e pressed in tho sands of USD. of USD • It would be interesting to do the sample distributions of salary in the different scales. Sample distributions of CEO salaries in l l & logs levels &l Basic Econometrics Basic Econometrics Chapter 5: Interval Estimation and Hypothesis Interval Estimation and Hypothesis Testing Iris Wang [email protected] Confidence intervals • Once we have estimated the population parameter β p p p β and obtained the associated standard error, we can easily construct a confidence interval (CI) for βj. • has a t distribution with n‐k‐1 degrees of freedom (df). • Define a 95% confidence interval for βj as where the constant t0.025 is the 97.5th percentile in the t distribution. Confidence intervals (lower limit) (upper limit) Meaning of CI: in 95 out of 100 cases intervals like Eq. above, will b ll contain the true h βJ. Confidence intervals Confidence intervals The width of the CI is proportional to the standard error of the estimator. • the larger the Se, the larger is the width of the CI. • the larger the , the greater is the uncertainty of estimating the true of the unknown parameter. How is the confidence is the confidence interval affected interval affected by an increase by an increase in the level in the level of confidence (e.g. from 95% to 99%)? Why? Don’tt forget the CLM assumptions! Don the CLM assumptions! • Estimates Estimates of the confidence of the confidence interval will interval will not be not be reliable if the CLM assumptions do not hold. Example: • Data: wage1.sav g • These data were originally obtained from the 1976 Current Population Survey in the US. • SPSS output: Coefficientsa Standardized Unstandardized Coefficients Model 1 B (Constant) educ Std. Error -,892 ,686 ,541 ,053 95,0% Confidence Interval for B Coefficients Beta t ,405 Sig. Lower Bound Upper Bound -1,300 ,194 -2,239 ,456 10,143 ,000 ,436 ,645 a. Dependent Variable: wage • Can you calculate these two CIs by yourself according to the formula? to the formula? Hypothesis Testing Testing • In Chapter In Chapter 3 we 3 we learned that Assumptions that Assumptions 1 1‐7 7 (such as, linear regression, no perfect collinearity, zero conditional enable us to obtain mathematical formulas for the expected value and variance of the OLS estimators • To test a hypothesis, we need to know the full sampling distribution of the estimator sampling distribution of the estimator mean, homoskedasticity) 1. Sampling Distribution: Illustration 1. Sampling Distribution: Illustration • Suppose we pp want to make statements about a population consisting of (say) 10 million individuals. • The model is as follows: Y = β0 + β1*x + u • Suppose we could draw (say) 100 samples from this population, where each sample consists of (say) 200 observations. Further suppose we observations. Further suppose we would estimate 100 100 different regressions (one for each sample) • This would generate 100 different estimates of our parameter of interest β1 – and they would form the distribution of our estimator. Let’ss do this! Let • Let’s Let s simulate 500 samples 500 samples consisting of 200 of 200 individuals. Our model is Y = β0 + β1*x + u, where u is normally u is normally distributed (and all other (and all other assumptions hold too). • Since we are simulating are simulating data, we data we can now choose the true parameters (this would obviously not be the case not be the case for real empirical for real empirical applications). Let’s choose β0 = 0 and β1=0. 0 2 Density 4 6 Here’s the distribution of the 500 d ff different estimates of β f β1 -.2 -.1 0 b1 Mean of b1: ‐0.005 Std dev of b1: 0.072 .1 .2 2. Why do we need to know the sampling distribution of the OLS estimator? • Recall the formula for the t statistic: • In other words, the difference between the parameter estimate and a given (unknown) value of the true parameter, scaled by the standard error of the estimator, follows a t di t ib ti t‐distribution. • This is very good news, because we know exactly what the t distribution looks like (statisticians have studied this distribution looks like (statisticians this distribution for many years). p , know exactlyy how to compute p probabilities p • In particular, we using a t distribution – and this will be very useful when testing hypothesis (more on this shortly) • Here’s the answer to the question – if we don’t know the sampling distribution of the OLS p g estimator, we can’t be sure that (beta_hat – beta)/se(beta )/ ( _hat) follows ) a t‐distribution. • In that case, this quantity could follow any distribution – in which distribution – in which case there there’ss no way no way of of doing the probability analysis that underlies the hypothesis testing. testing Testing the null the null hypothesis • In most In most applications, testing applications, testing is of central interest (j corresponds to any of the k independent variables in the model). • Since βj measures the partial effect of xj on the expected value y after controlling for other factors, th the null ll hypothesis h th i means that x th t j has no effect h ff t on the expected value of y. Example: Wage equation Example: Wage log(wage)=β0 +β1 education+u • The null h ll hypothesis h h i H0: β β1=0 means that, education h d i has no effect on hourly wage. • Is this I thi an economically i ll interesting i t ti hypothesis? h th i ? • Now let’s look at how we can carry out and interpret such a test. a test • The test statistic we use to test is called the t statistic or the t ratio of and is defined the t statistic or the t ratio of and is defined as as • As you can see, the t statistic is easy to compute: just divide yyour coefficient estimate by the standard y error. • SPSS (and most other econometrics software) will do this for you. • Since the se is always positive, the t statistic always has the same sign as the coefficient estimate. Intuition Two‐tailed Two tailed tests • Consider a null hypothesis like H0: βj=0 against a two‐ sided id d alternative like H lt ti lik H1: β βj≠0. ≠0 • In words, H1 is that xj has a ceteris paribus effect on y which could be either y, which be either positive or negative. positive or negative Now let let’ss decide on a significance on a significance level Significance level = probability of rejecting H0 when it is in fact true (i.e. a mistake). Let’s decide on a 5% significance level (the most common choice): hence, we are willing to mistakenly reject H0 when it is true 5% of the time. Two‐sided (cont’d) ( ) • To find the critical vale of t (denote by c), we first specify the significance level, say 5%. • Since the test is two the test is two‐ tailed, c is then chosen to make the area in each tail equal 2.5% 2.5% ‐ i.e. c i.e. c is the is the 97.5th percentile in the t distribution (again, with n‐ kk‐1 1 degrees degrees of freedom). of freedom). • The graph shows that, if df=26, then c=2.06. Econometric jargon: If H0: ßj=0 is rejected against a two‐sided alternative, we may say that ”xj is statistically significant at the 5% level”. Thus we conclude that the effect of xj on y is not zero. Testing against one one‐sided sided alternatives • The rule The rule for rejecting for rejecting H0 depends on: 1. The alternative hypothesis (H1) 2. The chosen significance level of the test • Let’s begin by looking at a one‐sided alternative of the form: • Let’s assume we decide to apply a 5% significance level, that is, α=5%. One‐tail One tail test • Under Under H H0 (βj=0 0 ), the t statistic has a t distribution. ), the t statistic has a t distribution. • Under H1 (βj>0), the expected value of the t statistic is positive. p • Denote the critical value by c. On p.118 • On p.118 Rejection rule: H0 is rejected in favor of H1 at the 5% significance level if We’ve seen how to obtain the t statistic. But how do we obtain c? To obtain c, we only need the significance level and the degrees of freedom (df). Example: For df = 28 and significance level 5%, c=1.701 If our If our t statistic is less than t statistic is less than 1.701, we 1 701 we do not do not reject H0 But if our t statistic is higher than 1.701, we do reject H0 A few points worth noting A few • As As the significance the significance level falls, the critical falls, the critical value increases. Why? j at (say) the 5% level, it is ( y) , • If H0 is rejected automatically rejected at the 1% level too. • What is the critical value c for o A 10% significance level with df=21? o A 1% significance level with df=120? • Confirm that, as the df gets large, the critical values for the t‐distribution get very close to the critical values l f h for the standard normal distribution. d d l di ib i Example: The wage equation ( (Data: WAGE1.SAV) ) Model: Based on the results below, test H0: β1=0 against H1: β1>0 Coefficientsa Standardized Unstandardized Coefficients Model 1 B (Constant) educ a. Dependent Variable: wage Coefficients Std. Error Beta - 892 -,892 ,686 686 ,541 ,053 t ,405 Sig. -1 300 -1,300 ,194 194 10,143 ,000 Testing other hypotheses about βj • Although H0: βj=0 is the most common hypothesis, we sometimes want to test whether βjj is equal to some other given constant. Suppose the null hypothesis is • In this case the appropriate t statistic is: • Now go back and test the hypothesis that the educ coefficient in the regression above is equal to 1 (against a two sided alternative). two‐sided alternative) Computing p p‐values values for t tests for t tests • You have seen how the researcher chooses the significance level. There’s no ”correct” significance level. • In practice, the 5% level is the most common one, but 10% is also frequently used (especially for small datasets) as is 1% (more common for large datasets) as is 1% (more common for large datasets). datasets) • Given the observed value of the t statistic, what is the smallest significance level at which at which the null the null hypothesis would be rejected? is known as the p‐value. as the p‐value • This level is known • Example: Suppose t = 1.85 and df=40. • This results in a p‐value = 0.0718. p‐values p values in SPSS in SPSS • • Correct interpretation: The p‐value is the probability of observing a t value as extreme as we did if the null hypothesis is true. ☺ Wrong interpretation (not interpretation (not uncommon): ”The p‐value is the probability that the null hypothesis is true….”. Thus, small p‐values are evidence against the null hypothesis. If the p‐ value is, say, 0.04, we might say there’ss significance at the 5% level there at the 5% level (actually at the 4% level) but not at the 1% level (or 3% or 2% level). Coefficientsa Standardize Model 1 Unstandardized d Coefficients Coefficients B (Constant) educ a. Dependent Variable: wage Std. Error -,892 ,686 ,541 ,053 Beta ,405 t Sig. -1,300 ,194 10,143 ,000 Basic Econometrics Basic Econometrics Chapter 6: Extensions of the Two‐Variable Extensions of the Two Variable Linear Regression Model Iris wang [email protected] Log‐linear Log linear regression models regression models • In In many cases relationships between many cases relationships between economic variables may be non‐linear. • However we can distinguish between However we can distinguish between functional forms that are intrinsically non‐ linear and those that can be transformed and those that can be transformed into into an equation to which we can apply ordinary least squares techniques least squares techniques. Log‐linear Log linear regression models regression models • Of Of those non those non‐linear linear equations that can be equations that can be transformed, the best known is the multiplicative power function form multiplicative power function form (sometimes called the Cobb‐Douglas functional form) which is transformed into a functional form), which is transformed into a linear format by taking logarithms. Log‐linear Log linear regression models regression models Production functions Production functions For example, suppose we have cross‐section data on firms in a particular industry with data on firms in a particular industry with observations both on the output (Q) of each firm and on the inputs of labour (L) and capital firm and on the inputs of labour (L) and capital (K). C id h f ll i f Consider the following functional form i lf Log‐linear Log linear regression models regression models Log‐linear Log linear regression models regression models Log‐linear Log linear regression models regression models The parameters α and β can be estimated directly from a regression of the variable lnQ on lnL and lnK Log‐linear Log linear regression models regression models Log‐linear Log linear regression models regression models