Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data assimilation wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Time series wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Regression toward the mean wikipedia , lookup
Choice modelling wikipedia , lookup
Regression analysis wikipedia , lookup
EC339: Lecture 7 Chapter 4-5: Analytical Solutions to OLS The Linear Regression Model Postulate: The dependent variable, Y, is a function of the explanatory variable, X, or However, the relationship is not deterministic Yi = ƒ(Xi) Value of Y is not completely determined by value of X Thus, we incorporate an error term (residual) into the model which provides a statistical relationship Yi = ƒ(Xi) + ui The Simple Linear Regression Model (SLR) Remember we are trying to predict Y for a given X. We assume a linear relationship (in the parameters (i.e., the BETAS)) Ceteris Paribus—All else held equal To account for our ERROR in prediction, we can add an error term to our prediction. If Y is a linear function of X then ERRORS are typically written as u, or epsilon representing ANYTHING ELSE that might cause the deviation between actual and predicted values We are interested in determining the intercept (0) and slope (1) Yˆ 1 X Y Yˆ u Y 1 X u SLR Uses Multivariate Expectations Univariate Distributions Multivariate Distributions Means, Variances, Standard Deviations Correlation, Covariance Marginal, Joint, and Conditional Probabilities Conditional Expectation m m f X ,Y ( y, x) j 1 j 1 f X ( x) E[Y | x] y j fY | X ( y j | x) yi Conditional Probability Density Fn. Joint Probability Density Fn. Marginal Probability Density Fn. Joint Distributions Joint Distribution Probability Density Functions Now want to consider how Y and X are distributed when considered together f X ,Y ( x, y ) P( X x, Y y ) INDEPENDENCE When outcomes of X and Y have no influence on one another, the joint probability is equal to the product of the marginal probability density function f X ,Y ( x, y ) f X ( x) fY ( y ) Think about BINOMIAL DISTRIBUTIONS, each TRIAL is INDEPENDENT and has no effect on the subsequent trial. Also, think of marginal distributions much like a histogram of a single variable. Conditional Distributions Conditional Probability Density Functions Now want to consider how Y is distributed when GIVEN a certain value for X Conditional Probability of Y occurring given X, is equal to the joint probability of X and Y, divided by the marginal probability of X occurring in the first place A joint probability is like finding the f X ,Y ( x, y) probability of a “high school fY | X ( y | x ) graduate” with an hourly wage f X ( x) between “$8 and $10” if looking at INDEPENDENCE education and wage data. If X and Y are independent then the conditional distribution shows these as marginal distributions. Just as if there is no new information. f ( x ) fY ( y ) fY | X ( y | x ) X fY ( y ) f X ( x) f X |Y ( x | y ) f X ( x ) fY ( y ) f X ( x) fY ( y ) Discrete Bivariate Distributions—Joint Probability Function For example, assume we flip a coin 3 times, recording the number of heads (H) X = number of heads on the last (3rd) flip Y = total number of heads in three flips There are 8 possible different joint outcomes S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} X takes on the values {0,1} Y takes on the values {0,1,2,3} (X = 0,Y = 0) (X = 0,Y = 1) (X = 0,Y = 2) (X = 0,Y = 3) (X = 1,Y = 0) (X = 1,Y = 1) (X = 1,Y = 2) (X = 1,Y = 3) Attaching a probability to each of the different joint outcomes gives us a discrete bivariate probability distribution or joint probability function Thus, ƒ(x,y) gives us the probability the random variables X and Y assume the joint outcome (x,y) Properties of Covariance If X and Y are discrete If X and Y are continuous Cov( X , Y ) E[ XY ] E[ X ]E[Y ] If independent If X and Y are independent then cov(X,Y) = 0 E[ XY ] E[ X ]E[Y ] Since this expectation is a "function" of X k E[ g ( X )] g ( x j ) f X ( x j ) j 1 Remember, X 2 is a "function" of X k m E[ g ( X , Y )] g ( xh , y j ) f X ,Y ( xh , y j ) h 1 j 1 Properties of Conditional Expectations m m f X ,Y ( y, x) j 1 j 1 f X ( x) E[Y | x] y j fY | X ( y j | x) yi Conditional Expectations, see Wooldridge E[WAGE | EDUC ] 1.05 .45 EDUC This is a regression of Wages (Y) on Education (X) Summing over all possible values of Y in the conditional expectation. E[WAGE|EDUC=12] is the expected value of a wage given that the years of education is 12 years, giving a value of $6.45, much like the predictions we have seen. The Linear Regression Model Ceteris Paribus—All else held equal Conditional Expectations can be linear or nonlinear—We will only examine LINEAR functions here. m m f X ,Y ( y, x) j 1 j 1 f X ( x) E[Y | x] y j fY | X ( y j | x) yi The Linear Regression Model For any given level of X many possible values of Y can exist If Y is a linear function of X then Yi = 0 + 1Xi + ui u represents the deviation between the actual value of Y and the predicted value of Y (or 0 + 1X1i) We are interested in determining the intercept (0) and slope (1) The Simple Linear Regression Model (SLR) Thus, what we are looking for is the Conditional Expectation of Y Given values of X. This is what we have called Y-hat thus far. We are trying to predict values of Y given values of X. To do this we must hold ALL OTHER FACTORS FIXED (Ceteris Paribus). E[Y | X ] Yˆ 1 X Y Yˆ u Y Yˆ u The Simple Linear Regression Model (SLR) LINEAR POPULATION REGRESSION FUNCTION We can assume that the EXPECTED VALUE of our error term is zero. If the value were NOT equal to zero, we could alter this expected value to equal zero by altering the INTERCEPT to account for this fact. E[u ] 0 This makes no statement about how X and the errors are related. IF u and X are unrelated linearly, their CORRELATION will equal zero! Correlation is not sufficient though, since they could be related NONLINEARLY… Conditional probability gives sufficient conditions as it looks at ALL values of u, given a value for X. This is zero conditional mean error. E[u | x] E[u ] 0 The Linear Regression Model: Assumptions Several assumptions must be made about the random variable error term The mean error is zero, or E(ui) = 0 Errors above and below the regression line tend to balance out Errors can arise due to Human behavior (may be unpredictable) A large number of explanatory variables are not in the model Imperfect measuring of dependent variable The Simple Linear Regression Model (SLR) Beginning with the simple linear regression, taking conditional expectations, and using our current assumptions gives us the POPULATION REGRESSION FUNCTION (Notice, no hats over the Betas, and that y, is equal to the predicted value, plus an error). y x u taking expected value, conditiona l on x E[ y | x] 0 1 x E[u | x] and using the assumption that E[u | x] 0 E[ y | x] 0 1 x The Linear Regression Model The regression model asserts that the expected value of Y is a linear function of X E(Yi) = 0 + 1X1i Known as the population regression function From a practical standpoint not all of a population’s observations are available Thus we typically estimate the slope and intercept using sample data The Simple Linear Regression Model (SLR) We can also make the following assumptions knowing that E[u|x]=0 Cov[ x, u ] E[ xu] E[ x]E[u ] Using the assumption that x and u are uncorrelat ed and E[u] 0 and y yˆ u E[ y 0 1 x] E[u ] 0 and using the assumption that E[xu] 0 E[ x( y 0 1 x)] E[ xu] 0 WE NOW HAVE TWO EQUATIONS IN TWO UNKNOWNS!! (The Beta’s are the unknowns). This is how the Method of Moments is constructed. The Linear Regression Model: Assumptions Additional assumptions are necessary to develop confidence intervals and perform hypothesis tests var(ui )i s u2 forall i Says that errors are drawn from a distribution with a constant variance (heteroskedasticity exists if this assumption fails) ui and uj are independent One observation’s error does not influence another observation’s error—errors are uncorrelated (serial correlation of errors exist if this assumption fails) Cov(ui,uj) = 0 for all i j The Linear Regression Model: Assumptions Cov(Xi,ui) = 0 for all i Error term is uncorrelated with the explanatory variable, X ui ~ N(0,s e ) 2 Error term follows a normal distribution The Linear Regression Model: Assumptions Cov(Xi,ui) = 0 for all i Error term is uncorrelated with the explanatory variable, X Error term follows a normal distribution ui ~ N (0, s ) 2 u Ordinary Least Squares-Fit n uˆ i 1 i 0, where uˆi yi ˆ0 ˆ1 xi n x uˆ i 1 i i 0 n SST ( yi y ) 2 , Total Sum of Squares i 1 n SSE ( yˆ i y ) 2 , Explained Sum of Squares i 1 n SSR (uˆi ) 2 , Sum of Squared Residuals i 1 SST SSE SSR Ordinary Least Squares-Fit SST SSE SSR n (y i 1 n i n (y i 1 i 1 i 1 n i n (y y ) 2 [( yi yˆ i ) ( yˆ i y )]2 y ) 2 [uˆi ( yˆ i y )]2 i 1 n i n n i 1 i 1 y ) 2 uˆi 2 uˆi ( yˆ i y ) ( yˆ i y ) 2 i 1 2 n showing SST SSR 2 uˆi ( yˆ i y ) SSE i 1 n where 2 uˆi ( yˆ i y ) 0 since residuals and predicted values are uncorrelat ed i 1 Ordinary Least Squares-Fit R - SQUARED GENERALIZE D In multiple regression , you cannot simply square correlatio n Interpreta tion is exactly th e same. R 2 is equal to the squared value of the correlatio n between y i and ŷ i . When prediction only depends on one independen t variable , this boils down to correlatio n between x and y SST SSE SSR SSE SSR R2 1 SST SST n R2 ( yˆ i 1 n (y i 1 i i n (uˆ ) y)2 1 y)2 i i 1 n (y i 1 i 2 y)2 Estimation (Three Ways-We will not discuss Maximum Likelihood) We need a formal method to determine the line that “fits” the data well Distance of the line from observations should be minimized Let Yi = 0 + 1X1i ^ ^ ^ The deviation of the observation from the line is the estimated error, or residual (ui) ^ ui = Yi - Yi Ordinary Least Squares Designed to minimize the magnitude of estimated residuals Selecting an estimated slope and estimated intercept that minimizes the sum of the squared errors Most popular method known as Ordinary Least Squares Ordinary Least Squares—Minimize Sum of Squared Errors Identifying the parameters (estimated slope and estimated y-intercept) that minimize the sum of the squared errors is a standard optimization problem in multivariable calculus Take first derivatives with respect to the estimated slope coefficient and estimated yintercept coefficient Set both equations equal to zero and solve the two equations Ordinary Least Squares Using a sample of data on X and Y we want to minimize the value of the squared errors, which are themselve s a function of the parameters yi 0 1 xi Q( 0 , 1 ). Estimate this function using calculus of optimizati on and chain rule giving yi ˆ0 ˆ1 xi Q(βˆ0 ,βˆ1 ). n min ˆ ˆ 0 , 1 (u ) i 1 i 2 min ˆ ˆ 0 , 1 n (y i 1 i ˆ0 ˆ1 xi ) 2 , where n Q(βˆ0 ,βˆ1 ) 2 ( yi ˆ0 ˆ1 xi ) 0 ˆ0 i 1 n Q(βˆ0 ,βˆ1 ) 2 xi ( yi ˆ0 ˆ1 xi ) 0 ˆ1 i 1 Which allows us to solve the system for our parameters . These are called the NORMAL EQUATIONS. Ordinary Least Squares-Derived n n n n i 1 i 1 i 1 2 2 xi ( yi ˆ0 ˆ1 xi ) 0 xi yi ˆ0 xi ˆ1 xi i 1 n x y i 1 i i n n i 1 i 1 2 ˆ0 xi ˆ1 xi From the first derivative equation n n n n 1 n ˆ ˆ ˆ ˆ ˆ ˆ 2 ( yi 0 1 xi ) 0 yi n 0 1 xi 0 [ yi 1 xi ] y ˆ1 x n i 1 i 1 i 1 i 1 i 1 Substituti ng this value for 0 n n n i 1 i 1 i 1 2 xi yi ( y ˆ1 x ) xi ˆ1 xi n xi yi i 1 n n n 1 n 2 ˆ x x ˆ y x i i 1 i 1 xi n i 1 i 1 i 1 i 1 n n n n n n i 1 i 1 i 1 i 1 n n n n n i 1 i 1 i 1 i 1 i 1 n n n n n i 1 i 1 i 1 i 1 i 1 2 n xi yi xi yi ˆ1 xi xi nˆ1 xi i 1 i 1 2 n xi yi xi yi nˆ1 xi ˆ1[ xi ]2 2 n xi yi xi yi ˆ1 (n xi [ xi ]2 ) n ˆ1 n n n xi yi xi yi i 1 i 1 n i 1 n n xi [ xi ]2 i 1 2 i 1 n ( xi x ) yi i n1 in1 n ( xi x ) 2 ( xi x )2 i 1 i1 n n ( xi x ) yi Ordinary Least Squares This results in the normal equations Which suggests an estimator for the intercept. The means of X and Y are ALWAYS on the regression line. Ordinary Least Squares Which yields an estimator for the slope of the line No other estimators will result in a smaller sum of squared errors SLR Assumption 1 Linear in Parameters SLR.1 Defines POPULATION model The dependent variable y is related to the independent variable x and the error (or disturbance) u as SLR.1 and are population parameters Assumption SLR.1 y 0 1 x u SLR Assumption 2 Random Sampling Use a random sample of Assumption SLR.2 size n, {xi,yi): i=1,2,…,n} from the population model {( xi , yi ) : i 1,2,..., n} Allows redefinition of SLR.1. Want to use DATA to estimate our parameters y x u , i 1,2,..., n i 0 1 i i and are population parameters to be estimated SLR Assumption 3 Sample variation in independent variable X values must vary. The variance of X cannot equal zero Assumption SLR.3 n 2 (x E [ x ]) 0 i i 1 SLR Assumption 4 Zero Conditional Mean For a random sample, implication is that NO independent variable is correlated with ANY unobservable (remember error includes unobservable data) Assumption SLR.4 E[u | x] 0 For a RANDOM Sample E[ui | xi ] 0 for all i 1,2,..., n SLR Theorem 1 Unbiasedness of OLS, estimators should equal the population value in expectation Theorem 1 E[ ˆ0 ] 0 , and E[ ˆ1 ] 1 n ˆ1 (x x) y i i 1 n (x x) i , and ˆ0 y ˆ1 x 2 i i 1 n ˆ1 ( x x )( i i 1 , and examining numerator n (x x) 2 i i 1 ˆ1 1 xi ui ) 0 n n n i 1 i 1 n i 1 ( xi x ) 0 ( xi x )1 xi ( xi x )ui This holds because x and u are assumed to be uncorrelated. ( xi x ) 2 i 1 ˆ1 n n n i 1 i 1 n i 1 0 ( xi x ) 1 ( xi x ) xi ( xi x )ui (x x) i i 1 ˆ1 n n i 1 i 1 1 ( xi x ) 2 ( xi x )ui n (x x) i 1 i Thus our estimator equals the actual value of Beta 2 2 n 1 ( x x )u i 1 n i (x x) i 1 i i 2 1 SLR Theorem 1 Unbiasedness of OLS, estimators should equal the population value in expectation Theorem 1 E[ ˆ0 ] 0 , and E[ ˆ1 ] 1 ˆ0 y ˆ1 x 0 1 x u ˆ1 x ˆ x ( ˆ ) u 0 0 1 1 E[ ˆ0 ] E[ 0 x ( 1 ˆ1 ) u ] E[ ˆ ] E[ x ( ˆ )] 0 E[ ˆ0 ] 0 0 1 1 The expected value of the residuals is zero. Thus our estimator equals the actual value of Beta SLR Assumption 5 Homoskedasticity The variance of the errors is INDEPENDENT of the values of X. Assumption SLR.5 Var (u | x) s 2 Rewriting SLR.3, SLR.4 , and SLR.5 This value implies also that E[ y | x] 0 1 x Var ( y | x) s 2 Method of Moments Seeks to equate the moments implied by a statistical model of the population distribution to the actual moments found in the sample Certain restrictions are implied in the population E(u) = 0 Cov(Xi,uj) = 0 i,j Results in the same estimators as least squares method Interpretation of the Regression Slope Coefficient The coefficient, 1, tells us the effect X has on Y Increasing X by one unit will change the mean value of Y by 1 units Units of Measurement and Regression Coefficients Magnitude of regression coefficients depends upon the units in which the dependent and explanatory variables are measured For example, using cents versus dollars will result in smaller coefficients Changing both the Y and X variables by the same amount will not affect the slope although it will impact the y-intercept Models Including Logarithms For a log-linear model the slope represents the proportionate (like percentage change) change in Y arising from a unit change in X For a log-log model the slope represents the proportionate change in Y arising from a proportionate change in X The coefficients in your regression result in the SEMI-elasticity of Y with respect to X The coefficients in your regression results in the elasticity of Y with respect to X. This is the CONSTANT ELASTICITY MODEL. For a linear-log model the slope is the unit change in Y arising from a proportionate change in X Regression in Excel Step 1: Reorganize data so that variables are right next to one another in columns Step 2: Data AnalysisRegression Regression in Excel-Ex. 2.11 Regression in Excel Regression in Excel Regression in Excel Your Estimated Equation is as follows log( salary ) 6.5055 .0097ceoten T-statistics show that the coefficient on ceoten is insignificant at the 5% level. The p-value for ceoten is 0.128368 which is greater than .05, meaning that you could see this value about 13% of the time. You are Inherently testing the null hypothesis that all coefficients are equal to ZERO. YOU FAIL TO REJECT THE NULL HYPOTHESIS HERE ON BETA-1. Regression in Excel X Variable 1 Line Fit Plot 10 9 8 Y 7 6 Y 5 Predicted Y 4 Linear (Y) y = 0.0097x + 6.5055 R2 = 0.0132 3 2 1 0 0 5 10 15 20 X Variable 1 25 30 35 40 Regression in Excel X Variable 1 Line Fit Plot 10 9 8 Y 7 6 Y 5 Predicted Y 4 Linear (Y) y = 0.0097x + 6.5055 R2 = 0.0132 3 2 1 0 0 5 10 15 20 X Variable 1 25 30 35 40