Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data assimilation wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Bias of an estimator wikipedia , lookup
Regression toward the mean wikipedia , lookup
Time series wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Choice modelling wikipedia , lookup
Regression analysis wikipedia , lookup
MATHEMATICAL ECONOMICS IV Semester COMPLEMENTARY COURSE B Sc MATHEMATICS (2011 Admission) UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION Calicut University P.O. Malappuram, Kerala, India 673 635 422 School of Distance Education UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION STUDY MATERIAL COMPLEMENTARY COURSE B Sc Mathematics IV Semester MATHEMATICAL ECONOMICS Prepared by & Sri. Shabeer K P, Assistant Professor, Dept. of Economics, Govt College Kodanchery. Scrutinised by: Layout: Computer Section, SDE © Reserved Mathematical Economics Page 2 School of Distance Education CONTENTS PAGE No. MODULE I INTRODUCTION TO ECONOMETRICS 5 MODULE II TWO VARIABLE REGRESSION MODEL 17 MODULE III THE CLASSICAL NORMAL LINEAR REGRESSION MODEL 37 MODULE IV EXTENSION OF TWO VARIABLE REGRESSION MODEL 52 Mathematical Economics Page 3 School of Distance Education Mathematical Economics Page 4 School of Distance Education MODULE I INTRODUCTION TO ECONOMETRICS 1.1 Definition and Scope of Econometrics Literally interpreted, econometrics means economic measurement. Econometrics deals with the measurement of economic relationships. It is a science which combines economic theory with economic statistics and tries by mathematical and statistical methods to investigate the empirical support of general economic law established by economic theory. Econometrics, therefore, makes concrete certain economic laws by utilising economics, mathematics and statistics. The term econometrics is formed from two words of Greek origin, ‘oukovouia’ meaning economy and ‘uetpov’ meaning measure. Although measurement is an important part of econometrics, the scope of econometrics is much broader, as can be seen from the following quotations. In the words of Arthur S Goldberger “econometrics may be defined as the social science in which the tools of economic theory, mathematics and statistical inference are applied to the analysis of economic phenomena”. Gerhard Tintner points out that “econometrics, as a result of certain outlook on the role of economics, consists of application of mathematical statistics to economic data to lend empirical support to the models constructed by mathematical economics and to obtain numerical results”. For H Theil “econometrics is concerned with the empirical determination of economic laws”. In the words of Ragnar Frisch “the mutual penetration of quantitative econometric theory and statistical observation is the essence of econometrics”. Thus, econometrics may be considered as the integration of economics, mathematics and statistics for the purpose of providing numerical values for the parameters of economic relationships and verifying economic theories. It is a special type of economic analysis and research in which the general economic theory, formulated in mathematical terms, is combined with empirical measurement of economic phenomena. Econometrics is the art and science of using statistical methods for the measurement of economic relations. In the practice of econometrics, economic theory, institutional information and other assumptions are relied upon to formulate a statistical model, or a set of statistical hypotheses to explain the phenomena in question. Economic theory makes statements or hypotheses that are mostly qualitative in nature. Econometrics gives empirical content to most economic theory. Econometrics differs from mathematical economics. The main concern of the mathematical economics is to express economic theory in mathematical form (equations) without regard to measurability or empirical verification of the theory. As noted above, econometrics is mainly interested in the empirical verification of Mathematical Economics Page 5 School of Distance Education economic theory. The econometrician often uses the mathematical equations proposed by mathematical economist but put these equations in such a form that they lend themselves to empirical testing. Further, although econometrics presupposes the expression of economic relationships in mathematical form, like mathematical economics it does not assume that economic relationships that are exact. On the contrary, econometrics assumes that economic relationships are not exact but stochastic. Econometric methods are designed to take into account random disturbances which create deviations from exact behavioural patterns suggested by economic theory and mathematical economics. Econometric methods are designed in such a way that they take into account the random disturbances. Econometrics differs both from mathematical statistics and economic statistics. An economic statistician gathers empirical data, records them or charts them, and then attempts to describe the pattern in their development over time and detects some relationship between various economic magnitudes. Economic statistics is mainly descriptive aspect of economics. It does not provide explanations of the development of the various variables and measurement of the parameters of economic relationships. On the contrary, mathematical statistics deals with methods of measurement which are developed on the basis of controlled experiments in laboratories. Statistical methods of measurement are not appropriate for economic relationships, which cannot be measured on the basis of evidence provided by controlled experiments, because such experiments cannot be designed for economic phenomena. For instance, in studying the economic behaviour of human beings one cannot change only one factor while keeping all other factors constant. In real world, all variables change continuously and simultaneously. So the controlled experiments are not possible. Econometrics uses statistical methods for adapting them to the problems of economic life. These adapted statistical methods are called econometric methods. In particular, econometric methods are adjusted so that they become appropriate for the measurement of economic relationships which are stochastic, that is, they include random elements. 1.2 Methodology of Econometrics Broadly speaking, traditional or classical econometric methodology consists of the following steps. 1) 2) 3) 4) 5) 6) 7) 8) Statement of the theory or hypothesis Specification of the mathematical model of the theory Specification of the econometric model of the theory Obtaining the data Estimation of the parameters of the econometric model Hypothesis testing Forecasting or prediction Using the model for control or policy purposes. To illustrate the preceding steps, let us consider the well known psychological law of consumption. Mathematical Economics Page 6 School of Distance Education 1) Statement of theory or hypothesis Keynes stated “the fundamental psychological law......is that men (women) are disposed, as a rule and on average, to increase their consumption as their income increases, but not as much as the increase in their income”. In short, Keynes postulated that the marginal propensity to consume (MPC), that is, the rate of change in consumption as a result of change in income, is greater than zero, but less than one. That is 0<MPC<1. 2) Specification on the mathematical model of consumption Economic theory may or may not indicate the precise mathematical form of the relationship or the number of equations to be included in the economic model. Mathematical model is specifying mathematical equations that describe the relationships between economic variables as proposed by the economic theory. Although Keynes postulated a positive relationship between consumption and income, he did not specify the precise form of functional relationship between the two. For simplicity, a mathematical economist might suggest the following form of the Keynesian consumption function: Y =β1+β2X 0< β2 <1 (1.1) Where Y = consumption expenditure, X= income and β 1and β2, known as parameters of the model are intercept and slope coefficients respectively. The slope coefficient β2 measures the MPC. This equation, which states that consumption is linearly related to income, is an example of mathematical model of relationship between consumption and income that is called the consumption function in economics. A model is simply a set of mathematical equations. If the model has only one equation, it is called single equation model. If the model has more than one equation it is called a multiple equation model. In the above equation (1.1), the variable appearing on the left side of the equality sign is called the dependent variable and the variables on the right side are called the independent or explanatory variables. Thus, in the Keynesian consumption function, consumption expenditure is the dependent variable and income is the explanatory variable. (3) Specification of the econometric model of consumption The purely mathematical model of the consumption function as in equation (1.1) is of limited interest to the econometrician because it assumes that there is an exact or deterministic relationship between consumption and income. But relationships between economic variables are generally inexact. Thus, if we obtain data on consumption and income, we could not expect all the observations to lie on the straight line. This is because of the fact that in addition to income other variables affect consumption expenditure. For example, size of family, ages of the members in the family, family religion etc are likely to exert some influence on consumption. Mathematical Economics Page 7 School of Distance Education To allow for the inexact relationship between economic variables, the econometrician would modify the deterministic consumption function as follows Y =β1+β2X+u (1.2) Consumption expenditure Where u, known as the disturbance or error term, is a random or stochastic variable. The disturbance term u represents all those factors that affect consumption but are not taken into account explicitly. This equation is an example of an econometric model. More technically, it is an example of linear regression model. The econometric consumption function hypotheses that the dependent variable Y (consumption) is linearly related to the explanatory variable X (income) but that the relationship between the two is not exact; it is subject to individual variation. The econometric model of consumption function is shown below u 0 Income (4) Obtaining Data To estimate the econometric model given in equation (1.2), that is, to obtain the numerical values of β1and β2, we need data. Three types of data may be available for empirical analysis. They are time series, cross-sectional and pooled data. A time series is a set of observations on the values that a variable takes at different times. That is, time series data give information about the numerical values of variables from period to period. Such data may be collected at regular time intervals such as daily, weekly, monthly, quarterly, annually, quinquennially or decennially. The data thus collected may be quantitative or qualitative. Qualitative variables also called “dummy variables or categorical variables” can be every bit as important as the quantitative variables. Thus, data on one or more variables collected over a period of time is called time series data. That is, values of one or more variables for several time periods pertaining to a single economic entity are given such data set is called time series data. Cross-sectional data are data on one or more variables collected at the same point of time. These data give information on the variables concerning individual agents at a given point of time. Pooled data is a combination of time series and cross sectional data. That is, in the pooled data are elements of both time series and cross-sectional data. Mathematical Economics Page 8 School of Distance Education (5) Estimation of the econometric model After the model has been specified and data has been collected, the econometrician must proceed with its estimation. In other words, he must obtain the numerical estimates of the coefficients of the model. In our case, the task is to estimate the parameters of the consumption function, that is, β 1 and β2. The numerical estimates of the parameters gives empirical content to the consumption function. The statistical tool of regression analysis is the main tool used to obtain the estimates. Choice of the appropriate econometric technique for the estimation of the function and critical examination of the assumptions of the chosen technique is a crucial step. (6) Hypothesis Testing A hypothesis is a theoretical proposition that is capable of empirical verification or disproof. It may be viewed as an explanation of some event or events, and which may be true or false explanation. Assuming that the fitted model is a reasonably good approximation of reality, we have to develop suitable criteria to find out whether the estimates obtained are in accordance with the expectation of the theory that is being tested. According to Milton Friedman, a theory or hypothesis that is not verifiable by appeal to empirical evidence may not be admissible as a part of scientific theory. Confirmation or refutation of economic theories on the basis of sample evidence is based on a branch of statistical theory known as statistical inference or hypothesis testing. (7) Forecasting or Prediction The objective of any econometric research is to obtain good numerical estimates of the coefficients of economic relationships and use them for the prediction of the values of economic variables. Forecasting is one of the prime aims of econometric research. Econometric methods are used to estimate the parameters of the model, to test the hypothesis concerning them and to generate forecasts from the model. If the model confirms the hypotheses or theory under consideration, we may use it to predict the future values of the dependent or forecast variable Y, on the basis of known value or expected values of the explanatory, or predictor, variable X. It is conceivably possible that the model is economically meaningful and statistically and econometrically correct for the sample period for which the model has been estimated, yet it may well not be suitable for forecasting. The reasons may be either (a) the values of the explanatory variables used in the forecast may be inaccurate or (b) the estimates of the coefficients may be incorrect due to deficiencies of the sample data. (8) Use the model for control or policy purposes Suppose that we have estimated the Keynesian consumption function, then the government can use it for control or policy purposes such as to determine the level of income that will guarantee the target amount of consumption expenditure. In other words, an estimated model may be used for control or policy purposes. By appropriate fiscal and monetary policy mix, the government can manipulate the control variable X to produce the desired lavel the target variable Y. Mathematical Economics Page 9 School of Distance Education 1.3 The Nature of Regression Analysis The term regression was introduced by Francis Galton. In a famous paper “Family Likeness in Stature”, Galton found that, although there was a tendency for tall parents to have tall children and for short parents to have short children, the average height of children born of parents of a given height tended to move or “regress” toward the average height of population as a whole. In other words, the height of the children of unusually tall or unusually short parents tends to move toward the average height of the population. Galton’s “law of universal regression” was confirmed by Karl Pearson, who collected more than a thousand records of heights of members of family groups. However, the modern interpretation of regression is quite different. Broadly speaking, we may say that regression analysis is concerned with the study of the dependence of one variable, the dependent variable, on one or more other variables, the explanatory variables, with a view to estimating and/or predicting the population mean or average value of the former in terms of the known or fixed values of the latter. In regression analysis, we are concerned with what is known as the statistical, not functional or deterministic, dependence among variables. In statistical relationships among variables we essentially deal with ‘random’ or ‘stochastic’ variables. That is, variables that have probability distributions. A random or stochastic variable is a variable that can take on any set of values, positive or negative, with a given probability. On the other hand, in functional or deterministic dependency, we also deal with variables, but these variables are not random or stochastic. In other words, a relation between X and Y characterised as Y = f(X) is said to be deterministic or non-stochastic if for each value of independent variable X, there is one and only one corresponding value of dependant variable Y. On the other hand, a relation between X and Y is said to be stochastic if for a particular value of X, there is whole probability distribution of values of Y. Although regression analysis deals with the dependence of one variable on other variables, it does not necessary imply causation. In the words of Kendall and Stuart, “a statistical relationship, however strong and however suggestive, can never establish casual connection: our ideas of causation must come from outside statistics, ultimately from theory or other”. That is, a statistical relationship per se cannot logically imply causation. To ascribe causality, one must appeal to a priori or theoretical consideration. 1.4 Regression and Correlation Closely related to but conceptually very much different from regression analysis is correlation analysis, where the primary objective is to measure the strength or degree of linear association between two variables. The correlation coefficient measures this strength of linear association. For example, we may be interested in finding the correlation coefficient between marks of mathematics and Mathematical Economics Page 10 School of Distance Education statistics examination, between smoking and lung cancer and so on. In regression analysis, we are not primarily interested in such a measure. Instead, we try to estimate or predict the average value of one variable on the basis of the fixed values of other variables. Thus, we want to know whether predict the average mark on a mathematics examination by knowing a student’s marks on a statistics examination. In regression analysis, there is an asymmetry in the way the dependent and explanatory variables are treated. The dependent variable is assumed to be statistical, random or stochastic, that is, to have a probability distribution. On the other hand, the explanatory variables are assumed to have fixed values (in repeated sampling). But in correlation analysis we treat any two variables symmetrically; there is no distinction between the dependent and explanatory variables. The correlation between marks of mathematics and statistics examinations is the same as that between marks of statistics and mathematics examinations. Moreover, both variables are assumed to be random. While most of the correlation theory is based on the assumption of randomness of variables, whereas most of the regression theory is based on the assumption that the dependent variable is stochastic but explanatory variables are fixed or non-stochastic. 1.5 Two-Variable Regression Analysis As noted above, regression analysis is largely concerned with estimating and/or predicting the population mean or average value of the dependant variable on the basis of the known value of the explanatory variable. We start by a simple linear regression model, that is, by the relationship between two variables, one dependent and one explanatory, related with a linear function. If we are studying the dependence of a variable on only a single explanatory variable, such as consumption expenditure on income, such study is known as the simple or twovariable regression analysis. Suppose we are interested in studying the relationship between weekly consumption expenditure Y and weekly after-tax or disposable family income X. More specifically, we want to predict the population mean level of weekly consumption expenditure knowing the family weekly income. For each of the conditional probability distributions of Y we can compute its mean average value, known as conditional mean or conditional expectation, denoted as E (Y/X=X i) and is read as “the expected value of Y given that X takes the specific value X i, which for simplicity written as E(Y/Xi). An expected value is simply a population mean or average value. Each conditional mean E(Y/Xi) will be a function of Xi. Symbolically, E(Y/Xi) = f (Xi) (1.3) Where f (Xi) denotes some function of the explanatory variable X i. We assume that E(Y/Xi) is a linear function of Xi. Equation (1.3) is known as the two variable Population Regression Function (PRF) or Population Regression (PR) for short. It states merely that the population mean of the distribution of Y given X i is Mathematical Economics Page 11 School of Distance Education functionally related to Xi. In other words, it tells how the mean or average response of Y varies with X. The form of the function f (X i) is important because in real situations we do not have the entire population available for examination. Therefore, the functional of the PRF is an empirical question. For example, an economist might hypothesize that consumption expenditure is linearly related to income. Thus, we assume that the PRF E(Y/Xi) is a linear function of Xi, say of the type, E(Y/Xi) = β1+β2Xi (1.4) Where β1and β2 are unknown but fixed parameters known as the regression coefficients. β1and β2 are also known as the intercept and slope coefficients respectively. Equation (1.4) is known as the linear population regression function or simply the linear population regression. Some alternative expressions used are linear population regression model or linear population regression equation. In regression analysis our interest is in estimating the PRF like equation (1.4) that is estimating the values of unknowns β1and β2 on the basis of observations on Y and X. 1.6 The Meaning of the term “Linear” The term linear can be interpreted in two different ways, namely, linearity in variables and linearity in parameters. The first and perhaps more natural meaning of linearity is that the conditional expectation of Y is a linear function of X i, such as equation (1.4). Geometrically, the regression curve in this case is a straight line. In this interpretation, a regression function such as E(Y/X i) = β1+β2Xi2 is not a linear function because the variable X appears with a power of 2. The second interpretation of linearity, that is, linearity in parameters is that the conditional expectation of Y, E(Y/X i), is a linear function of the parameters that is β’s. It may or may not be linear in the variable X. In this interpretation E(Y/X i) = β1+β2Xi2 is a linear regression model but E(Y/X i) = β1+ β Xi is not. The later is an example of a nonlinear in parameters regression model. Of the two interpretations of linearity, linearity in the parameters is relevant for the development of regression theory. Thus, for our analysis, the term linear regression will always mean a regression that is linear in the parameters, that is β’s. In other words, parameters are raised to the first power only. It may or may not be linear in the explanatory variable, the X’s. It can be noted that E(Y/X i) = β1+β2Xi is linear in both parameters and variable. 1.7 Stochastic Specification of PRF The stochastic nature of the regression model implies that for every value of X there is a whole probability distribution of values of Y. In other words, the value of Y can never be predicted exactly. The form of the equation (1.4) implies that the relationship between consumption expenditure and income is exact, that is, all the variations in Y are is due solely to changes in X, and there is no other factors affecting the dependent variable. Given the income level of X i, an individual family’s consumption expenditure will cluster around the average consumption of all families at that Xi, that is around the conditional expectation. Therefore, we can express the deviation of an individual Y i around the expected value as follows. Mathematical Economics Page 12 School of Distance Education ui = Yi ̶ E(Y/Xi) or Yi= E(Y/Xi) + ui (1.5) Where the deviation ui is an unobservable random variable taking positive or negative values. Technically ui is known as the stochastic disturbance or stochastic error term. This is because u i is supposed to ‘disturb’ the exact linear relationship which is assumes to exist between X and Y. The equation (1.5) means that the expenditure of an individual family, given its income level, can be expressed as the sum of two components. Firstly, E(Y/X i), which is simply the mean consumption expenditure of all families with the same level of income. This component is known as the systematic or deterministic component. Secondly, u i, which is the random or non-systematic component. In other words, [variations in Y i] = [Systematic variation] + [random variation] or [variations in Yi] = [explained variations] + [unexplained variation]. Since E(Y/Xi) is assumed to be linear in X i, equation (1.5) may be written as Yi= E(Y/Xi) + ui Yi = β1+β2Xi + ui (1.6) Equation (1.6) posits that the consumption expenditure of a family is linearly related to its income plus the disturbance term. Now if we take the conditional expectation, conditional upon the given X’s on both sides of the equation (1.5), we obtain E(Yi/Xi) = E[E(Y/Xi)]+ E(ui/Xi) E(Yi/Xi) = E(Y/Xi)+ E(ui/Xi) (1.7) Since the expected value of a constant is that constant itself. Again, E(Y i/Xi) is the same thing as E(Y/Xi), equation (1.7) implies that E(ui/Xi) = 0 (1.8) Thus, the assumption that the regression line passes through the conditional mean of Y implies that the conditional mean values of u i (conditional upon the given X’s) are zero. The stochastic specification clearly shows that there are other variables besides income that affect the consumption expenditure and that an individual’s family consumption expenditure cannot be fully explained only by the variables included in the regression model. 1.8 The Significance of the Stochastic Disturbance Term As noted above, the disturbance tern u i is the substitute for all those variables that are omitted from the model but that collectively affect Y. The reasons for not introducing those variables into the model and significance of stochastic disturbance term ui are explained below. Mathematical Economics Page 13 School of Distance Education a) Vagueness of theory The theory determining the behaviour of Y may be incomplete. We might know for certain that weekly income X influences weekly consumption expenditure Y, but we might be ignorant or unsure about other variables affecting Y. Therefore, u i may be used as a substitute for all the excluded or omitted variables from the model. b) Unavailability of Data Even if we know some of the excluded variables and therefore consider a multiple regression rather than a simple regression, we may not have quantitative information about these variables. For example, we could introduce family wealth as an explanatory variable in addition to the income variable to explain family consumption expenditure. But unfortunately, information about family wealth is not generally available. c) Core variables vs. Peripheral variables Assume the in the consumption-income example that besides income X 1, the number of children per family X 2, sex X3, religion X4, and education X5 also affect consumption expenditure. But it is quite possible that the joint influence of all or some of these variables may be so small ad non-systematic or random. Thus, as a practical matter and for cost consideration we do not introduce them into the model explicitly. Their combined effect can be treated as a random variable u i. d) Intrinsic Randomness in Human Behaviour Human behaviour is not predictable. Even if we succeed in introducing all the relevant variables into the model, there is bound to be some intrinsic randomness in individual Y that cannot be explained no matter how hard we try. The disturbances, the u’s may very well reflect this intrinsic randomness. e) Poor Proxy Variables Although regression model assumes that the variables Y and X are measured accurately, in practice the data may be plagued by errors of measurement. The deviations of the points from the true regression line may be due to errors of measurement of the variables, which are inevitable due to the methods of collecting and processing statistical information. The disturbance term u i also represents the errors of measurement. f) Principle of Parsimony Following Occam’s razor which states that descriptions be kept as simple as possible until proved inadequate, we would like to keep our regression model as simple as possible. If we can explain the behaviour of Y substantially with two or three explanatory variables and if our theory is not strong enough to suggest that other variables might be included, there is no need to introduce other variables. Let ui represent all other variables. Mathematical Economics Page 14 School of Distance Education g) Wrong Functional Form Even if we are theoretically correct variables explaining a phenomenon and even if we can obtain data on these variables, very often we do not know the form of the functional relationship between the regressand and the regressors. We may have linearised a possibly nonlinear relationship. Or we may have left out of the model some equations. The economic phenomena are much more complex than a single equation may reveal, no matter how many explanatory variables it contains. Thus, the disturbance term ui represents such errors which may be due to wrong specification of the functional relationship. Thus, for all these reasons, the stochastic disturbances u i assume an extremely critical role in regression analysis. 1.9 The Sample Regression Function (SRF) The linear relationship Yi = β1+β2Xi + ui holds for the population of the values of X and Y so that we could obtain the numerical values of β 1 and β2 only if we have all the conceivably possible values of X, Y and u which form the population of these variables. But in practice one rarely has access to entire population of interest. Thus, in most practical situations we have only sample Y values corresponding to some fixed X’s. Therefore our task is to estimate the PRF on the basis of the sample information. Analogous to the PRF, we can develop the concept of Sample Regression Function (SRF) to represent the sample regression line. The sample counter part of equation (4) may be written as Y =β + β X (1.9) Where Y is read as “Y-hat” or “Y-cap”. In the above equation, Y = Estimator of E(Y/Xi) β = Estimator of β1 β = Estimator of β2 An estimator also known as a sample statistic is simply a rule or formula or method that tells us how to estimate the population parameter from the information provided by the sample at hand. The particular numerical value obtained by the estimator in an application is known as an estimate. Now we can express the SRF in its stochastic form as Y =β + β X + u (1.10) Where , u denotes the sample residual term. Conceptually, u is analogous to ui and can be regarded as an estimate of u i. It is introduced in the SRF for the same reason as ui was introduced in the PRF. Mathematical Economics Page 15 School of Distance Education Thus, to sum up, the primary objective in the regression analysis is to estimate the PRF Yi = β1+β2Xi + ui On the basis of the SRF Y =β + β X + u But because of the sampling fluctuations our estimate of PRF based on the SRF is at best an approximate one. This approximation is shown the figure below. Yi Weekly Consumption expenditure SRF: Yi Y u ui = + PRF: E(Y/Xi) = β1+β2Xi Y E(Y/Xi) A 0 Xi Weekly Income For X = Xi, we have one sample observation Y=Y i. In terms of the SRF, the observed Yi can be expressed as Yi = Y + u (1.11) And in terms of PRF, it can be expressed as Yi = E (Y/Xi) + ui (1.12) In the figure, Y overestimates the true E(Y/Xi). But at the same time, for any Xi to the left of point A, SRF will underestimate the true PRF. Such over and underestimation is inevitable because of sampling fluctuations. The important task is to device a rule or method that will make the approximation as close as possible. That is, SRF should be constructed such that β is as close as possible to true β 1 and β is as close possible to the true β 2, even though we never know the true β 1 and β2. Mathematical Economics Page 16 School of Distance Education MODULE II TWO VARIABLE REGRESSION MODEL 2.1 The Method of Ordinary Least Squares Our task is to estimate the population regression function (PRF) on the basis of the sample regression function (SRF) as accurately as possible. Though there are several methods of constructing the SRF, the most popular method to estimate the PRF from SRF is the method of Ordinary Least Squares (OLS). The method of ordinary least squares is attributed to German mathematician Carl Friedrich Gauss. Under certain assumptions, the method of least squares has some very attractive statistical properties that have made it one of the most popular methods of regression analysis. The relationship between X and Y in the PRF is Yi = β1+β2Xi + ui Since PRF is not directly observable, we estimate it from the SRF, Y =β + β X + u Yi = Y + u Where Y is the estimated conditional mean value of Y i. Now to determine the SRF, we have u = Y − Y u = Y − β − β X u = Y − (β + β X ) (2.1) Which shows that the residuals, u are simply the difference between the actual and estimated Y values. Given n pairs of observations on Y and X, we would like to determine the SRF in such a manner that it is as close as possible to the actual Y. To this objective, we may adopt the following criterion: Choose the SRF in such a way that the sum of residuals ∑ u = ∑( Y − Y ) is as small as possible. The rationale of this criterion is easy to understand. It is intuitively obvious that the smaller the deviations from the population regression line, the better the fit of the line to the scatter of observations. Although intuitively appealing this is not a good criterion as can be seen from the following scatter diagram. Mathematical Economics Page 17 School of Distance Education Y SRF: u u X2 + u u X1 = X4 X3 X If we adopt the criterion of minimising ∑ u , then all the residuals will receive the same weight in the sum, although u and u are more widely scattered around the SRF than u and u . In other words, all the residuals receive equal importance no matter how close or how widely scattered the individual observations are from the SRF. A consequence of this is that it is quite possible that the algebraic sum of the u is small or even zero although u are widely scattered around the SRF. That is, while summing these deviations the positive values will offset the negative values, so that the final algebraic sum of these residuals might equal to zero. For avoiding this problem the best solution is to square the deviations and minimise the sum of squares. That is we can adopt the least squares criterion, which states that the SRF can be fixed in such a way that That is, ∑u u = = ∑(Y − β − β X ) (Y − Y ) (2.2) Thus, in the least squares criterion, we try to make equation (2.2) as small as possible, where u are the squared residuals. The reason for calling this method as the least squares method is that it seeks the minimisation of the sum of squares of the deviations of actual observations from the line. By squaring u , this method gives more weight to residuals such as u and u in the above than the residuals u and u As noted previously, under the minimum∑ u criterion, the sum can be small even though the u are widely spread around the SRF. But this is not possible under the least squares method because the larger the absolute value of u , the larger will be∑ u . A further justification of the least squares method lies in the fact that the estimators obtained by it have some very desirable statistical properties. Mathematical Economics Page 18 School of Distance Education It is obvious from the equation (2.2) that ∑ u = f (β , β ) (2.3) That is, the sum of squared residuals is some function of the estimatorsβ and β . For any given set of data, choosing different values for β and β will give different u′s and hence different values of ∑ u . The method of least squares chooses β and β in such a manner that for a given sample or set of data ∑ u is as small as possible. In other words, for a given sample, the method of least squares provides us with unique estimators of β and β that give smallest possible value of ∑ u . This can be accomplished through the application of differential calculus. The process of differentiation yields the following equations for estimating β1 and β2. ∑ ∑Y = nβ + β ∑ X (2.4) = β ∑X + β ∑X (2.5) Where n is the sample size. These simultaneous equations are known as the “normal equations”. 2.1.1 Formal Derivation of the Normal Equations We have to minimise the function u = (Y − β − β X ) With respect to β and β . The necessary condition for minimum is that the first derivatives of the function be equal to zero. That is, β = 0 and β =0 To obtain the above derivatives we apply the function of the function rule of differentiation. Then, the partial derivatives with respect to β will be, ∑(Y − β − β X ) β 2 This can be written as ∑ Y = nβ + β ∑ X Mathematical Economics =0 (Y − β − β X ) (−1) = 0 Y − β − β X ) =0 (2.4) Page 19 School of Distance Education The partial derivatives with respect to β will be, ∑(Y − β − β X ) β 2 This can be written as =0 (Y − β − β X ) (−X ) = 0 − β X − β X (YX ) =0 = β ∑X + β ∑X ∑Y (2.5) 2.1.2 Derivation of Least Squares Estimators We have the normal equations as, ∑ ∑Y = nβ + β ∑ X (2.4) = β ∑X + β ∑X (2.5) Applying the Crammer’s rule for solving we have |A| = n ∑X ∑X ∑X = n∑ X − (∑ X ) For solving β , forming the special matrix , |A2| |A2| = n ∑X ∑Y ∑X Y = n ∑X Y − ∑X ∑Y β = n ∑X Y − ∑X ∑Y n ∑ X − (∑ X ) Multiplying all the elements with n n we have ∑X ∑Y − n n n β = ∑X ∑X ∑X − n n n ∑X Y β = Mathematical Economics ∑X Y − nX Y ∑ X − nX Page 20 School of Distance Education β = ∑(X − X ) (Y − Y) ∑(X − X ) Where X and Y are the sample means of X and Y. By defining x i = (X − X ) and yi =(Y − Y), that is letting the lower case letters to denote deviations from their mean values, we have = ∑ (2.6) ∑ To derive β , let us reproduce the first normal equation, that is equation (2.4), we have Dividing both sides by n, we have ∑ = β + β ∑ = nβ + β ∑ X ∑ That is, Y = β + β X = (2.7) − The estimators β and β obtained are known as ordinary least squares estimators because they are derived from the least squares principle. Example: To illustrate the use of the above formulae we will estimate the supply function of commodity ‘z’. We have 12 pairs of observations as shown in the following table. Number of Observations 1 Quantity 69 Price 9 2 76 12 3 52 6 4 56 10 5 57 9 6 77 10 7 58 7 8 55 8 9 67 12 10 53 6 11 72 11 12 64 8 Mathematical Economics Page 21 School of Distance Education The following is the worksheet for the estimation of the supply function of commodity ‘z’. n Quantity (Yi) Price (Xi) x i yi xi2 1 69 9 6 0 0 0 2 76 12 13 3 39 9 3 52 6 -11 -3 33 9 4 56 10 -7 1 -7 1 5 57 9 -6 0 0 0 6 77 10 14 1 14 1 7 58 7 -5 -2 10 4 8 55 8 -8 -1 8 1 9 67 12 4 3 12 9 10 53 6 -10 -3 30 9 11 72 11 9 2 18 4 12 64 8 1 -1 -1 1 n=12 Y = 756 yi =( ) xi = ( ∑ y =0 X = 108 Y= X= β = − β = ∑ 756 = = n 12 − x =0 ) ∑ x y =156 x = 48 ∑ 108 = = n 12 ∑x y 156 = = . ∑x 48 − β X = 63-(3.25)9 = 33.75 Thus, the estimated supply function is = . 2.2 Numerical Properties of OLS Estimators + . Numerical properties are those that hold as a consequence of the use of ordinary least squares, regardless of how the data were generated. The following are numerical properties of OLS estimators I. The OLS estimators are expressed solely in terms of the observable (that is, sample ) quantities of X and Y. Therefore it can be easily computed. Mathematical Economics Page 22 School of Distance Education II. III. OLS estimators are point estimators. That is, given the sample, each estimator will provide only a single (point) value of the relevant population parameter. Once the OLS estimates are obtained from the sample data, the sample regression line can be easily obtained. The regression line thus obtained has the following properties: 1) It passes through the sample means of Y and X. This fact is obvious from equation (2.7) which can also be written as Y = β + β X. This is shown in the following figure. Y SRF: = + Y X X 2) The mean value of the estimated Y that is, Y is equal to the mean value of the actual Y. That is Y = Y Proof: Y =β + β X Since, β = Y − β X We will get, Y = (Y − β X) + β X Y = Y + β (X − X) Applying summation Dividing by ‘n’, we have β (X − X) ∑Y nY ∑ β (X − X) = + n n n since (X − X) = 0 we will get, Y = Y 3) The mean value residuals, u is zero. Mathematical Economics Y = nY + Page 23 School of Distance Education Proof: From our earlier equation, we have 2 Or (Y − β − β X ) (−1) = 0 (Y − β − β X ) =0 But since u = Y − β − β X , the preceding equation reduces to u =0. As a result of the proceeding property, the sample regression, Y =β + β X + u (1.10) Can be expressed in an alternative form where both Y and X are expressed as deviations from their mean values. For thus, taking the sum on both sides of (1.10), we will get Y = β +β Since, ∑ u = 0, we have X + u ∑Y = β +β ∑X (2.8) Dividing equation (2.8) through by n, we obtain = β +β X (2.9) Note that equation (2.9) is the same as equation (2.7). Subtracting (2.9) from (1.10) we obtain Y − or = β (X − X) + u y =β x +u (2.10) Where yi and xi are deviations from their respective sample mean values. Equation (2.10) is known as the deviation form. Note that the intercept term, β is absent. Yet we can find it from our earlier equation (2.7). In the deviation form, the SRF can be written as y =β x (2.11) 4) The residuals are uncorrelated with the predicted Yi. u = 0. Mathematical Economics That is ∑ Y Page 24 School of Distance Education Proof: ∑Y u = β ∑x u = β ∑ x (y − β x ) =β x y −β Since β = We have ∑ Y u = β ∑Y u = 0 Proof: ∑ ∑ ∑x −β as per equation (2.11) x ∑x 5) The residuals u are uncorrelated with Xi, that is ∑ u X = 0 From, our earlier normal equations, we have 2 (Y − β − β X ) (−X ) = 0 That is −2 ∑ u X = 0 or u X =0 2.3 The Classical Linear Regression Model: The Assumptions Underlying the Method of Least Squares If our objective is to estimate β 1 and β2 only, then the method of OLS will be sufficient. But our aim is not only to obtain β and β but also to draw inferences about the true β1 and β2. We would like to know how close β and β are to their counterparts in the population or how close is Y to the true E(Y/Xi).The PRF: Yi = β1+β2Xi + ui shows that Yi depends on both Xi and ui. Therefore unless we are specific about how Xi and ui are generated, there is no way we can make any statistical inference about Y i, β1 and β2. The classical linear regression model, which is the cornerstone of the most econometric theory, makes 10 assumptions, which are explained below. 1. Linear Regression Model The regression model is linear in the parameters as shown in the PRF, Y i = β1+β2Xi + u i. 2. X Values are fixed in repeated sampling Values taken by the regressor X are considered fixed in repeated samples. More technically, X is assumed to be nonstochastic. Our regression analysis is conditional regression analysis, that is, condition upon the given values of the Mathematical Economics Page 25 School of Distance Education regressor X. The Xi’s are a set of fixed values in the hypothetical process of repeated sampling which underlines the linear regression model. This means that, in taking a large number of samples on Y and X, the Xi values are the same in all the samples, but the ui and Yi do differ from sample to sample. 3. Zero mean value of the disturbance u i Given the value of X, the mean or expected value of the random disturbance term ui is zero. Technically, the conditional mean value of u i is zero. Symbolically we have E (ui/Xi) = 0 (2.12) The equation states that the mean value of u i conditional upon the given Xi is zero. This means that for each X, u may assume various values, some greater than zero and some smaller than zero, but if we consider all the possible value of u, for any given value of X, they would have an average value of zero. This assumption implies that E(Yi/Xi) = β1+β2Xi. 4. Homoscedasticity or equal variance of u i Given the value X, the variance of ui is the same for all observations. That is, the conditional variances of ui are identical. Symbolically, we have var(ui/Xi) =E [u − E ] var(ui/Xi)= E(ui2/Xi) because of assumption 3 var(ui/Xi)=σ2 (2.13) var stands for variance. Equation (2.13) states that the variance of ui for each X i is some positive constant number equal to σ 2. Technically it represents the assumption of Homoscedasticity or equal spread or equal variance. Stated differently, the equation means that the Y populations corresponding to X values have the same variance. The assumption also implies that the conditional variances of Yi are also homoscedastic. That is, var(Yi/Xi)=σ2 (2.14) 5. No autocorrelation between the disturbances Given any two X values, Xi and Xj (i≠j), the correlation between any two u i and uj (i≠j) is zero. Symbolically, cov (ui,uj / Xi, Xj) = E [ui-E(ui/Xi)][ uj-E(uj/Xj)] cov (ui,uj / Xi, Xj) = E(ui/Xi)E(uj/Xj) cov (ui,uj / Xi, Xj) =0 Mathematical Economics since E(ui)=E(uj)=0 (2.15) Page 26 School of Distance Education Where i and j are two different observations and where cov means covariance. Equation (2.15) postulates that the disturbances u i and uj are uncorrelated. Technically, this is the assumption of no serial correlation or no auto correlation. That is the covariances of any u i with any other uj are equal to zero. The value which the random term assumed in one period does not depend on the value which it assumed in any other period. 6. Zero covariance between ui and Xi or E(uiXi)=0 Formally, cov (ui, Xi) = E [u i-E(ui)][ Xi-E(Xi)] cov (ui, Xi) = E [ui ( Xi-E(Xi))] since E(ui) = 0 cov (ui, Xi) = E (uiXi)-E(Xi)E(ui) cov (ui, Xi) = E (uiXi) since E(ui)=0 cov (ui, Xi) =0, by assumption (2.16) The above assumption states that the disturbance u and explanatory variable X are uncorrelated. The rationale for this assumption is as follows: when we express the PRF as Yi = β1+β2Xi+ui, we assumed that explanatory variable X and u which represent the influence of all omitted variables have separate and additive influence on Y. But if X and u are correlated, it is impossible to assess their individual effects on Y. 7. The number of observations n must be greater than the number of parameters to be estimated. Alternatively, the number of observations n must be greater than the number of explanatory variables. For instance, if we had only one pair of observations on Y and X, there is no way to estimate the two unknowns, namely β 1and β2. We need at least two pairs of observations to estimate the two unknowns. 8. Variability in X values The X values in a given sample must not all be the same. Technically, var(X) must be a finite positive number. If all the X values are identical, then X = X and ∑ the denominator of the equation β = ∑ will be zero. Thus, it is impossible to estimate β2 and therefore β1. Variation in both Y and X is essential for regression analysis. In short, variables must vary. 9. The regression model is correctly specified Alternatively, there is no specification bias or error in the model used in empirical analysis. An econometric investigation begins with the specification of the econometric model underlying the phenomenon of interest. Some important questions that arise in the specification of the model include the following: (a) what variables should be included in the model? (b) What is the functional form of the Mathematical Economics Page 27 School of Distance Education model? Is it linear in the parameters, the variables or both? (c) What are the probabilistic assumptions made about the Y i, the Xi and the ui entering the model? 10. There is no perfect multicollenearity That is, there is no perfect linear relationship among the explanatory variables. If there is more than one explanatory variable in the relationship it is assumed that they are not perfectly correlated with each other. Indeed, the regressors should not even be strongly correlated; they should not be highly multicollinear. 2.4 Properties of Least Squares Estimators: The Gauss Markov Theorem As noted earlier, given the assumptions of classical linear regression model, the least squares estimates possess some ideal or optimum properties. These properties are contained in the well known Gauss Markov theorem. To understand this theorem, first we need to consider the best linear unbiasedness properties of an estimator, which is explained below. An estimator is the best when it has the smallest variance as compared to any other estimators obtained from other econometric methods. Symbolically assumed that θ has two estimates, namely θ and θ . θ is the best if, Or var(θ ) < var θ ) E [θ − E θ ] < E [θ − E θ ] An estimator is linear if it is linear function of sample observations. That is, it is determined by linear combination of sample data. Given the sample observations, Y1, Y2, Y3....Yn, a linear estimator will have the form k1Y1+ k2Y2+ k3Y3+...+kn Yn Where ki’s are some constants. The bias of an estimator is defined as the difference between its expected value and the true parameter. Bias = E (θ) − θ The estimator is unbiased if its bias is zero, that is, E (θ) = θ this means that the unbiased estimator converges to the true value of the parameter as the number of samples increases. An unbiased estimator gives ‘on the average’ the true value of the parameter. An estimator θ is best linear unbiased estimator (BLUE) if it is linear, unbiased and has the smallest variance as compared to all other linear unbiased estimators of the true θ. The BLU estimate has the minimum variance within a class of linear unbiased estimators of the true θ. An estimator, say the ordinary least squares estimator β is said to be BLUE of β2 if the following hold. Mathematical Economics Page 28 School of Distance Education 1) It is linear, that is, a linear function of a random variable, such as the dependent variable Y in the regression model. 2) It is unbiased, that is, its average or expected value, E(β ) is equal to the true value β2 3) It has the minimum variance in the class of all such linear unbiased estimators; an unbiased estimator with the least variance is known as an efficient estimator. In the regression context it can be proved that the OLS estimators are BLUE. This is the essence of the Gauss Markov Theorem, which can be stated as follows: Given the assumptions of the classical linear regression model, the least squares estimators, in the class of unbiased linear estimators, have the minimum variance, that is, they are BLUE. 2.4.1 Proof of the Gauss Markov Theorem 1) Property of Linearity The least squares estimates β and β are sample values of Yi. Given that Xi’s appear hypothetical repeated sampling process, it can estimates depend on the values of Y i only, that is, We have β = β = β = β = ∑ ∑ ∑ ∑ linear functions of the observed always in the same values in be shown that the least squares β = f(Y )and β = f (Y ) ∑ x (Y − Y) ∑x ∑x Y − Y∑x ) ∑x β = ∑k Y since ∑ x = 0 Or where ki = ∑ The values of X’s are fixed in hypothetical repeated sampling. Hence the k i’s are fixed constants from sample to sample and may be regarded as constant weights assigned to the individual values of Y. We write, β = k Y = k Y + k Y + ⋯ + k Y = f (Y) The estimate β is a linear function of Y’s, that is, a linear combination of the values of dependent variable. Mathematical Economics Page 29 School of Distance Education Similarly, β =Y−β X β =Y−X β = β = ∑Y −X n kY kY 1 − Xk Y n Since X values are constants and X and k are fixed constants from sample to sample. Thus, β depends only on the values of Yi, that is, β is a linear function of the sample values of Y. 2) Property of Unbiasedness The property of unbiasedness of β and β can be proved if we can establish E( β ) = β and E( β ) = β . The meaning of this property is that the estimates converge to the true value of the parameters as we increase the number of hypothetical sample. We have β = ∑ k Y β = k (β + β X + u ) β =β k +β k X + We can prove that ∑ k = 0 and ∑ k X = 1 k u) ∑ That is, ∑ k = ∑ ki = ∑( ∑ ) =0 ∑ similarly, ∑ k X = ∑ kX = Since ∑ x kX = = ∑ X − X ∑ X we have Mathematical Economics ∑(X − X)X ∑x X ∑X − X∑X ∑x Page 30 School of Distance Education kX = Therefore we have β = β + ∑ k u ∑X −X∑X ∑X −X∑X β =β + Taking the expected values, we have E β =E β + =1 ∑x u ∑x ∑ ∑ Since E β = β and E(u ) = 0 we will get E β estimator of β . = β . Therefore, β is unbiased Similarly we can prove that β is an unbiased estimator of β . We β = Taking the expected values we have E (β ) = E (β ) = E (β ) = E (β ) = 1 − Xk E(Y ) n 1 − Xk (β + β X ) n β β X − Xk β + − Xk β X n n nβ +Xβ n Since ∑ k = 0 and ∑ k X = 1 , we get have 1 − Xk Y n k +β X −Xβ n kX E (β ) = β + β X − β X E (β ) = β Thus, β is an unbiased estimator of β . 3) Property of Minimum Variance Mathematical Economics Page 31 School of Distance Education In this section we will prove the Gauss Markov theorem which states that the least squares estimates have the smallest variance as compared with any other linear unbiased estimators. It can be proved that var (β ) = E[β − E(β )] Since β = ∑ k Y we get var β var β we have var (Y ) = σ and k = so, var (β ) = σ ∑ = k = var ( k Y) var (Y )since k is constant ∑ (∑ ) ( Similarly, it can be proved that )= ∑ var (β ) = E[β − E(β )] 1 − Xk Y n var (β ) = var since var (Y ) = σ we have var β var β =σ ∑ =σ − −2 ∑ =σ +X Now taking LCM, we have Mathematical Economics = 1 − Xk n var( Y ) +X k +X ∑k since ∑ k = 0 we have var β That is, var β var β =σ + X ∑k ∑ Page 32 School of Distance Education =σ var β ∑ ∑ 2 since ∑ xi 2 = ∑ X2 − nX , var β1 = 2 ∑ X2 nX nX σ2 2 n ∑ xi that is, ( ̂) = 2 ∑ ∑ Thus, we obtained the variance of both ordinary least squares estimators. Now we need to prove that any other linear unbiased estimate of the true parameter for example,β2 , obtained from any other econometric method has a bigger variance than the OLS estimator β2 . That is, var(β2 ) < var(β2 ) The new estimator β2 is by assumption a linear combination of Y i’s. As a weighted sum of the sample values of Y i, the weights being different from the weight x ki (= ∑ xi 2 ) of the OLS estimates. For example, let us assume i β2 = Where ci = ki + di ci Yi ‘ki’ being the weights defined earlier for OLS estimates and ‘d i’, an arbitrary set of weights. The new estimator β2 is also assumed to be unbiased estimator of β2 . That is, E(β2 ) = β2 We have β2 = β2 = β1 β2 = ci (β1 + β2 Xi + u ) i ci +β2 Taking expected value on both sides, Mathematical Economics (β2 ) = β1 ci Yi ci +β2 ci X i + ci X i + ci u i ci u i Page 33 School of Distance Education If we assume, ∑ ci = o, ∑ ci Xi = 1 and ∑ ci ui = 0, then (β ) = β Now, the variance of the new estimator β is var β var β = var β var β But σ ∑ k c = var ( c Y) var (Y )since c is constant =σ =σ var β =σ var β = var β = variance of β and therefore, c k + k +σ d d +σ d > β Given that di’s are defined as an arbitrary constant weights not all of them are zero, the second term is positive, that is, σ ∑ d > 0. Therefore, var β Thus, in the group of linear unbiased estimates of true β , the least squares estimate has the minimum variance. In the similar way, we can prove that the least squares intercept coefficientβ has the minimum variance. 2.5 The Coefficient of Determination: A Measure of Goodness of Fit After the estimation of the parameters and the determination of the least squares regression line, we need to know how ‘good’ is the fit of this line to the sample observations of Y and X. That is, we need to measure the dispersion of observations around the regression line. This knowledge is essential, the closer the observations to the line, the better the goodness of fit and the better is the explanation of the variations of Y by the changes in the explanatory variables. If all the observations lie on the sample regression line, we would obtain a perfect fit. But this is rarely the case. Generally, there will be some positive u and negative u . What we hope for is that these residuals around the regression line are as small as possible. The square of correlation coefficient known as coefficient of determination r2, in two variable case, and R2 in multiple regression is a summary Mathematical Economics Page 34 School of Distance Education measure that tells how well the sample regression line fits the data. As a measure of goodness of fit, r2 shows the percentage of the total variation of the dependent variable Y that is explained by the independent variable X. To compute the coefficient of determination (r 2) we proceed as follows. Let ∑ y = ∑(Y − Y) → Total variation of the actual Y values about their sample mean, which may be called Total Sum of Squares (TSS). We compute total variation of the dependent variable by comparing each value of Y to the mean value Y and adding all the resulting deviations. Note that in order to find the TSS, we square the simple deviations, since by definition the sum of simple deviations of any variable around its means is identically equal to zero. ∑ y = ∑ Y − Y → Variations of the estimated Y values about their means, which may be called the sum of squares due to regression, or explained by regression or simply the Explained Sum of Squares (ESS). This is the part of the total variations of Yi which is explained by the regression line. Thus, ESS is the total explained by the regression line variation of the dependent variable. ∑ u = ∑ Y − Y →Variations of the dependent variable which is not explained by the dependent variable and is not explained by the regression line and is attributed to the existence of the disturbance term. The Residual Sum of Squares (RSS) is the sum of the squared residuals that gives the total unexplained variation of the dependent variable Y around its mean. In summary y = Y − Y → deviations of Y from its mean y = Y − Y → deviations of regressed value Y from the mean u = Y − Y → deviations of Y from the regression line Thus we have Y =Y +u Taking the sum and squaring, ∑ Y = ∑ Y + u Since 2 ∑ Y ∑ u = 0, we get That is Y = Y +2 Y = TSS = ESS + RSS Y Y + u + u u (2.17) or Mathematical Economics Page 35 School of Distance Education [ ]=[ ]+[ ( ) ] The above equation shows that the total variation in the observed Y values about their mean value can be partitioned into two parts; one attributable to the regression line and the other to random forces because not all actual Y observation lie on the fitted line. Now dividing equation (2.17) by TSS on both sides, we obtain, 1= 1= ESS RSS + TSS TSS ∑ Y −Y ∑ Y −Y + ∑(Y − Y) ∑(Y − Y) We now define r2 as Or alternatively = − = ∑ − = ∑( − ) ∑ ) ∑( = − The quantity r2 thus defined is known as the coefficient of determination and is the most commonly used measure of the goodness of fit of a regression line. Verbally, r2 measures the proportion or percentage of the total variation in Y explained by the regression model. For example, if r . = 0.90, this means that the regression line explains 90 per cent of the total variation of Y values around their mean. The remaining 10 per cent of the total variation in Y is unaccounted for by the regression line and is attributed to the factors included in the disturbance variable u. The following are two properties of the r 2. 1) r2 is a non negative quantity. 2) The limits of r2 are 0 ≤ r ≤ 1 . An r2 of 1 means perfect fit, that is, Y = Y for each i. On the other hand, an r2 of zero means that there is no relationship between the regressand and the regressor whatsoever ( β = 0). In such case Y = β = Y. That is, the best prediction of any Y value is simply its mean value. Therefore, in this situation, the regression line will be horizontal to the X axis. That is, as noted above, ∑ u = ∑ Y − Y is the proportion of the unexplained variation of the Y around their mean. In other words, the total variation of Y is explained completely by the estimated regression line. Consequently, there will be no unexplained variation (RSS=0) and hence r 2=1. On the other hand, if the regression line explains only part of the variation in Y, there will be some unexplained variation, (RSS>0) and therefore r 2will be smaller than 1. Finally, if the regression line does not explain any part of the variation in Y, ∑ Y − Y = ∑(Y − Y) and hence r2=0. Mathematical Economics Page 36 School of Distance Education MODULE III THE CLASSICAL NORMAL LINEAR REGRESSION MODEL 3.1 The Probability Distribution of Disturbances For the application of the method of ordinary least squares (OLS) to the classical linear regression model, we did not make any assumptions about the probability distribution of the disturbances u i. The only assumption made about u i were that they had zero expectations, were uncorrelated and had constant variance. With these assumptions we saw that the OLS estimators satisfy several desirable statistical properties, such as unbiasedness and minimum variance. If our objective is point estimation only, the OLS method will be sufficient. But point estimation is only one aspect of statistical inference, the other being hypothesis testing. Thus, our interest is not only in obtaining, say β but also using it to make statements or inferences about true β . That is, the goal is not merely to obtain the Sample Regression Function (SRF) but to use it to draw inferences about the Population Regression Function (PRF). Since our objective is estimation as well as hypothesis testing, we need to specify the probability distribution of disturbances u i. In the module II we proved that the OLS estimators of β and β are both linear functions of ui, which is random by assumption. Therefore, the sampling or probability distribution of OLS estimators will depend upon the assumptions made about the probability distribution of u i. Since the probability distribution of these estimators are necessary to draw inferences about their population values, the nature of probability distribution of u i assumes an extremely important role in hypothesis testing. But since the method of OLS does not make any assumptions about the probabilistic nature of ui, it is of little help for the purpose of drawing inferences about the PRF from SRF. But this can solved if we assume that the u’s follow some probability distribution. In the regression context, it is usually assumed that the u’s follow the normal distribution. 3.2 The Normality Assumption The classical normal linear regression model assumes that each u i is distributed normally with Mean: E(u ) = 0 Variance: E(u ) = σ cov u , u = 0 i ≠ j (3.1) (3.2) (3.3) These assumptions may also be compactly stated as u ~N(0, σ ) Mathematical Economics (3.4) Page 37 School of Distance Education Where ~ means “distributed as” and where N stands for the “normal distribution”. The terms in the parentheses represents the two parameters of the normal distribution, namely, the mean and the variance. u is normally distributed around zero mean and a constant finite variance σ . For each ui, there is a distribution of the type of (3.4).The meaning is that small values of u have a higher probability to be observed than large values. Extreme values of u are more and more unlikely the more extreme we get. For two normally distributed variables zero covariance or correlation means independence of the two variables. Therefore, with the normality assumption, equation (3.3) means that u and u are not only uncorrelated but also independently distributed. Therefore, we can write equation (3.4) as, u ~NID(0, σ ) (3.5) Where NID stands for normally and independently distributed. There are several reasons for the use of normality assumption, which are summarised below. 1) As noted earlier, ui represents the combined influence of a large number of independent variables that are not explicitly introduced in the regression model. We hope that the influence of these omitted or neglected variables is small or at best random. By the central limit theorem of statistics it can be shown that if there are large number of independent and identically distributed random variables, then, with few exceptions, the distribution of their sum tends to a normal distribution as the number of such variables increases indefinitely. It is this central limit theorem that provided a theoretical justification for the assumption of normality of u i. 2) A variant of central limit theorem states that even if the number of variables is not very large or if these variables are not strictly independent, their sum may still be normally distributed. 3) With the normality assumption, the probability distribution of the OLS estimators can be easily derived because one property of the normal distribution is that any linear function of normally distributed variables is itself normally distributed. 4) The normal distribution is a comparatively simple distribution involving only two parameters, namely mean and variance. 5) The assumption of normality is necessary for conducting the statistical tests of significance of the parameter estimates and for constructing confidence intervals. If this assumption is violated, the estimates of β and β are still unbiased and best, but we cannot assess their statistical reliability by the classical test of significance, because the latter are based on normal distribution. 3.3 Properties of OLS Estimators under the Normality Assumption With the assumptions of normality the OLS estimators have the following properties 1. They are unbiased. Mathematical Economics Page 38 School of Distance Education 2. They have the minimum variance. Combined with property 1, this means that they are minimum-variance unbiased or efficient estimators. 3. As the sample size increases indefinitely, the estimators converge to their population values. That is, they are consistent. 4. β is normally distributed with E β =β β ~N(β , σ var β =σ = ∑X σ n∑x or more compactly, ) (3.6) Then by the properties of normal distribution, the variable Z, which is defined as Z = follows the standardised normal distribution, that is normal distribution with zero mean and unit variance or Z~N(0,1) 5. β is normally distributed with E β =β β ~N(β , σ 6. And Z = ( ) var β =σ = σ ∑x or more compactly, (3.7) follows the standardised normal distribution. ) = is distributed as X2 (chi- square) distribution with n-2 degrees of freedom. 7. β , β are distributed independently of . 8. β and β have the minimum variance in the entire class of unbiased estimators, whether linear or not. Therefore, we can say that the least squares estimates are best unbiased estimators (BUE). 3.4 The Method of Maximum Likelihood Like the OLS, the method of Maximum Likelihood (ML) is a method for obtaining estimates of the parameters of population from the random sample. The method was developed by R A Fisher and is an important procedure of estimation in econometrics. In the ML, we take a fixed random sample. This sample might have been generated by many different normal populations, each having its own parameters of mean and variance. Which of these possible alternative populations is most probable to have given rise to the observed n sample values? To answer this question we must estimate the joint probability of obtaining all the n values for each possible normal population. Then choose the population whose parameters maximise the joint probability of observed sample values. The ML method chooses among all possible estimates of the parameters, those values, which make the probability of obtaining the observed sample as large as Mathematical Economics Page 39 School of Distance Education possible. The function which defines the joint (total) probability of any sample being observed is called the likelihood function of the variable X. The general expression of the likelihood function is L(X , X , … , X ; θ , θ , … θ ) Where θ , θ , … θ denote the parameters of the function which we want to estimate. In the case of normal distribution of X, the likelihood function in its general form is L(X , X , … , X ; μ, σ ) The ML method consists of maximisation of the likelihood function. Following the general condition of maximisation, the maximum value of the function is that value where the first derivative of the function with respect to its parameters is equal to zero. The estimated value of the parameters are the maximum likelihood estimates of population parameters. The various stages of ML method are outlined below. 1) Form the likelihood function, which gives the total probability of the particular sample values being observed. 2) Takes the partial derivatives of the likelihood function with respect to the parameters which we want to estimate and set them equal to zero. 3) Solve the equations of the partial derivatives for the unknown parameters to obtain their maximum likelihood estimates. 3.5 Maximum Likelihood Estimation of Two Variable Regression Model We already established that in the two variable regression model, Yi = β1+β2Xi + ui The Yi are normally distributed with mean= β 1+β2Xi and the variance σ2. As a result, the joint probability density function, given the man and the variance, can be written as f (Y1, Y2,...Yn/ β1+β2Xi, σ2) as Y’s are independent, this probability density function can be written as a product of n individual P D F’s as f (Y1, Y2,...Yn/ β1+β2Xi, σ2) = f (Y1 / β1+β2Xi, σ2) f (Y2 / β1+β2Xi, σ2)..... f (Yn / β1+β2Xi, σ2) (3.8) where , f (Yi) = f(Y ) = √ e (3.9) which is the density function of a normally distributed variance with given mean and variance. Substituting equation (3.9) in (3.8), we get Mathematical Economics Page 40 School of Distance Education f (Y1, Y2,...Yn/ β1+β2Xi, σ2) = ( ) e ∑ (3.10) if Y1, Y2...Yn are known or given, but β1,β2and σ2 are not known, the function in (3.10) is called a likelihood function denoted by LF (β1,β2, σ2) and written as, LF (β1,β2, σ2)= ( ) e ∑ (3.11) In the method of maximum likelihood, we estimate the unknown parameters in such a manner that the probability of observing the given Y’s is as high as possible. Therefore, we have to find the maximum of equation (3.11). This is a straight forward exercise of differential calculus as shown below. For differentiation, it is easier to express equation (3.11) in log form as, Log LF = −nlogσ − Y − β − β X σ n 1 log(2π) − 2 2 Y − β − β X σ n n 1 Log LF = − logσ − log(2π) − 2 2 2 Or n n 1 Log LF = − logσ − log(2π) − 2 2 2σ Y − β − β X Differentiating with respect to β and setting equal to zero, dLog LF 1 →− 2σ dβ which is the same as the Similarly, dLog LF 1 →− 2σ dβ 2 Y − β − β X Y − β − β X Y = nβ + β X normal equation of the least square theory. 2 Y − β − β X YX = β Mathematical Economics =0 (−1) = 0 X +β X (−X ) = 0 Page 41 School of Distance Education Which is same as the second normal equation of the least squares theory. Therefore, the ML estimators, β′s are the same as OLS estimators β′s. 3.6 Two variable Regression Model: Interval Estimation and Hypothesis Testing 3.6.1 Interval Estimation: Some Basic Ideas In statistics, the reliability of point estimator is measured by its standard error. Therefore, instead of relying on the point estimate alone, we may construct an interval around the point estimator, say within two or more standard errors on either side of the point estimator, such that this interval has, say 95 % probability of including the true parameter value. This is roughly the idea behind interval estimator. Assume that we want to know how close is β to β . For this purpose, we try to find out two positive numbers, θ and α, the latter lying between 0 and 1, such that the probability that the random interval (β − θ, β + θ ) contain the true β is 1-α. Symbolically, Pr (β − θ ≤ β ≤ β + θ = 1 − α (3.12) Such interval is known as a confidence interval. 1-α is known as confidence coefficient and α (0< α<1) is known as the significance level. It is also probability of committing Type I error. A Type I error consist in rejecting the true hypotheses, where as Type II error consist in accepting the false hypotheses. In other words, whenever we happen to incorrectly reject the null hypothesis, we make error of Type I and whenever we happen to incorrectly accept the null hypothesis, we make error of Type II. As a simile, Type I error is ‘convicting an innocent’, while that of Type II error is ‘letting go a guilty’. Thus, the significance level (α) is the probability of making the wrong decision. That is, the probability of rejecting the hypothesis when it is actually true or the probability of committing Type I error. We choose a level of significance for deciding whether to accept or reject our hypothesis. The probability of Type II error may be denoted as β. Then Pr(1-β)= P, is known as the power of the test, that is probability of rejecting a false hypothesis. “Significance level minimum and power maximum” is the procedure usually followed. The end point of the confidence interval are known as confidence limits. The equation (3.12) shows that an interval estimator, in contrast to the point estimator, is an interval constructed in such a manner that it has a specified probability 1 − α of including within its limits the true value of the parameter. For example, if α= 0.05 or 5%, then the probability that the random interval includes the true β is 0.95 or 95%. Thus, the interval estimator gives a range of values within which the true β may lie. Generally in practice a level of significance (α) of 0.05 or 0.01 is customary. Mathematical Economics Page 42 School of Distance Education 3.6.2 Confidence Interval for Regression Coefficients (1) If is known β ~N(β , σ Z= P Z > Z∝/ P Z < −Z∝/ = ) ∝ = (3.13) (3.14) ∝ That is, P −Z∝ < (3.15) < Z∝ = 1 − α (3.16) Substituting equation (3.13) in (3.16), we get P −Z∝ < < Z∝ = 1 − α (3.17) To remove σ from the denominator, we multiply all the elements by σ subtracting β from each element, we will get − ∝ σ − β < −β < ∝ Rearranging, we will have [β − ∝ σ <β <β + Therefore, [β − Substituting σ β − (2) If If ∝ ∑ ∑ ∝ σ ∝ σ ,β + −β =1− σ ]=1− ∝ and (3.18) (3.19) σ ] is the 100(1-α) % confidence interval for β . as it is known, we get, , β + is unknown ∝ ∑ ∑ (3.20) ~ (0, 1) and Z ~X with n degrees of freedom, then Z Z n Mathematical Economics ~t distribution with (n − 2)degree of freedom Page 43 School of Distance Education ∑ ∑ = That is, (3.21) = (3.22) ∑ ∑ or t = (3.23) We proceed on the similar way as we did earlier, we have − ∝ < < ∝ (3.24) =1− Substituting equation (3.23) in (3.24), we get P −t ∝ < < t∝ = 1 − α (3.25) Multiplying all the elements with seβ and subtracting β from (3.25), we have P −t ∝ seβ − β < −β < t ∝ seβ − β Rearranging, we will have =1−α P[β − t ∝ seβ < β < β + t ∝ seβ ] = 1 − α Therefore, β − ∝ interval for β . seβ , β + ∝ seβ or β ± (3.26) (3.27) ∝ seβ is the 100(1-α) % confidence By following the same procedure, we can get the confidence interval for β also. P[β − t ∝ seβ < β < β + t ∝ seβ ] = 1 − α Therefore, β − interval for β . ∝ seβ , β + ∝ seβ or β ± (3.28) ∝ seβ is the 100(1-α) % confidence An important feature of the confidence interval given in (3.27) and (3.28) may be noted. In both the cases, the width of the confidence interval is proportional to the standard error of the estimator. That is, the larger the standard error, the larger Mathematical Economics Page 44 School of Distance Education is the width of the confidence interval. Put differently, the larger is the standard error of the estimator, the greater is the uncertainty of estimating the true value of the unknown parameter. Thus, the standard error of an estimator is often described as a measure of the precision of the estimator. That is, how precisely the estimator measures the true population value. 3.6.3 Confidence Interval for σ 2 As pointed out in the properties of OLS estimators under the normality assumption (Property 6), the variable X = (n − 2)σ ~X with n − 2 degrees of freedom σ Therefore, we can use X2 distribution to establish confidence interval σ . Pr X ∝/ ≤X ≤X (3.29) = 1−∝ ∝/ Substituting for X2, we have Pr X ∝/ ≤ ( ) ≤X ∝/ = 1−∝ (3.30) Taking the reciprocal of equation (3.30) and rearranging we have Pr ∝/ ≤ ( ≤ ) ∝/ = 1−∝ (3.31) Multiplying all the elements of equation (3.31) by (n − 2)σ we will get, Pr ( ) ∝/ Therefore, ≤σ ≤ ( ) ∝/ ( , ( ) ∝/ ) ∝/ 3.6.4 Hypothesis Testing = 1−∝ (3.31) gives the 100(1-∝) % confidence interval for σ Very often in practice, we have to make decisions about population on the basis of sample information. In attempting to arrive at decisions, we make assumptions about population. Such assumptions about populations are called statistical hypothesis and in general, are assumptions about the form of distribution of the population or the values of the parameter involved. If a hypothesis determines the population completely, that is, if it specifies the form of the distribution of the population and the values of all parameters involved, it is called simple hypothesis; otherwise it is called composite hypotheses. Testing of hypothesis is a procedure by which we accept or reject a hypothesis on the basis of sample taken from population. Mathematical Economics Page 45 School of Distance Education The hypothesis which is tested for possible rejection under the assumption that it is true is called null hypothesis and is denoted by H o. Rejection of Ho naturally results in acceptance of some other hypothesis which is called alternative hypothesis and is denoted by H1. The testable hypothesis is called the null hypothesis. The term ‘null’ refers to the idea that there is no difference between the true value and the value we hypothesise. Since null hypothesis is a testable hypothesis there must also exists a counter proposition to it in order to test the hypothesised proposition. This counter proposition is called alternative hypothesis. In other words, the stated hypothesis is known null hypothesis. The null hypothesis is usually tested against alternative hypothesis which is also known as maintained hypothesis. The theory of hypothesis testing is concerned with developing rules or procedures for deciding whether to reject or not reject the null hypothesis. There are two mutually complementary approaches for devising such rules, namely confidence interval and test of significance. Both these approaches predicate that the variable (statistic or estimator) under consideration has some probability distribution and that hypothesis tasting involves making statement or assertions about the values of the parameters of such distribution. In the confidence interval approach of hypothesis testing we construct a 100(1-α) % confidence interval for the estimator, say β2. If β2 under H0 falls within this confidence interval, we accept H0. But if it falls outside this interval we reject H0. When we reject the null hypothesis, we say that our finding is statistically significant. On the other hand, when we do not reject the null hypothesis, we say that our finding is not statistically significant. An alternative but complementary approach to the confidence interval method of testing statistical hypothesis is the test of significance approach developed by R A Fisher, Neyman and Pearson. Broadly speaking, a test of significance is a procedure by which sample results are used to verify the truth or falsity of a null hypothesis. The key idea of test of significance is that of a test statistic (estimator) and the sampling distribution of such a statistic under the null hypothesis. The decision to accept or reject H0 is made on the basis of the value of the test statistic obtained from the data at hand. In the language of significance tests, a statistic is said to be statistically significant if the value of the test statistic lies in the regions of rejection (H0) or the critical region. In this case, the null hypothesis is rejected. By the same token, a test is said to be statistically insignificant if the value of the test statistic lies in the region of acceptance (of the null hypothesis). In this situation, the null hypothesis is not rejected. Thus, the first step in hypothesis testing is that of formulation of the null hypothesis and its alternative. The next step consists of devising a criterion of test that would enable us to decide whether the null hypothesis is to be rejected or not. For this purpose the whole set of values of the population is divided into two regions, namely the acceptance region and the rejection region. The acceptance region includes the values of the population which have a high probability of being observed and the rejection region or critical region includes those values which are highly unlikely to be observed. Then the test is performed with reference to test Mathematical Economics Page 46 School of Distance Education statistic. The empirical tests that are used for testing the hypothesis are called tests of significance. If the value of the test statistic falls in the critical region, the null hypothesis is rejected; while if the value of test statistic falls in the acceptance region, the null hypothesis is not rejected. The following tables summarises the test of significance approach to hypothesis testing. (1) Normal Test of Significance- σ is known Suppose β ~N(β , σ Then test statistic used is ) Z= Decision Rules – Z Test Type of Hypothesis Two Tail Right Tail Left Tail H0: The Null Hypothesis β =β ∗ β ≥β ∗ β ≤β H1: The Alternative Hypothesis ∗ β ≠β ∗ β <β ∗ β >β ∗ Critical Region |Z| > Z∝/ Z > Z∝ Z < −Z∝ Here, β ∗ is the hypothesised numerical value of β . |Z| means absolute value of Z. Z∝ or Z∝/ means the critical Z value at the α or α/2 level of significance. The same procedure holds to test hypothesis about β . (2) t-Test of Significance- σ is unknown When σ is unknown, we use t- distribution we use t distribution as the test statistic. Under Normality assumption the variable, t= t= Mathematical Economics β −β seβ β −β ∑x σ Page 47 School of Distance Education Decision Rules – t Test Type of Hypothesis H0: The Null Hypothesis Two Tail Right Tail β =β ∗ β ≥β ∗ β ≤β Left Tail H1: The Alternative Hypothesis ∗ β ≠β ∗ β <β ∗ β >β Critical Region |t| > t ∝/ ∗ t > t∝ t < −t ∝ The same procedure holds to test hypothesis about β . (3) Testing the Significance of σ 2: the Chi- Square test For testing the significance of σ 2, as noted earlier, we use the chi square test. X = (n − 2)σ ~X with n − 2 degrees of freedom σ H0: The Null Hypothesis H1: The Alternative Hypothesis σ =σ σ ≠σ X >X σ ≥σ σ <σ X <X σ ≤σ σ Critical Region σ >σ X >X ∝/ ∝ ∝ is the value of σ under the null hypothesis. 3.7 Regression Analysis and Analysis of Variance In this section, we analyse regression analysis from the point of view of the analysis of variance and try to develop a complementary way of looking at the statistical inference problem. We have developed in the previous module, y = y + u =β x + u That is, TSS= ESS+RSS. In other words, the total sum of squares composed of explained sum of squares and the residual sum squares. A study of these components of TSS is known as the analysis of variance (ANOVA) from the regression view point. ANOVA is a statistical method developed by R A Fisher for the analysis of experimental data. Mathematical Economics Page 48 School of Distance Education Associated with any sum of squares is its degree of freedom (df), that is, the number of independent observations on which it s based. TSS has n-1 df because we lose 1 df in computing the sample mean . RSS has n-2 df and ESS has 1 df which follows from the fact that = β ∑ x is a function of β only as ∑ x is known . Both case is true only in two variable regression model. The following table presents the various sum of squares and their associated df which is the standard form of the AOV table, sometimes also called the ANOVA table. Source of variation Sum of Squares (SS) Due t0 regression (ESS) y =β Due to residuals (RSS) u TSS Degree of Freedom 1 x n-2 n-1 y Mean Sum of Squares (MSS) β x ∑u = −2 In table the MSS obtained by dividing SS by their df. From the table let us consider, F= = = ∑ β ∑x ∑u −2 (3.32) If we assume that the disturbances u i are normally distributed and H 0:β2=0, it can be shown that the F of equation (3.32) follows the F distribution with 1 and n-2 df. 3.8 Application of Regression Analysis: The Problem of Prediction On the basis of sample data, we obtained the following sample regression Y =β + β X Where Y is the estimator of true E(Y i). We want use it to predict or forecast Y corresponding to some given level of X. There are two kinds of predictions, namely 1) Prediction of the conditional mean value of Y corresponding to chosen X, say X0. That is, the point on the population regression line itself. This prediction is known as mean prediction. 2) Prediction of an individual Y value corresponding to X 0, which is known as individual prediction. Mathematical Economics Page 49 School of Distance Education Mean Prediction Given Xi= X0, the mean prediction E(Y0/X0) is given by E(Y0/ X0) = β1+β2Xi (3.33) We estimate from equation (3.33) (3.34) Y =β + β X Taking the expectation of equation (3.34) given X 0, we get Y E Y (3.35) = (β ) + ( β )X because β and β are unbiased estimators. =β +β X That is, E Y = E(Y / X ) = β + β X (3.36) That is Y is an unbiased predictor of E(Y / X ) var Y = var(β + β X ) var Y = var β (3.37) Now using the property that var (a+b)=var (a) +var (b)+ 2 cov (a,b), we obtain + var β X (3.38) + 2 cov(β , β )X Using the formulas for variances and covariances of β and β we get, var Y =σ 1 X + n ∑x +X σ + 2X ∑x Rearranging and manipulating the terms we obtain, var Y =σ + ( ∑ Xσ ∑x ) (3.39) by replacing the unknown σ by its estimator σ , it follows that the variable, t= ( ( ) ) (3.40) Follows t distribution with n-2 df. Therefore t distribution can be used to derive confidence intervals for the true E(Y / X ) and test hypotheses about it in the usual manner. That is, P[β + β X − t ∝ se(Y ) < β + β X < β + β X + t ∝ se(Y )] = 1 − α Where se Y (3.41) is obtained from equation (3.39) Mathematical Economics Page 50 School of Distance Education Individual Prediction We want to predict an individual Y corresponding to a given X value, say X . That is we want to obtain, Y0 = β1+β2Xi+ u0 (3.42) We predict this as Y = β + β X . The prediction error Y − Y is Y −Y = β +β X +u − β + β X That is, Y − Y = (β − β ) + β − β X + u (3.43) Taking the expectations on both sides of equation (3.43), we have E(Y − Y ) = E(β − β ) + E β − β X − E (u ) since β and β are unbiased, X is a var Y =E Y −Y var Y = var β var Y =σ 1+ + E Y −Y =0 number and = E (β − β ) + β − β X + u var Y = E (β − β ) + X β − β 2β2−β2u0 +u (u ) = 0 by assumption. (3.44) + 2(β − β )X β − β (3.45) + X var (β )+ var (u )+2X cov(β , β ) + var(u ) + 2(β − β )u + (3.46) Using the variance and covariance formula for β and β and noting that var(u ) = σ , and slightly rearranging the equation (3.46), we have ( ∑ ) (3.47) further , it can be shown that Y follows the normal distribution. Substituting σ for the unknown σ , it follows that, t= ( ) (3.48) Also follows the t distribution. Therefore, t distribution can be used to draw inferences about the true Y . Mathematical Economics Page 51 School of Distance Education MODULE IV EXTENSION OF TWO VARIABLE REGRESSION MODEL 4.1 Introduction Some aspects of linear regression analysis can be easily introduced within the frame work of the two variable linear regression models that we have been discussing so far. First we consider the case of regression through the origin, ie, a situation where the intercept term, β 1, is absent from the model. Then we consider the question of the functional form of the linear regression model. Here we consider the models that are linear in parameters but not in variables. Finally we consider the question of unit of measurement, i.e, how the X and Y variables are measured and whether a change in the units of measurement affects the regression results. 4.2 Regression through origin There are occasions when two variables PRF assume the following form: ` Y i = β2 X i + u i (4.1) In this model the intercept term is absent or zero, hence regression through origin. How do we estimate models like (4.1) and what special problems do they pose? To answer these questions, let us first write SRF of (4.1) namely: Y = β X + u (4.2) ∑ u = ∑(Y − β Xi) (4.3) Now applying the ordinary least square (OLS) method to (4.2), we obtain the following formulae for the β , and its variance. We want to minimize With respect to β . Differentiating (4.5) with respect to β , we obtain ∑ = 2∑ Y − β Xi (−Xi) (4.4) Setting (4.4) equal to zero and simplifying, we get β = ∑ β = ∑ (4.5) ∑ Now substituting the PRF: Yi = β2Xi + ui in to this equation, we obtain Mathematical Economics ( ∑ ) (4.6) Page 52 School of Distance Education = β2 + Note: E (β ) = β2. Therefore, ∑ ∑ ∑ E ( β -β2)2 = E (4.7) ∑ Expand the right hand side of (4.7) and noting that the Xi is nonstochastic and the ui are homoscedastic and uncorrelated, we obtain: Var (β ) = E (β -β2)2 = Where σ is estimated by 2 = (4.8) ∑ ∑ (4.9) It is interesting to compare these formulas with those obtained when the intercept term is included in the model. β = Var (β ) = 2 = ∑ ∑ ∑ ∑ (4.10) (4.11) (4.12) The difference between two sets of formulae should be obvious. In the model with intercept term is absent, we use raw sums of squares and cross product but in the intercept present model, we use adjusted (from mean) sum of squares and cross products. Second, the degrees of freedom for computing 2 is (n-1) in the first case and (n-2) in the second case. Although the zero intercept models may be appropriate on occasions, there are some features of this model that need to be noted. First, ∑ which is always zero for the model with the intercept term need not be zero when that term is absent. In short ∑ need not be zero for the regression through the origin. Suppose we want to impose conditions that ∑ ∑Y = β ∑X + ∑u This expression then gives Mathematical Economics = β ∑X = 0. In that case we have (4.13) Page 53 School of Distance Education ∑ = β =∑ (4.14) But this estimator is not the same as equation (4.5). And since β of (4.5) is unbiased, the β of (4.14) is unbiased. Incidentally note from (4.4), we get after equating it to zero. ∑ =0 (4.15) The upshot is that, in regression through origin, we can’t have both ∑ u Xi and ∑ equal to zero. The only condition that is satisfied is that ∑ u Xi = 0. Recall Yi = Y + u (4.16) Summing this equation on both sides and dividing by n, we get =Y +u Since for the zero intercept model, ∑ (4.17) and u need not be zero then it follows that (4.18) Y=Y That is the mean of actual Y values need not be equal to the mean of the estimated Y values; the two mean values are identical for the intercept present model. Second, r2, the coefficient of determination which is always non negative for the conventional model, can on occasions turn out to be negative for the intercept less model. Therefore, conventionally, computed r 2 may not be appropriate for regression through origin model. r2 =1− = 1− ∑ (4.19) ∑ Note, for conventional or intercept present model, RSS = ∑ Unless be negative. =∑ − ∑ ≤∑ is zero. That is for conventional model, RSS ≤ TSS or r 2 can never For the zero intercept models, it can be shown analogously that, RSS= ∑ u = ∑ Y − β ∑ X (4.20) Now there is no guarantee that this RSS will always be less than TSS which suggests that RSS can be greater than TSS, implying that r 2 as conventionally defined can be negative. The conventional r 2 is not appropriate for regression Mathematical Economics Page 54 School of Distance Education through origin model. But we can compute what is known as the raw r 2 for such models which is defined as Raw r2 = (∑ ∑ ∑ ) (4.21) Although the r2 satisfies the relation 0< r2< 1, it is not directly comparable to the conventional r2 value. Because of these special features of this model, one needs to exercise great caution in using the zero intercept models. Unless there is strong apriori expectation, one would be well advised to seek to the conventional intercept present model. 4.3 Functional forms of regression models So far we have considered models that are linear in parameters as well as in the variables. Here we consider some commonly used regression models that may be nonlinear in the variables but are linear in the parameters or that can be made so by suitable transformation of the variables. In particular we discuss the following regression models 1. Log – linear model 2. Semi log models 3. Reciprocal models 4.4 How to measure elasticity: the log linear model Consider the following model, known as exponential regression model: Yi = β1 Xiβ2 eui (4.23) Which may be expressed alternatively as ln Yi = ln β1 + β2 lnXi +ui (4.24) Where ln = natural log, ie, log to the base e, and where e= 2.718. If we write equation (4.24) as; ln Yi = ∝ + β2 ln Xi +ui (4.25) Where ∝ = ln β1, this model is linear in parameters ∝ and β, linear in the logarithms of the variables Y and X and can be estimated by OLS regression. Because of this linearity, such models are called log-log, double log or log linear models. If assumptions of the classical linear regression models are fulfilled, the parameters of equation (4.25) can be estimated by OLS method by letting Yi* = ∝ + β2 Xi* +ui Mathematical Economics (4.26) Page 55 School of Distance Education Where Yi* = ln Yi, Xi* = ln Xi. The OLS estimator α and β unbiased estimator of ∝ and β2 respectively. obtained will be best linear One attractive feature of the log- log model, which has made it popular in applied work, is that the slope co-efficient β2 measures the elasticity of Y with respect to X, that is the percentage change in Y for given small percentage change in X. Thus if Y represents the quantity of a commodity demanded and X its unit price, β2 measures the price elasticity of demand. In the two variable models, the simplest way to decide whether the log linear model fit the data is to plot the scatter diagram of ln Y i against Xi and see the scatter plots lie approximately on a straight line. 4.5 Semi log models: Log Lin and Lin Log models: 4.5.1 How to measure the growth rate: the Log Lin model Economists, business people and governments are often interested in finding out the rate of growth of certain economic variables such as population GDP, money supply, employment etc. Suppose we want to find out the growth rate of personal consumption expenditure on services. Let Y t denote real expenditure on services at time t and Y 0 the initial value of the expenditure on services. We may recall the following wellknown compound interest formula given as Yt = Y0 ( 1+r)t (4.27) Where r is the compound that is overtime rate of growth of Y. taking the natural logarithm of equation (4.27), we can write ln Yi = ln Y0 + t ln(1+r) (4.28) Now letting β1 = ln Y0 (4.29) β2 = ln (1+r) (4.30) We can write equation (4.28) as ln Yi = β1 + β2t (4.31) Adding the disturbance term to equation (4.31), we obtain ln Yt = β1 + β2t +ut (4.32) This model is like any other regression model in that the parameters β1 and β2 are linear. The only difference is that the regressand is the logarithm of Y and the regressor is ‘time’ which will take values of 1, 2, 3 etc. Mathematical Economics Page 56 School of Distance Education Models like (4.31) are called semi log models because only one variable (in the case of regressand) appears in the logarithmic form. For descriptive purposes a model in which the regressand is logarithmic will be called a log lin model. A model in which the regressand is linear but the regressor is logarithmic is called a lin-log model. Let us briefly examine the properties of the model. In this model the slope coefficient measures the constant proportional or relative change in Y for a given absolute change in the value of the regressor (in the case of variable t) that is, β2 = (4.33) If we multiply the relative change in Y by 100, equation (4.33) will then give the percentage change or the growth rate, in Y for an absolute change in x, the regressor. That is, 100 times β 2 give the growth rate in Y; 100 times β 2 is known in the literature as semi elasticity of Y with repeat of X. The slope coefficient of the growth model, β 2 gives the instantaneous (at a point in time) rate of growth and not the compound (over a period of time) rate of growth. But the latter can be easily found from (4.32) by taking the antilog the estimated β2 and subtracting 1 from it and multiplying the difference by 100. Linear trend model: instead of estimating model (4.32), researchers sometimes estimate the following model: Yt = β1 + β2t +ut (4.34) That is instead of regressing the log of Y on time, they regress Y on time, where Y is the regressand under consideration. Such a model is called a linear trend model and the time variable t is known as the trend variable. If the slope coefficient is positive, there is an upward trend in Y, whereas if, it is negative there is a downward trend in. 4.5.2. The Lin – Log model: Unlike the growth model just discussed, in which we were interested in finding the per cent growth in Y for an absolute change in X, suppose we now want to find the absolute change in Y for the present change in X. A model that can accomplish this purpose can be written as Yi = β1 + β2ln Xi + ui (4.35) For descriptive purposes we call such a model as a lin log model. Let us interpret the slope of the coefficient. As usual β2 = Mathematical Economics Page 57 School of Distance Education = The relative step follows from the fact that a change the log of a number is a relative change. Symbolically we have, β2 = So that ∆Y = β 2 (∆X/X) ∆ ∆ / (4.36) This equation states that absolute change in Y (= ∆Y) is equal to slope times the relative change in X. If the latter is multiplied by 100, then (4.36) gives the absolute change in Y for a percentage change in X. Thus if (∆X/X) changes by 0.01 unit (or 1%), the absolute change in Y is 0.01 (β 2): if in an application one finds that β2 = 500, the absolute change in Y is (0.01) (500) = 5.0. Therefor when regressions like (1) is estimated by OLS, do not forget to multiply by the value of estimated slope coefficient by 0.01. 4.6 Reciprocal models Models of the following types are known as the reciprocal models. Yi = β1 + β2( )+ui (4.37) Although this model is non-linear in the variable X, because it enters inversely or reciprocally, the model is linear in β 1 and β2 and therefore a linear regression model. This model has three features: As X increases indefinitely; the term β2 ( ) approaches zero (note β2 is a constant) and Y approaches the limiting or asymptote value β1. Therefore the models like (4.37) have built in term an asymptote or limit value that the dependent variable will take when the value of the X variable increases indefinitely. We conclude our discussion of reciprocal models by considering the logarithmic reciprocal model, which takes the following form; Yi = β1 − β2( )+ui (4.38) Such a model may therefore be appropriate for shot run production functions. 4.7 Scaling and unit of measurement Here we consider, how the Y and X variables are measured and whether a change in the unit of measurement affects the regression results. Let, Mathematical Economics Y =β + β X + u (4.39) Page 58 School of Distance Education Define Y i* = w 1 Y i (4.40) X i* = w 2 X i (4.41) Where w1 and w2 are constants, called the scale factors; w 1 may be equal to w2 or may be different. From (4.40) and (4.41) it is clear that Y i* and Xi* are rescaled Yi and Xi. Thus if, Yi and Xi measured in billions of dollars and one want to express them in millions of dollars, we will have Y i* = 1000Yi and Xi*=1000 Xi; here w1 = w2 = 1000. Now consider the egression using Y i* and Xi* variables: Yi= β *+ β *Xi + u * (4.42) Where Yi* = w1Yi, Xi* = w1 Xi, and u * = wi u (why?) We want to find out the relationship between the following pairs: 1. 2. 3. 4. 5. 6. β * and β * β * and β * Var( β *) and Var(β *) Var( β *) and Var(β *) 2 and σ*2 r2xy and r2x*y* From least square theory, we know that β =Y−β X β = var β ∑x y ∑x =σ ∑ σ ∑x var (β ) = 2 = ∑ ∑ Applying OLS to (4.42), we obtain similarly, β Mathematical Economics β * * = = ∑ −β X ∗∑ ∑ ∗ ∗ Page 59 School of Distance Education Var β Var β * ∑ =σ ∗ * =∑ *2 ∗ ∗ = ∑ ∗ ∗ ∑ ∗ From these results it is easy to establish relationships between the two sets of parameter estimates. All that one has to do is recall these definitional relationships: Y i* = w 1 Y i (or yi* = w1yi); Xi* = w2Xi (xi* = w2xi); u * = wi u ; *=w1 and * = w2 . Making use of these definitions, the reader can easily verify that β β * * *2 Var β * = = w1 β = w12 = (4.44) (4.45) 2 = w12 Var β Var β * = 2xy (4.43) β 2x*y* Var β (4.46) (4.47) (4.48) From the preceding results, it should be clear that, given the regression results based on one scale of measurement, one scale of measurement, one can derive the results based on another scale of measurement once the scaling factors, the w’s are known. In practice, though, one should choose the units of measurement sensibly; there is little point in carrying all these zeros in expressing numbers in millions or billions of dollars. From the results given in (4.43) though (4.48) one can easily derive some special cases. For instance, if w1= w2, that is, the scaling factors are identical, the slope coefficient and its standard error remain unaffected in going from the (X i Yi) to the (Xi*, Yi*) scale, which should be intuitively clear. However the intercept and its standard error are both multiplied by w 1. But if X scale is not changed, (i.e. w 2 = 1), and Y scale is changed by the factor w 1, the slope as well as the intercept coefficients and their respective standard errors are all multiplied by the same w1 factor. Finally if Y scale remains unchanged, (i.e. w 1=1), but the X scale is changed by the factor w2, the slope coefficient and its standard error are multiplied by the factor (1/w2) but the intercept coefficient and its standard error remain unaffected. It should be noted that the transformation from (X i Yi) to the ( Xi*, Yi* ) scale does not affect the properties of OLS estimators. Mathematical Economics Page 60 School of Distance Education 4.7 Regression on standardised variable A variable is said to be standardised if we subtract mean value of the variable from its individual values and divide the difference by the standard deviation of that variable. Thus in the regression Y and X, if we redefine these variables as Y i* = (4.49) X i* = (4.50) Where = sample mean of Y, = standard deviation of Y, = sample mean of X, = standard deviation of X. the variables X i* and Yi* are called standardised variable. An interesting property of a standardised variable is that its mean value is always zero and its standard deviation is always one. As a result it is not matter in what unit the regressand and regressor are measured. Therefore instead of running the standard bivariate regression: ` Y i = β1 + β2 X i + u i (4.51) We would run regression on the standardised variable as Yi* = β *+ β *Xi + * = β *Xi + u * (4.52) (4.53) Since it is easy to show that in the regression involving standardised regressand and regressor, the intercept term is always zero. The regression coefficient of the standardised variables denoted by β * and β *, are known as the beta coefficients. Incidentally, notice that (4.53) is a regression through origin model. How do we interpret beta coefficients? The interpretation is that if the standardised regressor increases by one standard deviation, on average, the standardised regressand increases by β * standardised units. Thus unlike the traditional model we measure the effect not in terms of the original units in which Y and X are expressed, but in standard deviation units. References 1) Gujarati D M, Sangeetha : “Basic Econometrics”. Fourth Edition. Tata McGraw Hill Education Pvt Ltd, New Delhi. 2) Koutsoyiannis A : “ Theory of Econometrics”. Second Edition McMillan Press Ltd. London. 3) Shyamala S, Navadeep Kaur and Arul Pragasam : “A text Book on Econometrics- Theory and Applications”. Vishal Publishing Co. Delhi. 4) Gregory C Chow: “Econometrics”. McGraw Hill Book Co. Singapore 5) Madnani G M K : “Introduction to Econometrics- Principles and Applications”. Oxford and IBH Publishing Co. New Delhi. Mathematical Economics Page 61