Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A CONCISE GUIDE TO THE SAS STATISTICAL PACKAGE VERSION 9.3 and 9.4 Professor Thornton Economics 415/514 Econometrics 1 INTRODUCTION This guide provides an overview of the SAS statistical package, version 9.4, and an explanation of a number of useful SAS commands and capabilities. It does not explain all SAS commands and capabilities. SAS is an extremely powerful statistical package, and if you desire to learn more about what it can do you should consult the appropriate SAS Users Manual or purchase one of the many SAS companion books available in bookstores that provide a more detailed explanation about various facets of the SAS system. DATA SETS In this guide, SAS commands are explained in the context of examples. The examples are based on the following three data sets. It is assumed that each data set is contained in a file on a memory stick in drive E. If your data files are on drive C, or on a memory stick located in a different drive such as drive F, modify the examples below accordingly (e.g., replace the letter E with the letter C or F). Using a memory stick is recommended. WAGEDATA The data file WAGEDATA consists of a cross-section of 49 workers. The variables are WAGE = monthly wage, EDUC = years of education beyond the eighth grade, EXPER = years of experience, AGE = age of worker, GENDER = indicator variable for gender (1 if male, 0 if female), RACE = indicator variable for race (1 if white, 0 if nonwhite), CLERICAL = indicator variable for clerical worker (1 if clerical worker, 0 otherwise), MAINT = indicator variable for maintenance worker (1 if maintenance worker, 0 otherwise), CRAFTS = indicator variable for crafts worker (1 if crafts worker, 0 otherwise). CPS85 The data file CPS85 consists of 526 randomly selected employed workers from the May 1985 current population survey conducted by the Department of Commerce. This is a survey of over 50,000 households conducted monthly, and it serves as the basis for the national employment and unemployment statistics. The variables are: ED = years of education, SOUTH = dummy variable (1 if worker lives in south, 0 otherwise), NONWH = dummy variable (1 if worker is nonwhite, zero otherwise), HISP = dummy variable (1 if worker is Hispanic, 0 otherwise), FE = dummy variable (1 if worker is female, 0 otherwise), MARR = dummy variable (1 if worker is married with spouse present in household, 0 otherwise), MARRFE = dummy variable (1 if worker is married female with spouse present in household, 0 otherwise), EX = years of labor market experience, UNION = dummy variable (1 if worker has union job, 0 otherwise), WAGE = average hourly earnings in constant 2003 dollars, AGE = age in years, MANUF = dummy variable ( 1 if worker works in manufacturing industry, 0 otherwise), CONSTR = dummy variable ( 1 if worker works in construction industry, 0 otherwise), MANAG = dummy variable (1 if worker is managerial or administrative, 0 otherwise), SALES = dummy variable (1 if worker is in sales, 0 otherwise), CLER = dummy variable ( 1 if worker is clerical worker, 0 otherwise), SERV = dummy variable (1 if worker is a service worker, 0 otherwise), PROF = dummy variable (1 if worker is professional or technical, 0 otherwise), MACROCON The data file MACROCON consists of a times-series of annual data for the period 1959 to 1995. The variables are YEAR = year, CONS = annual consumption spending in billions of dollars, DISINC = annual disposable income in billions of dollars, PRICE = consumer price index, PRIME = the prime interest rate, UN = unemployment rate. 2 BACKGROUND INFORMATION SAS is a statistical software package that can be used to read, manage, analyze, and present data. SAS allows you to read data in a variety of different formats, transform the data to conduct statistical analyses, analyze the data, and present the results. A SAS program has two major components: Data Steps and Procedures. The data step allows you to read SAS data sets or raw data, perform transformations on the data, create new variables, and recode existing variables. The data step is the component of the program that creates SAS datasets. The procedure (usually referred to as PROC) allows you to analyze and present the data. Data steps and procedures are comprised of one or more statements. A statement is usually identified by a keyword that suggests the statement’s function (e.g., REG, MEANS, RUN). Every statement ends with a semicolon. EXECUTING A SAS PROGRAM A SAS program can be executed in different ways. The two most important ways are batch mode and interactive windows mode. In batch mode you use a text editor (such as Microsoft WordPad) to write a SAS program in an input file in a text document (.txt). You then tell SAS to execute the program in the input file and place the resulting output in an output file. You then use a text editor to view the output file. In interactive windows mode, you type SAS statements in a Program Editor window. When SAS statements are executed the output is displayed in an Output window. A Log window is also displayed that contains the log for any SAS statements that are executed. The log window is very useful in writing SAS programs. The log is displayed whether the program works or not. It repeats the SAS statements that are executed, documents any SAS datasets that are created, gives you warnings about potential problems with your program, and error messages for mistakes such as incorrect syntax. This guide explains how to create and execute SAS programs in interactive windows mode using the Program Editor. CREATING A SAS DATASET The first step in SAS programming is to create a SAS dataset. SAS has a large number of tools that can be used to read raw data into a SAS dataset. This process is called importing. The raw data used to create a SAS dataset can be in a number of different formats and locations. This guide explains how to import an Excel file, create a temporary SAS dataset, create a permanent SAS dataset that you can save for future use in a SAS library, and access a SAS dataset stored in the library. Example #1 The Excel File WAGEDATA has 49 observations on 9 variables. The names of the variables are WAGE, EDUC, EXPER, AGE, GENDER, RACE, CLERICAL, MAINT, CRAFTS. You want to create a temporary SAS dataset named EARNINGS, and a SAS library named ECON415. You then want to save the temporary dataset EARNINGS as a permanent SAS dataset also named EARNINGS. If you are using SAS 9.3, you can directly import an Excel file. If you are using SAS 9.4, you must first save the Excel file as a CSV (Comma delimited) file. To use the Excel program to save the Excel file as a CSV file do the following. In Excel, on the menu bar in the upper left hand corner click File. Click Save 3 As. In the Save as type box scroll down the list of file types and click on CSV (Comma delimited). Click Save. Now launch SAS. On the menu bar in the upper left hand corner click File. Click Import Data… Under Select a data source from the list below, Microsoft Excel Workbook should appear; if not, find it under the list of choices. Click Next. In the dialogue box next to Workbook, enter the name and location of the file you want to import. In this example: E:\wagedata. In the dialogue box that appears SAS asks you What table do you want to import? This is the name of the worksheet in the Excel file you are importing. In the Excel file WAGEDATA, there is only one worksheet named data. This should already appear in the box. If not select it. Click Next. In the dialogue box that appears, enter the name you want to give to the temporary SAS dataset you are creating. Enter the name earnings. Click Finish. To verify that you have successfully created a temporary SAS dataset named EARNINGS, click the explorer button on the tool bar. Click the Work icon, and the earnings icon. To create a new SAS library named ECON415, on the menu bar in the upper left hand corner click on Tools. Click on New Library. In the dialogue box next to Name, enter ECON415. In the box next to Path, enter E:\. Click OK. Click the explorer button on the tool bar. Click Work. Use the mouse to drag the file named EARNINGS from the folder named Work to the folder named Econ415. To verify that you have successfully created a permanent SAS dataset named EARNINGS, click on the folder Econ415. ACCESSING A PERMANENT SAS DATASET The following examples explain how to load a permanent SAS dataset that you have created and create new temporary or permanent SAS datasets from it. Example #2 You want to access the dataset named EARNINGS which is stored in the library named ECON415 on a disk on drive E. You want to create a temporary SAS data set named EARN1. In the Program Editor window type the following statements. LIBNAME econ415 ‘e:’; DATA earn1; SET econ415.earnings; RUN; The LIBNAME statement tells SAS the name of the library and where it is located. The DATA statement tells SAS to create a temporary SAS dataset named EARN1. The SET statement tells SAS to access the permanent SAS dataset named EARNINGS that is located in the library named ECON415. To verify that you have accessed EARINGS and created EARN1, click the Libraries icon in the Explorer window. There is now an icon for ECON415. If you click ECON415, you will see an EARNINGS icon. If you click the Work icon, you will see an icon for the temporary dataset EARN1. Note that when you end your session, the temporary dataset EARN1 will be deleted. If you want to store this new dataset permanently in the library named ECON415, then replace the DATA statement above with the following DATA statement DATA econ415.earn1; If you want to store all changes made in the current session in the permanent SAS dataset named EARNINGS, then replace the DATA statement above with the following DATA statement DATA econ415.earnings; 4 In this case, you do not create a temporary SAS dataset. Rather, SAS overwrites the permanent SAS dataset EARNINGS with any changes that you make to the data during the current session. CREATING VARIABLES, RECODING VARIABLES, DELETING OBSERVATIONS Assignment statements and logical expressions can be used for many purposes, such as creating new variables from existing variables, recoding variables, and deleting observations from the current sample. Each of these are explained below. ASSIGNMENT STATEMENTS Assignment statements allow you to create new variables from existing variables. Assignment statements use the following arithmetic operators, which are carried-out in the following order if parentheses are not used: ** (exponentiation), * (multiplication), / (division), + (addition), - (subtraction). The operator for the natural logarithm is LOG. Example #3 You want to access the dataset EARNINGS and create a temporary dataset named EARN1 that contains all the variables in EARNINGS plus additional variables that you want to create. LIBNAME econ415 ‘e:’; DATA earn1; SET econ415.earnings; logwage = log(wage); yearwage = wage*12; daywage = wage / 30; agesq = age**2; agecub = age**3; toteduc = educ + 8; RUN; SAS will create the variables logwage, yearwage, daywage, agesq, agecub, and toteduc, and place them in the temporary dataset EARN1 along with all existing variables in the dataset EARNINGS. LOGICAL EXPRESSIONS Logical expressions use conditional IF, THEN, ELSE statements, and comparison and logical operators. The comparison operators are: Equal to Greater than Less than Greater than or equal to Less than or equal to Not equal to In Notin = > < >= <= ^= eq gt lt ge le ne in not in The logical operators are: 5 And Or & | and or In the following example, a description of each logical expression and its use is given directly below the expression for ease of reference. Example #4 You want to access the dataset EARNINGS, create a temporary dataset named EARN1, and create new variables, recode existing variables, and delete observations from the sample to construct EARN1. Commands LIBNAME econ415 ‘e:’; DATA earn1; SET econ415.earnings; This accesses the permanent SAS dataset named EARNINGS from the library named ECON415, and creates the temporary SAS dataset named EARN1. IF educ > 4 THEN college = 1; ELSE college = 0; This creates a dummy variable named college that can take two values: 1 or 0. The IF THEN statement assigns a value of 1 to the variable college if the variable educ is greater than 4. The ELSE statement assigns a value of 0 to the variable college for all observations that do not have a value of one. IF age > 50 THEN newage = 2; ELSE IF age > 25 THEN newage = 1; ELSE newage = 0; This creates a multinomial variable called newage that can take three values: 2,1,or 0. The IF THEN statement assigns a value of 2 to the variable newage if the variable age is greater than 50. The ELSE IF THEN statement assigns a value of 1 to the variable newage if the variable age is greater than 25 and equal to or less than 50. The ELSE statement assigns a value of 0 to the variable newage for all observations that do not have a value of 2 or 1. Note that only one ELSE statement is allowed per IF THEN statement. IF gender = 1 THEN sex = ‘male’; ELSE sex = ‘female’; This creates a character variable named sex, that can take two names: male or female. The IF THEN statement assigns the name male to the variable sex if the variable gender is equal to 1. The ELSE statement assigns the name female to the variable sex for all observations that do not have the name male. IF wage > 1300; This keeps any observation for which the variable wage is greater than 1300. It deletes all observations for which wage is 1300 or less. IF exper = 1 THEN delete; 6 This deletes any observation for which the variable exper is equal to 1. IF exper = 3 and gender = 1 then delete; This deletes any observation for which both the variable exper is equal to 3 and the variable gender is equal to 1. If either one of these conditions is not satisfied, then the observation is not deleted. IF educ = 11 or age > = 57 then delete; This deletes any observation for which either the variable educ is equal to 11 or the variable age is greater than or equal to 57. IF wage = . THEN delete; SAS represents a missing observation with a period (.). This deletes any observation for which the variable wage has a missing value. IF age = . then age = 65; This assigns the value of 65 to the variable age for any observation that is missing. RUN; DELETING VARIABLES FROM A SAS DATASET Example #5 You want to create two new permanent SAS datasets from the permanent SAS dataset named EARNINGS. You want to name these new SAS datasets EARNSUB1 and EARNSUB2. You want EARNSUB1 to contain the variables WAGE, EDUC, EXPER, AGE. You want EARNSUB2 to contain the variables WAGE, EDUC. LIBNAME econ415 ‘e:’; DATA econ415.earnsub1; SET econ415.earnings; KEEP wage educ exper age; DATA econ415.earnsub2; SET econ415.earnsub1; KEEP wage educ; RUN; An alternative program that would accomplish the same task is to replace the KEEP statements with the following DROP statements. DROP gender race clerical maint crafts; DROP exper age; The LIBNAME statement tells SAS to access and/or store permanent SAS datasets in the library named ECON415, which is located on the disk in drive E. The first DATA statement tells SAS to create a new 7 permanent SAS dataset named EARNSUB1 and store it in the library named ECON415. The first SET statement tells SAS to access the permanent SAS dataset name EARNINGS located in the library named ECON415. The KEEP statement tells SAS to include the variables WAGE, EDUC, EXPER, AGE from the dataset EARNINGS in the dataset EARNSUB1 (or delete the variables GENDER, RACE, CLERICAL, MAINT, CRAFT from the dataset EARNINGS in the dataset EARNSUB1). Alternatively, the DROP statement tells SAS to delete the variables GENDER, RACE, CLERICAL, MAINT, CRAFT from the dataset EARNINGS in the dataset EARNSUB1(or include the variables WAGE, EDUC, EXPER, AGE from the dataset EARNINGS in the dataset EARNSUB1). The second DATA statement tells SAS to create a new permanent SAS dataset named EARNSUB2 and store it in the library named ECON415. The second SET statement tells SAS to access the permanent SAS dataset name EARNSUB1 located in the library named ECON415. The KEEP statement tells SAS to include the variables WAGE, and EDUC from the dataset EARNSUB1 in the dataset EARNSUB2. Alternatively, the DROP statement tells SAS to delete the variables EXPER and AGE from the dataset EARNSUB1 in the dataset EARNSUB2. DISPLAYING A SAS DATASET Example #6 You want to display the data in the permanent SAS dataset named EARNINGS. LIBNAME econ415 ‘e:’; DATA earn1; SET econ415.earnings; PROC PRINT data=earn1; RUN; The temporary SAS dataset EARN1 that contains the data from the permanent SAS dataset EARNINGS will be displayed in the Output Window. DESCRIBING AND ANALYZING DATA Examples #7 through #17 below involve describing and analyzing data. The data are contained in the Excel file CPS85, which is assumed to be located on a disk in drive E. Create a temporary SAS dataset named CPS85A. Save this temporary SAS dataset as a permanent SAS dataset named CPS85A in the library ECON415. FREQUENCY DISTRIBUTIONS AND SCATTER DIAGRAMS Example #7 You want to access the permanent SAS dataset named CPS85A which is stored in the library named ECON415 on drive E:. You want to display an absolute frequency distribution for the variables WAGE and ED, a relative frequency distribution for the variables WAGE and ED, and a scatter diagram for the variables WAGE and ED. LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC UNIVARIATE NOPRINT; VAR wage ed; 8 HISTOGRAM wage ed; PROC SGPLOT; SCATTER x = ed y = wage; RUN; The LIBNAME, DATA and SET statements access the permanent SAS dataset named CPS85A and create the temporary SAS dataset named CPS85B. Note that this temporary dataset will be deleted when your session ends. The PROC UNIVARIATE statement and the option NOPRINT tells SAS to obtain information required to construct a histogram and suppress the output. The VAR statement tells SAS to obtain the information for the variables WAGE and ED. The HISTOGRAM statement tells SAS to construct histograms for the variable WAGE and ED. The PROC SGPLOT statement tells SAS to construct a graph that plots data points. The SCATTER statement tells SAS to construct a scatter diagram. The x = ed tells SAS to measure the variable ED on the horizontal axis. The y = wage tells SAS to measure the variable WAGE on the vertical axis. DESCRIPTIVE STATISTICS Example #8 You want to access the permanent SAS dataset named CPS85A which is stored in the library named ECON415 on a disk in drive E. You want to calculate the mean, variance, standard deviation, and coefficient of variation for the variables WAGE, ED, EX, FE, AGE, UNION. You also want to calculate the covariances and correlation coefficients for these variables. LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC MEANS mean var std cv max min; VAR wage ed ex fe age union; PROC CORR COV; VAR wage ed ex fe age union; RUN; The LIBNAME, DATA and SET statements access the permanent SAS dataset named CPS85A and create the temporary SAS dataset named CPS85B. The PROC MEANS statement and the options MEAN, VAR, STD, CV, MAX, MIN, tell SAS to calculate the mean, variance, standard deviation, coefficient of variation, and maximum and minimum values. The VAR statement tells SAS to calculate these statistics for the variables WAGE, ED, EX, FE, AGE, and UNION only. If you omit the VAR statement, then SAS will calculate descriptive statistics for all variables in the dataset CPS85A. The PROC CORR COV statement tells SAS to calculate the correlation matrix and covariance matrix. The VAR statement tells SAS to calculate the correlation coefficients and covariances for the variables WAGE, ED, EX, FE, AGE, and UNION only. If you want SAS to provide a full range of descriptive statistics, you can replace the PROC MEANS mean var std cv; statement with the following statement. PROC UNIVARIATE; SAS will provide a large number of different types of descriptive statistics for the variables WAGE, ED, EX, FE, AGE, UNION. LINEAR REGRESSION 9 Example #9 You want to use the data in the SAS dataset CPS85A to run a linear regression of WAGE on ED. You also want to print the variance-covariance matrix for the parameter estimates. LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed / covb; RUN; The PROC REG statement tells SAS to run a linear regression using the OLS estimator. The MODEL statement tells SAS the dependent variable, explanatory variable(s), and any optional output to print. The dependent variable is on the left-hand side of the equal sign and the explanatory variable(s) are on the right-hand side. The forward slash (/) separates the regression equation from the options. The option covb tells SAS to display the variance-covariance matrix of estimates in the Output window along with the standard regression results. If you do not give SAS any options, then you do not have to include the forward slash. Example #10 You want to use the data in the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and FE. You also want to print the variance-covariance matrix for the parameter estimates. LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe / covb; RUN; This program is the same as the program for example #9 except two additional explanatory variables, EX and FE, are included in the MODEL statement. Example #11 You want to use the data in the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and FE. You want to test the following hypotheses. 1) Education and experience have no joint effect on wage; that is, the coefficient of ED and the coefficient of EX are jointly equal to zero 2) The marginal effects of ED and EX are equal; that is the coefficients of ED and EX are equal. 3) The sum of the marginal effects of ED and EX is equal to 2; that is, the sum of the coefficients of ED and EX is 2. LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe; TEST ed = 0, ex = 0; TEST ed = ex; TEST ed + ex = 2; 10 RUN; Note that one or more TEST statements can follow a MODEL statement. Because we are testing three different hypotheses for the same regression model, we have three TEST statements that follow the model statement. Note that when you are testing a joint hypothesis (i.e., two or more restrictions jointly), after the TEST statement you separate the equation that defines each hypothesis by a comma. Example #12 You want to use the data in the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and FE, and impose the restriction that the coefficients of ED and EX are equal. Thus, your objective is to estimate a restricted model that imposes a restriction on the model parameters. LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe; RESTRICT ed = ex; RUN; The RESTRICT statement tells SAS to impose a restriction on the parameters of the statistical model. The restriction that you want to impose is given by the equation after the RESTRICT statement. Note that the format of the RESTRICT statement is identical to the format of the TEST statement. SAS will display the parameter estimates for the restricted model in the Output window. In addition, it provides an estimate for a parameter called RESTRICT. This is the estimate for a Lagrange parameter that is introduced during the estimation process. If the coefficient of RESTRICT is zero, then the restricted and unrestricted estimates are not significantly different, which means that the restriction has no effect. In this example, a t-test rejects the null hypothesis that the coefficient of RESTRICT is zero. This indicates that imposing the restriction is not valid, and therefore the estimate of the marginal effect of education is significantly different from the marginal effect of work experience. Example #13 You want to use the SAS dataset named CPS85A to run a linear regression of WAGE on ED, EX and FE. You want to check for multicollinearity among the explanatory variables. To do this you want to run a regression of each explanatory variable on all remaining explanatory variables so you can calculate variance inflation factors. You also want to calculate the correlation coefficients for the explanatory variables. LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe; MODEL ed = ex fe; MODEL ex= ed fe; MODEL fe = ed ex; PROC CORR; VAR ed ex fe; RUN; 11 You can use the R2 statistic for the last three models to calculate variance inflation factors for ED, EX and FE. You can check the correlation matrix for high correlation coefficients between the explanatory variables. Note that SAS will display certain multicollinearity diagnostics, such as eigenvalues and condition indexes, if you use the MODEL statement MODEL wage = ed ex fe / collin; Example #14 You want to use the SAS dataset CPS85A to estimate a varying slope parameter model where WAGE depends upon ED, EX, FE, and the interaction variable EDFE, which is the product of ED and FE. This interaction variable allows the coefficient of ED to depend upon FE. LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; edfe = ed*fe; PROC REG; MODEL wage = ed ex fe edfe; RUN; Note that to estimate this model, you must first create an interaction term for ED and FE. Example #15 You want to use the SAS dataset CPS85A to estimate a log-linear functional form, where the logarithm of WAGE depends upon ED, EX, FE. LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; Logwage = log(wage); PROC REG; MODEL logwage = ed ex fe; RUN; Note that to estimate this model, you must first create a new variable named LOGWAGE, which is the natural logarithm of the variable WAGE. Example #16 You want to use the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and FE. You then want to estimate this model using the FGLS estimator (weighted least squares) assuming that the variance of the error term is a linear function of ED. LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC REG; MODEL wage = ed ex fe; 12 OUTPUT out=cps85b residual=resid; DATA cps85c; SET cps85b; residsq = resid**2; PROC REG; MODEL residsq = ed; OUTPUT out=cps85c predicted=varhat; DATA cps85d; SET cps85c; IF varhat <= 0 THEN varhat = residsq; sdhat = sqrt(varhat); w = 1/sdhat; PROC REG; MODEL wage = ed ex fe; WEIGHT w; RUN; In this program we use three DATA statements to create three temporary SAS datasets. The OUTPUT statement that follows the MODEL statement for the regression of RESIDSQ on ED tells SAS to save the predicted values of RESIDSQ for this regression as the variable named VARHAT (predicted=varhat), and include this variable in the temporary SAS dataset named CPS85C (out=cps85c). The conditional IF THEN statement tells SAS to replace any value of the variable VARHAT that is negative or zero with the value for the variable RESIDSQ. We must do this because we cannot take the square root of zero or a negative number. The function SQRT tells SAS to find the square root of the variable VARHAT. The WEIGHT statement that follows the last MODEL statement tells SAS to run a weighted least squares regression using the variable W as the weight. This is the FGLS estimator. Example #17 You want to use the SAS datataset CPS85A to run a instrumental variables regression of WAGE on ED, EX, and FE using the two-stage least squares estimator. You assume ED is the endogenous explanatory variable. The instrumental variables are NONWH and MARR. You also want to calculate the F-statistic for the null hypothesis that NONWH and MARR have no joint effect on ED in the first-stage regression to check the strength (relevance) of the instrumental variables NONWH and MARR. LIBNAME econ415 ‘e:’; DATA cps85b; SET econ415.cps85a; PROC SYSLIN 2sls; ENDOGENOUS ed; INSTRUMENTS nonwh marr; MODEL wage = ed ex fe; RUN; DATA cps85c; SET econ415.cps85a; PROC REG; MODEL ed = ex fe nonwh marr; TEST nonwh = 0, marr = 0; RUN; 13 The PROC SYSLIN statement tells SAS that you are going to estimate at least one equation in a system of linear equations. The option 2SLS tells SAS to estimate the equation(s) using the two-stage least squares estimator. The ENDOGENOUS statement tells SAS the endogenous variable(s). The INSTRUMENTS statement tells SAS the variables that you will use as instrumental variables. The MODEL statement tells SAS the equation to estimate. The second set of commands beginning with the second data statement DATA CPS85C and ending with the second RUN statement are the commands for the first-stage regression of ED on EX, FE, NONWH, MARR, and calculation of the F-statistic that is used to check instrument strength or relevance. Examples #18 and #19 use the data are contained in the Excel file MACROCON, which are assumed to be located on a disk in drive E. Create a temporary SAS dataset named MACROCON. Save this temporary SAS dataset as a permanent SAS dataset named MACROCON in the library ECON415. Example #18 You want to use the SAS dataset named MACROCON to run a linear regression of real consumption expenditures (RCONS) on real disposable income (RDISINC) and PRIME. Real consumption expenditures is defined as CONS divided by PRICE, with the appropriate adjustment for the decimal point. Real disposable income is defined as DISINC divided by PRICE, with the appropriate adjustment for the decimal point. You want to do a Largrange multiplier test to test for second-order autocorrelation. LIBNAME econ415 ‘e:’; DATA con1; SET econ415.macrocon; rcons = cons/(price/100); rdisinc = disinc/(price/100); PROC REG; MODEL rcons = rdisinc prime; OUTPUT out=con1 residual=resid; DATA con2; SET con1; resid1 = lag1(resid); resid2 = lag2(resid); PROC REG; MODEL resid = rdisinc prime resid1 resid2; RUN; The assignment statements for RCONS and RDISINC tell SAS to create the new variables RCONS and RDISINC and save them in the temporary SAS dataset named CON1. The OUTPUT statement that follows the MODEL statement for the regression of RCONS on RDISINC and PRIME tells SAS to save the residuals from this regression as the variable named RESID, and include the variable named RESID in the temporary SAS dataset named CON1. The second DATA statement tells SAS to create a second temporary SAS dataset named CON2. The SET statement tells SAS to include all of the variables in the temporary SAS dataset CON1 in the temporary SAS dataset named CON2. The assignment statement RESID1 = LAG1(RESID) tells SAS to create a new variable named RESID1 that is equal to the variable RESID lagged one period. The assignment RESID2 = LAG2(RESID) tells SAS to create a new variable named RESID2 that is equal to the variable RESID lagged two periods. The variables RESID1 and RESID2 are saved in the temporary SAS dataset CON2. To calculate the Lagrange multiplier test statistic, take the unadjusted R2 statistic from the regression of RESID on RDISINC, PRIME, RESID1, and RESID2 (R2 = 0.31) and multiply by the sample size (n = 35). Note that you lose two observations 14 when running this regression because you have a variable that is lagged two periods. For this example, the Lagrange multiplier test statistic is LM = (0.31)(35) = 10.8. Example #19 You want to use the SAS dataset named MACROCON to run a linear regression of real consumption expenditures (RCONS) on real disposable income (RDISINC) and PRIME. You want to estimate this model using the FGLS Cochrane-Orcutt estimator to correct for first-order autocorrelation. LIBNAME econ415 ‘e:’; DATA con1; SET econ415.macrocon; rcons = cons/(price/100); rdisinc = disinc/(price/100); PROC AUTOREG itprint; MODEL rcons = rdisinc prime / nlag=1 iter converge=0.0001; RUN; The PROC AUTOREG statement tells SAS to run a linear regression and correct for autocorrelation. The option ITPRINT tells SAS to print out each iteration that SAS performs so you can see how the estimate of the autocorrelation coefficient () changes. The MODEL statement tells SAS to run a linear regression of RCONS on RDISINC and PRIME. The / tells SAS that options follow. The option NLAG=1 tells SAS to correct first-order autocorrelation. The ITER option tells SAS to use CochraneOrcuitt estimator, which involves doing iterations. The CONVERGE=0.0001 option tells SAS to stop iterating when the estimate of from two successive iterations differ by no more than 0.0001. If you do not include a the CONVERGE option, SAS will use its own default value for when convergence is achieved. It is important to note that SAS will print out the negative of the estimate of the autocorrelation coefficient, . Thus, if SAS prints a negative it is positive, indicating positive autocorrelation. If SAS prints a positive it is negative indicating negative autocorrelation. 15