Download SAS Self-Teaching Guide

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Instrumental variables estimation wikipedia , lookup

Data assimilation wikipedia , lookup

Time series wikipedia , lookup

Least squares wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
A CONCISE GUIDE TO THE
SAS STATISTICAL PACKAGE
VERSION 9.3 and 9.4
Professor Thornton
Economics 415/514
Econometrics
1
INTRODUCTION
This guide provides an overview of the SAS statistical package, version 9.4, and an explanation of a
number of useful SAS commands and capabilities. It does not explain all SAS commands and
capabilities. SAS is an extremely powerful statistical package, and if you desire to learn more about what
it can do you should consult the appropriate SAS Users Manual or purchase one of the many SAS
companion books available in bookstores that provide a more detailed explanation about various facets of
the SAS system.
DATA SETS
In this guide, SAS commands are explained in the context of examples. The examples are based on the
following three data sets. It is assumed that each data set is contained in a file on a memory stick in drive
E. If your data files are on drive C, or on a memory stick located in a different drive such as drive F,
modify the examples below accordingly (e.g., replace the letter E with the letter C or F). Using a memory
stick is recommended.
WAGEDATA
The data file WAGEDATA consists of a cross-section of 49 workers. The variables are WAGE =
monthly wage, EDUC = years of education beyond the eighth grade, EXPER = years of experience, AGE
= age of worker, GENDER = indicator variable for gender (1 if male, 0 if female), RACE = indicator
variable for race (1 if white, 0 if nonwhite), CLERICAL = indicator variable for clerical worker (1 if
clerical worker, 0 otherwise), MAINT = indicator variable for maintenance worker (1 if maintenance
worker, 0 otherwise), CRAFTS = indicator variable for crafts worker (1 if crafts worker, 0 otherwise).
CPS85
The data file CPS85 consists of 526 randomly selected employed workers from the May 1985 current
population survey conducted by the Department of Commerce. This is a survey of over 50,000
households conducted monthly, and it serves as the basis for the national employment and unemployment
statistics. The variables are: ED = years of education, SOUTH = dummy variable (1 if worker lives in
south, 0 otherwise), NONWH = dummy variable (1 if worker is nonwhite, zero otherwise), HISP =
dummy variable (1 if worker is Hispanic, 0 otherwise), FE = dummy variable (1 if worker is female, 0
otherwise), MARR = dummy variable (1 if worker is married with spouse present in household, 0
otherwise), MARRFE = dummy variable (1 if worker is married female with spouse present in household,
0 otherwise), EX = years of labor market experience, UNION = dummy variable (1 if worker has union
job, 0 otherwise), WAGE = average hourly earnings in constant 2003 dollars, AGE = age in years,
MANUF = dummy variable ( 1 if worker works in manufacturing industry, 0 otherwise), CONSTR =
dummy variable ( 1 if worker works in construction industry, 0 otherwise), MANAG = dummy variable
(1 if worker is managerial or administrative, 0 otherwise), SALES = dummy variable (1 if worker is in
sales, 0 otherwise), CLER = dummy variable ( 1 if worker is clerical worker, 0 otherwise), SERV =
dummy variable (1 if worker is a service worker, 0 otherwise), PROF = dummy variable (1 if worker is
professional or technical, 0 otherwise),
MACROCON
The data file MACROCON consists of a times-series of annual data for the period 1959 to 1995. The
variables are YEAR = year, CONS = annual consumption spending in billions of dollars, DISINC =
annual disposable income in billions of dollars, PRICE = consumer price index, PRIME = the prime
interest rate, UN = unemployment rate.
2
BACKGROUND INFORMATION
SAS is a statistical software package that can be used to read, manage, analyze, and present data. SAS
allows you to read data in a variety of different formats, transform the data to conduct statistical analyses,
analyze the data, and present the results.
A SAS program has two major components: Data Steps and Procedures. The data step allows you to
read SAS data sets or raw data, perform transformations on the data, create new variables, and recode
existing variables. The data step is the component of the program that creates SAS datasets. The
procedure (usually referred to as PROC) allows you to analyze and present the data. Data steps and
procedures are comprised of one or more statements. A statement is usually identified by a keyword
that suggests the statement’s function (e.g., REG, MEANS, RUN). Every statement ends with a
semicolon.
EXECUTING A SAS PROGRAM
A SAS program can be executed in different ways. The two most important ways are batch mode and
interactive windows mode. In batch mode you use a text editor (such as Microsoft WordPad) to write a
SAS program in an input file in a text document (.txt). You then tell SAS to execute the program in the
input file and place the resulting output in an output file. You then use a text editor to view the output
file.
In interactive windows mode, you type SAS statements in a Program Editor window. When SAS
statements are executed the output is displayed in an Output window. A Log window is also displayed
that contains the log for any SAS statements that are executed. The log window is very useful in writing
SAS programs. The log is displayed whether the program works or not. It repeats the SAS statements
that are executed, documents any SAS datasets that are created, gives you warnings about potential
problems with your program, and error messages for mistakes such as incorrect syntax.
This guide explains how to create and execute SAS programs in interactive windows mode using the
Program Editor.
CREATING A SAS DATASET
The first step in SAS programming is to create a SAS dataset. SAS has a large number of tools that can
be used to read raw data into a SAS dataset. This process is called importing. The raw data used to
create a SAS dataset can be in a number of different formats and locations. This guide explains how to
import an Excel file, create a temporary SAS dataset, create a permanent SAS dataset that you can save
for future use in a SAS library, and access a SAS dataset stored in the library.
Example #1
The Excel File WAGEDATA has 49 observations on 9 variables. The names of the variables are WAGE,
EDUC, EXPER, AGE, GENDER, RACE, CLERICAL, MAINT, CRAFTS. You want to create a
temporary SAS dataset named EARNINGS, and a SAS library named ECON415. You then want to save
the temporary dataset EARNINGS as a permanent SAS dataset also named EARNINGS.
If you are using SAS 9.3, you can directly import an Excel file. If you are using SAS 9.4, you must first
save the Excel file as a CSV (Comma delimited) file. To use the Excel program to save the Excel file as a
CSV file do the following. In Excel, on the menu bar in the upper left hand corner click File. Click Save
3
As. In the Save as type box scroll down the list of file types and click on CSV (Comma delimited). Click
Save. Now launch SAS. On the menu bar in the upper left hand corner click File. Click Import Data…
Under Select a data source from the list below, Microsoft Excel Workbook should appear; if not, find it
under the list of choices. Click Next. In the dialogue box next to Workbook, enter the name and location
of the file you want to import. In this example: E:\wagedata. In the dialogue box that appears SAS asks
you What table do you want to import? This is the name of the worksheet in the Excel file you are
importing. In the Excel file WAGEDATA, there is only one worksheet named data. This should already
appear in the box. If not select it. Click Next. In the dialogue box that appears, enter the name you want
to give to the temporary SAS dataset you are creating. Enter the name earnings. Click Finish. To verify
that you have successfully created a temporary SAS dataset named EARNINGS, click the explorer button
on the tool bar. Click the Work icon, and the earnings icon. To create a new SAS library named
ECON415, on the menu bar in the upper left hand corner click on Tools. Click on New Library. In the
dialogue box next to Name, enter ECON415. In the box next to Path, enter E:\. Click OK. Click the
explorer button on the tool bar. Click Work. Use the mouse to drag the file named EARNINGS from the
folder named Work to the folder named Econ415. To verify that you have successfully created a
permanent SAS dataset named EARNINGS, click on the folder Econ415.
ACCESSING A PERMANENT SAS DATASET
The following examples explain how to load a permanent SAS dataset that you have created and create
new temporary or permanent SAS datasets from it.
Example #2
You want to access the dataset named EARNINGS which is stored in the library named ECON415 on a
disk on drive E. You want to create a temporary SAS data set named EARN1.
In the Program Editor window type the following statements.
LIBNAME econ415 ‘e:’;
DATA earn1;
SET econ415.earnings;
RUN;
The LIBNAME statement tells SAS the name of the library and where it is located. The DATA statement
tells SAS to create a temporary SAS dataset named EARN1. The SET statement tells SAS to access the
permanent SAS dataset named EARNINGS that is located in the library named ECON415. To verify that
you have accessed EARINGS and created EARN1, click the Libraries icon in the Explorer window.
There is now an icon for ECON415. If you click ECON415, you will see an EARNINGS icon. If you
click the Work icon, you will see an icon for the temporary dataset EARN1. Note that when you end
your session, the temporary dataset EARN1 will be deleted. If you want to store this new dataset
permanently in the library named ECON415, then replace the DATA statement above with the following
DATA statement
DATA econ415.earn1;
If you want to store all changes made in the current session in the permanent SAS dataset named
EARNINGS, then replace the DATA statement above with the following DATA statement
DATA econ415.earnings;
4
In this case, you do not create a temporary SAS dataset. Rather, SAS overwrites the permanent SAS
dataset EARNINGS with any changes that you make to the data during the current session.
CREATING VARIABLES, RECODING VARIABLES, DELETING OBSERVATIONS
Assignment statements and logical expressions can be used for many purposes, such as creating new
variables from existing variables, recoding variables, and deleting observations from the current sample.
Each of these are explained below.
ASSIGNMENT STATEMENTS
Assignment statements allow you to create new variables from existing variables. Assignment statements
use the following arithmetic operators, which are carried-out in the following order if parentheses are not
used: ** (exponentiation), * (multiplication), / (division), + (addition), - (subtraction). The operator for
the natural logarithm is LOG.
Example #3
You want to access the dataset EARNINGS and create a temporary dataset named EARN1 that contains
all the variables in EARNINGS plus additional variables that you want to create.
LIBNAME econ415 ‘e:’;
DATA earn1;
SET econ415.earnings;
logwage = log(wage);
yearwage = wage*12;
daywage = wage / 30;
agesq = age**2;
agecub = age**3;
toteduc = educ + 8;
RUN;
SAS will create the variables logwage, yearwage, daywage, agesq, agecub, and toteduc, and place them in
the temporary dataset EARN1 along with all existing variables in the dataset EARNINGS.
LOGICAL EXPRESSIONS
Logical expressions use conditional IF, THEN, ELSE statements, and comparison and logical operators.
The comparison operators are:
Equal to
Greater than
Less than
Greater than or equal to
Less than or equal to
Not equal to
In
Notin
=
>
<
>=
<=
^=
eq
gt
lt
ge
le
ne
in
not in
The logical operators are:
5
And
Or
&
|
and
or
In the following example, a description of each logical expression and its use is given directly below the
expression for ease of reference.
Example #4
You want to access the dataset EARNINGS, create a temporary dataset named EARN1, and create new
variables, recode existing variables, and delete observations from the sample to construct EARN1.
Commands
LIBNAME econ415 ‘e:’;
DATA earn1;
SET econ415.earnings;
This accesses the permanent SAS dataset named EARNINGS from the library named ECON415, and
creates the temporary SAS dataset named EARN1.
IF educ > 4 THEN college = 1;
ELSE college = 0;
This creates a dummy variable named college that can take two values: 1 or 0. The IF THEN statement
assigns a value of 1 to the variable college if the variable educ is greater than 4. The ELSE statement
assigns a value of 0 to the variable college for all observations that do not have a value of one.
IF age > 50 THEN newage = 2;
ELSE IF age > 25 THEN newage = 1;
ELSE newage = 0;
This creates a multinomial variable called newage that can take three values: 2,1,or 0. The IF THEN
statement assigns a value of 2 to the variable newage if the variable age is greater than 50. The ELSE IF
THEN statement assigns a value of 1 to the variable newage if the variable age is greater than 25 and
equal to or less than 50. The ELSE statement assigns a value of 0 to the variable newage for all
observations that do not have a value of 2 or 1. Note that only one ELSE statement is allowed per IF
THEN statement.
IF gender = 1 THEN sex = ‘male’;
ELSE sex = ‘female’;
This creates a character variable named sex, that can take two names: male or female. The IF THEN
statement assigns the name male to the variable sex if the variable gender is equal to 1. The ELSE
statement assigns the name female to the variable sex for all observations that do not have the name male.
IF wage > 1300;
This keeps any observation for which the variable wage is greater than 1300. It deletes all observations
for which wage is 1300 or less.
IF exper = 1 THEN delete;
6
This deletes any observation for which the variable exper is equal to 1.
IF exper = 3 and gender = 1 then delete;
This deletes any observation for which both the variable exper is equal to 3 and the variable gender is
equal to 1. If either one of these conditions is not satisfied, then the observation is not deleted.
IF educ = 11 or age > = 57 then delete;
This deletes any observation for which either the variable educ is equal to 11 or the variable age is greater
than or equal to 57.
IF wage = . THEN delete;
SAS represents a missing observation with a period (.). This deletes any observation for which the
variable wage has a missing value.
IF age = . then age = 65;
This assigns the value of 65 to the variable age for any observation that is missing.
RUN;
DELETING VARIABLES FROM A SAS DATASET
Example #5
You want to create two new permanent SAS datasets from the permanent SAS dataset named
EARNINGS. You want to name these new SAS datasets EARNSUB1 and EARNSUB2. You want
EARNSUB1 to contain the variables WAGE, EDUC, EXPER, AGE. You want EARNSUB2 to contain
the variables WAGE, EDUC.
LIBNAME econ415 ‘e:’;
DATA econ415.earnsub1;
SET econ415.earnings;
KEEP wage educ exper age;
DATA econ415.earnsub2;
SET econ415.earnsub1;
KEEP wage educ;
RUN;
An alternative program that would accomplish the same task is to replace the KEEP statements with the
following DROP statements.
DROP gender race clerical maint crafts;
DROP exper age;
The LIBNAME statement tells SAS to access and/or store permanent SAS datasets in the library named
ECON415, which is located on the disk in drive E. The first DATA statement tells SAS to create a new
7
permanent SAS dataset named EARNSUB1 and store it in the library named ECON415. The first SET
statement tells SAS to access the permanent SAS dataset name EARNINGS located in the library named
ECON415. The KEEP statement tells SAS to include the variables WAGE, EDUC, EXPER, AGE from
the dataset EARNINGS in the dataset EARNSUB1 (or delete the variables GENDER, RACE,
CLERICAL, MAINT, CRAFT from the dataset EARNINGS in the dataset EARNSUB1). Alternatively,
the DROP statement tells SAS to delete the variables GENDER, RACE, CLERICAL, MAINT, CRAFT
from the dataset EARNINGS in the dataset EARNSUB1(or include the variables WAGE, EDUC,
EXPER, AGE from the dataset EARNINGS in the dataset EARNSUB1). The second DATA statement
tells SAS to create a new permanent SAS dataset named EARNSUB2 and store it in the library named
ECON415. The second SET statement tells SAS to access the permanent SAS dataset name EARNSUB1
located in the library named ECON415. The KEEP statement tells SAS to include the variables WAGE,
and EDUC from the dataset EARNSUB1 in the dataset EARNSUB2. Alternatively, the DROP statement
tells SAS to delete the variables EXPER and AGE from the dataset EARNSUB1 in the dataset
EARNSUB2.
DISPLAYING A SAS DATASET
Example #6
You want to display the data in the permanent SAS dataset named EARNINGS.
LIBNAME econ415 ‘e:’;
DATA earn1;
SET econ415.earnings;
PROC PRINT data=earn1;
RUN;
The temporary SAS dataset EARN1 that contains the data from the permanent SAS dataset EARNINGS
will be displayed in the Output Window.
DESCRIBING AND ANALYZING DATA
Examples #7 through #17 below involve describing and analyzing data. The data are contained in the
Excel file CPS85, which is assumed to be located on a disk in drive E. Create a temporary SAS dataset
named CPS85A. Save this temporary SAS dataset as a permanent SAS dataset named CPS85A in the
library ECON415.
FREQUENCY DISTRIBUTIONS AND SCATTER DIAGRAMS
Example #7
You want to access the permanent SAS dataset named CPS85A which is stored in the library named
ECON415 on drive E:. You want to display an absolute frequency distribution for the variables WAGE
and ED, a relative frequency distribution for the variables WAGE and ED, and a scatter diagram for the
variables WAGE and ED.
LIBNAME econ415 ‘e:’;
DATA cps85b;
SET econ415.cps85a;
PROC UNIVARIATE NOPRINT;
VAR wage ed;
8
HISTOGRAM wage ed;
PROC SGPLOT;
SCATTER x = ed y = wage;
RUN;
The LIBNAME, DATA and SET statements access the permanent SAS dataset named CPS85A and
create the temporary SAS dataset named CPS85B. Note that this temporary dataset will be deleted when
your session ends. The PROC UNIVARIATE statement and the option NOPRINT tells SAS to obtain
information required to construct a histogram and suppress the output. The VAR statement tells SAS to
obtain the information for the variables WAGE and ED. The HISTOGRAM statement tells SAS to
construct histograms for the variable WAGE and ED. The PROC SGPLOT statement tells SAS to
construct a graph that plots data points. The SCATTER statement tells SAS to construct a scatter
diagram. The x = ed tells SAS to measure the variable ED on the horizontal axis. The y = wage tells
SAS to measure the variable WAGE on the vertical axis.
DESCRIPTIVE STATISTICS
Example #8
You want to access the permanent SAS dataset named CPS85A which is stored in the library named
ECON415 on a disk in drive E. You want to calculate the mean, variance, standard deviation, and
coefficient of variation for the variables WAGE, ED, EX, FE, AGE, UNION. You also want to calculate
the covariances and correlation coefficients for these variables.
LIBNAME econ415 ‘e:’;
DATA cps85b;
SET econ415.cps85a;
PROC MEANS mean var std cv max min;
VAR wage ed ex fe age union;
PROC CORR COV;
VAR wage ed ex fe age union;
RUN;
The LIBNAME, DATA and SET statements access the permanent SAS dataset named CPS85A and
create the temporary SAS dataset named CPS85B. The PROC MEANS statement and the options
MEAN, VAR, STD, CV, MAX, MIN, tell SAS to calculate the mean, variance, standard deviation,
coefficient of variation, and maximum and minimum values. The VAR statement tells SAS to calculate
these statistics for the variables WAGE, ED, EX, FE, AGE, and UNION only. If you omit the VAR
statement, then SAS will calculate descriptive statistics for all variables in the dataset CPS85A. The
PROC CORR COV statement tells SAS to calculate the correlation matrix and covariance matrix. The
VAR statement tells SAS to calculate the correlation coefficients and covariances for the variables
WAGE, ED, EX, FE, AGE, and UNION only. If you want SAS to provide a full range of descriptive
statistics, you can replace the PROC MEANS mean var std cv; statement with the following statement.
PROC UNIVARIATE;
SAS will provide a large number of different types of descriptive statistics for the variables WAGE, ED,
EX, FE, AGE, UNION.
LINEAR REGRESSION
9
Example #9
You want to use the data in the SAS dataset CPS85A to run a linear regression of WAGE on ED. You
also want to print the variance-covariance matrix for the parameter estimates.
LIBNAME econ415 ‘e:’;
DATA cps85b;
SET econ415.cps85a;
PROC REG;
MODEL wage = ed / covb;
RUN;
The PROC REG statement tells SAS to run a linear regression using the OLS estimator. The MODEL
statement tells SAS the dependent variable, explanatory variable(s), and any optional output to print. The
dependent variable is on the left-hand side of the equal sign and the explanatory variable(s) are on the
right-hand side. The forward slash (/) separates the regression equation from the options. The option
covb tells SAS to display the variance-covariance matrix of estimates in the Output window along with
the standard regression results. If you do not give SAS any options, then you do not have to include the
forward slash.
Example #10
You want to use the data in the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and
FE. You also want to print the variance-covariance matrix for the parameter estimates.
LIBNAME econ415 ‘e:’;
DATA cps85b;
SET econ415.cps85a;
PROC REG;
MODEL wage = ed ex fe / covb;
RUN;
This program is the same as the program for example #9 except two additional explanatory variables, EX
and FE, are included in the MODEL statement.
Example #11
You want to use the data in the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and
FE. You want to test the following hypotheses. 1) Education and experience have no joint effect on
wage; that is, the coefficient of ED and the coefficient of EX are jointly equal to zero 2) The marginal
effects of ED and EX are equal; that is the coefficients of ED and EX are equal. 3) The sum of the
marginal effects of ED and EX is equal to 2; that is, the sum of the coefficients of ED and EX is 2.
LIBNAME econ415 ‘e:’;
DATA cps85b;
SET econ415.cps85a;
PROC REG;
MODEL wage = ed ex fe;
TEST ed = 0, ex = 0;
TEST ed = ex;
TEST ed + ex = 2;
10
RUN;
Note that one or more TEST statements can follow a MODEL statement. Because we are testing three
different hypotheses for the same regression model, we have three TEST statements that follow the model
statement. Note that when you are testing a joint hypothesis (i.e., two or more restrictions jointly), after
the TEST statement you separate the equation that defines each hypothesis by a comma.
Example #12
You want to use the data in the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and
FE, and impose the restriction that the coefficients of ED and EX are equal. Thus, your objective is to
estimate a restricted model that imposes a restriction on the model parameters.
LIBNAME econ415 ‘e:’;
DATA cps85b;
SET econ415.cps85a;
PROC REG;
MODEL wage = ed ex fe;
RESTRICT ed = ex;
RUN;
The RESTRICT statement tells SAS to impose a restriction on the parameters of the statistical model.
The restriction that you want to impose is given by the equation after the RESTRICT statement. Note
that the format of the RESTRICT statement is identical to the format of the TEST statement. SAS will
display the parameter estimates for the restricted model in the Output window. In addition, it provides an
estimate for a parameter called RESTRICT. This is the estimate for a Lagrange parameter that is
introduced during the estimation process. If the coefficient of RESTRICT is zero, then the restricted and
unrestricted estimates are not significantly different, which means that the restriction has no effect. In this
example, a t-test rejects the null hypothesis that the coefficient of RESTRICT is zero. This indicates that
imposing the restriction is not valid, and therefore the estimate of the marginal effect of education is
significantly different from the marginal effect of work experience.
Example #13
You want to use the SAS dataset named CPS85A to run a linear regression of WAGE on ED, EX and FE.
You want to check for multicollinearity among the explanatory variables. To do this you want to run a
regression of each explanatory variable on all remaining explanatory variables so you can calculate
variance inflation factors. You also want to calculate the correlation coefficients for the explanatory
variables.
LIBNAME econ415 ‘e:’;
DATA cps85b;
SET econ415.cps85a;
PROC REG;
MODEL wage = ed ex fe;
MODEL ed = ex fe;
MODEL ex= ed fe;
MODEL fe = ed ex;
PROC CORR;
VAR ed ex fe;
RUN;
11
You can use the R2 statistic for the last three models to calculate variance inflation factors for ED, EX and
FE. You can check the correlation matrix for high correlation coefficients between the explanatory
variables. Note that SAS will display certain multicollinearity diagnostics, such as eigenvalues and
condition indexes, if you use the MODEL statement
MODEL wage = ed ex fe / collin;
Example #14
You want to use the SAS dataset CPS85A to estimate a varying slope parameter model where WAGE
depends upon ED, EX, FE, and the interaction variable EDFE, which is the product of ED and FE. This
interaction variable allows the coefficient of ED to depend upon FE.
LIBNAME econ415 ‘e:’;
DATA cps85b;
SET econ415.cps85a;
edfe = ed*fe;
PROC REG;
MODEL wage = ed ex fe edfe;
RUN;
Note that to estimate this model, you must first create an interaction term for ED and FE.
Example #15
You want to use the SAS dataset CPS85A to estimate a log-linear functional form, where the logarithm of
WAGE depends upon ED, EX, FE.
LIBNAME econ415 ‘e:’;
DATA cps85b;
SET econ415.cps85a;
Logwage = log(wage);
PROC REG;
MODEL logwage = ed ex fe;
RUN;
Note that to estimate this model, you must first create a new variable named LOGWAGE, which is the
natural logarithm of the variable WAGE.
Example #16
You want to use the SAS dataset CPS85A to run a linear regression of WAGE on ED, EX, and FE. You
then want to estimate this model using the FGLS estimator (weighted least squares) assuming that the
variance of the error term is a linear function of ED.
LIBNAME econ415 ‘e:’;
DATA cps85b;
SET econ415.cps85a;
PROC REG;
MODEL wage = ed ex fe;
12
OUTPUT out=cps85b residual=resid;
DATA cps85c;
SET cps85b;
residsq = resid**2;
PROC REG;
MODEL residsq = ed;
OUTPUT out=cps85c predicted=varhat;
DATA cps85d;
SET cps85c;
IF varhat <= 0 THEN varhat = residsq;
sdhat = sqrt(varhat);
w = 1/sdhat;
PROC REG;
MODEL wage = ed ex fe;
WEIGHT w;
RUN;
In this program we use three DATA statements to create three temporary SAS datasets. The OUTPUT
statement that follows the MODEL statement for the regression of RESIDSQ on ED tells SAS to save the
predicted values of RESIDSQ for this regression as the variable named VARHAT (predicted=varhat), and
include this variable in the temporary SAS dataset named CPS85C (out=cps85c). The conditional IF
THEN statement tells SAS to replace any value of the variable VARHAT that is negative or zero with the
value for the variable RESIDSQ. We must do this because we cannot take the square root of zero or a
negative number. The function SQRT tells SAS to find the square root of the variable VARHAT. The
WEIGHT statement that follows the last MODEL statement tells SAS to run a weighted least squares
regression using the variable W as the weight. This is the FGLS estimator.
Example #17
You want to use the SAS datataset CPS85A to run a instrumental variables regression of WAGE on ED,
EX, and FE using the two-stage least squares estimator. You assume ED is the endogenous explanatory
variable. The instrumental variables are NONWH and MARR. You also want to calculate the F-statistic
for the null hypothesis that NONWH and MARR have no joint effect on ED in the first-stage regression
to check the strength (relevance) of the instrumental variables NONWH and MARR.
LIBNAME econ415 ‘e:’;
DATA cps85b;
SET econ415.cps85a;
PROC SYSLIN 2sls;
ENDOGENOUS ed;
INSTRUMENTS nonwh marr;
MODEL wage = ed ex fe;
RUN;
DATA cps85c;
SET econ415.cps85a;
PROC REG;
MODEL ed = ex fe nonwh marr;
TEST nonwh = 0, marr = 0;
RUN;
13
The PROC SYSLIN statement tells SAS that you are going to estimate at least one equation in a system
of linear equations. The option 2SLS tells SAS to estimate the equation(s) using the two-stage least
squares estimator. The ENDOGENOUS statement tells SAS the endogenous variable(s). The
INSTRUMENTS statement tells SAS the variables that you will use as instrumental variables. The
MODEL statement tells SAS the equation to estimate. The second set of commands beginning with the
second data statement DATA CPS85C and ending with the second RUN statement are the commands for
the first-stage regression of ED on EX, FE, NONWH, MARR, and calculation of the F-statistic that is
used to check instrument strength or relevance.
Examples #18 and #19 use the data are contained in the Excel file MACROCON, which are assumed to
be located on a disk in drive E. Create a temporary SAS dataset named MACROCON. Save this
temporary SAS dataset as a permanent SAS dataset named MACROCON in the library ECON415.
Example #18
You want to use the SAS dataset named MACROCON to run a linear regression of real consumption
expenditures (RCONS) on real disposable income (RDISINC) and PRIME. Real consumption
expenditures is defined as CONS divided by PRICE, with the appropriate adjustment for the decimal
point. Real disposable income is defined as DISINC divided by PRICE, with the appropriate adjustment
for the decimal point. You want to do a Largrange multiplier test to test for second-order autocorrelation.
LIBNAME econ415 ‘e:’;
DATA con1;
SET econ415.macrocon;
rcons = cons/(price/100);
rdisinc = disinc/(price/100);
PROC REG;
MODEL rcons = rdisinc prime;
OUTPUT out=con1 residual=resid;
DATA con2;
SET con1;
resid1 = lag1(resid);
resid2 = lag2(resid);
PROC REG;
MODEL resid = rdisinc prime resid1 resid2;
RUN;
The assignment statements for RCONS and RDISINC tell SAS to create the new variables RCONS and
RDISINC and save them in the temporary SAS dataset named CON1. The OUTPUT statement that
follows the MODEL statement for the regression of RCONS on RDISINC and PRIME tells SAS to save
the residuals from this regression as the variable named RESID, and include the variable named RESID in
the temporary SAS dataset named CON1. The second DATA statement tells SAS to create a second
temporary SAS dataset named CON2. The SET statement tells SAS to include all of the variables in the
temporary SAS dataset CON1 in the temporary SAS dataset named CON2. The assignment statement
RESID1 = LAG1(RESID) tells SAS to create a new variable named RESID1 that is equal to the variable
RESID lagged one period. The assignment RESID2 = LAG2(RESID) tells SAS to create a new variable
named RESID2 that is equal to the variable RESID lagged two periods. The variables RESID1 and
RESID2 are saved in the temporary SAS dataset CON2. To calculate the Lagrange multiplier test
statistic, take the unadjusted R2 statistic from the regression of RESID on RDISINC, PRIME, RESID1,
and RESID2 (R2 = 0.31) and multiply by the sample size (n = 35). Note that you lose two observations
14
when running this regression because you have a variable that is lagged two periods. For this example,
the Lagrange multiplier test statistic is LM = (0.31)(35) = 10.8.
Example #19
You want to use the SAS dataset named MACROCON to run a linear regression of real consumption
expenditures (RCONS) on real disposable income (RDISINC) and PRIME. You want to estimate this
model using the FGLS Cochrane-Orcutt estimator to correct for first-order autocorrelation.
LIBNAME econ415 ‘e:’;
DATA con1;
SET econ415.macrocon;
rcons = cons/(price/100);
rdisinc = disinc/(price/100);
PROC AUTOREG itprint;
MODEL rcons = rdisinc prime / nlag=1 iter converge=0.0001;
RUN;
The PROC AUTOREG statement tells SAS to run a linear regression and correct for autocorrelation.
The option ITPRINT tells SAS to print out each iteration that SAS performs so you can see how the
estimate of the autocorrelation coefficient () changes. The MODEL statement tells SAS to run a linear
regression of RCONS on RDISINC and PRIME. The / tells SAS that options follow. The option
NLAG=1 tells SAS to correct first-order autocorrelation. The ITER option tells SAS to use CochraneOrcuitt estimator, which involves doing iterations. The CONVERGE=0.0001 option tells SAS to stop
iterating when the estimate of  from two successive iterations differ by no more than 0.0001. If you do
not include a the CONVERGE option, SAS will use its own default value for when convergence is
achieved. It is important to note that SAS will print out the negative of the estimate of the autocorrelation
coefficient, . Thus, if SAS prints a negative  it is positive, indicating positive autocorrelation. If SAS
prints a positive  it is negative indicating negative autocorrelation.
15