Download OLS regression in the SAS system-

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Ordinary least squares wikipedia, lookup

Principal component analysis wikipedia, lookup

Coefficient of determination wikipedia, lookup

OLS regression in the SAS system--hints and helps
1. How to use PROC REG, the regression procedure
The general, basic form for PROC REG is:
proc reg;
--invokes PROC REG
model y=x1 x2 x3;
--MODEL statement; lists variables in this format:
dependent variable = list of regressors
output out=pre_res p=yfit r=yresid
lclm=yl95 uclm=yu95 dffits;
--OUTPUT statement. Format is:
output out=outputdataset p=name r=name, etc..
p is the predicted (fitted) values
r is the residual values
lclm and uclm are the lower and upper
95% confidence limits for the mean of y
dffits is the standard influence of
observation on predicted value. This is
useful in seeing if any particular
observation has an outsized effect on your
2. You will need to do a PROC PRINT to get SAS to print out the fitted and residual values.
proc print data=pre_res;
var yfit yresid;
--invokes PROC PRINT and calls up
the output dataset
--lists the variables to be printed
3. To calculate the mean absolute percentage error (MAPE), which is essentially the same as the
mean deviation in Chapter 4, do the following:
data absolute;
set pre_res;
keep a_yresid;
proc means mean data=absolute;
output out=mapey
4. PROC CORR will print a correlation matrix so you can do your collinearity diagnostics.
proc corr data=flowrate;
var y x1 x2 x3;
--Let's say your original dataset was named flowrate
--List all the variables in the dataset
1. Probably the best thing to do, when you have your variables selected, is to do PROC CORR before
you do any regressions. That way, you can do your collinearity checks and determine which
regressors should not be included in the same model.
2. After this, run a regression(s) with all the regressors included. Look over the output; do not
calculate the fitted and residual values yet. Doublecheck the results on collinear regressors
with the correlation matrix.
3. Next, run regressions of models with "collinearity-free" (ha ha; there's no such thing) sets of
regressors. You might have one or several, depending on the results of your analysis of the
correlation matrix.
4. Look over these results, checking the t-statistic results to determine which regerssors need to
be booted out. Cook up different sets of regressors and run the models, again looking over the
results. Consider not only the "raw" statistics when choosing your regressors, but also the
underlying theory describing the relationship between the regressors and the dependent variable,
in selecting your regressors.
5. When you have come up with one or more models which seem to be the best, run them again, this
time calculating the fitted and residual values. Print them and calculate the mean absolute
percentage error (MAPE) of the residuals for each model. Compare the results, and, based on all
this analysis, choose your best model, summarize it, and report on it.