Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

OLS regression in the SAS system--hints and helps 1. How to use PROC REG, the regression procedure The general, basic form for PROC REG is: proc reg; --invokes PROC REG model y=x1 x2 x3; --MODEL statement; lists variables in this format: dependent variable = list of regressors output out=pre_res p=yfit r=yresid lclm=yl95 uclm=yu95 dffits; --OUTPUT statement. Format is: output out=outputdataset p=name r=name, etc.. p is the predicted (fitted) values r is the residual values lclm and uclm are the lower and upper 95% confidence limits for the mean of y dffits is the standard influence of observation on predicted value. This is useful in seeing if any particular observation has an outsized effect on your estimates. 2. You will need to do a PROC PRINT to get SAS to print out the fitted and residual values. proc print data=pre_res; var yfit yresid; --invokes PROC PRINT and calls up the output dataset --lists the variables to be printed 3. To calculate the mean absolute percentage error (MAPE), which is essentially the same as the mean deviation in Chapter 4, do the following: data absolute; set pre_res; a_yresid=abs(yresid); keep a_yresid; proc means mean data=absolute; output out=mapey FIX ME 4. PROC CORR will print a correlation matrix so you can do your collinearity diagnostics. proc corr data=flowrate; var y x1 x2 x3; --Let's say your original dataset was named flowrate --List all the variables in the dataset Notes: 1. Probably the best thing to do, when you have your variables selected, is to do PROC CORR before you do any regressions. That way, you can do your collinearity checks and determine which regressors should not be included in the same model. 2. After this, run a regression(s) with all the regressors included. Look over the output; do not calculate the fitted and residual values yet. Doublecheck the results on collinear regressors with the correlation matrix. 3. Next, run regressions of models with "collinearity-free" (ha ha; there's no such thing) sets of regressors. You might have one or several, depending on the results of your analysis of the correlation matrix. 4. Look over these results, checking the t-statistic results to determine which regerssors need to be booted out. Cook up different sets of regressors and run the models, again looking over the results. Consider not only the "raw" statistics when choosing your regressors, but also the underlying theory describing the relationship between the regressors and the dependent variable, in selecting your regressors. 5. When you have come up with one or more models which seem to be the best, run them again, this time calculating the fitted and residual values. Print them and calculate the mean absolute percentage error (MAPE) of the residuals for each model. Compare the results, and, based on all this analysis, choose your best model, summarize it, and report on it.