Download 3.1. Analysis of Residuals in Excel 2007/2010

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
BUSI 6220
FALL 2012
HW3 SOLUTIONS
3.1. Analysis of Residuals in Excel 2007/2010
3.1.1. Open the Excel Worksheet GPAvsGMAT.xls and select Data > Data Analysis >
Regression. Then fill out the popup window as follows:
3.1.2. Click OK. The Regression results will appear, together with the two requested plots:
(1) A Residual Plot indicating approximately constant variance, and (2) a Normal
Probability Plot indicating an approximately normal distribution of the residuals.
3.1.3. The Standardized residuals have also been saved. Obtain a line plot of these st.
residuals (St. Residuals vs. order index). (For example: Select the range of listed
Standardized Residuals, then select Insert > Scatter. Alternatively, select Insert > Line >
1
BUSI 6220
FALL 2012
HW3 SOLUTIONS
OK > right-click on the line > Format data series > line color > no line > Marker
options > automatic > OK.) The resulting plot implies there might be some serial
autocorrelation, but the pattern is not very clear. However, since our data is cross-sectional
and not time series, we can always randomly reorder the observations and force them to be
un-autocorrelated. Therefore, this is not an important problem.
Characteristic
autocorrelation pattern
(make sure you
understand why!)
3.2. Analysis of Residuals in IBM SPSS
3.2.1. Start IBM SPSS 20. Select File > Open > Data. Open GPAvsGMAT.xls (change file
type to “Excel”).
3.2.2. Select Analyze > Regression > Linear and specify GPA as the Dependent and
GMAT as the Independent variable. To get SPSS to highlight any outliers, we click the
Statistics button in the regression window, and check the box for Casewise Diagnostics.
Note that the default value of standardized residual is 3, but you may want to replace it by 2
when you have a small sample size. Unusual observations, if any, will be displayed in a table
in the output. Click Continue.
To study the Normality (or lack thereof) of the residuals, click the Plots button in the
regression window, and select both the Normal Probability Plot and the related Histogram.
Remember that if Normality is present in the residuals, we would expect the points in the
2
BUSI 6220
FALL 2012
HW3 SOLUTIONS
Probability Plot to fall on a straight line. From the Plots button menu we can also obtain a
plot of the standardized residuals against (standardized) predicted values (of the dependent
variable). Select ZRESID and ZPRED from the menu (as y-axis and x-axis respectively). In
this plot, we are hoping to see a random scatter. Look out for "funneling" patterns,
illustrating that the variance is not constant, and functional relationship patterns, illustrating a
deficiency in our model form.
Formatting of SPSS outputs:
In order to create more attractive SPSS charts, double-click on them to invoke the chart
editor. Then click on a chart component (select it) to invoke its properties. These include
size, fill & border for graphical components and size, text layout, text style, and fill & border
for textual components such as titles.
In order to create more attractive output tables, select Edit > Options > Pivot Tables . Look
at the Table Look drop-down menu and you will realize that what you have been seeing so
far is only one of about 17 options! Some make your tables look good in academic papers,
some others in PowerPoint presentations. Still not happy? You can always copy/paste the
SPSS table object directly to Excel and reformat it to your satisfaction.
3
BUSI 6220
FALL 2012
HW3 SOLUTIONS
3.3. Analysis of Residuals in Minitab
3.3.1. Start MINITAB 16. Select File > Open Worksheet. Open GPAvsGMAT.xls. Fit a
Regression model (Stat > Regression > Regression) using GPA as the response and
GMAT as the predictor variable. Click on Graphs. Select Standardized and Four in one.
Click OK. Click on Storage. Select Standardized Residuals.
3.3.2. Click OK. The Residuals Plots will appear: (1) A Residuals vs. Fits Plot (=Residual
Plot) indicating approximately constant variance, (2) A Normal Probability Plot indicating an
approximately normal distribution of the residuals, (3) A Histogram of the Residuals
confirming an approximately normal distribution of the residuals, and (4) a Residuals vs.
Order of the Data Plot indicating that the residuals may not be uncorrelated to each other.
After this first visual inspection of the regression assumptions, we will continue with some
formal tests of hypothesis.
3.3.3. Note that your selections in 3.3.1 saved the residuals under a new column called
SRES1 (to indicate standardized residuals). We will now perform a test for constant variance
in the residuals. We will first split the residuals in two groups. For this, we need to create a
new column called group containing the group indexes. Type the data by hand, or select
Editor > Enable Commands, and type the commands:
MTB > set c4
DATA> (1:2)10
DATA> end
MTB >
This command will automatically enter the numbers 1 to 2 under column C4, so that each
one of them appears 10 times. To get help on Minitab’s SET command, go to MINITAB
4
BUSI 6220
FALL 2012
HW3 SOLUTIONS
Session Command Help and look at Session Commands > Manipulating and Calculating
Data > Make Patterned Data > SET. Rename C4 to group. Then type the following
commands:
MTB > sort c3 c5;
SUBC> by c1.
MTB >
Note the semicolon on the first line and the period on the second! This command will store
in column C5 the standardized residuals sorted by increasing order of X observations. We
are now ready to test whether the two groups of sorted residuals have equal variances.
Select Stat > Basic Statistics > 2 Variances and then fill out the dialog box as shown:
Test for Equal Variances: OrdRes versus group
95% Bonferroni confidence intervals for standard deviations
group
1
2
N
10
10
Lower
0.807027
0.525253
StDev
1.23375
0.80299
Upper
2.48435
1.61694
F-Test (normal distribution)
Test statistic = 2.36, p-value = 0.217
Levene's Test (any continuous distribution)
Test statistic = 3.78, p-value = 0.068
The results indicate that the Levene’s Test with:
H0: The two groups have equal variances, vs. HA: The two groups have unequal variances,
results in marginal failure to reject the null hypothesis, since the p-value (0.068) is only
marginally greater than some reasonable alpha values (say, 0.05) but smaller than others
(such as 0.10). Therefore, the residuals appear to have constant variance. Note that,
alternatively, you may choose to group the residuals in three groups:
5
BUSI 6220
FALL 2012
MTB > set C4
DATA> (1:3)7
DATA> end
HW3 SOLUTIONS
(NOTE: since this generates 21 values, delete the last value in c4)
Then Minitab’s test for equal variances will perform a Bartlett’s test and a Levene’s test:
Test for Equal Variances: OrdRes versus group
95% Bonferroni confidence intervals
for standard deviations
group
1
2
3
N
7
7
6
Lower
0.702020
0.583382
0.436345
StDev
1.19110
0.98981
0.76893
Upper
3.23189
2.68571
2.40224
Bartlett's Test (normal distribution)
Test statistic = 0.92, p-value = 0.631
Levene's Test (any continuous distribution)
Test statistic = 0.78, p-value = 0.476
3.3.4. We will now perform a test for randomness of the residuals. Once again, make sure
the command line is enabled by selecting Editor > Enable Commands, and type the
command:
MTB > runs OrdRes
Runs Test: OrdRes
Runs test for OrdRes
Runs above and below K = 0.00435209
The observed number of runs = 8
The expected number of runs = 11
10 observations above K, 10 below
* N is small, so the following approximation may be invalid.
P-value = 0.168
Runs Test tests the null hypothesis:
H0: The residuals increase or decrease in value randomly, vs.
HA: The residuals follow a pattern in the way they increase or decrease.
The results of the Runs Test suggest that the residuals are probably random, as indicated by
the high p-value.
Another way to check for serial autocorrelation in the residuals is by performing a time series
analysis and obtaining Auto-Correlation Function (ACF) and Partial Auto-Correlation
Function (PACF) plots. Select Stat > Time Series > Autocorrelation and Partial
Autocorrelation. Select OrdRes as the Series to be analyzed. The resulting plots indicate
that there are no significant autocorrelation components in the first few lags.
6
BUSI 6220
FALL 2012
HW3 SOLUTIONS
Partial Autocorrelation Function for OrdRes
Autocorrelation Function for OrdRes
(with 5% significance limits for the partial autocorrelations)
1.0
1.0
0.8
0.8
0.6
0.6
Partial Autocorrelation
Autocorrelation
(with 5% significance limits for the autocorrelations)
0.4
0.2
0.0
-0.2
-0.4
-0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-0.8
-1.0
-1.0
1
2
3
Lag
4
1
5
2
3
Lag
4
5
The presence of the First-order autocorrelation (referring to the first time lag) can also be
detected using the Durbin-Watson Statistic. The hypotheses are:
H0: There is no first-order autocorrelation in the residuals, vs.
HA: There is a positive first-order autocorrelation in the residuals.
To obtain this test, select Stat > Regression > Regression, specify the response and the
predictor(s), then click on Options, and finally select Durbin-Watson Statistic.
Regression Analysis: GPA versus GMAT
The regression equation is
GPA = - 1.70 + 0.00840 GMAT
Predictor
Constant
GMAT
Coef
-1.6996
0.008399
S = 0.435014
SE Coef
0.7268
0.001440
R-Sq = 65.4%
Analysis of Variance
Source
DF
SS
Regression
1 6.4337
Residual Error 18 3.4063
Total
19 9.8400
T
-2.34
5.83
P
0.031
0.000
R-Sq(adj) = 63.5%
MS
6.4337
0.1892
F
34.00
P
0.000
Durbin-Watson statistic = 1.46398
A Durbin-Watson statistic value higher than 1.15 indicates that H0 should not be rejected.
3.3.5. We will now perform a test for normality of the residuals. Selecting Stat > Basic
Statistics > Normality Test, and specify SRES1 as the variable:
7
BUSI 6220
FALL 2012
HW3 SOLUTIONS
By default, Minitab performs an Anderson-Darling test. The hypotheses are
H0: The residuals are normally distributed, vs.
HA: The residuals are not normally distributed.
The high p-value (0.607) indicates that we have no reason to reject the normality assumption.
Other tests for normality (same H0, HA): Ryan-Joiner, Shapiro-Wilk, Kolmogorov-Smirnov.
3.4. Analysis of Residuals in SAS. (INCOMPLETE)
To verify the assumptions of simple linear regression, begin by looking at a plot of the
residuals versus the predicted values and a normal probability plot of the residuals.
3.4.1. Start SAS 9.3. Select File > Import Data. Make sure the settings correspond to
importing worksheet “Sheet1” from an Excel file and open GPAvsGMAT.xls. Create a new
Member called GPAvsGMAT in library WORK. Start SAS Analyst by selecting Solutions
> Analysis > Analyst. After clicking on the Analyst window and making sure it is active,
select File > Open By SAS Name. Double-click the Work library icon to expand its
contents, then select the Gpavsgmat table and click OK. This will result in a Gpavsgmat
table opening in a VIEWTABLE window.
8