Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
BUSI 6220
FALL 2006
HW3 SOLUTIONS
3.1. Analysis of Residuals in Excel.
3.1.1. Open the Excel Worksheet GPAvsGMAT.xls and select Tools > Data Analysis >
Regression. Then fill out the popup window as follows:
3.1.2. Click OK. The Regression results will appear, together with the two requested plots:
(1) A Residual Plot indicating approximately constant variance, and (2) a Normal Probability
Plot indicating an approximately normal distribution of the residuals.
Norm al Probability Plot
1
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
300
4
3.5
GPA
Residuals
GMAT Residual Plot
3
2.5
2
1.5
1
0.5
0
400
500
600
0
700
50
100
150
Sam ple Percentile
GMAT
3.1.3. The Standardized residuals have also been saved. Obtain a line plot of these st.
residuals (St. Residuals vs. order index). The resulting plot implies there might be some
serial autocorrelation, but the pattern is not very clear. However, since our data is cross-
1
BUSI 6220
FALL 2006
HW3 SOLUTIONS
sectional and not time series, we can always randomly reorder the observations and force
them to be un-autocorrelated. Therefore, this is not an important problem.
Standard Residuals
2
1.5
1
0.5
0
-0.5
-1
-1.5
Characteristic
autocorrelation pattern
-2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2
BUSI 6220
FALL 2006
HW3 SOLUTIONS
3.2. Analysis of Residuals in SPSS.
3.2.1. Start SPSS 12/14 for Windows. Select File > Open > Data. Open GPAvsGMAT.xls.
3.2.2. Select Analyze > Regression > Linear and specify GPA as the Dependent and
GMAT as the Independent variable. To get SPSS to highlight any outliers, we click the
Statistics button in the regression window, and check the box for Casewise Diagnostics.
Note that the default value of standardized residual is 3, but you may want to replace it by 2
when you have a small sample size. Any unusual observations will be displayed in a table in
the output.
To study the Normality (or otherwise) of the residuals, click the Plots button in the
regression window, and select both the Normal Probability Plot and the related Histogram.
Remember that if Normality is present in the residuals, we would expect the points in the
Probability Plot to fall on the straight line. From the Plots button menu we can also obtain a
plot of the standardised residuals against (standardised) predicted values (of the dependent
variable). Select ZRESID and ZPRED from the menu (as y-axis and x-axis respectively).
Remember we are looking for random scatter in this plot. In particular, look out for
"funnelling" illustrating that the variance is not constant, and a functional relationship,
illustrating a deficiency in our model form.
3
BUSI 6220
FALL 2006
HW3 SOLUTIONS
3.3. Analysis of Residuals in Minitab.
3.3.1. Start MINITAB 14 for Windows. Select File > Open Worsheet. Open
GPAvsGMAT.xls. Fit a Regression model (Stat > Regression > Regression) using GPA
as the response and GMAT as the predictor variable. Click on Graphs. Select
Standardized and Four in one. Click on Storage. Select Standardized Residuals.
3.3.2. Click OK. The Residuals Plots will appear: (1) A Residuals vs. Fits Plot (=Residual
Plot) indicating approximately constant variance, (2) A Normal Probability Plot indicating an
approximately normal distribution of the residuals, (3) A Histogram of the Residuals
confirming an approximately normal distribution of the residuals, and (4) a Residuals vs.
Order of the Data Plot indicating that the residuals may not be uncorrelated to each other.
After this first visual inspection of the regression assumptions, we will continue with some
formal tests of hypothesis.
3.3.3. Note that your selections in 3.3.1 saved the residuals under a new column called
SRES1 (to indicate standardized residuals). We will now perform a test for constant variance
in the residuals. We will first split the residuals in two groups. For this, we need to create a
new column called group containing the group indexes. Type the data by hand, or select
Editor > Enable Commands, and type the commands:
MTB > set c4
DATA> (1:2)10
DATA> end
MTB >
This command will automatically enter the numbers 1 to 2 under column C4, so that eack
one of them appears 10 times. To get help on Minitab’s SET command, go to MINITAB
4
BUSI 6220
FALL 2006
HW3 SOLUTIONS
Session Command Help and look at Session Commands > Manipulating and Calculating
Data > Make Patterned Data > SET. Rename C4 to group. Then type the following
commands:
MTB > sort c3 c5;
SUBC> by c1.
MTB >
Note the semicolon on the first line and the period on the second! This command will store
in column C5 the standardized residuals sorted by increasing order of X observations. We
are now ready to test whether the two groups of sorted residuals have equal variances.
Select Stat > Basic Statistics > 2 Variances and then fill out the dialog box as shown:
Test for Equal Variances: OrdRes versus group
95% Bonferroni confidence intervals for standard deviations
group
1
2
N
10
10
Lower
0.807027
0.525253
StDev
1.23375
0.80299
Upper
2.48435
1.61694
F-Test (normal distribution)
Test statistic = 2.36, p-value = 0.217
Levene's Test (any continuous distribution)
Test statistic = 3.78, p-value = 0.068
The results indicate that the Levene’s Test with:
H0: The two groups have equal variances, vs. HA: The two groups have unequal variances,
results in marginal failure to reject the null hypothesis, since the p-value (0.068) is only
marginally greater than some reasonable alpha values (say, 0.05) but smaller than others
(such as 0.10). Therefore, the residuals appear to have constant variance. Note that,
alternatively, you may choose to group the residuals in three groups:
5
BUSI 6220
FALL 2006
HW3 SOLUTIONS
MTB > set C4
DATA> (1:3)7
DATA> end
Then Minitab’s test for equal variances will perform a Bartlet’s test and a Levene’s test:
Test for Equal Variances: OrdRes versus group
95% Bonferroni confidence intervals for standard deviations
group
1
2
3
N
7
7
6
Lower
0.702020
0.583382
0.436345
StDev
1.19110
0.98981
0.76893
Upper
3.23189
2.68571
2.40224
Bartlett's Test (normal distribution)
Test statistic = 0.92, p-value = 0.631
Levene's Test (any continuous distribution)
Test statistic = 0.78, p-value = 0.476
3.3.4. We will now perform a test for randomness of the residuals. Once again, make sure
the command line is enabled by selecting Editor > Enable Commands, and type the
command:
MTB > runs OrdRes
Runs Test: OrdRes
Runs test for OrdRes
Runs above and below K = 0.00435209
The observed number of runs = 8
The expected number of runs = 11
10 observations above K, 10 below
* N is small, so the following approximation may be invalid.
P-value = 0.168
The Runs Test tests the hypothesis:
H0: The residuals increase or decrease in value randomly, vs.
HA: The residuals follow a pattern in the way they increase or decrease.
The results of the Runs Test suggest that the residuals are probably random, as indicated by
the high p-value.
Another way to check for serial autocorrelation in the residuals is by performing a time series
analysis and obtaining Auto-Correlation Function (ACF) and Partial Auto-Correlation
Function (PACF) plots. Select Stat > Time Series > Autocorrelation and Partial
Autocorrelation. Select OrdRes as the Series to be analyzed. The resulting plots indicate
that there are no significant autocorrelation components in the first few lags.
6
BUSI 6220
FALL 2006
HW3 SOLUTIONS
Partial Autocorrelation Function for OrdRes
Autocorrelation Function for OrdRes
(with 5% significance limits for the partial autocorrelations)
1.0
1.0
0.8
0.8
0.6
0.6
Partial Autocorrelation
Autocorrelation
(with 5% significance limits for the autocorrelations)
0.4
0.2
0.0
-0.2
-0.4
-0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-0.8
-1.0
-1.0
1
2
3
Lag
4
1
5
2
3
Lag
4
5
The presence of the First-order autocorrelation (referring to the first time lag) can also be
detected using the Durbin-Watson Statistic. The hypotheses are:
H0: There is no first-order autocorrelation in the residuals, vs.
HA: There is a positive first-order autocorrelation in the residuals.
To obtain this test, select Stat > Regression > Regression, specify the response and the
predictor(s), then click on Options, and finally select Durbin-Watson Statistic.
Regression Analysis: GPA versus GMAT
The regression equation is
GPA = - 1.70 + 0.00840 GMAT
Predictor
Constant
GMAT
Coef
-1.6996
0.008399
S = 0.435014
SE Coef
0.7268
0.001440
R-Sq = 65.4%
Analysis of Variance
Source
DF
SS
Regression
1 6.4337
Residual Error 18 3.4063
Total
19 9.8400
T
-2.34
5.83
P
0.031
0.000
R-Sq(adj) = 63.5%
MS
6.4337
0.1892
F
34.00
P
0.000
Durbin-Watson statistic = 1.46398
A Durbin-Watson statistic value higher than 1.15 indicates that H0 should not be rejected.
3.3.5. We will now perform a test for normality of the residuals. Selecting Stat > Basic
Statistics > Normality Test, and specify SRES1 as the variable:
7
BUSI 6220
FALL 2006
HW3 SOLUTIONS
By default, Minitab performs an Anderson-Darling test. The hypotheses are
H0: The residuals are normally distributed, vs.
HA: The residuals are not normally distributed.
The high p-value (0.607) indicates that we have no reason to reject the normality assumption.
Other tests for normality: Ryan-Joiner, Shapiro-Wilk, Kolmogorov-Smirnov.
3.4. Analysis of Residuals in SAS.
To verify the assumptions of simple linear regression, begin by looking at a plot of the
residuals versus the predicted values and a normal probability plot of the residuals.
3.4.1. Start SAS 9.1.3 for Windows. Select File > Import Data. Make sure the settings
correspond to importing worksheet “Sheet1” from an Excel file and open GPAvsGMAT.xls.
Create a new Member called GPAvsGMAT in library WORK. Start SAS Analyst by
selecting Solutions > Analysis > Analyst.
3.4.2. Fit a Regression model, selecting Statistics > Regression > Simple… Then select
GMAT as the Dependent and GPA as the explanatory variable.
1.
2.
3.
4.
5.
6.
Click on Plots.
In the Simple Linear Regression: Plots window, click on the Residual tab.
Check Plot residuals vs variables.
In the Residuals field, check Studentized.
In the Variables field, check Predicted Y.
In the Normal probability and quantiles plots field, check Normal probabilityprobability plot.
7. Select OK and again OK.
8
BUSI 6220
FALL 2006
HW3 SOLUTIONS
3.4.3. Close the Analysis window and open the first Plot. This is the plot of the residuals
versus the predicted values.
The second plot is the normal probability plot of the residuals.
9