Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BUSI 6220 FALL 2012 HW3 SOLUTIONS 3.1. Analysis of Residuals in Excel 2007/2010 3.1.1. Open the Excel Worksheet GPAvsGMAT.xls and select Data > Data Analysis > Regression. Then fill out the popup window as follows: 3.1.2. Click OK. The Regression results will appear, together with the two requested plots: (1) A Residual Plot indicating approximately constant variance, and (2) a Normal Probability Plot indicating an approximately normal distribution of the residuals. 3.1.3. The Standardized residuals have also been saved. Obtain a line plot of these st. residuals (St. Residuals vs. order index). (For example: Select the range of listed Standardized Residuals, then select Insert > Scatter. Alternatively, select Insert > Line > 1 BUSI 6220 FALL 2012 HW3 SOLUTIONS OK > right-click on the line > Format data series > line color > no line > Marker options > automatic > OK.) The resulting plot implies there might be some serial autocorrelation, but the pattern is not very clear. However, since our data is cross-sectional and not time series, we can always randomly reorder the observations and force them to be un-autocorrelated. Therefore, this is not an important problem. Characteristic autocorrelation pattern (make sure you understand why!) 3.2. Analysis of Residuals in IBM SPSS 3.2.1. Start IBM SPSS 20. Select File > Open > Data. Open GPAvsGMAT.xls (change file type to “Excel”). 3.2.2. Select Analyze > Regression > Linear and specify GPA as the Dependent and GMAT as the Independent variable. To get SPSS to highlight any outliers, we click the Statistics button in the regression window, and check the box for Casewise Diagnostics. Note that the default value of standardized residual is 3, but you may want to replace it by 2 when you have a small sample size. Unusual observations, if any, will be displayed in a table in the output. Click Continue. To study the Normality (or lack thereof) of the residuals, click the Plots button in the regression window, and select both the Normal Probability Plot and the related Histogram. Remember that if Normality is present in the residuals, we would expect the points in the 2 BUSI 6220 FALL 2012 HW3 SOLUTIONS Probability Plot to fall on a straight line. From the Plots button menu we can also obtain a plot of the standardized residuals against (standardized) predicted values (of the dependent variable). Select ZRESID and ZPRED from the menu (as y-axis and x-axis respectively). In this plot, we are hoping to see a random scatter. Look out for "funneling" patterns, illustrating that the variance is not constant, and functional relationship patterns, illustrating a deficiency in our model form. Formatting of SPSS outputs: In order to create more attractive SPSS charts, double-click on them to invoke the chart editor. Then click on a chart component (select it) to invoke its properties. These include size, fill & border for graphical components and size, text layout, text style, and fill & border for textual components such as titles. In order to create more attractive output tables, select Edit > Options > Pivot Tables . Look at the Table Look drop-down menu and you will realize that what you have been seeing so far is only one of about 17 options! Some make your tables look good in academic papers, some others in PowerPoint presentations. Still not happy? You can always copy/paste the SPSS table object directly to Excel and reformat it to your satisfaction. 3 BUSI 6220 FALL 2012 HW3 SOLUTIONS 3.3. Analysis of Residuals in Minitab 3.3.1. Start MINITAB 16. Select File > Open Worksheet. Open GPAvsGMAT.xls. Fit a Regression model (Stat > Regression > Regression) using GPA as the response and GMAT as the predictor variable. Click on Graphs. Select Standardized and Four in one. Click OK. Click on Storage. Select Standardized Residuals. 3.3.2. Click OK. The Residuals Plots will appear: (1) A Residuals vs. Fits Plot (=Residual Plot) indicating approximately constant variance, (2) A Normal Probability Plot indicating an approximately normal distribution of the residuals, (3) A Histogram of the Residuals confirming an approximately normal distribution of the residuals, and (4) a Residuals vs. Order of the Data Plot indicating that the residuals may not be uncorrelated to each other. After this first visual inspection of the regression assumptions, we will continue with some formal tests of hypothesis. 3.3.3. Note that your selections in 3.3.1 saved the residuals under a new column called SRES1 (to indicate standardized residuals). We will now perform a test for constant variance in the residuals. We will first split the residuals in two groups. For this, we need to create a new column called group containing the group indexes. Type the data by hand, or select Editor > Enable Commands, and type the commands: MTB > set c4 DATA> (1:2)10 DATA> end MTB > This command will automatically enter the numbers 1 to 2 under column C4, so that each one of them appears 10 times. To get help on Minitab’s SET command, go to MINITAB 4 BUSI 6220 FALL 2012 HW3 SOLUTIONS Session Command Help and look at Session Commands > Manipulating and Calculating Data > Make Patterned Data > SET. Rename C4 to group. Then type the following commands: MTB > sort c3 c5; SUBC> by c1. MTB > Note the semicolon on the first line and the period on the second! This command will store in column C5 the standardized residuals sorted by increasing order of X observations. We are now ready to test whether the two groups of sorted residuals have equal variances. Select Stat > Basic Statistics > 2 Variances and then fill out the dialog box as shown: Test for Equal Variances: OrdRes versus group 95% Bonferroni confidence intervals for standard deviations group 1 2 N 10 10 Lower 0.807027 0.525253 StDev 1.23375 0.80299 Upper 2.48435 1.61694 F-Test (normal distribution) Test statistic = 2.36, p-value = 0.217 Levene's Test (any continuous distribution) Test statistic = 3.78, p-value = 0.068 The results indicate that the Levene’s Test with: H0: The two groups have equal variances, vs. HA: The two groups have unequal variances, results in marginal failure to reject the null hypothesis, since the p-value (0.068) is only marginally greater than some reasonable alpha values (say, 0.05) but smaller than others (such as 0.10). Therefore, the residuals appear to have constant variance. Note that, alternatively, you may choose to group the residuals in three groups: 5 BUSI 6220 FALL 2012 MTB > set C4 DATA> (1:3)7 DATA> end HW3 SOLUTIONS (NOTE: since this generates 21 values, delete the last value in c4) Then Minitab’s test for equal variances will perform a Bartlett’s test and a Levene’s test: Test for Equal Variances: OrdRes versus group 95% Bonferroni confidence intervals for standard deviations group 1 2 3 N 7 7 6 Lower 0.702020 0.583382 0.436345 StDev 1.19110 0.98981 0.76893 Upper 3.23189 2.68571 2.40224 Bartlett's Test (normal distribution) Test statistic = 0.92, p-value = 0.631 Levene's Test (any continuous distribution) Test statistic = 0.78, p-value = 0.476 3.3.4. We will now perform a test for randomness of the residuals. Once again, make sure the command line is enabled by selecting Editor > Enable Commands, and type the command: MTB > runs OrdRes Runs Test: OrdRes Runs test for OrdRes Runs above and below K = 0.00435209 The observed number of runs = 8 The expected number of runs = 11 10 observations above K, 10 below * N is small, so the following approximation may be invalid. P-value = 0.168 Runs Test tests the null hypothesis: H0: The residuals increase or decrease in value randomly, vs. HA: The residuals follow a pattern in the way they increase or decrease. The results of the Runs Test suggest that the residuals are probably random, as indicated by the high p-value. Another way to check for serial autocorrelation in the residuals is by performing a time series analysis and obtaining Auto-Correlation Function (ACF) and Partial Auto-Correlation Function (PACF) plots. Select Stat > Time Series > Autocorrelation and Partial Autocorrelation. Select OrdRes as the Series to be analyzed. The resulting plots indicate that there are no significant autocorrelation components in the first few lags. 6 BUSI 6220 FALL 2012 HW3 SOLUTIONS Partial Autocorrelation Function for OrdRes Autocorrelation Function for OrdRes (with 5% significance limits for the partial autocorrelations) 1.0 1.0 0.8 0.8 0.6 0.6 Partial Autocorrelation Autocorrelation (with 5% significance limits for the autocorrelations) 0.4 0.2 0.0 -0.2 -0.4 -0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -0.8 -1.0 -1.0 1 2 3 Lag 4 1 5 2 3 Lag 4 5 The presence of the First-order autocorrelation (referring to the first time lag) can also be detected using the Durbin-Watson Statistic. The hypotheses are: H0: There is no first-order autocorrelation in the residuals, vs. HA: There is a positive first-order autocorrelation in the residuals. To obtain this test, select Stat > Regression > Regression, specify the response and the predictor(s), then click on Options, and finally select Durbin-Watson Statistic. Regression Analysis: GPA versus GMAT The regression equation is GPA = - 1.70 + 0.00840 GMAT Predictor Constant GMAT Coef -1.6996 0.008399 S = 0.435014 SE Coef 0.7268 0.001440 R-Sq = 65.4% Analysis of Variance Source DF SS Regression 1 6.4337 Residual Error 18 3.4063 Total 19 9.8400 T -2.34 5.83 P 0.031 0.000 R-Sq(adj) = 63.5% MS 6.4337 0.1892 F 34.00 P 0.000 Durbin-Watson statistic = 1.46398 A Durbin-Watson statistic value higher than 1.15 indicates that H0 should not be rejected. 3.3.5. We will now perform a test for normality of the residuals. Selecting Stat > Basic Statistics > Normality Test, and specify SRES1 as the variable: 7 BUSI 6220 FALL 2012 HW3 SOLUTIONS By default, Minitab performs an Anderson-Darling test. The hypotheses are H0: The residuals are normally distributed, vs. HA: The residuals are not normally distributed. The high p-value (0.607) indicates that we have no reason to reject the normality assumption. Other tests for normality (same H0, HA): Ryan-Joiner, Shapiro-Wilk, Kolmogorov-Smirnov. 3.4. Analysis of Residuals in SAS. (INCOMPLETE) To verify the assumptions of simple linear regression, begin by looking at a plot of the residuals versus the predicted values and a normal probability plot of the residuals. 3.4.1. Start SAS 9.3. Select File > Import Data. Make sure the settings correspond to importing worksheet “Sheet1” from an Excel file and open GPAvsGMAT.xls. Create a new Member called GPAvsGMAT in library WORK. Start SAS Analyst by selecting Solutions > Analysis > Analyst. After clicking on the Analyst window and making sure it is active, select File > Open By SAS Name. Double-click the Work library icon to expand its contents, then select the Gpavsgmat table and click OK. This will result in a Gpavsgmat table opening in a VIEWTABLE window. 8