* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lab 10
Survey
Document related concepts
Transcript
Name: April 1, 2011 T.A. name/Class time: MW Lecturer: Lab 10: Chapters 2 and 10 The SAT and the ACT are the two major standardized tests that colleges use to evaluate candidates for admission. Most students take just one of these tests. However, some students take both, we have data for the scores of 60 students who did this. What is the relationship between these two types of test scores? You can find this data on the course website www.stat.purdue.edu/~gundlach/stat301 under “Labs.” The dataset is called “SAT.sav”. 1. (2 points) SAT is the explanatory variable, and ACT is the response variable. Show a scatterplot of the data. Describe the form, direction, and strength of the relationship. 2. (1 point) What is the correlation between SAT and ACT scores? Show the Pearson’s correlation output and state the correlation below. 3. (2 points) What is the equation of the least-squares regression line? Include the least square regression line on the scatterplot from #1. 4. (1 point) What is the % of variation in ACT score that is explained by the least squares regression line? Comment on whether this number is good or bad. (What do we like this value to be?) 5. (2 points) Calculate (by hand, showing your work) the predicted ACT score when the SAT score is 1050. 1 6. (2 points) Calculate (by hand, showing your work) the residual for an SAT score of 1050. 7. (1 point) Use SPSS to calculate a 95% prediction interval for ACT score when SAT score is 1050. 8. (1 point) Make a Normal probability plot of the residuals. (0.5 points) Do the points fall around a straight line? (Yes or No) (0.5 points) Does the distribution of the residuals look normal? (Yes or No) 9. (1 point) Make a residual plot with a y = 0 reference line. (0.5 points) Do the residuals fall randomly around the 0 reference line? (Yes or No) (0.5 points) Can you find any clear pattern in the residual plot? (Yes or No) (0.5 points) Are there any outliers? (Yes or No) (0.5 points) If “Yes”, where is the outlier? Please circle the outlier point in the residual plot, and identify the student (observation number) in the dataset. 10. (2 points) Remove the outlier from the data set. Re-run the regression. State the new equation of the line and the new R2. 11. (2 points) Was the outlier influential? Explain your reasoning. Along with the lab report, you need to turn in the scatterplot with the regression line, the Pearson’s correlation table, the “model summary” and “coefficients” tables from the regression output for the full data set and for the data set without the outlier(s), the Normal probability plot and residual plot for the full data set. 2 SPSS instructions for Lab 10 Print off: scatterplot with the regression line, model summary table from regression, coefficients table from regression, Normal probability plot, residual plot. Download the dataset from course website and save it on your desktop and open it by SPSS. (1) LSR Equation, Normal probability plot, residuals, predicted values, prediction interval, test statistic and P-value for testing the slope and intercept: Analyze Regression Linear Move “ACT” to the Dependent box and “SAT” to the Independent(s). Click “Statistics” on the right side. Check “Confidence Intervals” box. “Continue” Click “Plots” on the right side. Check “Normal probability plot” box. “Continue” Click “Save” button on the right side. You will see the “Linear Regression: Save” dialogue box. Check “Unstandardized” box under “Residuals”. Check the “Unstandardized” box under “Predicted Values”. Check the “Individual” box under “Prediction Intervals.” Then click on “continue” “OK”. Based on the coefficients table in the output, write down the LSR equations. (2) Scatter Plot with Regression Line: Graphs Legacy Dialogs Scatter Simple Define Choose X (independent/explanatory variable) and Y (dependent/response variable) according to the problem. Then click on “OK.” Adding regression line: Double click on the scatterplot in the output, so that the “Chart Editor” window is popped up. Click on any point in the plot once, such that all the points become yellow. Click on the “add fit line” icon (an icon with a scatterplot and a LSR line) on the top of the chart editor. The LSR line will be added. Then close the Chart Editor window. (3) Correlation and hypothesis tests for correlation: Analyze Correlate Bivariate. Move SAT and ACT into the box on the right by clicking on the arrow button. Click “OK.” Use this Pearson’s correlation table to get the correlation and to do hypothesis tests for independence or the correlation. (4) Residual Plot with Reference Line: Go back to the SPSS data Editor, you will find a column named “RES_1”, which are the saved residuals. Graphs Legacy Dialogs Scatter Simple Define Let “Unstandardized Residuals” be Y and “SAT” be X. Then click on “OK”. Adding reference line: Double click on the scatterplot in the output, so that the “Chart Editor” window is popped up. 3 Click on “Options” Y Axis Reference Line Under reference tab, fill out “0” in the “Y Axis Position” Then click “Apply” and close the Editor Window. Screen shot of the “save” box of the regression setup. Note which boxes are checked. This is what your data page looks like after you have run the regression. “PRE_1” is the predicted value column. “RES_1” is the residual column. “LICI_1” and “UICI_1” are the lower and upper prediction intervals for individual scores. 4