Download Lab 10

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Forecasting wikipedia , lookup

Choice modelling wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Name:
April 1, 2011
T.A. name/Class time:
MW Lecturer:
Lab 10: Chapters 2 and 10
The SAT and the ACT are the two major standardized tests that colleges use to evaluate
candidates for admission. Most students take just one of these tests. However, some
students take both, we have data for the scores of 60 students who did this. What is the
relationship between these two types of test scores?
You can find this data on the course website www.stat.purdue.edu/~gundlach/stat301
under “Labs.” The dataset is called “SAT.sav”.
1.
(2 points) SAT is the explanatory variable, and ACT is the response variable.
Show a scatterplot of the data. Describe the form, direction, and strength of the
relationship.
2.
(1 point) What is the correlation between SAT and ACT scores? Show the
Pearson’s correlation output and state the correlation below.
3.
(2 points) What is the equation of the least-squares regression line? Include the
least square regression line on the scatterplot from #1.
4.
(1 point) What is the % of variation in ACT score that is explained by the least
squares regression line? Comment on whether this number is good or bad. (What
do we like this value to be?)
5.
(2 points) Calculate (by hand, showing your work) the predicted ACT score when
the SAT score is 1050.
1
6.
(2 points) Calculate (by hand, showing your work) the residual for an SAT score
of 1050.
7.
(1 point) Use SPSS to calculate a 95% prediction interval for ACT score when
SAT score is 1050.
8.
(1 point) Make a Normal probability plot of the residuals.
 (0.5 points) Do the points fall around a straight line? (Yes or No)
 (0.5 points) Does the distribution of the residuals look normal? (Yes or No)
9.
(1 point) Make a residual plot with a y = 0 reference line.
 (0.5 points) Do the residuals fall randomly around the 0 reference line? (Yes
or No)
 (0.5 points) Can you find any clear pattern in the residual plot? (Yes or No)
 (0.5 points) Are there any outliers? (Yes or No)
 (0.5 points) If “Yes”, where is the outlier? Please circle the outlier point in
the residual plot, and identify the student (observation number) in the dataset.
10.
(2 points) Remove the outlier from the data set. Re-run the regression. State the
new equation of the line and the new R2.
11.
(2 points) Was the outlier influential? Explain your reasoning.
Along with the lab report, you need to turn in the scatterplot with the regression line, the
Pearson’s correlation table, the “model summary” and “coefficients” tables from the
regression output for the full data set and for the data set without the outlier(s), the
Normal probability plot and residual plot for the full data set.
2
SPSS instructions for Lab 10
Print off: scatterplot with the regression line, model summary table from regression, coefficients
table from regression, Normal probability plot, residual plot.
Download the dataset from course website and save it on your desktop and open it by SPSS.
(1)
LSR Equation, Normal probability plot, residuals, predicted values, prediction
interval, test statistic and P-value for testing the slope and intercept:
Analyze  Regression  Linear
Move “ACT” to the Dependent box and “SAT” to the Independent(s).
Click “Statistics” on the right side. Check “Confidence Intervals” box.  “Continue”
Click “Plots” on the right side. Check “Normal probability plot” box.  “Continue”
Click “Save” button on the right side. You will see the “Linear Regression: Save”
dialogue box. Check “Unstandardized” box under “Residuals”. Check the
“Unstandardized” box under “Predicted Values”. Check the “Individual” box under
“Prediction Intervals.”
Then click on “continue”  “OK”.
Based on the coefficients table in the output, write down the LSR equations.
(2)
Scatter Plot with Regression Line:
Graphs  Legacy Dialogs  Scatter  Simple  Define 
Choose X (independent/explanatory variable) and Y (dependent/response variable)
according to the problem. Then click on “OK.”
Adding regression line: Double click on the scatterplot in the output, so that the “Chart
Editor” window is popped up.
Click on any point in the plot once, such that all the points become yellow.
Click on the “add fit line” icon (an icon with a scatterplot and a LSR line) on the top of
the chart editor. The LSR line will be added. Then close the Chart Editor window.
(3)
Correlation and hypothesis tests for correlation:
Analyze  Correlate  Bivariate.
Move SAT and ACT into the box on the right by clicking on the arrow button. Click
“OK.”
Use this Pearson’s correlation table to get the correlation and to do hypothesis tests for
independence or the correlation.
(4)
Residual Plot with Reference Line: Go back to the SPSS data Editor, you will find a
column named “RES_1”, which are the saved residuals.
Graphs  Legacy Dialogs  Scatter  Simple  Define 
Let “Unstandardized Residuals” be Y and “SAT” be X. Then click on “OK”.
Adding reference line: Double click on the scatterplot in the output, so that the “Chart
Editor” window is popped up.
3
Click on “Options”  Y Axis Reference Line  Under reference tab, fill out “0” in the
“Y Axis Position”  Then click “Apply” and close the Editor Window.
Screen shot of the “save” box of
the regression setup. Note
which boxes are checked.
This is what your data page
looks like after you have run the
regression. “PRE_1” is the
predicted value column.
“RES_1” is the residual column.
“LICI_1” and “UICI_1” are the
lower and upper prediction
intervals for individual scores.
4