Download Name Least Squares Regression The data below are airfares to

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Differential equation wikipedia , lookup

Schwarzschild geodesics wikipedia , lookup

Partial differential equation wikipedia , lookup

Equation of state wikipedia , lookup

Itô diffusion wikipedia , lookup

Transcript
Name _________________
Least Squares Regression
The data below are airfares to various cities from Baltimore, MD (including the
descriptive statistics).
178
138
N
12
94
Mean
166.92
278
158
258
Std. Dev.
59.5
Min
94
198
188
98
Median
168
Q1
108
179
138
Q3
195.5
98
Max
278
1. If someone asks how much they can expect to pay for airfare from Baltimore, what
prediction would you give? Explain.
2. Suggest another variable (an explanatory variable) that might be useful for predicting
the airfare to a certain destination (the response variable).
The following table reports the distance (in miles) as well as the airfare for 14
destinations. A scatterplot is given as well.
Destinat... Distance
Airfare
newfare
1
Atlanta
576
178
158
2
Boston
370
138
118
3
Chicago
612
94
96
4
Dallas/Fo... 1216
278
300
5
Detroit
409
158
162
6
Denver
1502
258
308
7
Miami
946
198
138
8
New Orle... 998
188
118
9
New York 189
98
108
10
Orlando
787
179
118
11
Pittsburgh 210
138
198
12
St. Louis
98
178
737
Scatter Plot
Airfare
280
260
240
220
200
180
160
140
120
100
80
Airfare
Airfare
0
400
800
Distance
1200
1600
3. Based on this scatterplot, does it seem that knowing the distance to a destination
would be useful for predicting the airfare? Explain.
AP Stats
Least Squares Regression
4. Place a ruler (or straightedge) over the scatterplot so that it forms a straight line that
roughly summarizes the relationship between distance and airfare. Then draw this
line on the scatterplot.
5. Roughly what airfare does your line predict for a destination that is x1 = 300 miles
away?
y1 =
6. Roughly what airfare does your line predict for a destination that is x2 = 1500 miles
away?
y2 =
The equation of a line can be represented as y = a + bx, where y denotes the response
variable being predicted, x denotes the explanatory variable being used for prediction, a
is the value of the y-intercept, and b is the value of the slope of the line.
7. Use your answers to 5 & 6 to find the slope (change in y over change in x) of your
line.
Slope =
8. Use your answers to 5 & 7 to determine the y-intercept of your line using the
following equation: a = y1 – bx1.
y-intercept =
9. Put your answers to 7 & 8 together to produce the equation of your line.
Naturally, we would like to have a better way of choosing the line to approximate a
relationship than simply drawing one that seems about right. Since there are infinitely
many lines that one could draw, however, we need some criterion to select which line is
the “best” at describing the relationship. The most commonly used criterion is least
squares, which says to choose the line that minimizes the sum of squared vertical
distances from the points to the line.
We write the equation of the least squares line (also known as the regression line) as y =
a + bx. The most convenient expression for calculating these coefficients relates them to
the means and standard deviations of the two variables and the correlation coefficient
_
_
_
_
sy
between them: b = r
and a = y − b x (where x and y represent the means of the
sx
variables, sx and sy are the standard deviations, and r the correlation between them).
AP Stats
Least Squares Regression
10. Enter the airfare data into your calculator (L1 and L2).
11. Use the CORR program to compute the mean and standard deviation of distance and
airfare, and the value of the correlation between the two. Record the results below:
Mean
Std. Dev.
Correlation
Airfare (y)
Distance (x)
12. Write the equation of the least squares line for predicting airfare from distance.
We will now use the calculator to produce the regression line. In order for your
calculator to display the correlation coefficient you will need to turn the diagnostic
display mode to ON. To do this, access the calculator’s CATALOG menu ( 2nd
and scroll down until you find “DiagnosticOn.” Press ENTER twice.
0)
13. To use your calculator to find the equation of the least squares line, use the
STAT→CALC menu and select option LinReg(a+bx).
14. To find the least squares line for predicting airfare from distance, complete the
command by entering the two appropriate lists (L1 and L2) so that you have the
following:
15. Press ENTER and write the equation below. Compare this equation with the equation
you found in #12.
AP Stats
Least Squares Regression
One of the primary uses of regression is for prediction, for one can use the regression line
to predict the value of the y-variable for a given value of the x-variable simply by
plugging that value of x into the equation of the regression line.
16. What airfare does the least squares line predict for a destination that is 300 miles
away?
17. What airfare does the least squares line predict for a destination that is 1500 miles
away?
18. Draw the least squares line on the scatterplot below by plotting the two points that
you found in 16 & 17 and connecting them with a straight line.
Scatter Plot
Airfare
Airfare
280
260
240
220
200
180
160
140
120
100
80
0
400
800
1200
Distance
1600
19. To use your calculator to create a scatterplot of airfare vs. distance and then graph the
least squares line:
¾ Select LinReg(a+bx) from the STAT→CALC menu as before.
¾ Enter L1,L2 as before, but now type another comma. Press the VARS button
and then use the right arrow to see the Y-VARS menu. Press ENTER to
select Function and then ENTER again to select Y1. You should have the
following screen:
¾ Press ENTER. If you now press the blue Y= button at the top of your
calculator, you will see the regression equation stored as the Y1 function.
¾ Press GRAPH.
20. Just from looking at the regression line that you have drawn on the scatterplot, guess
what value the regression line would predict for the airfare to a destination 900 miles
away.
AP Stats
Least Squares Regression
21. Use the equation of the regression line to predict the airfare to a destination 900 miles
away, and compare this prediction to your guess in #20.
22. What airfare would the regression line predict for a flight to San Francisco, which is
2842 miles from Baltimore? Would you consider this prediction as reliable as the one
for 900 miles? Explain.
The actual airfare to San Francisco at that time was $198. That the regression line’s
prediction is not very reasonable illustrates the danger of extrapolation, i.e., of trying to
predict y for values of x beyond those contained in the data.
23. Use the equation of the regression line to predict the airfare if the distance is 900
miles. Record the prediction in the table below, and repeat for distances of 901, 902,
and 903 miles.
Distance
Predicted airfare
900
901
902
903
24. Do you notice a pattern in these predictions? By how many dollars is each prediction
higher than the preceding one? Does this number look familiar? Explain.
25. By how much does the regression line predict airfare to rise for each additional 100
miles that a destination is farther away?
26. If you look back at the original listing of distances and airfares, you find that Atlanta
is 576 miles from Baltimore. What airfare would the regression line have predicted
for Atlanta?
27. The actual airfare to Atlanta at that time was $178. By how much does the actual
price exceed the prediction?
The residual is the difference between the actual y-value and the expected value (from
the regression line), so the residual measures the vertical distance from the observed yvalue to the regression line.
AP Stats
Least Squares Regression
28. Record Atlanta’s fitted value and residual in the table below. Then calculate Boston’s
residual and Chicago’s fitted value using the definition of residual (i.e. without using
the equation of the regression line), showing your calculations.
Airfare
Destinat... Distance
Airfare
fittedvalue
residual meanfare deviation
1
Atlanta
576
178
2
Boston
370
138
3
Chicago
612
94
4
Dallas/Fo... 1216
278
226
52
166.92
5
Detroit
409
158
131.27
26.73
166.92
-8.92
6
Denver
1502
258
259.57
-1.56
166.92
91.08
7
Miami
946
198
194.3
3.7
166.92
31.08
8
New Orle... 998
188
200.41
-12.41
166.92
21.08
9
New York 189
98
105.45
-7.45
166.92
-68.92
10
Orlando
787
179
175.64
3.36
166.92
12.08
11
Pittsburgh 210
138
107.92
30.08
166.92
-28.92
12
St. Louis
98
169.77
-71.77
166.92
-68.92
737
126.7
-61.1
166.92
11.08
166.92
-28.92
166.92
-72.92
29. Which city has the largest (in absolute value) residual? What were its distance and
airfare? By how much did the regression line err in predicting its airfare; was it an
underestimate or an overestimate? Circle this observation on the scatterplot
containing the regression line.
30. For observations with positive residual values, was their actual airfare greater or less
than the predicted airfare?
31. For observations with negative residual values, do their points on the scatterplot fall
above or below the regression line?
32. The mean of these airfares is $166.92 with a standard deviation of $59.45. In the
absence of information about distance, we could use the overall mean airfare as the
prediction for each city’s airfare. In this situation, the deviations from the mean
would be the errors in those predictions. The last column of the table above reports
these deviations. Determine the deviation from the mean airfare for Dallas and record
it in the table.
33. For how many cities does the overall mean airfare result in a closer prediction to the
actual airfare than the regression line does? In other words, how many cities have a
deviation from the mean that is smaller than their residual from the regression line?
Which cities are these?
AP Stats
Least Squares Regression
34. Do most cities have a smaller residual or a smaller deviation from the mean? Does
this suggest that predictions from the regression line are generally better than the
airfare mean? Explain.
This worksheet was adapted from “Workshop Statistics”, 2nd edition, by Alan Rossman, Beth Chance, J.
Barr Von Oehsen.
AP Stats
Least Squares Regression