Download Chapter 14: Inference for Regression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Chapter 14: Inference for Regression
Use the following to answer questions 1-7:
An old saying in golf is “you drive for show and you putt for dough.” The point is that good putting is
more important than long driving for shooting low scores and hence winning money. To see if this is the
case, data on the top 69 money-winners on the PGA tour in 1993 are examined. The average number of
putts per hole for each player is used to predict their total winnings using the simple linear regression
model
1993 winnings = β 0 + (average number of putts per hole)β1 .
This model was fit to the data using the method of least squares. The following results were obtained
from statistical software.
R-squared = 0.081
s = 281,777
Variable
Constant
Avg. Putts
Coefficient
7,897,179
-4,139,198
s.e. of Coef
3,023,782
1,698,371
___ 1. The explanatory variable in this study is
a) 1993 winnings.
b) average number of putts per hole.
c) the slope, β 1.
d) –4,139,198.
___ 2. The quantity s = 281,777 is an estimate of the standard deviation σ of the deviations in the
simple linear regression model. The degrees of freedom for s are
a) 69.
b) 68.
c) 67.
d) 281,777.
___ 3. The intercept of the least-squares regression line is
a) 7,897,179.
b) –4,139,198.
c) 3,023,782.
d) 1,698,371.
___ 4. Suppose the researchers test the hypotheses
H0: β 1 = 0, Ha: β 1 < 0
The value of the t statistic for this test is
a) 2.61.
b) 2.44.
c) 0.081.
d) –2.44.
___ 5. A 95% confidence interval for the slope β 1 in the simple linear regression model is
(approximately)
a) 7,897,179 ± 3,023,782.
c) –4,139,198 ± 1,698,371.
b) 7,897,179 ± 6,047,564.
d) –4,139,198 ± 3,396,742.
___ 6. The correlation between 1993 winnings and average number of putts per hole is
a) 0.081.
b) –0.081.
c) 0.285.
d) –0.285.
Page 1
___ 7. Below is a scatterplot of the 1993 winnings versus the average number of putts per round;
below it is a plot of the residuals versus the average number of putts per round. Which of
the following statements is supported by these plots?
a) There is no striking evidence in these plots that the assumptions for regression are
violated.
b) The abundance of outliers and influential observations in the plots means that the
assumptions for regression are clearly violated.
c) These plots contain dramatic evidence that the standard deviation of the response
about the true regression line increases as the average number of putts per round
increases.
d) These plots contain many more points than were used to fit the least-squares regression
line in the previous problems. Obviously there is a major error present.
Page 2
Use the following to answer questions 8-13:
Salary data for 1992–1993 for a sample of 15 universities was obtained. We are curious about the
relation between mean salaries for assistant professors (junior faculty) and full professors (senior
faculty) at a given university. In particular, do universities pay (relatively) high salaries to both
assistant and full professors, or are full professors treated much better than assistant professors? In
other words, do senior faculty receive high salaries compared to their junior faculty counterparts?
Suppose we fit the following simple linear regression model
Full prof. salary = β 0 + ( Asst. prof. salary ) β1 .
The variables Full Prof. Salary and Asst. Prof. Salary are the mean salaries for full and assistant
professors at a given university. This model was fit to the data using the method of least squares. The
following results were obtained from statistical software. Note that salaries were in thousands of
dollars. Mean assistant professor salaries were treated as the explanatory variable and mean full
professor salaries as the response variable.
R-squared = 0.596
s = 5.503
Variable
Constant
Asst. Prof. Salary
Coef
15.0658
1.40827
SE of Coef
14.36
0.3217
___ 8. The intercept of the least-squares regression line is (approximately)
a) 15.07.
b) 14.36.
c) 1.41.
d) 0.32.
___ 9. A 90% confidence interval for the slope β 1 in the simple linear regression model is
(approximately)
a) 1.41 ± 0.57.
b) 1.41 ± 0.32.
c) –1.41 ± 0.57.
d) -1.41 ± 0.32.
___ 10. Suppose the researchers test the hypotheses
H0: β 1 = 0, Ha: β 1 ≠ 0
The value of the t statistic for this test is
a) 0.32.
b) 1.05.
c) 1.41.
d) 4.38.
___ 11. The correlation between mean assistant and full professor salaries is
a) 0.055.
b) 0.355.
c) 0.596.
d) 0.772.
___ 12. Is there strong evidence (and if so, why) that the relationship between mean assistant and
full professor salaries is adequately described by a straight line?
a) yes, because the slope of the least-squares line is positive.
b) yes, because the P-value for testing if the slope is 0 is quite small.
c) no, because the value of the square of the correlation is relatively small.
d) it is impossible to say, because we are not given the actual value of the correlation.
Page 3
___ 13. Below is a scatterplot of mean full versus assistant professor salaries (in thousands of
dollars).
Which of the following statements is supported by the plot?
a) There is no striking evidence in the plot that the assumptions for regression are
violated.
b) There appears to be an outlier and/or influential observations in the plot suggesting
that our results must be interpreted with caution.
c) The plot contains dramatic evidence that the standard deviation of the response about
the true regression line is not even approximately the same everywhere.
d) The plot contains many fewer points than were used to fit the least-squares regression
line in the previous problems. Obviously there is a major error present.
Page 4
___ 14. After a snowstorm in a large metropolitan area, meteorologists took a random sample of
several locations and measured depth of the snow along with the water content. The
results were summarized in a computer printout:
LINEAR REGRESSION ANALYSIS
The regression equation is
Water = -0.03039+0.10341*Inches
R-squared = 0.95020
DF = 4
T = 8.610
P =
Unfortunately, the printer failed just as the p-value was being
displayed. What is the p-value for the hypothesis test
H 0 : β = 0 vs. H a : β ≠ 0 ?
a) P < 0.001
b) P = 0.001
c) P = 0.0304
d) P = 0.103
e) P = 0.950
___ 15. Suppose we are given the following data:
Femur
Humerus
38
41
56
63
59
70
64
72
74
84
We wish to construct a 96% confidence interval for the true slope of the regression line
relating humerus (dependent) to femur (independent) lengths.
What is the value of SEb, the standard error of the slope?
HINT: Do NOT use the formula from the book. Instead, use your calculator in a clever way.
Even though you are not asked to conduct a test, use your calculator to perform a
LinRegTTest to acquire values for t and b. Then use the relationship you know between t,
b, and SEb to find SEb.
a) 13.1985
b) 15.8902
c) 1.9820
d) 0.0751
e) 11.7853
Answer Key
1.
2.
3.
4.
5.
6.
7.
8.
b
c
a
d
d
d
a
a
9.
10.
11.
12.
13.
14.
15.
a
d
d
b
b
b
d
Page 5