Download Sample midterm2 (Solutions)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Business Statistics 41100
Instructor: Federico M. Bandi
Sample midterm2 (Solutions)
The allotted time is 1 hour and 30 minutes. The exam is divided into three parts. The first and second part
are true-false and multiple choice, respectively. Please answer the true-false and multiple choice questions
on the exam by circling the best answer. There will be no partial credit for these questions. The third part
of the exam consists of two problems. Please answer these problems in the space provided on the exam (you
may use the back of the sheets if necessary). You will get partial credit for these problems provided that
your answers are organized and legible so that your train of thought can be easily followed.
Note: You should answer all questions on the exam. The blue books will not be looked at.
Please print your name in the space provided below and sign.
Panicking is not allowed and will be penalized!
Name:
Please, sign the following pledge: “I pledge my honor that I have not violated the Honor Code during this
examination.”
Signature:
True/False
Multiple Choice
Question 1
Question 2
8 Points
18 Points
11 Points
26 Points
Total
63 Points
1
True or False (1 point each)
[1] If we converted the X variable from dollars to
analysis would become 100 times larger.
1 0
100 s
of dollars, then the slope estimate b1 in regression
TF
True. Here is one way to see this. We know that
b1 = rxy ×
sy
.
sx
Also, we know that the standard deviation of a data set is in the same units as the data set.
Hence, if we divide the data set X by 100 the standard deviation will also be divided by 100
and, consequently, the slope will be multiplied by 100 (since everything else is unchanged).
[2] If the R2 of a simple linear regression is zero, then the Y and the X variable must be uncorrelated.
TF
True. The R-squared of a regression of Y on X is equal to the squared correlation between
Y and X. Hence, an R-squared equal to zero implies a correlation (and a covariance) equal to
zero.
[3] If the t-statistic t is distributed as a t distribution with 2 degrees of freedom, then P (−1.96 < t <
1.96) > 0.95.
TF
False. The t distribution has more probability in the tails than the standard normal distribution.
[4] If the assumptions of the SLR model hold, then a histogram of the standardized residuals should look
normal with mean equal to 0 and standard deviation equal to 1.
TF
True. One of the assumptions of the SLR model is that the true residuals are normally
distributed mean zero and variance σ 2 . Hence, the true standardized residuals should be
normally distributed mean zero and variance 1. The same applies to the estimated standardized
residuals (but, of course, normality would be an approximation in this case).
[5] It is possible to reject a certain null hypothesis about a certain parameter of interest (the slope in the
SLR model, say) even when the estimated value from the sample is equal to the conjectured value.
TF
False. If the estimated value from the sample is the same as the conjectured value, then
the t statistic is equal to zero, the p-value is equal to 1 and we always fail to reject.
[6] The p-value of a test is the probability of rejecting the null hypothesis when the null hypothesis is
true.
TF
False. The probability of rejecting the null hypothesis when the null hypothesis is true is
the level of the test (5%, 1%, and so on).
2
[7] It is possible to have a residual plot that shows evidence of both non-linearity and heteroskedasticity.
TF
True. The residuals might have a nonlinear pattern to them as well as increasing dispersion
around some underlying curve.
[8] A Durbin-Watson statistic that is equal to 2 provides evidence in favor of correlation in the residuals.
TF
False. When the DW statistic is equal to 2 we have no evidence of autocorrelation (see
Chapter 5 in the notes.)
3
Multiple Choice (3 points each)
[1] Consider the following variance-covariance matrix (i.e., variances are on the diagonal, covariances are
off the diagonal) of monthly returns on the S&P 500 index and monthly returns on the W indsor mutual
fund.
S&P 500
0.00230401
0.00215591
S&P 500
W indsor
W indsor
0.00215591
0.00236580
If the beta of W indsor is the slope estimate of a regression of W indsor on the S&P 500, what is the
W indsor beta?
(a) 0.9112
(b) 0.95
(c) 0.9357
(d) 1
(e) Cannot be computed based on the information given
The formula for the slope coefficient is
b1 =
Covariance(Y, X)
V ariance(X)
(see Chapter 1 in the notes). Hence,
b1 =
0.00215591
= 0.9357.
0.00230401
[2] Consider the following pairs of X and Y values: (2, 5), (3, 7), (4, 9), (5, 12), (10, ??). What value should
?? be for a regression of Y on X to deliver an R2 equal to 1?
(a) 20
(b) 21
(c) 22
(d) No value can work
(e) None of the above
No value can work. For the R2 to be equal to 1 the five pairs should lie on a positively-sloped
line. The first three numbers are on the line Y=1+2X but the fourth one is not. No choice of
?? would make the pairs lie on the same positively-sloped line.
4
[3] Consider the following SLR model:
Yi = 1 + 2Xi + εi ,
where εi ; N (0, 4) i.i.d.. The error term ε is independent of X for every i. If X = 2, then Y is:
(a) equal to 5
(b) N (5, 4)
(c) N (0, 4)
(d) N (5, 2)
(e) None of the above
The answer is (b). If X = 2, then Y = 5 + ε which implies that E(Y ) = 5 and V (Y ) = 4
since the error term has mean 0 and variance 4. Finally, note that Y is a linear function of
a normal random variable since the error term is normally distributed. Linear functions of
normal random variables are normal, thus Y is N (5, 4).
[4] Which of the following statements is FALSE?
(a) If we are simply interested in predicting Y given X, then non-linearity is a more serious violation of
the standard assumptions than heteroskedasticity
(b) If the estimated value from the sample is equal to 1.0001 and the conjectured value is equal to 1, then
we always fail to reject the assumption
(c) The larger the t-statistic, the smaller the p-value, the more we want to reject
(d) The sample means of Y and X lie on the regression line
(e) It is easier to estimate E(Y |X) than Y given X
(f) None of the above
Point (b) is false. If the standard error is very small, then the t statistic might be large
enough to reject the hypothesis. The intuition is the following. If the sample allows us to
estimate the parameter very well (with a small standard error, that is), then even small deviations from the conjecture might lead to rejections of the hypothesis (since the sample is very
informative about the true parameter). See Chapter 4, Statistical vs. Practical Significance.
5
[5] A class of 100 students has just taken an exam. The exam consisted of 40 true-false questions. A
diligent teaching assistant has recorded the number of correct answers (Y) and wrong answers (X) for
each student. Subsequently, the diligent (but not so bright) teaching assistant has regressed Y on X.
Which of the following statements is WRONG?
(a) The estimated intercept is 40
(b) The estimated slope is −1
(c) The estimated s value is 0
(d) The correlation between Y and X is 1
(e) The R2 is 1
(f) None of the above
The relationship between correct answers and wrong answers is Y = 40 − X. Hence, the
wrong statement is (d). The correlation between Y and X is −1.
(6) A common belief in finance is that days with more trading activity also tend to be days with larger
price moves (positive or negative). Intuitively the greater the disagreement about the value of the asset
across investors the greater the trading activity and the greater the price moves. A researcher decides
to measure trading activity by the number of contracts traded (volume). The researcher regresses the
absolute value of returns (absret) on volume and obtains the following regression output.
The regression equation is
absret = −0.0367 + 0.00000115volume
Intercept
volume
Estimate
−0.0367
0.00000115
Std error
0.3342
0.00000028
T-ratio
−0.11
4.10
P-value
0.913
??
Taking into account that the sample is very large, which of the following statements is WRONG?
(a) Statistically the intercept does not play much of a role
(b) The missing p-value is smaller than the reported p-value
(c) An approximate 68% confidence interval for the true slope is 0.00000115 ± 1 ∗ 0.00000028
(d) At the 1% level, we fail to reject the null hypothesis that the true slope is equal to 0
(e) None of the above
The wrong statement is (d). With a very large number of observations (as in the assumptions of the problem) the t cut-off value for a 1% test is about 3 (as in the standard normal
case). But, 4.10 > 3, hence we reject the null.
6
Long Problems
[1] (11 points) The market
that the relationship between the return
¡ ¢ model that we discussed in class
¡ implies
¢
on any stock A RtA and the return on the market RtM can be represented using the following SLR
model:
RtA = β 0 + β 1 RtM + εt
t = 1, 2, 3, ...
Assume all the standard assumptions of the SLR model are satisfied (for example, the errors are iid
and normally distributed with mean 0 and variance σ 2 ).
Assume you are going to be given the following quantity:
b=
R2A − R1A
R2M − R1M
(a) (3 points) Compute the expected value of b, E(b). (Hint: treat the returns on the market as being
non-random. Only the returns on the stock are random and, as always, their randomness is induced
by the error terms through the model.)
As in Chapter 2 (Section 2.5), plug the corresponding values into b first.
b
β 0 + β 1 R2M + ε2 − β 0 − β 1 R1M − ε1
R2A − R1A
=
R2M − R1M
R2M − R1M
ε2 − ε1
= β1 + M
.
R2 − R1M
=
Hence,
µ
E(b) = E β 1 +
ε2 − ε1
R2M − R1M
¶
= β1 +
E(ε2 ) − E(ε1 )
= β1,
R2M − R1M
by applying the standard formula for the expected value of a linear combination of random
variables.
(b) (2 points) Interpret your result from part (a). One or two sentences will more than suffice.
7
b is an unbiased estimator of the beta of the stock.
(One can show that it is not as good as the least-squares estimator, but this is another
story...)
(c) (3 points) Compute the variance of b, V (b). (Hint: same as for part (a).)
µ
V (b) = V β 1 +
ε2 − ε1
R2M − R1M
¶
=
V ar(ε2 ) + V ar(ε1 )
2σ 2
=
¡ M
¢
¡
¢2
2
R2 − R1M
R2M − R1M
by applying the standard formula for the variance of a linear combination of random variables.
(d) (3 points) What is the probability that b > β 1 , i.e., P (b > β 1 )?
As always, b is normally distributed since ε2 and ε1 are normally distributed. Hence,
Ã
!
2σ 2
b ∼ N β1, ¡
¢2
R2M − R1M
and P (b > β 1 ) = 0.5.
8
[2] (26 points) We are interested in the relation between someone’s GMAT (0-800) and SAT (0-1600)
scores. We have data on 41 GSB students. We use GMAT as the response variable (the regressand)
and SAT as the explanatory variable (the regressor) and obtain the following regression:
gmat = 403.62 + 0.214sat
Table1
Intercept
slope
Estimate
403.62
0.21431
Std error
51.63
0.03921
T-ratio
??
5.47
P-value
0.000
0.000
s = 30.39 and R2 =??
Table2
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
39
40
SS
27597
??
63620
(a) (2 points) Give an interpretation for the sign and magnitude of the estimated slope coefficient.
A higher SAT score implies a higher GMAT score. Theoretically (but only theoretically
given the way SAT and GMAT scores are given) an increase in the SAT score of one point
implies an (average) increase of a fifth of a point in the GMAT score.
(b) (2 points) What is the value of the missing t-ratio?
t=
403.62
= 7.81
51.63
9
(c) (3 points) Test the hypothesis that the slope is equal to 0 at the 1% level. (You should be very precise
here.)
P − value = 0 < 0.01 ⇒ reject
(d) (3 points) Test the hypothesis that the intercept is equal to 500 at the 5% level. (You should be as
precise as possible here.)
¯
¯
¯ 403.62 − 500 ¯
¯ = 1.86 < about 2 ⇒ fail to reject
t = ¯¯
¯
51.63
(e) (2 points) You have a younger cousin whose SAT score is 1400 . Use the following output to predict
your younger cousin’s GMAT score. Choose only one interval and explain your choice.
Table3
Fit
703.66
St. Error Fit
5.89
95% CI
(691.76, 715.57)
95% PI
(641.05, 766.28)
You should choose the 95% PI since your younger cousin is going to take the test only once.
The 95% CI would be an interval for the expected (average) score over several trials. (See
Chapter 4, subsection 4.7, in the notes.)
(f) (3 points) Table 3 above gives you sf it . What is the standard error of the predicted value (spred from
class)?
10
s2pred = s2f it + s2 = 5.892 + 30.392 = 34.69 + 923.55 = 958.24 ⇒ spred = 30.95.
(g) (3 points) Use your result from part (f) to find the t cut-off value t39,0.025 .
766.28 = 703.66 + t39,0.025 30.95 ⇒ t39,0.025 =
11
766.28 − 703.66
= 2.02.
30.95
(h) (2 points) Use your result from part (g) to re-run the test in part (d). (You should be very precise
now.)
¯
¯
¯ 403.62 − 500 ¯
¯ = 1.86 < 2.02 ⇒ fail to reject
¯
t=¯
¯
51.63
(i) (3 points) What value is the missing term in Table 2?
SSE = 63620 − 27597 = 36023.
(j) (3 points) What value is the missing R2 ?
R2 =
27597
= 43.3%.
63620
12