Download SAMPLE MULTIPLE CHOICE QUESTIONS FOR MIDTERM

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Instrumental variables estimation wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Regression toward the mean wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
SAMPLE MULTIPLE CHOICE QUESTIONS FOR MIDTERM
1.) Suppose the monthly demand for tomatoes (a perishable good) in a small town is random. With
probability 1/2, demand is 50; with probability 1/2, demand is 100. You are the only producer of tomatoes
in this town. Tomatoes sell for a fixed price of $1, cost $0.50 to produce, and can only be sold in the
local market. If you produce 60 tomatoes, your expected profit is:
a) $15
b) $25
c) $45
d) $50
e) none of the above
E(PROFIT) = E(REVENUE – COST) = ½*($50-$30) + ½*($60-$30) = ½*$20 + ½*$30 = $25
2.) Suppose you have the following information about the cdf of a random variable X, which takes one of
4 possible values:
Value of X
Cdf
1
0.25
2
0.4
3
0.8
4
Which of the following is/are true?
a) Pr(X = 2) = 0.4
b) E(X) = 2.5
c) Pr(X = 4) = 0.2
d) all of the above
e) none of the above
The cdf tells you what is the cumulative probability of observing a value of X which is less than or equal
to Xi. Thus, there is a 80% chance that you will observe a value of X less than or equal to 3 or a 100%
chance that you will observe a value of X less than or equal to 4. So a) is incorrect.
By subtracting the cumulative probabilities from one another, you can construct a pdf. Thus,
Pr(X=1) = 0.25, Pr(X=2) = 0.15, Pr(X=3) = 0.40, and Pr(X=4) = 0.20. With the pdf in hand, you can
calculate E(X) = 0.25*1 + 0.15*2 + 0.40*3 + 0.20*4 = 2.55. So b) is incorrect and the only correct
answer is c).
3.) If the covariance between two random variables X and Y is zero then
a) X and Y are independent
b) Knowing the value of X provides no information about the value of Y
c) E(X) = E(Y) = 0
d) a and b are true
e) none of the above
Remember that independence implies zero covariance but not the other way around, so it cannot be a).
Likewise b) is the very definition of independence and c) is just nonsense.
4.) If two random variables X and Y are independent,
a) their joint distribution equals the product of their marginal distributions
b) the conditional distribution of X given Y equals the marginal distribution of X
c) their covariance is zero
d) a and c
e) a, b, and c
Again, this answer follows from the definition of independence given in Lecture 3a.
5.) Suppose you have a random sample of 10 observations from a normal distribution with mean = 10 and
variance = 2. The sample mean (x-bar) is 8 and the sample variance is 3. The sampling distribution of xbar has
a) mean 8 and variance 3
b) mean 8 and variance 0.3
c) mean 10 and variance 0.2
d) mean 10 and variance 2
e) none of the above
The correct answer is c.) Note how the mean and variance of the sampling distribution of x-bar are given
by the population quantities and not the sample characteristics.
6.) If q is an unbiased estimator of Q, then:
a) Q is the mean of the sampling distribution of q
b) q is the mean of the sampling distribution of Q
c) Var[q] = Var[Q] / n where n = the sample size
d) q = Q
e) a and c
We define an unbiased estimator as one for which E(q)-Q = 0 or E(q)=Q. So a) is correct whereas b) is
incorrect. The third statement is incorrect for a number of reasons, one of which is that Q is a constant
and has a zero variance. Finally, d) is nonsense.
7.) Suppose you compute a sample statistic q to estimate a population quantity Q. Which of the following
is/are false?
[1] the variance of Q is zero
[2] if q is an unbiased estimator of Q, then q = Q
[3] if q is an unbiased estimator of Q, then q is the mean of the sampling distribution of Q
[4] a 95% confidence interval for q contains Q with 95% probability
a) 2 only
b) 3 only
c) 2 and 3
d) 2, 3, and 4
e) 1, 2, 3, and 4
Since Q simply exists and is fixed number or constant, it has no variance. So [1] is correct. Again, [2] is
nonsense whereas [3] confuses what is from the sample and what is from the population and is incorrect.
Finally, [4] misinterprets what a confidence interval captures. So d) is the correct answer.
8.) The law of large numbers says that:
a) the sample mean is a biased estimator of the population mean in small samples
b) the sampling distribution of the sample mean approaches a normal distribution as the
sample size approaches infinity
c) the behaviour of large populations is well approximated by the average
d) the sample mean is an unbiased estimator of the population mean in large samples
e) none of the above
The law of large numbers states that as n approaches infinity, q approaches Q. It definitely has some
implications about the likely biasedness of an estimator, but it is really a statement about the consistency
of an estimator (where consistency is defined by the condition given about q approaching Q in the limit).
9.) Suppose you draw a random sample of n observations, X1, X2, …, Xn, from a population with unknown
mean μ. Which of the following estimators of μ is/are biased?
a) the first observation you sample, X1
b)
X2
c) X 2  s 2 / n
d) b and c
e) a, b, and c
We have seen before that a) is actually unbiased. The second line seems like it might be unbiased since by
taking the square root of a square you just arrive back at X-bar. A problem comes up though if your
original X-bar is negative, say -10. Squaring -10 and then taking the square root you arrive at 10. This is
problematic and will lead to b) being a biased estimator because we cannot say that its expected value is
indeed μ. Likewise for c) which compounds the problems of b) by subtracting out a constant, an operation
which we know will impart bias. So the correct answer is d)
10.) The significance level of a test is the probability that you:
a) reject the null when it is true
b) fail to reject the null when it is false
c) reject the null when it is false
d) fail to reject the null when it is true
e) none of the above
This is simply the definition of the significance level of a test.
11.) Suppose you want to test the following hypothesis at the 5% level of significance:
H0: μ = μ0
H1: μ ≠ μ0
Which of the following is/are true?
a) the probability of a Type I error is 0.05
b) the probability of a Type I error is 0.025
c) the t statistic for this test has a t distribution with n-1 degrees of freedom
d) a and c
e) b and c
Again, a) follows from the definition of the significance level of a test. The second option is eliminated as
it contradicts a). The third answer draws from Lecture 4, Homework #1, and Tutorial #3. It is also true so
that the correct answer is d)
12.) Suppose [L(X), U(X)] is a 95% confidence interval for a population mean. Which of the following
is/are true?
a) Pr L X   X  U  X   0.90


b) PrL X   X  U  X   0.95
c) PrX  L X  PrU  X   X   0.05
d) a and c
e) none of the above
The key here is that X-bar is not the population mean, so that our confidence interval has nothing to say
about it. Therefore, e) is the correct answer.
13.) Which of the following is a linear regression model:
a) Yi    1 X i   2 X i2   i
b) log( Yi )  0  1 log( X i )   i
c) Yi  0  1e X i   i
d) all of the above
e) none of the above
The general rule is that if you can substitute a generic expression for your independent variables (but not
your parameters), your regression model is linear if it looks like Yi    i X i   i
So in the first case, you can substitute Z for X-squared; in the second case, you can substitute Z for the
log of X; and in the third case, you can substitute Z for the exponential of X. Therefore, they are all
acceptable linear regression models.
14.) In the linear regression model, the stochastic error term:
a) measures the difference between the dependent variable and its predicted value
b) measures the difference between the independent variable and its predicted value
c) is unbiased
d) a and c
e) none of the above
The stochastic error term measures the difference between Y and the conditional expectation of Y. So a)
and b) are incorrect whereas as c) is nonsense (the error term is not an estimator).
15.) In the linear regression model, the least squares estimator
a) minimizes the sum of squared residuals
b) is unbiased
c) is most efficient among the class of linear estimators
d) maximizes the value of R2
e) all of the above
We know by definition that the OLS estimator minimizes the sum of squared residuals (thus, it produces
the “least squares”). We have also noted that it has the desirable property of being both unbiased and
most efficient. Finally, we have seen in Lecture 7b that since it minimizes the sum of squared residuals (or
in other words, RSS) it automatically maximizes the value of R2. So, the answer must be e).
16.) Suppose that in the simple linear regression model Yi = β0 + β1Xi + εi on 100 observations, you
calculate that R2= 0.5, the sample covariance of X and Y is 10, and the sample variance of X is 15. Then
the least squares estimator of β1 is:
a) not calculable using the information given
b) 1/3
c) 1 / 3
d) 2/3
e) none of the above
In Top Hat Monocle, we saw how the least squares estimator of beta-one in this case of a single
independent variables can be expressed as the ratio of the covariance of X and Y to the variance of X.
Thus, 10/15 = 2/3 and the correct answer is d).
17.) Suppose upon running a regression, EViews reports a value of the explained sum of squares as 1648
and an R2 of 0.80. What is the value of the residual sum of squares in this case?
a.) 0
b) 412
c) 1318.4
d) unknown as it is incalculable
e) none of the above
Since R2 is defined as ESS/TSS = (TSS – RSS)/TSS = 1 – RSS/TSS, we can solve for RSS by substitution.
That is, 0.80 = 1648/TSS which implies that TSS = 2060 and 0.80 = 1 – RSS/2060 which implies that
RSS = 412. So, the correct answer is b).
18.) In the linear regression model, adjusted R2 measures
a) the proportion of variation in Y explained by X
b) the proportion of variation in X explained by Y
c) the proportion of variation in Y explained by X, adjusted for the number of independent
variables
d) the proportion of variation in X explained by Y, adjusted for the number of independent
variables
e) none of the above
This is simply the definition of adjusted R-squared.
19.) In the linear regression model, the degrees of freedom
a) is equal to the number of observations (n) minus 1
b) affects the precision of the coefficient estimates
c) affects the value of the coefficient estimates
d) all of the above
e) none of the above
We know that the degrees of freedom in the linear regression model will actually be equal to n – k – 1
where k equals the number of parameters (or coefficients on the independent variables). We also know
that the value of the coefficient estimates should not depend on the degrees of freedom. This leaves us
with b) which is a claim made in Lecture 8.
20.) In the Capital Asset Pricing Model (CAPM),
a) β measures the sensitivity of the expected return of a portfolio to systematic risk
b) β measures the sensitivity of the expected return of a portfolio to specific risk
c) β is greater than one
d) α is less than zero
e) R2 is meaningless
This is simply the definition of beta in the CAPM.
SAMPLE SHORT ANSWER QUESTIONS FOR MIDTERM
1.) Suppose the monthly demand (x) for a perishable good is a random variable that takes one of six
possible values. The pdf of monthly demand is f(x):
x
f(x)
100
0.10
200
0.10
300
0.20
400
0.35
500
0.20
600
0.05
The good sells for a fixed price of $15 per unit, and production costs are $10 per unit. Therefore, the firm
earns $5 profit on each unit sold and loses $10 on each unit that goes unsold. If the firm brings 400 units
to market, what is the expected profit? What is the variance of profit?
Note: expected revenue is not the same thing as expected demand times price.
It is constrained by the fact you can never sell more than 400 units, no matter what the demand.
E(PROFIT) = E(REVENUE - COST)
E(REVENUE - COST) = 0.10*($15*100 - $4000) + 0.10*($15*200 - $4000)
+ 0.20*($15*300 - $4000) + 0.35*($15*400 - $4000)
+ 0.20*($15*400 - $4000) + 0.05*($15*400 - $4000)
E(REVENUE - COST) = 0.10*(-$2500) + 0.10*(-$1000) + 0.20*($500) + 0.35*($2000)
+ 0.20*($2000) + 0.05*($2000)
E(REVENUE - COST) = -$250 - $100 + $100 + $700 + $400 + $100 = $950
Equivalently,
E(REVENUE - COST) = E(REVENUE) – E(COST) = E(REVENUE) – COST(Q=400)
E(REVENUE)
– $4000 = 0.10*($15*100) + 0.10*($15*200) + 0.20*($15*300) + 0.35*($15*400)
+ 0.20*($15*400) + 0.05*($15*400) - $4000
E(REVENUE) – $4000 = $150 + $300 + $900 + $2100 + $1200 + $300 - $4000 = $950
Var(PROFIT) = 0.10*(-$2500-$950)2 + 0.10*(-$1000-$950)2 + 0.20*($500-950)2
+ 0.35*($2000-$950)2 + 0.20*($2000-$950)2 + 0.05*($2000-$950)2
Var(PROFIT) = 1190250 + 380250 + 40500 + 385875 + 220500 + 55125 = $22272500
Note: the standard deviation is a much more reasonable and easily interpreted number, $1507.48.
2.) Suppose the price of a stock X is a random variable. On any day, its value may increase, decrease, or
not change at all. The distribution of daily price changes is as follows:
Price Change ($)
Probability
-1.00
0.283
0.00
0.25
0.50
1.00
0.10
a) What is the probability of a $0.5 increase in price?
b) Draw the pdf and cdf of price changes.
c) What is the expected price change?
d) What is the variance of the price changes?
e) Suppose the stock’s price today is $10. What is the expected value of tomorrow’s price?
What is its variance?
a) Pr($0.5) = 1 - 0.283 - 0.25 - 0.10 = 0.367
b) First, the pdf
1.0
0.8
0.6
0.4
0.2
0.0
-1.00
0.00
0.50
1.00
0.00
0.50
1.00
Now, the cdf
1.0
0.8
0.6
0.4
0.2
0.0
-1.00
c) E(Price change) = -1.00*0.283 + 0.00*0.25 + 0.50*0.367 + 1.00*0.10
E(Price change) = -0.283 + 0.00 + 0.1835 + 0.10 = $0.0005
d) Var(Price change) =
(-1.00-0.0005)2*0.283 + (0.00-0.0005)2*0.25 + (0.50-0.0005)2*0.367 + (1.00-0.0005)2*0.10
Var(Price change) = 0.2833 + 0.0000 + 0.0916 + 0.0999 = $20.4748
Again, the variance dwarfs the mean.
e) For the expected value:
E(Tomorrow’s price) = E($10 + price change) = E($10) +E(price change) = $10.0005
For the variance:
Var(Tomorrow’s price) = (9.00-10.0005)2*0.283 + (10.00-10.0005)2*0.25 + (10.50-10.0005)2*0.367
+ (11.00-10.0005)2*0.10
Var(Tomorrow’s price) = 0.2833 + 0.0000 + 0.0916 + 0.0999 = $20.4748
Note: this is exactly what we had in part d) as adding a constant changes nothing about the underlying
variance.
Now suppose the distribution above only applies when the weather is sunny. When the weather is rainy,
the distribution is:
Price Change ($)
Probability
-1.00
0.50
0.00
0.20
0.50
0.20
1.00
0.10
f) What is the expected price change on a rainy day?
g) Suppose the probability of rain is 0.4, and the probability of sun is 0.6. What is the
expected price change?
f) E(Price change | Rainy day) = -1.00*0.50 + 0.00*0.20 + 0.50*0.20 + 1.00*0.10
E(Price change | Rainy day) = -0.50 + 0.00 + 0.10 + 0.10 = -$0.30
g) From the law of iterated expectations, an unconditional expectation is just a weighted average of
conditional expectations where the weights are the probabilities of outcomes on which we are
conditioning, so
E(Price change) = E(Price change | sunny day) * Pr(sunny day) +
E(Price change | rainy day) * Pr(rainy day)
E(Price change) = $0.0005 * 0.6 + (-$0.30 * 0.4) = -$0.1197
3.) Suppose you collect the following data that are a random sample from a N(μ,σ2) population:
4.37 6.99 7.85 2.60 3.34 5.94 4.21 5.99 8.53 4.92
a) Compute the t-statistic for testing the hypothesis:
H0 : μ = 4
H1 : μ ≠ 4
b) What is the sampling distribution of the test statistic you computed in part a)?
c) Can you reject the null hypothesis of part a at the 5% level of significance? Explain.
a) We need to form
T
X    ~ t
0
s/ n
n 1
First, X-bar = (4.37 + 6.99 + 7.85 + 2.60 + 3.34 + 5.94 + 4.21 + 5.99 + 8.53 + 4.92) / 10
X-bar = 5.4740
Next, s2 = 1/9 * [(4.37 – 5.474)2+ (6.99 – 5.474)2 + (7.85 – 5.474)2 + (2.60 – 5.474)2
+ (3.34 – 5.474)2 + (5.94 – 5.474)2 + (4.21 – 5.474)2 + (5.99 – 5.474)2
+ (8.53 – 5.474)2 + (4.92 – 5.474)2]
s2 = 3.7448 which implies s = 1.9351 and
T
5.4740  4.00 
1.9351 / 10
1.4740
 2.4088 ~ tn1
1.9351 / 10
b) The sampling distribution is defined as the set of possible values that the statistic might take, and the
probabilities associated with each of them. It measures uncertainty over the possible value that the
statistic might take in repeated samples from the same population. In this case, the sampling distribution
is simply tn-1 = t9.
c) The critical value for a t distribution with 9 degrees of freedom and 0.05 level of significance in the
presence of a two-sided alternative is equal to 2.262. This suggests we can safely reject the null that μ =
4.
4.) Suppose you have a random sample of 100 SFU students. In response to a short survey, each student
reported the usual number of hours per week that he/she spent working off-campus (Xi), and the usual
number of hours per week that he/she spent engaged in social activities (Yi). The university has hired you
to analyze these data. Their main interest is the total number of hours per week that SFU students engage
in non-academic activities. Define a new variable, Zi = Xi + Yi, to measure the number of hours that SFU
students engage in non-academic activities. Suppose that in the population of SFU students:
EX i    X , EYi   Y ,Var X i    X2 ,Var Yi    Y2 , CovX i , Yi    XY .
a) The university has asked you to estimate the average number of hours that SFU students
engage in non-academic activities, i.e., E[Zi] = μz. Your roommate, who has already taken BUEC
333, says “That’s easy! Just take the first person in your sample. Their value, Z1, is an unbiased
estimator of μz.” Is your roommate right? Explain.
b) What is the variance of your roommate’s estimator in part a)?
c) Give a more efficient, but still unbiased, estimator of μz. Show that it is unbiased and that it
is more efficient than your roommate’s estimator.
d) What is the variance of the sampling distribution of your estimator in part c)? Explain
what the variance of the sampling distribution measures.
a) Yes, your roommate is right. Since E(V + W) = E(V) + E(W), it follows that
E(Zi)=E(Xi + Yi) = E(Xi) + E(Yi)= μx + μy = μz. And since E(Zi) = E(Z1), E(Z1) = μz.
 
1 n
 2
Xi  
.

 n i 1  n
b) A good place to start is to remember that Var X  Var 
Using Z1 is just a special case of X-bar where n = 1, so Var Z1    2 .
c) If Z1/1 is unbiased, then it stands to reason that (Z1 + Z2)/2 is unbiased as well since
E((Z1 + Z2)/2) = ½*(E(Z1) + E(Z2)) = ½*(μz + μz) = μz.
 Z1  Z 2  
  2.

2
2


2
Furthermore, Var 
d) See above. The variance of the sampling distribution measures the dispersion of the sample statistic of
interest and represents the fact that different samples drawn from the same population will necessarily
generate different values of the sample statistic as different observations take different values.
5.) Suppose we have a linear regression model with one independent variable and no intercept:
Yi = βXi + εi
a) Verbally explain the steps necessary to derive the least squares estimator (hint: this should entail
four distinct steps).
The least squares estimator seeks to minimize the sum of the squared residuals from the estimating
equation.
1.) Thus, we first have to define our residual as the difference between that which is observed and that
which is predicted by the regression. (In this way, the residual is best thought of as a prediction
error, that is, something we would like to make as small as possible. Because these residuals will
likely be both positive and negative, simply considering their sum is unsatisfactory as this will
likely be equal to zero. A better way forward is to consider the sum of the squared “prediction
errors” which will definitely not be zero and which will penalize us for making big errors.)
2.) Next, we need to define a minimization problem.
3.) We must take the derivative of the sum of squared residuals with respect to beta-hat and set it
equal to zero.
4.) Finally, we must solve for beta-hat.
b) Formally derive an expression for this estimator given your answer in part a).
ei  Yi  ˆ X i
 e  
n
Minˆ
i 1
n
2
i
i 1
Yi  ˆ X i
  Y   
2
n
n
2
i 1
i
i 1

n
i 1
This allows us to derive the following first order condition:
n
  ei2
i 1
ˆ
 2  X iYi   2 ˆ   X i2   0
n
n
i 1
i 1
  X iYi   ˆ   X i2   0
n
n
i 1
i 1
ˆ   X i2     X iYi 
n
n
i 1
i 1
n
ˆ 
XY
i 1
n
i i
 X 
i 1
2
i

2Yi ˆ X i   ˆ X i

2