Download (7) In the graph below, the solid line is the true population

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
SECOND SEMESTER – 2007-2008
Course No: ECON C342/FIN C332/MGTS C443
Course Title: ECONOMETRICS
Date
: 03/05/2008
Duration : 180 min
Weight age: 40%
COMPREHENSIVE EXAMINATION
---------------------------------------------------------------------------------------------------------------------------NOTE:
 This paper consist of two parts: PART A (Closed Book) and PART B (Open Book)
 PART A: 20 marks and PART B: 20 marks. After completing PART –A, attempt PART-B.
 Suggested time for PART –A – 90 min
------------------------------------------------------------------------------------------------------------------------------------------
PART A (CLOSED BOOK)
20 MARKS
Instructions:
 Write your answer in the separate sheet provided. For multiple choice questions choose the
best alternate answer and write the corresponding letter (A or B or C or D) in the answer
sheet. Corrections, overwriting and illegible answers are invalid.
 Question 1-28 carries equal marks and question 29, 30 and 31 – 1.0 mark each and Q-32
carries 3 marks.
----------------------------------------------------------------------------------------------------------------------------
1. An industrial psychologist has a theory that day-shift workers are, in general, more
productive than night-shift workers. To test his theory, he takes a sample of 50 day-shift
workers and an independent sample of 40 night-shift workers and records the output
for one shift for each worker.(t from table = 1.645)
Day shift
Night shift
Sample size
50
40
Average output
58.1
51.6
Standard deviation
12.6
16.8
(To save you some calculations, the pooled standard deviation for the two groups is
14.61.) Construct a 90% confidence interval for the difference between the mean
output for day-shift workers and the mean output for night-shift workers.
A. (1.4, 11.6)
B. (0.4, 12.6)
C. (-17.5, 30.5)
D. (-22.1, 35.1)
2. To reduce the width of a confidence interval for the difference between two population
means:
A. Reduce the sample sizes taken from both populations, or reduce the confidence level
B. Reduce the sample sizes taken from both populations, or increase the confidence level
C. Increase the sample sizes taken from both populations, or reduce the confidence level
D. Increase the sample sizes taken from both populations, or increase the confidence level
(The next 3 questions are based on the following information.)
Employees in a large company are entitled to 15-minute coffee breaks. A random
sample of 10 employees was drawn from the population of all employees, and the length
of coffee break for each of the sampled employees was measured. The mean coffee
break length for the employees in the sample is 16.8 minutes, and the sample standard
deviation is 2.2 minutes. A human resources manager is interested in whether
employees tend to be taking longer or shorter coffee breaks than they are entitled to,
and she performs a hypothesis test to examine this question.
2
3. Which of the following would be the most appropriate null hypothesis for this test?
A. H0: =15.0
B. H0: X =16.8
C. H0: =16.8
D. H0: =2.2
4. The appropriate test statistic and approximate (two-tailed) p-value for the hypothesis
test are. (From table 2.5% of the area is about t=2.262 and 1% is above 2.821 etc)
A. Test statistic: t = 2.59;
p-value is between .02 and .05
B. Test statistic: t = 2.59;
p-value is less than .01
C. Test statistic: t = 7.64;
p-value is between .02 and .05
D. Test statistic: t = 7.64;
p-value is less than .01
5. What would be the most appropriate conclusion from this hypothesis test?
A. There is no evidence that the mean coffee break length in the population is greater than
15 min.
B. There is strong evidence that the mean coffee break length in the population is less than
15 min.
C. There is no evidence that the mean coffee break length in the sample is greater than 15
min.
D. There is strong evidence that the mean coffee break length in the population is greater
than 15 min.
6. Which of the following is/are true about p-values?
A. A large p-value means that there is a lot of evidence against the null hypothesis.
B. As the test statistic t gets further away from 0, the p-value gets smaller.
C. If the correlation coefficient is negative, then the p-value for the slope of the regression
line is also negative.
D. All of the above are true about p-values.
7. Which of the following best summarizes the distinction between statistical significance
and practical significance?
A. Statistical significance and practical significance both refer to the importance of an
observed difference between sample means; but practical significance only holds when
the p-value is less then 0.05.
B. Statistical significance refers to the sample, while practical significance refers to the
population.
C. Statistical significance refers to the ability to conclude that an effect observed in a sample
is likely to also hold in the population; practical significance refers to the size and
importance of the effect.
D. Statistical significance is most likely when sample sizes are small, while practical
significance is most likely when sample sizes are large.
(The next 5 questions are based on the following information.)
In this problem we consider an analysis of the number of active physicians in a city as a
function of the city's population and the region of the United States that the city is in.
The sample consists of data on 141 cities in the United States, and the variables are
defined as follows:
 pop = city population (in thousands)
 doctors = number of professionally active physicians in the city
3

region dummy variables are defined for 4 regions: East, Central, South, and West.
east
= 1 if city in the East region,
0 otherwise
central
= 1 if city in the Central region,
0 otherwise
south
= 1 if city in the South region,
0 otherwise
R2 = 0.9551
R2(Adjusted) = 0.9538 SSE=56,950,000
Residual SD = 647.1
Coefficients
Intercept
pop
east
central
south
-255
2.3
-36
-327
-83
Standard
Error
130.2
0.043
174.7
164.1
152.8
t Stat
P-Value
-1.96
53.5
-0.21
-1.99
-0.54
0.052
0.000
0.836
0.048
0.590
8. What is the predicted number of doctors in a city with a population of 500,000 in the
West region?
A. 568
B. 823
C. 895
D. 1150
9. What is the equation for the regression line predicting number of doctors from
population for the South region?
A. Doctors = -338 + 2.3 pop
B. Doctors = -255 + 83 pop
C. Doctors = -172 + 2.3 pop
D. Doctors = -255 + 2.3 pop
10. Based on this model, in which region is the slope of the regression line relating doctors
to population the steepest?
A. The West region has the steepest regression line.
B. The Central region has the steepest regression line.
C. The East region has the steepest regression line.
D. The regression line has the same slope in all four regions.
11. To test whether the whole model is at all useful, we perform a hypothesis test of whether
the population coefficients for the four independent variables (pop, east, central, and
south) are all equal to 0. What is the test statistic, approximate critical value, and
conclusion for this hypothesis test? (Use alpha=0.05)
A. test statistic: F = 723 approximate critical value: F* = 2.45 Conclusion: Reject H0
B. test statistic: t = 53.5 approximate critical value: t* = 1.98 Conclusion: Reject H0
C. test statistic: F = 2862 approximate critical value: F* = 3.92Conclusion:Don't Reject H0
D. test statistic: t = -0.54 approximate critical value: t* = 1.98Conclusion: Don't Reject H0
12. Now consider a different coding scheme for the dummy variables specifying the region:

D1 = 1 if city in the East region,
0 otherwise

D2 = 1 if city in the Central region, 0 otherwise

D3 = 1 if city in the West region,
0 otherwise
What is the regression equation predicting doctor from the variables pop, D1, D2 and
D3?
4
A.
B.
C.
D.
Doctors =
Doctors =
Doctors =
Doctors =
-338 + 2.3 pop + 47 D1 - 244 D2 + 83 D3
-255 + 2.3 pop - 36 D1 - 327 D2 - 83 D3
-255 + 2.3 pop + 36 D1 + 327 D2+ 83 D3
255 - 2.3 pop - 36 D1 - 327 D2- 83 D3
13. In the graph below, the solid line is the true population regression line and the circles
are observations in the sample. Which assumption appears to be violated in this
sample?
A. E(ui|xi) = 0.
B. Homoskedasticity: Var(ui) = 2, a constant.
C. No autocorrelation: Cov(ui,uj)=0 for ij.
D. All of the above.
y
x
14. The time series ut graphed below appears to be
A. positively serially-correlated.
B. negatively serially-correlated.
C. serially uncorrelated.
D. Cannot be determined from the information given.
t
time
15. If two regressors xi2 and xi3 are closely but not perfectly correlated, then the leastsquares estimators of their coefficients
A. will have large standard errors.
B. will be zero.
C. will be biased.
D. will be inconsistent.
16. The assumptions of the DW test for serial correlation are:
A. The regression model includes a constant
B. Serial correlation is assumed to be of order one only
C. The equation does not include a lagged depended variable as an explanatory variable.
D. All of the above
5
17. Autocorrelation in your data is a problem because:
A. the assumption of the CLRM that the covariance and the correlations between different
disturbances are all zero is being violated.
B. the method of OLS assumes that the data are uncorrelated and calculates the point
estimates of regression parameters accordingly.
C. it is contagious
D. a & b
18. If your dataset has serial correlation, but you completely ignore the problem and use a
plain OLS command, you will:
A. you get OLS estimators that are still BLUE.
B. get t-test statistics that make you reject the null hypothesis about the overall significance
of the model.
C. you get t-statistics that are higher than the R2.
D. none of the above
19. By inspection of the figure below you understand that
A. it is an obvious case of heteroskedasticity because for large values of X the spread of the
residuals is smaller than that of small values of X.
B. there is evidence of positive serial correlation.
C. it is an obvious case of heteroskedasticity because for small values of X the spread of the
residuals is smaller than that of large values of X.
D. there is evidence of perfect positive serial correlation.
20. A researcher is testing a null hypothesis regarding a population mean. The critical
(two-tailed) Z value for alpha=0.05 is 1.96. The calculated Z value from the sample is
2.17. Based on this information, the researcher should _____ the null hypothesis, and
the p-value for this test is _____.
A. fail to reject; p-value = .03
B. reject;
p-value = .03
C. fail to reject; p-value = .05
D. reject;
p-value = .05
21. As the sample size gets larger, the standard error of the sample mean will ______ and
the probability of making a Type II error will ______.
A. increase; increase
B. decrease; decrease
C. increase; decrease
D. decrease; increase
6
22. According to which model is the elasticity of y with respect to x equal to 0.3?
A. y = 2.5 + 0.3 x .
B. y = 2.5 + 0.3 (1/x) .
C. y = 2.5 + 0.3 ln(x) .
D. ln(y) = 2.5 + 0.3 x .
23. In time-series data, any two variables are correlated in finite sample
A. only if one variable causes the other.
B. only if neither variable causes the other.
C. if they both have trends.
D. All of the above.
24. According to which model does a one-unit change in x cause approximately a four
percent increase in y?
A. y = 7.8 + 0.04 x .
B. y = 7.8 + 0.04 (1/x) .
C. y = 7.8 + 0.04 ln(x) .
D. ln(y) = 7.8 + 0.04 x .
25. In the model yi = 1 + 2 xi, + ui , assuming E(ui|xi)=0, the conditional mean of y
(that is, E(yi|xi)) is
A. zero.
B. 1 .
C. 2 .
D. 1 + 2 xi .
26. The equation: yi = 2.0 + 2.5 xi2 + 0.07 xi22 implies that a one-unit increase in xi2 will
cause yi to increase by about
A. 0.2 units.
B. 2.0 units.
C. 2.5 units.
D. (2.5 + 0.14xi2) units.
27. For the model yt =  1 + 2 xt + ut , ordinary least squares yields consistent estimators of
1 and  2 if (write yes(Y) or no(N))
A. xt and yt are integrated processes but are not co-integrated.
B. xt and yt are co-integrated processes.
C. ut is a random walk.
D. ut
is serially-correlated, but stationary and weakly
dependent.
28. Suppose the “true” model is : Yi = 0 + 1 X1i + ui. but an “irrelevant” variable X2 is added
to the model (irrelevant in the sense that the true 2 coefficient attached to the variable X2 is
zero). The modified model is: Yi = 0 + 1 X1i + 2 X2i + ui. Would the R2 and the adjusted R2
for the modified model larger than for the original model?
7
29. Consider the following “true” production function:
ln Yi = 0 + 1 ln L1i + 2 ln L2i + 3 ln Ki + uI where Y= output, L1 = production labor,
L2 = nonproduction labor, K= capital
Suppose the regression actually used in empirical investigation is
ln Yi = 0 + 1 ln L1i + 2 ln Ki + uI. Will estimated 1 and 2 be unbiased estimators of 1
and 3?
30. Critical values of the Durbin-Watson statistic contain an ‘uncertain’ region between dU
and dL, where we can neither reject nor accept the null of no autocorrelation. What
happens to the value of dU and to the width of the region as n increases? Give intuitive
reasons why this should be so.
31. Given the autocorrelated error term ut = ρut-1 + vt where |ρ| < 1 and v ~ iid (0, ρ v2),
derive the relationship between V (u) and V (v).
32. True/False/Uncertain. Justify briefly.
I. When selecting a model , one should always pick the model which maximizes the R2
II. When using dummy variable, one must still include an intercept parameter.
III. In the linear regression yi=bo+b1xi+ui. The OLS estimates will not be BLUE if there is an
omitted variable which is uncorrelated with xi.
IV. In the linear regression yi=bo+b1xi+ui.OLS is only BLUE when the errors are iid
normally distributed.
V. Knowing a coefficient and its p-value is equivalent to knowing the coefficient and its
standard error.
VI. The slope coefficient in the model yi=bo+b1xi+ui can be interpreted as elasticity.
************Best of LUCK**********
8
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
SECOND SEMESTER – 2007-2008
Course No: ECON C342/FIN C332/MGTS C443
Course Title: ECONOMETRICS
Date
: 03/05/2008
Marks
: 20
COMPREHENSIVE EXAMINATION - PART-B (OPEN BOOK)
----------------------------------------------------------------------------------------------------------------------------- ---Note: Attempt all questions. Write the assumptions if any clearly. Each question carries
equal marks (5.0).
1. Suppose we wish to estimate the relationship between the quantity demanded of
coffee and the price of coffee. Let qcoffeei denote the quantity demanded of coffee,
and pcoffeei denote the price of coffee. Further suppose that the price of tea does
not in fact influence the quantity demanded of coffee. But unaware of this fact, we
include the price of tea in our equation because your Economics professor mistakenly
told you that coffee and tea are substitutes. So we estimate the following equation by
ordinary least-squares:
qcoffeei = 1 + 2 pcoffeei+ 3 pteai + ui ,
where pteai denotes the price of tea, even though the "true" or population value of
3 is zero. Answer the following questions, justifying your answers as strongly as
you can. (Assume that all the classical assumptions concerning ui and the regressors
pcoffeei and pteai hold for these data.)
a). Will the least-squares estimated value of 3 be exactly zero?
b). Will the least-squares estimator of 2 be biased?
c). Is any harm done by keeping pteai in the estimated equation?
2. The following relationship for the demand for music CDs in the MUMBAI is
proposed:
qt =  + 1yt +  2pt + ut
where q is the log of quantity demanded; y is the log of real income per capita; and p is
the log of real price. The data are annual and there are 21 observations. The following
quantities are computed from the data:
21
21
21
2
2
2
 [q  q ] = 9.7274;  [ y  y] = 0.0797;  [p  p] = 1.2310;
t
t
t
t 1
t 1
t 1
21
21
 [q  q ][ y  y] = 0.6976;  [q  q ][ p  p] = –2.7662;
t
t
t
t
t 1
t 1
21
p = 5.9933.
 [p  p ][ y  y] = -0.2257; q = 7.8284; y = 6.6054;
t
t
t 1


a) Calculate the Ordinary Least Squares estimates for this demand model, i.e. α , β 1 and

β2.
b) Test the following hypothesis about the coefficients, ensuring that you specify clearly the
null and alternative hypotheses:
i) Music cds are a luxury good
ii) Demand for music cds is price elastic
9

c) Write the OLS variance of α
d) Compute and interpret the R2 and adjusted R2 for the estimated regression model and test
for the overall significance of the model.
3. Suppose we wish to estimate the effect of tax rates on economic growth, using
cross-sectional data for n=50 states. The following variables are to be used.
yi
= economic growth rate in state i.
xi
= tax rate in state i.
dsi
= 1 if state i is in the South, and 0 otherwise.
dmi
= 1 if state i is in the Midwest, and 0 otherwise.
dwi
= 1 if state i is in the West, and 0 otherwise.
The following four equations were estimated, with the sums of squared residuals (SSR)
as shown.
[1] yi = 0.025 – 0.007 xi
SSR=360
[2] yi = 0.021 + 0.002 dsi – 0.003 dmi +0.001 dwi – 0.0068 xi
SSR=270
[3] yi = 0.019 – 0.0068 xi – 0.0001 (dsi xi) + 0.0003 (dmi xi) – 0.0005 (dwi xi)
SSR=280
[4] yi = 0.019 – 0.0068 xi + 0.002 dsi – 0.003 dmi +0.001 dwi
SSR=168
+ 0.0001 (dsi xi) + 0.0002 (dmi xi) – 0.0003 (dwi xi)
a) Although there are four official Census regions, only three dummy variables are
used. If a fourth dummy variable were created for the remaining Northeast region
and all four regional dummy variables were included in the same regression, then
what econometric problem would arise?
b) According to equation [4], what is the intercept for the Northeast? What is the
intercept for the Midwest? What is the slope for the South?
c) We wish to test the null hypothesis that all states have the same intercept and
slope, against the alternative hypothesis that they have different intercepts and
slopes by region, at 5% significance. Which equation, [1], [2], [3], or [4], is the
restricted equation and which is the unrestricted equation?
d) Give the value of the test statistic, its degrees of freedom, the critical point, and
your conclusion (whether you can reject the null hypothesis).
4. The following regression was run using quarterly data, amounting to 70 observations:
Bt  0.78  0.89 Pt  0.35S t
(0.56) (0.78) (0.12)
a)
b)
c)
d)
R 2  0.76, DW  1.57.White(5)  27.2
Where B is the demand for brokerage services, P is the price of the services and S
is the total number of brokers and all variables are in logarithms (standard errors
in parentheses). DW is the Durbin-Watson statistic. White is White’s Test.
Comment on the specification of the above model.
Does the above regression suffer from first order autocorrelation? If so how might
this have arisen?
Does the model suffer from heteroskedasticity ?
The following model was estimated: yt    xt  u t .
10
It is assumed that the variance of the error term takes the following form:
E (u t ) 2   2 xt2 . Explain the form which heteroskedasticity takes in this case, and
show how the equation can be transformed to remedy the problem of
heteroskedasticity.
********