* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download (7) In the graph below, the solid line is the true population
Survey
Document related concepts
Transcript
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI SECOND SEMESTER – 2007-2008 Course No: ECON C342/FIN C332/MGTS C443 Course Title: ECONOMETRICS Date : 03/05/2008 Duration : 180 min Weight age: 40% COMPREHENSIVE EXAMINATION ---------------------------------------------------------------------------------------------------------------------------NOTE: This paper consist of two parts: PART A (Closed Book) and PART B (Open Book) PART A: 20 marks and PART B: 20 marks. After completing PART –A, attempt PART-B. Suggested time for PART –A – 90 min ------------------------------------------------------------------------------------------------------------------------------------------ PART A (CLOSED BOOK) 20 MARKS Instructions: Write your answer in the separate sheet provided. For multiple choice questions choose the best alternate answer and write the corresponding letter (A or B or C or D) in the answer sheet. Corrections, overwriting and illegible answers are invalid. Question 1-28 carries equal marks and question 29, 30 and 31 – 1.0 mark each and Q-32 carries 3 marks. ---------------------------------------------------------------------------------------------------------------------------- 1. An industrial psychologist has a theory that day-shift workers are, in general, more productive than night-shift workers. To test his theory, he takes a sample of 50 day-shift workers and an independent sample of 40 night-shift workers and records the output for one shift for each worker.(t from table = 1.645) Day shift Night shift Sample size 50 40 Average output 58.1 51.6 Standard deviation 12.6 16.8 (To save you some calculations, the pooled standard deviation for the two groups is 14.61.) Construct a 90% confidence interval for the difference between the mean output for day-shift workers and the mean output for night-shift workers. A. (1.4, 11.6) B. (0.4, 12.6) C. (-17.5, 30.5) D. (-22.1, 35.1) 2. To reduce the width of a confidence interval for the difference between two population means: A. Reduce the sample sizes taken from both populations, or reduce the confidence level B. Reduce the sample sizes taken from both populations, or increase the confidence level C. Increase the sample sizes taken from both populations, or reduce the confidence level D. Increase the sample sizes taken from both populations, or increase the confidence level (The next 3 questions are based on the following information.) Employees in a large company are entitled to 15-minute coffee breaks. A random sample of 10 employees was drawn from the population of all employees, and the length of coffee break for each of the sampled employees was measured. The mean coffee break length for the employees in the sample is 16.8 minutes, and the sample standard deviation is 2.2 minutes. A human resources manager is interested in whether employees tend to be taking longer or shorter coffee breaks than they are entitled to, and she performs a hypothesis test to examine this question. 2 3. Which of the following would be the most appropriate null hypothesis for this test? A. H0: =15.0 B. H0: X =16.8 C. H0: =16.8 D. H0: =2.2 4. The appropriate test statistic and approximate (two-tailed) p-value for the hypothesis test are. (From table 2.5% of the area is about t=2.262 and 1% is above 2.821 etc) A. Test statistic: t = 2.59; p-value is between .02 and .05 B. Test statistic: t = 2.59; p-value is less than .01 C. Test statistic: t = 7.64; p-value is between .02 and .05 D. Test statistic: t = 7.64; p-value is less than .01 5. What would be the most appropriate conclusion from this hypothesis test? A. There is no evidence that the mean coffee break length in the population is greater than 15 min. B. There is strong evidence that the mean coffee break length in the population is less than 15 min. C. There is no evidence that the mean coffee break length in the sample is greater than 15 min. D. There is strong evidence that the mean coffee break length in the population is greater than 15 min. 6. Which of the following is/are true about p-values? A. A large p-value means that there is a lot of evidence against the null hypothesis. B. As the test statistic t gets further away from 0, the p-value gets smaller. C. If the correlation coefficient is negative, then the p-value for the slope of the regression line is also negative. D. All of the above are true about p-values. 7. Which of the following best summarizes the distinction between statistical significance and practical significance? A. Statistical significance and practical significance both refer to the importance of an observed difference between sample means; but practical significance only holds when the p-value is less then 0.05. B. Statistical significance refers to the sample, while practical significance refers to the population. C. Statistical significance refers to the ability to conclude that an effect observed in a sample is likely to also hold in the population; practical significance refers to the size and importance of the effect. D. Statistical significance is most likely when sample sizes are small, while practical significance is most likely when sample sizes are large. (The next 5 questions are based on the following information.) In this problem we consider an analysis of the number of active physicians in a city as a function of the city's population and the region of the United States that the city is in. The sample consists of data on 141 cities in the United States, and the variables are defined as follows: pop = city population (in thousands) doctors = number of professionally active physicians in the city 3 region dummy variables are defined for 4 regions: East, Central, South, and West. east = 1 if city in the East region, 0 otherwise central = 1 if city in the Central region, 0 otherwise south = 1 if city in the South region, 0 otherwise R2 = 0.9551 R2(Adjusted) = 0.9538 SSE=56,950,000 Residual SD = 647.1 Coefficients Intercept pop east central south -255 2.3 -36 -327 -83 Standard Error 130.2 0.043 174.7 164.1 152.8 t Stat P-Value -1.96 53.5 -0.21 -1.99 -0.54 0.052 0.000 0.836 0.048 0.590 8. What is the predicted number of doctors in a city with a population of 500,000 in the West region? A. 568 B. 823 C. 895 D. 1150 9. What is the equation for the regression line predicting number of doctors from population for the South region? A. Doctors = -338 + 2.3 pop B. Doctors = -255 + 83 pop C. Doctors = -172 + 2.3 pop D. Doctors = -255 + 2.3 pop 10. Based on this model, in which region is the slope of the regression line relating doctors to population the steepest? A. The West region has the steepest regression line. B. The Central region has the steepest regression line. C. The East region has the steepest regression line. D. The regression line has the same slope in all four regions. 11. To test whether the whole model is at all useful, we perform a hypothesis test of whether the population coefficients for the four independent variables (pop, east, central, and south) are all equal to 0. What is the test statistic, approximate critical value, and conclusion for this hypothesis test? (Use alpha=0.05) A. test statistic: F = 723 approximate critical value: F* = 2.45 Conclusion: Reject H0 B. test statistic: t = 53.5 approximate critical value: t* = 1.98 Conclusion: Reject H0 C. test statistic: F = 2862 approximate critical value: F* = 3.92Conclusion:Don't Reject H0 D. test statistic: t = -0.54 approximate critical value: t* = 1.98Conclusion: Don't Reject H0 12. Now consider a different coding scheme for the dummy variables specifying the region: D1 = 1 if city in the East region, 0 otherwise D2 = 1 if city in the Central region, 0 otherwise D3 = 1 if city in the West region, 0 otherwise What is the regression equation predicting doctor from the variables pop, D1, D2 and D3? 4 A. B. C. D. Doctors = Doctors = Doctors = Doctors = -338 + 2.3 pop + 47 D1 - 244 D2 + 83 D3 -255 + 2.3 pop - 36 D1 - 327 D2 - 83 D3 -255 + 2.3 pop + 36 D1 + 327 D2+ 83 D3 255 - 2.3 pop - 36 D1 - 327 D2- 83 D3 13. In the graph below, the solid line is the true population regression line and the circles are observations in the sample. Which assumption appears to be violated in this sample? A. E(ui|xi) = 0. B. Homoskedasticity: Var(ui) = 2, a constant. C. No autocorrelation: Cov(ui,uj)=0 for ij. D. All of the above. y x 14. The time series ut graphed below appears to be A. positively serially-correlated. B. negatively serially-correlated. C. serially uncorrelated. D. Cannot be determined from the information given. t time 15. If two regressors xi2 and xi3 are closely but not perfectly correlated, then the leastsquares estimators of their coefficients A. will have large standard errors. B. will be zero. C. will be biased. D. will be inconsistent. 16. The assumptions of the DW test for serial correlation are: A. The regression model includes a constant B. Serial correlation is assumed to be of order one only C. The equation does not include a lagged depended variable as an explanatory variable. D. All of the above 5 17. Autocorrelation in your data is a problem because: A. the assumption of the CLRM that the covariance and the correlations between different disturbances are all zero is being violated. B. the method of OLS assumes that the data are uncorrelated and calculates the point estimates of regression parameters accordingly. C. it is contagious D. a & b 18. If your dataset has serial correlation, but you completely ignore the problem and use a plain OLS command, you will: A. you get OLS estimators that are still BLUE. B. get t-test statistics that make you reject the null hypothesis about the overall significance of the model. C. you get t-statistics that are higher than the R2. D. none of the above 19. By inspection of the figure below you understand that A. it is an obvious case of heteroskedasticity because for large values of X the spread of the residuals is smaller than that of small values of X. B. there is evidence of positive serial correlation. C. it is an obvious case of heteroskedasticity because for small values of X the spread of the residuals is smaller than that of large values of X. D. there is evidence of perfect positive serial correlation. 20. A researcher is testing a null hypothesis regarding a population mean. The critical (two-tailed) Z value for alpha=0.05 is 1.96. The calculated Z value from the sample is 2.17. Based on this information, the researcher should _____ the null hypothesis, and the p-value for this test is _____. A. fail to reject; p-value = .03 B. reject; p-value = .03 C. fail to reject; p-value = .05 D. reject; p-value = .05 21. As the sample size gets larger, the standard error of the sample mean will ______ and the probability of making a Type II error will ______. A. increase; increase B. decrease; decrease C. increase; decrease D. decrease; increase 6 22. According to which model is the elasticity of y with respect to x equal to 0.3? A. y = 2.5 + 0.3 x . B. y = 2.5 + 0.3 (1/x) . C. y = 2.5 + 0.3 ln(x) . D. ln(y) = 2.5 + 0.3 x . 23. In time-series data, any two variables are correlated in finite sample A. only if one variable causes the other. B. only if neither variable causes the other. C. if they both have trends. D. All of the above. 24. According to which model does a one-unit change in x cause approximately a four percent increase in y? A. y = 7.8 + 0.04 x . B. y = 7.8 + 0.04 (1/x) . C. y = 7.8 + 0.04 ln(x) . D. ln(y) = 7.8 + 0.04 x . 25. In the model yi = 1 + 2 xi, + ui , assuming E(ui|xi)=0, the conditional mean of y (that is, E(yi|xi)) is A. zero. B. 1 . C. 2 . D. 1 + 2 xi . 26. The equation: yi = 2.0 + 2.5 xi2 + 0.07 xi22 implies that a one-unit increase in xi2 will cause yi to increase by about A. 0.2 units. B. 2.0 units. C. 2.5 units. D. (2.5 + 0.14xi2) units. 27. For the model yt = 1 + 2 xt + ut , ordinary least squares yields consistent estimators of 1 and 2 if (write yes(Y) or no(N)) A. xt and yt are integrated processes but are not co-integrated. B. xt and yt are co-integrated processes. C. ut is a random walk. D. ut is serially-correlated, but stationary and weakly dependent. 28. Suppose the “true” model is : Yi = 0 + 1 X1i + ui. but an “irrelevant” variable X2 is added to the model (irrelevant in the sense that the true 2 coefficient attached to the variable X2 is zero). The modified model is: Yi = 0 + 1 X1i + 2 X2i + ui. Would the R2 and the adjusted R2 for the modified model larger than for the original model? 7 29. Consider the following “true” production function: ln Yi = 0 + 1 ln L1i + 2 ln L2i + 3 ln Ki + uI where Y= output, L1 = production labor, L2 = nonproduction labor, K= capital Suppose the regression actually used in empirical investigation is ln Yi = 0 + 1 ln L1i + 2 ln Ki + uI. Will estimated 1 and 2 be unbiased estimators of 1 and 3? 30. Critical values of the Durbin-Watson statistic contain an ‘uncertain’ region between dU and dL, where we can neither reject nor accept the null of no autocorrelation. What happens to the value of dU and to the width of the region as n increases? Give intuitive reasons why this should be so. 31. Given the autocorrelated error term ut = ρut-1 + vt where |ρ| < 1 and v ~ iid (0, ρ v2), derive the relationship between V (u) and V (v). 32. True/False/Uncertain. Justify briefly. I. When selecting a model , one should always pick the model which maximizes the R2 II. When using dummy variable, one must still include an intercept parameter. III. In the linear regression yi=bo+b1xi+ui. The OLS estimates will not be BLUE if there is an omitted variable which is uncorrelated with xi. IV. In the linear regression yi=bo+b1xi+ui.OLS is only BLUE when the errors are iid normally distributed. V. Knowing a coefficient and its p-value is equivalent to knowing the coefficient and its standard error. VI. The slope coefficient in the model yi=bo+b1xi+ui can be interpreted as elasticity. ************Best of LUCK********** 8 BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI SECOND SEMESTER – 2007-2008 Course No: ECON C342/FIN C332/MGTS C443 Course Title: ECONOMETRICS Date : 03/05/2008 Marks : 20 COMPREHENSIVE EXAMINATION - PART-B (OPEN BOOK) ----------------------------------------------------------------------------------------------------------------------------- ---Note: Attempt all questions. Write the assumptions if any clearly. Each question carries equal marks (5.0). 1. Suppose we wish to estimate the relationship between the quantity demanded of coffee and the price of coffee. Let qcoffeei denote the quantity demanded of coffee, and pcoffeei denote the price of coffee. Further suppose that the price of tea does not in fact influence the quantity demanded of coffee. But unaware of this fact, we include the price of tea in our equation because your Economics professor mistakenly told you that coffee and tea are substitutes. So we estimate the following equation by ordinary least-squares: qcoffeei = 1 + 2 pcoffeei+ 3 pteai + ui , where pteai denotes the price of tea, even though the "true" or population value of 3 is zero. Answer the following questions, justifying your answers as strongly as you can. (Assume that all the classical assumptions concerning ui and the regressors pcoffeei and pteai hold for these data.) a). Will the least-squares estimated value of 3 be exactly zero? b). Will the least-squares estimator of 2 be biased? c). Is any harm done by keeping pteai in the estimated equation? 2. The following relationship for the demand for music CDs in the MUMBAI is proposed: qt = + 1yt + 2pt + ut where q is the log of quantity demanded; y is the log of real income per capita; and p is the log of real price. The data are annual and there are 21 observations. The following quantities are computed from the data: 21 21 21 2 2 2 [q q ] = 9.7274; [ y y] = 0.0797; [p p] = 1.2310; t t t t 1 t 1 t 1 21 21 [q q ][ y y] = 0.6976; [q q ][ p p] = –2.7662; t t t t t 1 t 1 21 p = 5.9933. [p p ][ y y] = -0.2257; q = 7.8284; y = 6.6054; t t t 1 a) Calculate the Ordinary Least Squares estimates for this demand model, i.e. α , β 1 and β2. b) Test the following hypothesis about the coefficients, ensuring that you specify clearly the null and alternative hypotheses: i) Music cds are a luxury good ii) Demand for music cds is price elastic 9 c) Write the OLS variance of α d) Compute and interpret the R2 and adjusted R2 for the estimated regression model and test for the overall significance of the model. 3. Suppose we wish to estimate the effect of tax rates on economic growth, using cross-sectional data for n=50 states. The following variables are to be used. yi = economic growth rate in state i. xi = tax rate in state i. dsi = 1 if state i is in the South, and 0 otherwise. dmi = 1 if state i is in the Midwest, and 0 otherwise. dwi = 1 if state i is in the West, and 0 otherwise. The following four equations were estimated, with the sums of squared residuals (SSR) as shown. [1] yi = 0.025 – 0.007 xi SSR=360 [2] yi = 0.021 + 0.002 dsi – 0.003 dmi +0.001 dwi – 0.0068 xi SSR=270 [3] yi = 0.019 – 0.0068 xi – 0.0001 (dsi xi) + 0.0003 (dmi xi) – 0.0005 (dwi xi) SSR=280 [4] yi = 0.019 – 0.0068 xi + 0.002 dsi – 0.003 dmi +0.001 dwi SSR=168 + 0.0001 (dsi xi) + 0.0002 (dmi xi) – 0.0003 (dwi xi) a) Although there are four official Census regions, only three dummy variables are used. If a fourth dummy variable were created for the remaining Northeast region and all four regional dummy variables were included in the same regression, then what econometric problem would arise? b) According to equation [4], what is the intercept for the Northeast? What is the intercept for the Midwest? What is the slope for the South? c) We wish to test the null hypothesis that all states have the same intercept and slope, against the alternative hypothesis that they have different intercepts and slopes by region, at 5% significance. Which equation, [1], [2], [3], or [4], is the restricted equation and which is the unrestricted equation? d) Give the value of the test statistic, its degrees of freedom, the critical point, and your conclusion (whether you can reject the null hypothesis). 4. The following regression was run using quarterly data, amounting to 70 observations: Bt 0.78 0.89 Pt 0.35S t (0.56) (0.78) (0.12) a) b) c) d) R 2 0.76, DW 1.57.White(5) 27.2 Where B is the demand for brokerage services, P is the price of the services and S is the total number of brokers and all variables are in logarithms (standard errors in parentheses). DW is the Durbin-Watson statistic. White is White’s Test. Comment on the specification of the above model. Does the above regression suffer from first order autocorrelation? If so how might this have arisen? Does the model suffer from heteroskedasticity ? The following model was estimated: yt xt u t . 10 It is assumed that the variance of the error term takes the following form: E (u t ) 2 2 xt2 . Explain the form which heteroskedasticity takes in this case, and show how the equation can be transformed to remedy the problem of heteroskedasticity. ********