Download Econ173_fa02FinalAnswers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Final—Form A
Fall 2002
Economics 173
Instructor: Petry
Name_____________
SSN______________
Before beginning the exam, please verify that you have 15 pages with 48 questions in your
exam booklet. You should also have a decision-tree and formula sheet provided by your
TA. Please include your full name, social security number and Net-ID on your bubble
sheets. Good luck!
1. Imagine we have sample sales numbers for 50 random books published this year. It happens
that the sample has a “symmetric” distribution. If we add the sales of the top 10 best sellers to
this sample what would happen to the mean, median and mode of the distribution?
a. mode > mean > median
b. mode > median > mean
c. median > mean > mode
d. mean > median > mode
e. mean > mode > median
2. If we want to increase the confidence level for estimating an interval for the variance of
grades of students in this exam, while maintaining the width of the interval, what can we
possibly do?
a. Add some new observation to my sample to make the sample size larger
b. Throw some observations away to make the sample size smaller
c. Decrease the standard deviation of the population
d. Increase the standard deviation of the population
e. Both (b) and (d) have the same effect, so we would do both
3. If we want to make a confidence interval with a band of +/- .1, 95% confidence, and
assuming we have no prior information about the proportion of students who leave campus for
winter break, how large must my sample size be? (Z0.005=2.58, Z0.025=1.96, Z0.05=1.645)
a. 95
b. 96
c. 97
d. 67
e. 68
4. All of the following are assumptions about errors (residuals) in the classical regression
model EXCEPT:
a. the probability distribution of errors is normal
b. there is no serious multicollinearity present
c. errors are correlated with the independent variable(s)
d. the standard deviation of errors is constant for all values of the independent variable(s)
e. errors are independent of each other across time
582763826
Page 1 of 16
Use the excel output below to answer the next two questions (#5-6).
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.70374
R Square
0.49525
Adjusted R Square
0.487484
Standard Error
3.073497
Observations
67
ANOVA
df
Regression
Residual
Total
Intercept
X Variable 1
1
65
66
SS
602.4573
614.0148
1216.472
MS
602.4573
9.446382
F
63.77652
Significance F
3.09E-11
Coefficients
-0.4969
0.727765
Standard Error
0.415287
0.09113
t Stat
-1.19653
7.98602
P-value
0.235837
3.09E-11
Lower 95%
-1.32629
0.545766
Upper 95%
0.332483
0.909764
5. Which of the following are true?
I. The independent variable explains more that 50% of the variability in the
independent variable.
II. Overall, the model does a good job explaining the data.
III. The independent variable is not statistically significant at the 5% level.
a. I an II
b. I only
c. II only
d. II and III
e. III only
6. In the test for overall significance of the model, the test statistic follows which distribution
(with degrees of freedom in parentheses)?
a. t(66)
b. t(67)
c. F(2, 66)
d. F(1, 65)
e. None of the above
582763826
Page 2 of 16
7. Which of the following statements is definitely correct?
a. The 90% confidence interval for the population mean from a sample of 20 observations
is narrower than the 87% confidence interval for a sample of 25 holding everything else
constant.
b. The 90% confidence interval for the population mean from a sample of 20 observations
is wider than the 87% confidence interval for a sample of 25 holding everything else
constant.
c. The 87% confidence interval for the population mean from a sample of 20 observations
is narrower than the 90% confidence interval for a sample of 25 holding everything else
constant.
d. The 87% confidence interval for the population mean from a sample of 20 observations
is wider than the 90% confidence interval for a sample of 25 holding everything else
constant.
e. None of the above
Use the following information to answer the next four questions (#8-11).
Below is a regression of number of cigarettes smoked per weekend on the independent
variables listed. DRINKS is the number of alcoholic drinks consumed per weekend.
PARTIES is the number of house parties attended per weekend. ASIAN and AFRICAN-AM
are dummy variables for ethnicity (with CAUCASIAN being the only other ethnicity
represented in our sample). COLLEGE is whether the person attends college. RUNNER is
whether the person runs at least twice a week for 30 minutes.
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.493408
R Square
Adjusted R Square
0.194642
Standard Error
50.94051
Observations
100
ANOVA
df
Regression
Residual
Total
Intercept
DRINKS
PARTIES
ASIAN
COLLEGE
AFRICAN-AM
RUNNER
582763826
6
93
99
SS
77657.86
241329
Coefficients Standard Error
9.641725
16.03158
2.254276
0.522904
0.690245
0.271775
2.569739
3.522896
2.314365
14.2286
-3.19552
20.54125
-2.52578
10.75428
MS
t Stat
0.601421
4.311073
2.539772
0.729439
0.162656
-1.55566
-0.23486
F
Significance F
4.987784
0.000181
P-value
0.549023
4.04E-05
0.012751
0.467566
0.871142
0.123186
0.814832
Lower 95%
Upper 95
-22.1938
41.47
1.215894
3.292
0.150556
1.229
-4.42603
9.565
-25.9408
30.56
-72.746
8.835
-23.8816
18.83
Page 3 of 16
8. Using an alpha of .05, which of the following variables would be significant in the
model?
a.
DRINKS, PARTIES, ASIAN, COLLEGE, AFR-AM, and RUNNER
b.
ASIAN, COLLEGE, AFR-AM and RUNNER
c.
DRINKS and PARTIES
d.
PARTIES
e.
ASIAN and AFR-AM
9. Assuming a level of significance that allows for ALL the variables to be significant,
then what is the difference in average cigarettes smoked per weekend between an Asian
and an African-American, all else being equal?
a. 2.52578
b. 3.19552
c. -0.6697
d. 5.7653
e. 0.6697
10. Given an SSR of the Reduced Model of 28,384, what is the Test Statistic for the partial
F test?
a. 8.080
b. 4.747
c. 9.494
d. 3.938
e. 10.385
11. In order to test whether an Asian person smokes 3 more cigarettes compared to
Caucasians, all else being equal, what is the appropriate test statistic?
a. 0.72944
b. –0.72944
c. –0.12213
d. 0.3835
e. -0.3835
582763826
Page 4 of 16
Use the information below to answer the following two questions (#12-13).
Beer demand at a small-sized cricket stadium in New Zealand is being studied. A sample of 25
daily demands was collected, the data being in cases of beer. The descriptive statistics are
presented below:
Demand
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
201.6
3.289883
204
213
16.44942
270.5833
0.390956
-0.68938
70
159
229
5040
25
12. Calculate the test statistic for testing the alternative hypothesis that the population variance
of beer demand is greater than 250. The value of the test statistic is:
a.
27.06
b.
1.58
c.
1.65
d.
25.98
e.
37.12
13. Given the right sided critical values: 36.42 (df=24) and 37.65 (df=25), the decision from
this test should be:
a.
fail to reject the null, therefore conclude that the variance is less than 250
b.
fail to reject the null, therefore conclude that the variance is not greater than 250
c.
reject the null, therefore conclude that the variance is less than 250
d.
reject the null, therefore conclude that the variance is greater than 250
e.
the test proves inconclusive
582763826
Page 5 of 16
14. Out of 87 people polled in Israel, 21 said “yes” to the question: Do you enjoy watching golf
on TV? In Denmark, out of 113 respondents, 49 said yes to the same question. To test the
divergence of golf-watching preferences in the two countries, the most suitable test from the
list below is:
a.
z-test for difference in proportions
b.
chi-square test for difference in proportions
c.
F-test for difference in variances
d.
t-test for difference in means (equal variances)
e.
paired sample t-test for mean difference
15. Data was collected on the same set of sugar-mills in Bangladesh before and after
unionization to examine the effect on productivity. Data was in the form of man-hours per ton
of sugar produced. The effect of unionization is best tested using:
a. z-test for difference in proportions
b. chi-square test for difference in proportions
c. F-test for difference in variances
d. t-test for difference in means (equal variances)
e. paired sample t-test for mean difference
16. The effect(s) of multicollinearity is (are):
a. The standard errors are enlarged
b. t-statistics are decreased
c. The F-stat is made artificially high
d. All of the above
e. (a) and (b)
17. You estimated the following regression
SALES = 2.5 + .9WRKYRS + 1.5 WRKYRS^2
(.5) (.18)
(0.375)
where the standard errors are in parenthesis. From this you can conclude that (with the two
sided critical values being +/- 2.05):
a.
b.
c.
d.
e.
The t-stat for the quadratic term is 5
The t-stat for the quadratic term is 4
The t-stat for the quadratic term is 0.1406
The t-stat for the quadratic term is 9
None of the above
18. You are given the following data: F= 12.07449 and MSR=638.0741. Calculate the
standard error of the estimate of the model.
a. 0.0189
b. 52.8448
c. 7.269
d. 0.1376
e. we do not have sufficient information
582763826
Page 6 of 16
Use the following information to answer the next two questions (#19-20).
Cigarette consumption data was collected from 1975 to 1994 (in thousands of sticks of annual
consumption) and subjected to regression trend analysis. The regression outputs are provided
below (NOTE: the year 1975 was coded as 1):
Intercept
CODED YEAR
Intercept
CODED YEAR
CODED^2
Intercept
CODED YEAR
CODED^2
CODED^3
Coefficients
4317.958
-83.6436
Coefficients
4265.625
-69.3709
-0.67965
Coefficients
4151.68
-11.2297
-7.43527
0.214464
Standard Error
25.02099206
2.088712061
Standard Error
37.89803543
8.311626354
0.384451587
Standard Error
42.14282662
16.95671287
1.85247981
0.058077868
t Stat
172.5734
-40.0455
t Stat
112.5553
-8.34625
-1.76785
t Stat
98.51451
-0.66226
-4.01368
3.692696
P-value
1.99E-30
4.77E-19
P-value
7.27E-26
2.04E-07
0.095028
P-value
1.06E-23
0.517232
0.001003
0.001973
19. After studying these outputs and at a 10% level of significance, which of the models given
above would you choose?
a.
linear
b.
quadratic
c.
cubic
d.
any of the above are acceptable
20. Irrespective of which model you actually chose above, use the cubic model to “predict”
cigarette consumption in the year 1997 (in thousands of sticks), rounded to the nearest integer:
a.
2570
b.
2311
c.
2394
d.
not enough information for prediction to be performed.
582763826
Page 7 of 16
Use the following information to answer the next 3 questions (#21-23).
A quarterly data set was processed, with the intention of constructing seasonal indices. As we
know, the first step in this effort is to fit a linear trend. The regression output is provided here:
Coefficients
143.0072
7.416087
Intercept
Period
Standard Error t Stat
8.681187
16.47324
0.607557
12.20641
P-value
7.35E-14
2.86E-11
The residual output and some additional information:
RESIDUAL OUTPUT
Year
1994
Observation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
1995
1996
1997
1998
1999
yhat (predicted y)
150.4233333
157.8394203
165.2555072
172.6715942
180.0876812
187.5037681
194.9198551
202.335942
209.752029
217.1681159
224.5842029
232.0002899
239.4163768
246.8324638
254.2485507
261.6646377
269.0807246
276.4968116
283.9128986
291.3289855
298.7450725
306.1611594
313.5772464
320.9933333
Residuals
33.57667
15.16058
-5.25551
16.32841
10.91232
-2.50377
-10.9199
-2.33594
-4.75203
-25.1681
-24.5842
-3.00029
-3.41638
-27.8325
-43.2486
10.33536
10.91928
-15.4968
-8.9129
30.67101
32.25493
-5.16116
-7.57725
30.00667
percentage trend
1.223214484
1.096050655
0.968197688
1.094563358
1.060594477
0.986646838
0.943977718
0.988455131
0.977344539
0.884107684
0.890534585
0.987067732
0.985730396
0.887241478
0.829896569
1.039498506
1.040579924
0.943953019
0.968606926
1.105279653
1.107968065
0.983142344
0.975836109
1.093480654
We also provide the seasonal indices that were computed based on all this information:
Q1
Q2
Q3
Q4
TOTAL
582763826
1.063
0.961
0.927
1.049
4
Page 8 of 16
21. The seasonally adjusted, or deseasonalized, value that you obtain from the actual data point
for the 2nd quarter of 1997, is:
a. 246.83
b. 219
c. 227.89
d. 237.21
e. –27.83
22. The trend plus seasonal component based “forecast” for the 2nd quarter of 1997 is:
a. 246.83
b. 219
c. 227.89
d. 237.21
e. –27.83
23. Heteroscedasticity
a.
occurs when errors do not have a constant variance , and may be detected by a
fan shape in the residual plot
b.
occurs when errors do not have zero mean, and is detected by a cyclical pattern
in the residuals
c.
occurs when errors are not constant, and is detected by a fan shape in the
residual plot
d.
occurs when errors are correlated, and is detected by a cyclical pattern in the
residuals
e.
none of the above
582763826
Page 9 of 16
Use the following information to answer the next seven questions (#24-30).
A multiple regression was performed to explain college GPA on the basis of high-school
GPA, SAT score and the number of hours spent per week on extracurricular activities in the
final year of high school. Both the college and high-school GPA variables are continuous, with
ranges from 0 to 12 (due to summation of several years of GPA).
The regression output is provided below, with some parts deliberately and willfully hidden:
Regression Statistics
Multiple R
0.536870707
R Square
0.288230156
Adjusted R Square 0.265987348
Standard Error
2.030233313
Observations
100
ANOVA
df
Regression
Residual
Total
Intercept
HS GPA
SAT
Activities
3
Coefficients
0.72110455
0.610872024
0.002708497
SS
160.2370587
395.6973413
555.9344
MS
53.41235
4.121847
F
Standard Error
1.869815022
0.100749211
0.002873196
0.064049816
t Stat
0.385656
6.063293
0.942677
0.722149
P-value
0.700605
2.62E-08
0.348212
0.471959
Significance F
3.54141E-07
Lower 95%
-2.99045177
Upper 95%
4.4326609
-0.00299476
-0.0808845
0.0084117
0.1733915
24. The residual degrees of freedom is equal to:
a.
100
b.
99
c.
98
d.
97
e.
96
25. For every additional hour of extracurricular activity, college GPA increases by
(approximately):
a.
-11.27, therefore, it actually decreases
b.
0.046
c.
11.27
d.
0.089
e.
0.06
26. For testing the overall validity of the model, the value of the test statistic is:
a.
53.41
b.
12.96
c.
0.077
d.
220.16
e.
3.54141E-07
582763826
Page 10 of 16
27. What percent of the variability in college GPA has been explained by this model, without
correcting for the number of independent variables?
a.
53.69
b.
28.82
c.
26.60
d.
2.03
e.
same as the correct answer to the previous question
28. Among the following choices, which is the only probable candidate for a 95% confidence
interval for the coefficient on high-school GPA?
a.
(-0.27, 0.65)
b.
(-0.27, -0.65)
c.
(0.41, 0.81)
d.
(-0.41, 0.81)
e.
(-0.41, -0.81)
29. Which of the following statements is correct?
a.
for every additional point scored on the SAT, the estimated average college
GPA increases by 0.0027
b.
for every additional point scored on the SAT, the average college GPA
increases by 0.0027
c.
for every additional point scored on the SAT, the population average college
GPA increases by 0.0027
d.
for every additional point scored on the SAT, the estimated average college
GPA decreases by 0.0027
e.
for every point increase in GPA, SAT score must go up by 0.0027
30. Based on these results, if you were to form a reduced model and then do a partial F-test, the
degrees of freedom of your statistic would be:
a.
1 and 100
b.
2 and 100
c.
96 and 100
d.
2 and 96
e.
1 and 96
31. To represent the categorical variable “MUSIC,” which has levels “classic rock”, “modern
rock” and “other,” how many dummy variables need to be constructed?
a.
1
b.
2
c.
3
d.
4
e.
0
582763826
Page 11 of 16
32. Given the following information, determine if first order autocorrelation exists:
n=25 k=5
alpha=0.10 d=0.90
from DW table: dL=0.95
dU=1.45
a.
there is no evidence of positive autocorrelation
b.
there is significant evidence of negative autocorrelation
c.
we have enough evidence to conclude that positive autocorrelation exists
d.
the test is inconclusive
e.
there is no evidence of first order correlation between the errors
Use the following information to answer the next six questions (#33-38).
We have collected two samples, one each from two independent populations, with the intention
of comparing the population means. The following sample statistics were calculated:


x1  604.02, x2  633.23, s1  64.05, s 2  103.29
33. First we need to perform an F-test to see if the two population variances are equal. Given
that the right critical value is F0.05, 42,106 = 1.50, what is the left critical value F0.95, 106, 42?
a.
3
b.
0.667
c.
–1.5
d.
42
e.
106
34. The value of the F test statistic is:
a.
0.385
b.
2.601
c.
0.912
d.
1.132
e.
1.50
35. Based on the result of the F-test, the appropriate means test in this case is:
a.
the chi-square test for mean difference
b.
the paired sample t-test for mean difference
c.
the pooled variance t-test
d.
the unequal variance t-test
e.
the z-test for mean difference
36. The test statistic for the test selected above is:
a.
-2.09
b.
–1.21
c.
3.32
d.
4.16
e.
0.34
582763826
Page 12 of 16
37. Given that the two-tailed p-value (i.e. when H1: 1  2) corresponding to the test stat
calculated above is 0.0386, your decision at the 5% level should be:
a.
reject the null hypothesis, the means are different.
b.
Do not reject the null hypothesis, the means are not different
c.
reject the null hypothesis, the means are not different
d.
Do not reject the null hypothesis, the mean of population 2 is greater.
38. If on the other hand, you do a one tailed test with H1: 1 > 2, and still obtain the same test
statistic, the new p-value (for this alternative hypothesis) should be:
a.
1.21
b.
0.0386
c.
0.0772
d.
0.9228
e.
0.9807
Use the following information to answer the next 2 questions (#39-40).
Assume you are interested in calculating the seasonal indices for a quarterly data set. You
conduct an analysis of trend then averaged the values obtained by quarter. This process
produced the following numbers:
Q1:
1.053
Q2:
.941
Q3:
.907
Q4:
1.029
39. Your next step would be to:
a. proceed with these numbers as your seasonal indices
b. throw out the numbers and conclude no “stable” seasonality exists in this data set
c. purify the data
d. normalize the data
e. none of the above
40. The seasonal index used to represent the 1st quarter would be:
a. 1.053
b. .9825
c. 1.072
d. 1.035
e. None of the above
41. Which of the following is a measure of the linear relationship between two variables?
a. The standard deviation
b. The covariance
c. The coefficient of correlation
d. The variance
e. Both b & c
582763826
Page 13 of 16
42. Generally speaking, if two variables are unrelated, the covariance will be:
a. a smaller positive number than if they were related
b. a smaller positive or negative number than if they were related
c. a larger negative number than if they were related
d. a larger number than if they were related
e. a positive number close to zero
43. The Central Limit Theorem is among the most remarkable theorems in all statistics.
Essentially it says that:
a. for sufficiently large samples, the sampling distribution of the sample mean is
approximately normal when the sample is drawn from a normal population
b. for sufficiently large samples, the sampling distribution of the sample mean is
approximately normal, irrespective of the population distribution
c. the sample mean is always equal to the population mean if both come from normal
distributions
d. both a & b
e. both b & c
44. A wavelike pattern which persists for more than a year describes which component of a
time series?
a. long term trend
b. cyclical effect
c. seasonal effect
d. random variation
e. none of the above
45. The disadvantages of the moving average method of smoothing a time series include which
of the following:
I.
Only a small portion of the data set is represented in each averaged value
II.
You can only use this method to forecast one period ahead
III.
You lose a portion of your data set
IV.
You cannot smooth using an even number of periods to calculate the average
a. I & II
b. I & III
c. II, III & IV
d. I, II, III & IV
e. III & IV
582763826
Page 14 of 16
Use the following information to answer the next two questions (#46-47).
Use w=.8 to smooth the following data set:
Day Value
1
18
2
22
3
23
4
19
5
28
46. The smoothed value for time period 3 is:
a. 22.6
b. 21.9
c. 22.1
d. 22.9
e. 22.3
47. The forecasted value for time period 6 would be:
a. 26.1
b. 25.8
c. 26.3
d. 28.0
e. We cannot use this method of smoothing to forecast
48. In the exponential smoothing method that we used in lecture, the larger the smoothing
constant:
a. the more heavily the current actual data point is weighted in the smoothed series
b. the more heavily the smoothed data point lagged one period is weighted in the
smoothed series
c. the less volatile the smoothed series would become
d. the larger the difference between the smoothed data point and the actual data point
would be for each period
e. both a & c
582763826
Page 15 of 16
Answer Key:
1.
d
2.
a
3.
c
4.
c
5.
c
6.
d
7.
b
8.
c
9.
d
10.
b
11.
c
12.
d
13.
b
14.
a
15.
e
16.
e
17.
b
18.
c
19.
c
20.
a
21.
c
22.
d
23.
a
24.
e
25.
b
582763826
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
b
b
c
a
d
b
c
b
a
d
a
a
e
d
c
e
b
b
b
b
a
c
a
Page 16 of 16