Download Exam # 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
DS 533
Fall 2003
Exam # 1
Name: ___________________
Show All your Work
1.
You are the quality control manager in a plant that produces bungee cords. Your
responsibility is to oversee the production of the synthetic material in the cord. Specifically,
your responsibility is to ensure that bungee cords have the correct elastic qualities to avoid
personal injury lawsuits.
Your efforts are compounded in that you use two procedures for testing bungee cord elasticity,
procedure A and procedure B. Procedure A is generally subject to error, but few are very large.
On the other hand, procedure B is very accurate but subject to large one-time errors.
Specifically, forecast errors in evaluating the dynamic cord elasticity per pound of load are
presented below for a random sample of four cords.
Procedure A Forecast Errors
.01
-.01
-.02
.02
Procedure B Forecast Errors
.008
-.009
-.008
.03
Using mean-absolute deviation (MAE) and mean-squared error (RMSE), evaluate the relative
accuracy of each procedure. Which procedure will you use in quality control testing?
Procedure A
1
.01  .01  .02  .02 .06
MAE  ( y i  yˆ i ) 

 .015
n
4
RMSE 
( y
i
 yˆ i ) 2
n

.001  .001  .004  .004
.001

 .0158
4
4
Procedure B
.008  .009  .008  .03 .055
MAE 

 .01375
4
4
.001109
RMSE 
 .01665
4
Under MAD procedure B is superior, under RMSE procedure A is superior. Ultimately this
depends on the relative costs of large versus small forecast errors underlying the accuracy
measures. I suspect that large errors are more costly to the firm than small, and accordingly apply
the RMSE and conclude that procedure A is superior.
2.
A random sample of twelve automobiles showed the following figures for miles achieved
on a gallon of gas. Assume the population distribution is normal. From the data:
12
12
i 1
i 1
12
n  12,  X i  232.9,  X i  4533.49, and  ( X
(a)
s2 
(c)
i 1
i X )
2
 13.2867.
Find the average miles per gallon of gas.
x
(b)
2
232.9
 19.41
12
Find the sample variance.
13.2867
 1.21
11
Find an approximate 95% confidence interval for the population mean.
From t-table for  = .05 and n-1 = 11 degrees of freedom, we get t 
2
x  t
2
, n 1
s
n
1.099
12
(18.71, 20.11)
19.41  2.21
 2.201
, 11
3.
Private housing starts (PHS) are considered leading indicators of future economic
activity. Using Quarterly data on private housing starts over the period Feb. 1980-Feb. 2000, the
estimated correlograms are reported below.
0.8
0.6
0.4
ACF
0.2
Upper Limit
Low er Limit
0
-0.2
1
2
3
4
5
6
7
8
9
10
11
12
-0.4
a)
Based on  = 5%, do you think that the private housing starts data exhibit seasonal
variation? Explain your inferences clearly and note what hypotheses you are testing in
your answer.
H0: k = 0
Ha: k  0
K = 4, 8, 12
Since r4, r8, r12 are greater than the critical value = 2/80 = .224
Reject H0 that k = 0 for k = 4, 8, 12. This implies that seasonal variation exists in this time
series data.
b)
To evaluate the possibility of a significant trend in the data it is recommended to
deseasonalize the series first. The correlograms for the deseasonalized PHS series are
reported below.
1.2
1
0.8
0.6
ACF
0.4
Upper Limit
0.2
Low er Limit
0
-0.2
1
2
3
4
5
6
7
8
9
10
11
12
-0.4
Using  = 5% can you say that there is a strong evidence for trend in the PHS series? Explain
your inferences clearly and note what hypotheses you are testing in your answer.
H0: k = 0
Ha: k  0
K = 1, 2, 3, …, 12
rk for k =1, 2, 3, …, 7 gradually decline and become not significant after r7 at 5% level of
significance. This implies that there is a trend in this time series data.
4.
In 1985, the government bond yield in the United States was 10.62 percent. A random
sample of government bond yields in nine foreign countries was:
11.04, 6.34, 10.94, 13.00, 7.34, 13.09, 4.78, 10.62, 6.87.
The mean foreign bond yield was 9.34 with variance 9.31. Assume that government bond yields
are normally distributed.
At the 5 percent level of significance, test whether the government bond yields in the rest of the
world during 1985 were lower than in the United States (state the null and alternative hypothesis,
evaluate the test statistic, draw the decision criteria or evaluate the P-value, and state your
conclusion).
H0:   10.62
Ha:  <10.62
t
x   0 9.34  10.62

 1.26
s
3.05
n
9
Decision Criteria:
t *  t , n1  t.05, 8  1.86
Reject H0 if t < -1.86
Conclusion:
Do not reject H0, government bond yields in the rest of the world was not lower than the
government bond yield in the U.S. during 1985.
Multiple Choice
Select the best answer
1.
Which measure of forecast accuracy is analogous to standard deviation?
A)
B)
C)
D)
Mean Absolute Error.
Mean Absolute Percentage Error.
Mean Squared Error.
Root Mean Squared Error **
2.
Which of the following measures is a poor indicator of forecast accuracy, but useful in
determining the direction of bias in a forecasting model?
A)
B)
C)
D)
E)
3.
Mean Absolute Percentage Error.
Mean Percentage Error. **
Mean Squared Error.
Root Mean Squared Error.
None of the above.
Which of the following is incorrect? Evaluation of forecast accuracy
A)
B)
C)
D)
E)
F)
is important since the production of forecasts is costly to the firm.
requires the use of symmetric error cost functions.
is important since it may reduce business losses from inaccurate forecasts.
is done by averaging forecast errors.
both b) and d) are incorrect. **
both a) and b) are incorrect.
4.
Measures of forecast accuracy based upon a quadratic error cost function, notably root
mean square error (RMSE), tend to treat
A)
B)
C)
D)
D)
levels of large and small forecast errors equally.
large and small forecast errors equally on the margin.
large and small forecast errors unequally on the margin. **
every forecast error with the same penalty.
None of the above.
5.
You are given a time series of sales data with 10 observations. You construct forecasts
according to last period’s actual level of sales plus the most recent observed change in sales.
How many data points will be lost in the forecast process relative to the original data series?
A)
B)
C)
D)
E)
One.
Two. **
Three.
Zero.
None of the above.
6.
Forecasts based solely on the most recent observation(s) of the variable of interest
A)
B)
C)
D)
E)
7.
The sampling distribution of the sample mean is
A)
B)
C)
D)
F)
8.
are called “naive” forecasts.
are the simplest of all quantitative forecasting methods.
leads to loss of one data point in the forecast series relative to the original series.
are consistent with the “random walk” hypothesis in finance, which states that the
optimal forecast of today's stock rate of return is yesterday's actual rate of return.
All the above. **
normally distributed with mean  and variance 2.
normally distributed with mean  and variance s2.
distributed as a t distribution with variance 1.
normally distributed with mean 0 and variance 1.
None of the above. **
An unbiased model
A)
B)
C)
D)
is one that does not consistently over-estimate or under-estimate the true value of
a parameter. **
is one that consistently produces estimates with the smallest RMSE.
is one, which contains no independent variable; it depends solely on time-series
pattern recognition.
is one made up by a team of forecasters.
9.
When testing the null hypothesis that the population correlation between a pair of
variables is zero
A)
B)
C)
D)
10.
the normal sampling distribution is used.
the chi-square distribution is used.
the standard normal distribution is used.
The t distribution is used for small samples. **
Which of the following is not consistent with the presence of a trend in a time series?
A)
B)
C)
D)
The autocorrelation function declines quickly to zero as the lag increases. **
The autocorrelation function of the first-differences declines quickly to zero as the
lag increases.
The autocorrelation function declines slowly towards zero as the lag increases.
The autocorrelation function of the first-differences quickly declines to zero.
11.
Autocorrelation refers to the correlation between a variable and:
A)
B)
C)
D)
E)
12.
itself.
another very similar variable.
itself when lagged one or more periods. **
another variable when the analysis is done on a computer.
None of the above.
Stationarity refers to
A)
B)
C)
D)
E)
the size of the RMSE of a forecasting model.
the size of variances of the model's estimates.
a method of forecast optimization.
lack of trend in a given time series. **
None of the above.
Formulas
Mean Absolute Error
MAE 
1 n
 yt  yˆt
n t 1
Mean Squared Error
MSE 
1 n
( yt  yˆ t ) 2

n t 1
Mean absolute percentage Error
MAPE 
1 n yt  yˆ t

n t 1 yt
Mean percentage Error
MPE 
1 n ( yt  yˆ t )
 y
n t 1
t
Root mean square error
n
(y
RMSE=
t 1
t
 yˆ t ) 2
n
yˆ t  yt 1
yˆ t  yt 1  P( yt 1  yt 2 )
( xi ) 2
 ( xi  x ) 2   x  n
S2 
n 1
n 1
n
x
 xi
2
i
i 1
n
 (x  x)
s
2
i
z
n 1
Standard Normal Test

Confidence Interval
X  z
X  0
z
 n
T-test

n
Confidence Interval
t
r
x
x
s n
n XY   X  Y
n X  ( X )
2
n
rk 
x t*
(y
t  k 1
t
n Y  ( Y )
 y )( yt  k  y )
n
(y
t 1
2
t
 y)2
2
s
n
t
2
t
r 0
1 r 2
n2
rk  0
1 nk