Download Handout: Detrending

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Business cycle wikipedia , lookup

Okishio's theorem wikipedia , lookup

Rostow's stages of growth wikipedia , lookup

Economic growth wikipedia , lookup

Transcript
Time Series Modeling and Detrending
1. Using a Time Series to Observe Changes in Production
A time series is a set of data points collected on one variable over time. By collecting and analyzing
time series data on gross domestic product (GDP), we can identify trends and fluctuations in the
economy’s output level. Figure 1 presents the 67 GDP data points for the US for the years 1929-96.
Figure 1
US Gross Domestic Product (1929-96)
Billions of 1992 Dollars
8,000
7,000
6,000
5,000
4,000
3,000
2,000
1,000
0
1925
1930
1935
1940
1945
1950
1955
1960
1965
1970
1975
1980
1985
1990
1995
2000
This time series of US GPD obviously exhibits an increasing trend over time. However, fluctuations, or
deviations from the trend are also apparent in that the series occasionally decreases. Statistical methods
exist that can be used to “decompose” the time series into two components: (1) trend and (2) deviation
from trend; i.e., fluctuation.
2. Identifying and Characterizing Production Trends
In general, a time series model for a particular economic variable is a model that predicts the level of the
variable for the current period t based upon (1) the level of the variable in previous periods, (2) the levels
of other variables in the current and previous periods, and (3) the time period t itself. Here, we will
consider some simple but important examples, illustrating how a time series model can be used to
identify and characterize a trend in production.
2.1. Trend Model 1: The Geometric Model
One simple way to characterize a trend is by an average growth rate. To correctly calculate an average
growth rate, the effect of “compounding” must be recognized.
A familiar example of compounding occurs when money deposited in a bank accrues interest. For
example suppose $100 is deposited in a bank that pays 10 percent interest on deposits annually. After
1
one year, the original $100 would still be in the bank, plus interest equal to 10 percent of the original
$100. Thus, at the end of one year, the bank balance would be given by
100 + [100][.10] = 100[1+.10] = 110
During the second year, the depositor not only earns interest on the original $100 deposit, but interest
is also earned on the interest accrued during the first year. That is, interest during the second year is
earned on the $110, the bank balance at the beginning of the second year. Thus, at the end of the second
year, the bank balance would be given by
121.00  110[1  .10]  100[1  .10][1  .10]  100[1  .10]2
Using the same logic again, at the end of the third year the bank balance would be
133.10  100[1  .10]3
The $133.10 balance is generated by an initial $100 deposit which grows at an average annual rate of 10
percent over 3 years. To obtain a general formula from the equation immediately above, replace the
initial $100 bank balance with the variable PV; replace the $133.10 ending balance with the variable FV;
replace the .10 decimal percentage interest rate with the variable r; and replace the 3 years of
compounding with the variable n. Making these replacements, the formula becomes
(1) FV  PV [1  r ]n .
Consistent with the notation used in finance, PV stands for “present value,’' while FV stands for
“future value.” In general, this general formula gives the relationship between a present value amount
and a future value amount, where the present value amount is compounded at rate r for n periods. In this
framework, the rate r is that average rate at which the initial PV amount grows into the final FV amount.
Solving the equation (1) for r one obtains
1
 FV  n
(2) r  
 1.
 PV 
In Figure 1, the 1929 Real GDP value is 790.9 and can be thought of as the initial PV amount. The 1996
real GDP value is 6,906.8 and can be thought of as the final FV amount. There are 67 years between
1996 and 1929, meaning n is equal to 67. Substituting these values into equation (2), we obtain the
average annual growth rate for real GDP over the 1929-96 period:
1
 6,906.8  67
r
  1  0.0329  3.29% .
 790.9 
As oppose to being an arithmetic mean of the annual real GDP growth rates, our 3.29 value is a
geometric mean, a mean that takes compounding into effect.
2
We can use this average growth rate to develop a time series model of real GDP growth, a geometric
model. Let yt denote the actual real GDP value for period t, and let ŷt denote the value of real GDP
predicted by the model for period t. Using this notation, the geometric model can be stated as
(3) yˆt 1  [1  r ] yˆt ,
where yˆt  yt in the initial time period t  t0 . In our example, t 0 is 1929 so that yˆ1929  y1929  790.6 .
Given the 1929 value of real GDP and r=.0329, the 1930 value predicted by the model is
816.9=[1+0.0329]790.6.
The 1930 value is then used to obtain the 1931 predicted value:
843.8=[1+0.0329]816.9. Continuing this process, the entire time series predicted by the model can be
generated. If our geometric model is correct, then the 1996 predicted by the model will be identical to
the actual value; i.e., yˆ1996  y1996  6906.8 .
The geometric model is a simple example of a time series model in that the current predicted value for
GDP depends only upon the predicted value for GDP in the previous period and the average annual
growth rate r.
The trend path predicted by the geometric model is shown in Figure 2, along with the actual real GDP
data. Notice that the predicted path for real GDP begins and ends at the same place as the actual path.
The predicted path is smooth because the annual growth rate is constant, at 3.29%. Because the actual
path is not smooth, the annual growth rate fluctuates along the path. The vertical gap between the actual
path and the predicted path is the deviation from trend indicated by the geometric model.
Figure 2
US Gross Domestic Product (1929-96)
Actual Versus Geometric Trend
Billions of 1992 Dollars
8,000.0
7,000.0
6,000.0
5,000.0
4,000.0
3,000.0
2,000.0
1,000.0
0.0
1925
1930
1935
1940
1945
1950
1955
1960
Actual
1965
1970
1975
1980
1985
1990
1995
2000
Geometric
A significant problem with the geometric model is that the predicted path is highly sensitive to a change
in the final data point. For example, if 1997 were an unusually strong growth year, the entire predicted
path would shift up. Conversely, a weak growth year would shift the predicted path down. This is a
problem because an unusually strong or an unusually weak performance may be transitory; i.e.,
temporary. A trend path is supposed to capture the more permanent movements in the variable. Thus, in
a better model, transitory changes would not affect the predicted trend path to the extent that they can in
the geometric model. Such a model is presented in the next section.
3
2.2. Trend Model 2: The Exponential Model
Like, the geometric model, the exponential model generates a path with a constant growth rate. Letting r
represent this constant rate and letting ŷt denote the predicted value of the variable that is growing, the
exponential model can be presented as
(4) yˆt  Aert
Here time variable t takes on consecutive, discrete whole number values, beginning with 0 for the initial
time period t=t0 and increasing through the last year for which data is available. For example, with data
available from 1929 to 1996, the variable t would run from 0 to 67. The number e is special number in
mathematics, approximately equal to 2.71. When t=0 is substituted into equation (4), we find that y0=A
must hold. Thus, by choosing the initial t level to be zero, we get a nice interpretation of the variable A.
The variable A is the predicted value for real GDP in the initial period 0; i.e, the predicted 1929 real GDP
level.
Using the least squares regression technique, one can obtain an estimate of the constant trend growth rate
r. Least squares is a “liner regression” method. Therefore, to use least squares, we must first “linearize”
the model. This can be accomplished by taking the natural log of both sides of the last equation. Doing
so, we obtain
(5) ln( yˆt )  ln( A)  rt .
Notice that this “logarithmic transformation” moved the variable r from a position of being an exponent
on the number e in equation (4) to being a coefficient on the variable t in equation (5). In equation (4),
the fact that r is an exponent implies that there is a non-linear relationship between r and ŷt ; i.e., you
would not get a straight line if you plotted the relationship between ŷt and r. However, in equation (5),
the fact that r is a coefficient implies that there is a linear relationship between r and ln( ŷt ); i.e., you
would get a straight line if you plotted the relationship between ln( ŷt ) and r. Because of this linear
relationship, we can obtain a least squares estimate for r by regressing ln(yt ) on t.
Using Excel to regress ln(yt ) on t, we obtain the following estimated equation
(6) ln( yˆt )  6.56  0.0365t ,
(217.97) (47.04)
R2=0.97
Notice that the regression process takes equation (5) and gives us estimates of ln(A) and r. Our
estimate for the ln(A) is 6.56 and our estimate for r is 0.0365. The number in parenthesis underneath
each estimate is the estimate’s associated t-statistic. A t-statistic is a test statistic associated with the
hypothesis that associated estimate is equal to zero. A crude but commonly used rule of thumb is that a tstatistic greater than 2 (in absolute value) indicates that the estimate is significantly different than zero,
while a t-statistic less than 2 indicates that the estimate is not significantly different than zero. Using this
rule of thumb, the estimates for ln(A) and r would be judged significantly different than zero. (More
formally, we fail to reject the hypothesis that ln(A)=0 and the hypothesis that r=0.)
4
The meaning of the term “significance” here has to do with the extent to which the independent
variable in question helps explain the dependent variable. (Here, ln(yt) is the dependent variable and t is
the independent variable.) When a t-statistic is close to zero, the indication is that variable in question
could be left out of the equation and the remaining variables could explain the dependent variable nearly
as well.
The R2 for the regression, which is always a number between zero and one, gives the percentage of the
variance in the dependent variable that is explained by the model being tested. An R 2 of 1.0 would
indicate that the model precisely fits the actual data, meaning there are no deviations from the model’s
trend. Here, the R2 is 0.97, or 97 percent. The high t-statistics and high R2 indicate that the model does a
reasonably good job of capturing the path followed by real GDP.
To use our regression results to obtain a model for the original GDP time series yt, we must first
derive an estimate of A from our estimate for ln(A). This is accomplished by recognizing that the
exponential function and natural log function are inverse functions. Letting exp(x) denote an exponential
function of x and letting ln(x) denote a natural log function of x, the fact that these two are inverse
functions implies exp(ln(x))=x and ln(exp(x))=x. We can apply this knowledge by “taking the
exponential of the ln(A).” Doing so, we would write exp(ln(A))=A. In words, we get an estimate for A
from our estimate of ln(A) by taking the exponential. Because our estimate for the ln(A) is 6.56, our
estimate for A is exp(6.56)=706.7.
Using our estimates for A and r, we can now re-write equation (4) as
(7) yˆ t  706.7e0.0365t
We obtain our predicted path from real GDP by substituting in the values for t, ranging form 0 to 67.
For example, y1  706.7e0.0365(1)  733.0 and y67  706.7e0.0365t  8177.1 . Figure 3 presents the
actual real GDP data along with the predicted path generated by the exponential model. Comparing the
exponential model to the geometric model, notice that the geometric model under-predicts the real GDP
level much more often than it over-predicts. The least squares estimation method guarantees parameter
values which “best fit” of the data for the given model, meaning under-prediction is precisely balanced
by over-prediction.
Thus, the exponential model provides best fit of the time series data, under the
assumption that the series is growing at a constant rate. (It will be shown below that a better fit can be
obtained by allowing for a variable growth rate.)
5
Figure 3
US Gross Domestic Product (1929-96)
Actual Versus Exponential Trend
Billions of 1992 Dollars
9,000
8,000
7,000
6,000
5,000
4,000
3,000
2,000
1,000
0
1925
1930
1935
1940
1945
1950
1955
1960
Actual
1965
1970
1975
1980
1985
1990
1995
2000
Exponential
2.3. Model 3: The Log-Difference Model
The log-difference model provides a third way to obtain a constant growth rate estimate for a time
series. It is based upon the mathematical fact that the derivative of the ln(x) with respect to x is 1/x.
Let’s use this fact to derive the log-difference model. Let yt again denote the value of real GDP during
period t, so that ln(yt) is the natural log of this value. To examine how ln(yt) changes over a continuous
time interval, take the derivative of ln(yt) with respect to t. Doing so, the logarithm rule and chain rule
imply
(8)
d ln( yt ) 1 dyt

dt
yt dt
The “d” in the derivative can be thought of as the word “change.” On the left side of equation (8) we are
examining the change in the ln(yt) divided by the change in t. If we think in terms of periods (discrete
time) rather than in terms of continuous time, the change in t from one period to the next is equal to one,
or one period. That is, dt=1. We are examining a change from the current period t to the next period
t+1. During this one period, the time series changes from ln(yt) to ln(yt+1), meaning d[ln(yt)]= ln(yt+1)ln(yt). Similarly, the change in yt is given by dyt=yt+1-yt. Substituting ln(yt+1)-ln(yt) for d[ln(yt)],
substituting yt+1-yt for dyt, and setting dt=1 in equation (8), we obtain
6
(9)
ln( yt 1 )  ln( yt ) 
yt 1  yt
.
yt
Notice that on the right-side of equation (9) we have the percentage change in the variable y from period t
to period t+1. On the left side, we have the log-difference. Thus, what we have shown is that one way
to calculate the growth rate from one period to the next is by taking the log difference.
The growth rate of the real GDP series is shown in Figure 4 of your stats-book (and in Figure 3.5
below). Notice that, unlike the real GDP series itself, the real GDP growth rate series does not follow a
definite trend. A time series variable that does not follow a trend but rather fluctuates around some mean
value (without much change in the amplitude of the fluctuation) is called a stationary variable.
Statistically, the stationarity property is nice because it indicates that the variable tends toward its mean
value, implying that the mean value provides a good estimate of where the value of variable will tend to
be. For us, the fact that the log difference of real GDP looks stationary implies that we can obtain an
estimate of the real GDP growth rate by simply taking the arithmetic average of the log difference
numbers present in the log difference time series. Using Excel to calculate the logs, the log differences,
and the average of the log differences for the data plotted in Figure 3.1, an average annual growth rate of
r=0.0323, or 3.23 percent, was obtained.
Because the log difference series appears stationary, the 3.23 percent growth estimate is thought to be
a better estimate of the average annual growth rate for the economy’s output than the 3.29 percent and
3.65 percent estimates obtained from the geometric and exponential models. That is, for use in
forecasting the 1997 rate of growth, the estimate of 3.23 percent is the best estimate we have under the
assumption that the long run growth rate of the economy is constant. To forecast the 1997 real GDP
level, we would apply the estimated growth rate to the 1996 real GDP level. That is, the log-difference
model forecasts a 1997 real GDP level of 7129.9= 6,906.8[1+0.0323].
To obtain a predicted historical path for the economy, we use a method similar to that used for the
geometric model. The difference is we work from the present back into the past, rather than from the
past to the present.. The formula used to generate the predicted path for the geometric model is given as
equation (3). To work backwards, we need to solve for ŷt in terms of yˆ t 1 . Solving the equation for
ŷt , we obtain yˆ t 
yˆ t 1
. Taking this equation and moving the time subscripts back one period, gives
[1  r ]
us a relationship between last period’s value and this period’ value:
(10)
yˆ t 1 
yˆ t
.
[1  r ]
When equation (10) is used to generate a predicted path for real GDP, one obtains a path very similar
to that obtained from the geometric model. For that reason, the log-difference path will not be plotted
here. However, it should be kept in mind that the path generated by the log difference model need not
always be similar to the geometric model. In general, the log difference model generates the best
estimate of a constant growth rate for a time series variable.
7
2.4. Trend Model 4: The Polynomial Model
The geometric, exponential, and log difference models each generate predicted paths for a time series
variable that follow trends associated with a constant growth rate.
The geometric model is useful
because it allows one to get a “ball park” estimate of the growth rate while using only three numbers--the number of periods, the beginning value of the series, and the ending value of the series. The
exponential model is useful because it provides a growth rate estimate associated with the best fit of the
historical data. (This best-fit path is useful in examining economic fluctuations, as will be shown below.)
Finally, the log difference model generates the best estimate of a constant growth rate for forecasting
purposes.
The constant growth rate assumption is useful in that it characterizes the performance of the economy
using a single number---the average annual growth rate. However, for many time series variables, it is
obvious that constant growth is not exhibited, even on average. The polynomial model, described here, is
similar to the exponential model in that in generates a path that best-fits the historical data. However, it
differs from the exponential model in that the growth rate is allowed to vary.
Like the exponential model, the polynomial model involves a least squares regression. The following
estimation equation is what gives the model its name:
(11)
yˆt  a0  a1t  a2t 2  a3t 3 .
Considering equation (11), we can think of ŷt as a function of t. More precisely, yt is said to be a
polynomial function of t because of the increasing powers of t observed in the equation. Operationally,
the regression is performed by first creating the variable t (as describe above for the exponential model).
The variables t2 and t3 are obtained by respectively squaring and cubing the variable t.
Then, the
2
3
variable ŷt is regressed on a constant, t, t , and t . The regression yields the estimates for a0 , a1 , a2 ,
and a3 , and the predicted path ŷt can be constructed using equation (11).
Figure 4 presents the actual real GDP data along with the predicted path generated by the polynomial
model. Comparing the polynomial model to the exponential model, notice that the polynomial path in
Figure 4 fits the data better than the exponential path in Figure 3. This is because the growth rate is
allowed to vary along the polynomial path, while the growth rate is restricted to some fixed value along
the exponential path.
8
Figure 4
US Gross Domestic Product (1929-96)
Actual Versus Polynomial Trend
Error! Not a valid link.
Figure 5 displays the growth rate path generated by the polynomial and exponential models, along with
the actual real GDP growth rate. The polynomial growth rate path is obtained by calculating the growth
rate for each year as the percentage change in the predicted value. For example, the growth rate shown
for 1930 is the percentage change in the predicted series from 1929 to 1930: (642.1614.09)/614.0=0.046=4.6%. Alternatively, the growth rate shown for 1996 is the percentage change in
the predicted series from 1995 to 1996: (6,953.5-6,791.5)/6,791.6= 0.024=2.4%. The actual growth
rates are calculated in an analogous manner using the actual real GDP data series. The exponential
growth rate remains constant at the estimated 3.65 %.
Notice that the growth rate generated by the polynomial model first increases, peaking during the late
1930’s, and then exhibits a decreasing trend. Significantly, this indicates that the rate of economic
growth in the US is decreasing. The exponential model offers a different interpretation of US economic
history. The exponential model indicates that US economic growth is not slowing; the weaker
performance more recently is viewed as a temporary experience that is below the trend. (This below
trend experience can be seen in Figure 3.)
Figure 5
US Gross Domestic Product Growth Rate (1930-96)
Actual Versus Exponential Model and Polynomial Model Estimates
Error! Not a valid link.
Future living standards greatly depend upon which model is closer to the truth---the polynomial or the
exponential. A decline in the rate of economic growth indicates lower living standards than would
otherwise be experienced. Without additional theorizing, there is no way to choose which model more
closely represents reality. Hopefully, by the time you finish this book, you will have some additional
tools that will enable you to thoughtfully comment on which of these two models you believe more
closely represents what is happening to the US economy.
3. Production Fluctuations
Examining figures 3 and 4, note that the economy is rarely following the predicted trend precisely. The
economy is normally either above trend or below trend. The term detrending is used to describe a
9
process that eliminates the trend from the data so that only the deviations from trend remain. Typically,
detrending simply involves subtracting the predicted trend value from the actual data value for each given
year. In the next two sections, we examine two detrended series: the exponential and the polynomial.
3.1. Detrending Real GDP Using the Exponential Model
Figure 6 presents a detrended real GDP series US obtained from the exponential model presented above.
Examining a detrended series like that shown, allows us to consider what has been called the business
cycle. The term business cycle is used because the detrended series cycles back and forth from being
above trend to being below trend. The detrended series obtained from the exponential model, indicates
that the economy has experienced only one full cycle since 1929. This cycle began when the economy
was about on trend in 1930 and ended when the economy was about on trend in 1981. The trough of the
cycle, or the below trend period, started in 1930 and ended in 1941---roughly the period of the Great
Depression. The peak of the cycle, or the above trend period, started in 1941 and ended in 1981. Since
1981, the model indicates that the economy has been in another trough. The deviation below trend is
very strong, and there is no indication that the bottom of the trough has been reached. This is one reason
why we might want to question the assumption that the economy is following a constant growth rate
path.
Figure 6
Detrended Real Gross Domestic Product for the US (1929-96)
Residuals From the Exponential Model
Error! Not a valid link.
3.2. Detrending Real GDP Using the Polynomial Model
Figure 7 presents a detrended real GDP series obtained from the polynomial model presented above.
Notice that by allowing the growth rate to vary, we obtain a detrended series than contains many more
business cycles than we obtain under the assumption of constant growth. As characterized in Table 1,
there are seven complete cycles shown in Figure 7.
Figure 7
Detrended Real Gross Domestic Product for the US (1929-96)
Residuals From the Polynomial Model
Error! Not a valid link.
A cycle can be characterized by its amplitude and duration. The amplitude is the magnitude of the
deviation from trend, while the duration is the length of the cycle. Table 1 presents the seven complete
cycles generated by the polynomial model. The duration of each is shown. A rough characterization of
the amplitude of the trough and peak for each cycle is also shown. Cycles are regular if their duration
times are equal. Note that the seven cycles presented in Table 1 are irregular. This irregularity makes
predicting economic performance more difficult than it would be if the cycles were regular.
Table 1: Seven Business Cycles
Cycle
Period
Duration
1931-48
17 years
1948-53
5 years
1953-55
2 years
1955-70
15 years
Trough
Period
Amplitude
1931-41
Large
1948-50
Small
1953-55
Small
1955-65
Large
10
Period
1941-48
1950-53
1955
1965-70
Peak
Amplitude
Large
Small
Small
Medium
1970-74
1974-80
1980-91
4 years
6 years
11 years
1970-71
1974-75
1980-84
Small
Medium
Large
1972-74
1976-80
1985-91
Medium
Medium
Large
Each of the troughs shown in Table 1 contains a recession or near recession. Here, we will define a
recession as a time period over which output decreases. Using this definition, the US has experienced
recessions in the following periods: 1930-33, 1938, 1949, 1954, 1958, 1970, 1974-75, 1980, 1982, and
1991. There is some tendency for larger troughs to be followed by larger peaks and some tendency for
longer troughs to be followed by longer peaks. These observations indicate that the US economy, tends
to bounce back to health after experiencing weakness.
To summarize, we know that the output of the US economy follows an increasing trend, but fluctuates
around this trend in an irregular manner. To explain why the economy grows and to explain why the
economy fluctuates, economists tend to examine the structure of the economy. In terms of modeling this
structure, specifying a production function is a typical first step.
11