Download Handout: Detrending

Time Series Modeling and Detrending 1. Using a Time Series to Observe Changes in Production A time series is a set of data points collected on one variable over time. By collecting and analyzing time series data on gross domestic product (GDP), we can identify trends and fluctuations in the economy’s output level. Figure 1 presents the 67 GDP data points for the US for the years 1929-96. Figure 1 US Gross Domestic Product (1929-96) Billions of 1992 Dollars 8,000 7,000 6,000 5,000 4,000 3,000 2,000 1,000 0 1925 1930 1935 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 This time series of US GPD obviously exhibits an increasing trend over time. However, fluctuations, or deviations from the trend are also apparent in that the series occasionally decreases. Statistical methods exist that can be used to “decompose” the time series into two components: (1) trend and (2) deviation from trend; i.e., fluctuation. 2. Identifying and Characterizing Production Trends In general, a time series model for a particular economic variable is a model that predicts the level of the variable for the current period t based upon (1) the level of the variable in previous periods, (2) the levels of other variables in the current and previous periods, and (3) the time period t itself. Here, we will consider some simple but important examples, illustrating how a time series model can be used to identify and characterize a trend in production. 2.1. Trend Model 1: The Geometric Model One simple way to characterize a trend is by an average growth rate. To correctly calculate an average growth rate, the effect of “compounding” must be recognized. A familiar example of compounding occurs when money deposited in a bank accrues interest. For example suppose $100 is deposited in a bank that pays 10 percent interest on deposits annually. After 1 one year, the original $100 would still be in the bank, plus interest equal to 10 percent of the original $100. Thus, at the end of one year, the bank balance would be given by 100 + [100][.10] = 100[1+.10] = 110 During the second year, the depositor not only earns interest on the original $100 deposit, but interest is also earned on the interest accrued during the first year. That is, interest during the second year is earned on the $110, the bank balance at the beginning of the second year. Thus, at the end of the second year, the bank balance would be given by 121.00  110[1  .10]  100[1  .10][1  .10]  100[1  .10]2 Using the same logic again, at the end of the third year the bank balance would be 133.10  100[1  .10]3 The $133.10 balance is generated by an initial $100 deposit which grows at an average annual rate of 10 percent over 3 years. To obtain a general formula from the equation immediately above, replace the initial $100 bank balance with the variable PV; replace the $133.10 ending balance with the variable FV; replace the .10 decimal percentage interest rate with the variable r; and replace the 3 years of compounding with the variable n. Making these replacements, the formula becomes (1) FV  PV [1  r ]n . Consistent with the notation used in finance, PV stands for “present value,’' while FV stands for “future value.” In general, this general formula gives the relationship between a present value amount and a future value amount, where the present value amount is compounded at rate r for n periods. In this framework, the rate r is that average rate at which the initial PV amount grows into the final FV amount. Solving the equation (1) for r one obtains 1  FV  n (2) r    1.  PV  In Figure 1, the 1929 Real GDP value is 790.9 and can be thought of as the initial PV amount. The 1996 real GDP value is 6,906.8 and can be thought of as the final FV amount. There are 67 years between 1996 and 1929, meaning n is equal to 67. Substituting these values into equation (2), we obtain the average annual growth rate for real GDP over the 1929-96 period: 1  6,906.8  67 r   1  0.0329  3.29% .  790.9  As oppose to being an arithmetic mean of the annual real GDP growth rates, our 3.29 value is a geometric mean, a mean that takes compounding into effect. 2 We can use this average growth rate to develop a time series model of real GDP growth, a geometric model. Let yt denote the actual real GDP value for period t, and let ŷt denote the value of real GDP predicted by the model for period t. Using this notation, the geometric model can be stated as (3) yˆt 1  [1  r ] yˆt , where yˆt  yt in the initial time period t  t0 . In our example, t 0 is 1929 so that yˆ1929  y1929  790.6 . Given the 1929 value of real GDP and r=.0329, the 1930 value predicted by the model is 816.9=[1+0.0329]790.6. The 1930 value is then used to obtain the 1931 predicted value: 843.8=[1+0.0329]816.9. Continuing this process, the entire time series predicted by the model can be generated. If our geometric model is correct, then the 1996 predicted by the model will be identical to the actual value; i.e., yˆ1996  y1996  6906.8 . The geometric model is a simple example of a time series model in that the current predicted value for GDP depends only upon the predicted value for GDP in the previous period and the average annual growth rate r. The trend path predicted by the geometric model is shown in Figure 2, along with the actual real GDP data. Notice that the predicted path for real GDP begins and ends at the same place as the actual path. The predicted path is smooth because the annual growth rate is constant, at 3.29%. Because the actual path is not smooth, the annual growth rate fluctuates along the path. The vertical gap between the actual path and the predicted path is the deviation from trend indicated by the geometric model. Figure 2 US Gross Domestic Product (1929-96) Actual Versus Geometric Trend Billions of 1992 Dollars 8,000.0 7,000.0 6,000.0 5,000.0 4,000.0 3,000.0 2,000.0 1,000.0 0.0 1925 1930 1935 1940 1945 1950 1955 1960 Actual 1965 1970 1975 1980 1985 1990 1995 2000 Geometric A significant problem with the geometric model is that the predicted path is highly sensitive to a change in the final data point. For example, if 1997 were an unusually strong growth year, the entire predicted path would shift up. Conversely, a weak growth year would shift the predicted path down. This is a problem because an unusually strong or an unusually weak performance may be transitory; i.e., temporary. A trend path is supposed to capture the more permanent movements in the variable. Thus, in a better model, transitory changes would not affect the predicted trend path to the extent that they can in the geometric model. Such a model is presented in the next section. 3 2.2. Trend Model 2: The Exponential Model Like, the geometric model, the exponential model generates a path with a constant growth rate. Letting r represent this constant rate and letting ŷt denote the predicted value of the variable that is growing, the exponential model can be presented as (4) yˆt  Aert Here time variable t takes on consecutive, discrete whole number values, beginning with 0 for the initial time period t=t0 and increasing through the last year for which data is available. For example, with data available from 1929 to 1996, the variable t would run from 0 to 67. The number e is special number in mathematics, approximately equal to 2.71. When t=0 is substituted into equation (4), we find that y0=A must hold. Thus, by choosing the initial t level to be zero, we get a nice interpretation of the variable A. The variable A is the predicted value for real GDP in the initial period 0; i.e, the predicted 1929 real GDP level. Using the least squares regression technique, one can obtain an estimate of the constant trend growth rate r. Least squares is a “liner regression” method. Therefore, to use least squares, we must first “linearize” the model. This can be accomplished by taking the natural log of both sides of the last equation. Doing so, we obtain (5) ln( yˆt )  ln( A)  rt . Notice that this “logarithmic transformation” moved the variable r from a position of being an exponent on the number e in equation (4) to being a coefficient on the variable t in equation (5). In equation (4), the fact that r is an exponent implies that there is a non-linear relationship between r and ŷt ; i.e., you would not get a straight line if you plotted the relationship between ŷt and r. However, in equation (5), the fact that r is a coefficient implies that there is a linear relationship between r and ln( ŷt ); i.e., you would get a straight line if you plotted the relationship between ln( ŷt ) and r. Because of this linear relationship, we can obtain a least squares estimate for r by regressing ln(yt ) on t. Using Excel to regress ln(yt ) on t, we obtain the following estimated equation (6) ln( yˆt )  6.56  0.0365t , (217.97) (47.04) R2=0.97 Notice that the regression process takes equation (5) and gives us estimates of ln(A) and r. Our estimate for the ln(A) is 6.56 and our estimate for r is 0.0365. The number in parenthesis underneath each estimate is the estimate’s associated t-statistic. A t-statistic is a test statistic associated with the hypothesis that associated estimate is equal to zero. A crude but commonly used rule of thumb is that a tstatistic greater than 2 (in absolute value) indicates that the estimate is significantly different than zero, while a t-statistic less than 2 indicates that the estimate is not significantly different than zero. Using this rule of thumb, the estimates for ln(A) and r would be judged significantly different than zero. (More formally, we fail to reject the hypothesis that ln(A)=0 and the hypothesis that r=0.) 4 The meaning of the term “significance” here has to do with the extent to which the independent variable in question helps explain the dependent variable. (Here, ln(yt) is the dependent variable and t is the independent variable.) When a t-statistic is close to zero, the indication is that variable in question could be left out of the equation and the remaining variables could explain the dependent variable nearly as well. The R2 for the regression, which is always a number between zero and one, gives the percentage of the variance in the dependent variable that is explained by the model being tested. An R 2 of 1.0 would indicate that the model precisely fits the actual data, meaning there are no deviations from the model’s trend. Here, the R2 is 0.97, or 97 percent. The high t-statistics and high R2 indicate that the model does a reasonably good job of capturing the path followed by real GDP. To use our regression results to obtain a model for the original GDP time series yt, we must first derive an estimate of A from our estimate for ln(A). This is accomplished by recognizing that the exponential function and natural log function are inverse functions. Letting exp(x) denote an exponential function of x and letting ln(x) denote a natural log function of x, the fact that these two are inverse functions implies exp(ln(x))=x and ln(exp(x))=x. We can apply this knowledge by “taking the exponential of the ln(A).” Doing so, we would write exp(ln(A))=A. In words, we get an estimate for A from our estimate of ln(A) by taking the exponential. Because our estimate for the ln(A) is 6.56, our estimate for A is exp(6.56)=706.7. Using our estimates for A and r, we can now re-write equation (4) as (7) yˆ t  706.7e0.0365t We obtain our predicted path from real GDP by substituting in the values for t, ranging form 0 to 67. For example, y1  706.7e0.0365(1)  733.0 and y67  706.7e0.0365t  8177.1 . Figure 3 presents the actual real GDP data along with the predicted path generated by the exponential model. Comparing the exponential model to the geometric model, notice that the geometric model under-predicts the real GDP level much more often than it over-predicts. The least squares estimation method guarantees parameter values which “best fit” of the data for the given model, meaning under-prediction is precisely balanced by over-prediction. Thus, the exponential model provides best fit of the time series data, under the assumption that the series is growing at a constant rate. (It will be shown below that a better fit can be obtained by allowing for a variable growth rate.) 5 Figure 3 US Gross Domestic Product (1929-96) Actual Versus Exponential Trend Billions of 1992 Dollars 9,000 8,000 7,000 6,000 5,000 4,000 3,000 2,000 1,000 0 1925 1930 1935 1940 1945 1950 1955 1960 Actual 1965 1970 1975 1980 1985 1990 1995 2000 Exponential 2.3. Model 3: The Log-Difference Model The log-difference model provides a third way to obtain a constant growth rate estimate for a time series. It is based upon the mathematical fact that the derivative of the ln(x) with respect to x is 1/x. Let’s use this fact to derive the log-difference model. Let yt again denote the value of real GDP during period t, so that ln(yt) is the natural log of this value. To examine how ln(yt) changes over a continuous time interval, take the derivative of ln(yt) with respect to t. Doing so, the logarithm rule and chain rule imply (8) d ln( yt ) 1 dyt  dt yt dt The “d” in the derivative can be thought of as the word “change.” On the left side of equation (8) we are examining the change in the ln(yt) divided by the change in t. If we think in terms of periods (discrete time) rather than in terms of continuous time, the change in t from one period to the next is equal to one, or one period. That is, dt=1. We are examining a change from the current period t to the next period t+1. During this one period, the time series changes from ln(yt) to ln(yt+1), meaning d[ln(yt)]= ln(yt+1)ln(yt). Similarly, the change in yt is given by dyt=yt+1-yt. Substituting ln(yt+1)-ln(yt) for d[ln(yt)], substituting yt+1-yt for dyt, and setting dt=1 in equation (8), we obtain 6 (9) ln( yt 1 )  ln( yt )  yt 1  yt . yt Notice that on the right-side of equation (9) we have the percentage change in the variable y from period t to period t+1. On the left side, we have the log-difference. Thus, what we have shown is that one way to calculate the growth rate from one period to the next is by taking the log difference. The growth rate of the real GDP series is shown in Figure 4 of your stats-book (and in Figure 3.5 below). Notice that, unlike the real GDP series itself, the real GDP growth rate series does not follow a definite trend. A time series variable that does not follow a trend but rather fluctuates around some mean value (without much change in the amplitude of the fluctuation) is called a stationary variable. Statistically, the stationarity property is nice because it indicates that the variable tends toward its mean value, implying that the mean value provides a good estimate of where the value of variable will tend to be. For us, the fact that the log difference of real GDP looks stationary implies that we can obtain an estimate of the real GDP growth rate by simply taking the arithmetic average of the log difference numbers present in the log difference time series. Using Excel to calculate the logs, the log differences, and the average of the log differences for the data plotted in Figure 3.1, an average annual growth rate of r=0.0323, or 3.23 percent, was obtained. Because the log difference series appears stationary, the 3.23 percent growth estimate is thought to be a better estimate of the average annual growth rate for the economy’s output than the 3.29 percent and 3.65 percent estimates obtained from the geometric and exponential models. That is, for use in forecasting the 1997 rate of growth, the estimate of 3.23 percent is the best estimate we have under the assumption that the long run growth rate of the economy is constant. To forecast the 1997 real GDP level, we would apply the estimated growth rate to the 1996 real GDP level. That is, the log-difference model forecasts a 1997 real GDP level of 7129.9= 6,906.8[1+0.0323]. To obtain a predicted historical path for the economy, we use a method similar to that used for the geometric model. The difference is we work from the present back into the past, rather than from the past to the present.. The formula used to generate the predicted path for the geometric model is given as equation (3). To work backwards, we need to solve for ŷt in terms of yˆ t 1 . Solving the equation for ŷt , we obtain yˆ t  yˆ t 1 . Taking this equation and moving the time subscripts back one period, gives [1  r ] us a relationship between last period’s value and this period’ value: (10) yˆ t 1  yˆ t . [1  r ] When equation (10) is used to generate a predicted path for real GDP, one obtains a path very similar to that obtained from the geometric model. For that reason, the log-difference path will not be plotted here. However, it should be kept in mind that the path generated by the log difference model need not always be similar to the geometric model. In general, the log difference model generates the best estimate of a constant growth rate for a time series variable. 7 2.4. Trend Model 4: The Polynomial Model The geometric, exponential, and log difference models each generate predicted paths for a time series variable that follow trends associated with a constant growth rate. The geometric model is useful because it allows one to get a “ball park” estimate of the growth rate while using only three numbers--the number of periods, the beginning value of the series, and the ending value of the series. The exponential model is useful because it provides a growth rate estimate associated with the best fit of the historical data. (This best-fit path is useful in examining economic fluctuations, as will be shown below.) Finally, the log difference model generates the best estimate of a constant growth rate for forecasting purposes. The constant growth rate assumption is useful in that it characterizes the performance of the economy using a single number---the average annual growth rate. However, for many time series variables, it is obvious that constant growth is not exhibited, even on average. The polynomial model, described here, is similar to the exponential model in that in generates a path that best-fits the historical data. However, it differs from the exponential model in that the growth rate is allowed to vary. Like the exponential model, the polynomial model involves a least squares regression. The following estimation equation is what gives the model its name: (11) yˆt  a0  a1t  a2t 2  a3t 3 . Considering equation (11), we can think of ŷt as a function of t. More precisely, yt is said to be a polynomial function of t because of the increasing powers of t observed in the equation. Operationally, the regression is performed by first creating the variable t (as describe above for the exponential model). The variables t2 and t3 are obtained by respectively squaring and cubing the variable t. Then, the 2 3 variable ŷt is regressed on a constant, t, t , and t . The regression yields the estimates for a0 , a1 , a2 , and a3 , and the predicted path ŷt can be constructed using equation (11). Figure 4 presents the actual real GDP data along with the predicted path generated by the polynomial model. Comparing the polynomial model to the exponential model, notice that the polynomial path in Figure 4 fits the data better than the exponential path in Figure 3. This is because the growth rate is allowed to vary along the polynomial path, while the growth rate is restricted to some fixed value along the exponential path. 8 Figure 4 US Gross Domestic Product (1929-96) Actual Versus Polynomial Trend Error! Not a valid link. Figure 5 displays the growth rate path generated by the polynomial and exponential models, along with the actual real GDP growth rate. The polynomial growth rate path is obtained by calculating the growth rate for each year as the percentage change in the predicted value. For example, the growth rate shown for 1930 is the percentage change in the predicted series from 1929 to 1930: (642.1614.09)/614.0=0.046=4.6%. Alternatively, the growth rate shown for 1996 is the percentage change in the predicted series from 1995 to 1996: (6,953.5-6,791.5)/6,791.6= 0.024=2.4%. The actual growth rates are calculated in an analogous manner using the actual real GDP data series. The exponential growth rate remains constant at the estimated 3.65 %. Notice that the growth rate generated by the polynomial model first increases, peaking during the late 1930’s, and then exhibits a decreasing trend. Significantly, this indicates that the rate of economic growth in the US is decreasing. The exponential model offers a different interpretation of US economic history. The exponential model indicates that US economic growth is not slowing; the weaker performance more recently is viewed as a temporary experience that is below the trend. (This below trend experience can be seen in Figure 3.) Figure 5 US Gross Domestic Product Growth Rate (1930-96) Actual Versus Exponential Model and Polynomial Model Estimates Error! Not a valid link. Future living standards greatly depend upon which model is closer to the truth---the polynomial or the exponential. A decline in the rate of economic growth indicates lower living standards than would otherwise be experienced. Without additional theorizing, there is no way to choose which model more closely represents reality. Hopefully, by the time you finish this book, you will have some additional tools that will enable you to thoughtfully comment on which of these two models you believe more closely represents what is happening to the US economy. 3. Production Fluctuations Examining figures 3 and 4, note that the economy is rarely following the predicted trend precisely. The economy is normally either above trend or below trend. The term detrending is used to describe a 9 process that eliminates the trend from the data so that only the deviations from trend remain. Typically, detrending simply involves subtracting the predicted trend value from the actual data value for each given year. In the next two sections, we examine two detrended series: the exponential and the polynomial. 3.1. Detrending Real GDP Using the Exponential Model Figure 6 presents a detrended real GDP series US obtained from the exponential model presented above. Examining a detrended series like that shown, allows us to consider what has been called the business cycle. The term business cycle is used because the detrended series cycles back and forth from being above trend to being below trend. The detrended series obtained from the exponential model, indicates that the economy has experienced only one full cycle since 1929. This cycle began when the economy was about on trend in 1930 and ended when the economy was about on trend in 1981. The trough of the cycle, or the below trend period, started in 1930 and ended in 1941---roughly the period of the Great Depression. The peak of the cycle, or the above trend period, started in 1941 and ended in 1981. Since 1981, the model indicates that the economy has been in another trough. The deviation below trend is very strong, and there is no indication that the bottom of the trough has been reached. This is one reason why we might want to question the assumption that the economy is following a constant growth rate path. Figure 6 Detrended Real Gross Domestic Product for the US (1929-96) Residuals From the Exponential Model Error! Not a valid link. 3.2. Detrending Real GDP Using the Polynomial Model Figure 7 presents a detrended real GDP series obtained from the polynomial model presented above. Notice that by allowing the growth rate to vary, we obtain a detrended series than contains many more business cycles than we obtain under the assumption of constant growth. As characterized in Table 1, there are seven complete cycles shown in Figure 7. Figure 7 Detrended Real Gross Domestic Product for the US (1929-96) Residuals From the Polynomial Model Error! Not a valid link. A cycle can be characterized by its amplitude and duration. The amplitude is the magnitude of the deviation from trend, while the duration is the length of the cycle. Table 1 presents the seven complete cycles generated by the polynomial model. The duration of each is shown. A rough characterization of the amplitude of the trough and peak for each cycle is also shown. Cycles are regular if their duration times are equal. Note that the seven cycles presented in Table 1 are irregular. This irregularity makes predicting economic performance more difficult than it would be if the cycles were regular. Table 1: Seven Business Cycles Cycle Period Duration 1931-48 17 years 1948-53 5 years 1953-55 2 years 1955-70 15 years Trough Period Amplitude 1931-41 Large 1948-50 Small 1953-55 Small 1955-65 Large 10 Period 1941-48 1950-53 1955 1965-70 Peak Amplitude Large Small Small Medium 1970-74 1974-80 1980-91 4 years 6 years 11 years 1970-71 1974-75 1980-84 Small Medium Large 1972-74 1976-80 1985-91 Medium Medium Large Each of the troughs shown in Table 1 contains a recession or near recession. Here, we will define a recession as a time period over which output decreases. Using this definition, the US has experienced recessions in the following periods: 1930-33, 1938, 1949, 1954, 1958, 1970, 1974-75, 1980, 1982, and 1991. There is some tendency for larger troughs to be followed by larger peaks and some tendency for longer troughs to be followed by longer peaks. These observations indicate that the US economy, tends to bounce back to health after experiencing weakness. To summarize, we know that the output of the US economy follows an increasing trend, but fluctuates around this trend in an irregular manner. To explain why the economy grows and to explain why the economy fluctuates, economists tend to examine the structure of the economy. In terms of modeling this structure, specifying a production function is a typical first step. 11

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Handout: Detrending