Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
HW3-Solutions Ozan Sonmez October 19, 2016 Problem 2.1 For the Johnson & Johnson data, say yt , shown in Figure 1.1, let xt = log(yt ). In this problem, we are going to fit a special type of structural model, xt = Tt + St + Nt where Tt is a trend component, St is a seasonal component, and Nt is noise. In our case, time t is in quarters (1960.00, 1960.25, . . . ) so one unit of time is a year. part a Fit the regression model xt = βt + α1 Q1 (t) + α2 Q2 (t) + α3 Q3 (t) + α4 Q4 (t) + wt |{z} | {z } |{z} trend seasonal noise where Qi (t) = 1 if time t corresponds to quarter i = 1, 2, 3, 4, and zero otherwise. The Qi (t)’s are called indicator variables. We will assume for now that wt is a Gaussian white noise sequence. library(astsa) trend = time(jj) - 1970 # helps to `center' time Q = factor(cycle(jj) ) # make (Q)uarter factors reg1 = lm(log(jj)~0 + trend + Q, na.action=NULL) # no intercept summary(reg1) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Call: lm(formula = log(jj) ~ 0 + trend + Q, na.action = NULL) Residuals: Min 1Q Median -0.29318 -0.09062 -0.01180 3Q 0.08460 Max 0.27644 Coefficients: Estimate Std. Error t value Pr(>|t|) trend 0.167172 0.002259 74.00 <2e-16 *** Q1 1.052793 0.027359 38.48 <2e-16 *** Q2 1.080916 0.027365 39.50 <2e-16 *** Q3 1.151024 0.027383 42.03 <2e-16 *** Q4 0.882266 0.027412 32.19 <2e-16 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1254 on 79 degrees of freedom Multiple R-squared: 0.9935, Adjusted R-squared: 0.9931 F-statistic: 2407 on 5 and 79 DF, p-value: < 2.2e-16 1 part b The estimated average annual increase in the logged earnings per share is α̂1 + α̂2 + α̂3 + α̂4 , which can be extracted from the sumamry table above, i.e., 1.052793 + 1.080916 + 1.151024 + 0.882266 = 4.166999 part c If the model is correct, average logged earnings rate increase or decrease from the third quarter to the fourth quarter is α̂4 − α̂3 = −0.268758 (decrease). And it decreases by (0.269/1.151024) × 100 = 23.37049 %. part d What happens if you include an intercept term in the model in (a)? Explain why there was a problem. reg2 = lm(log(jj)~ trend + Q, na.action=NULL) # no intercept summary(reg2) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Call: lm(formula = log(jj) ~ trend + Q, na.action = NULL) Residuals: Min 1Q Median -0.29318 -0.09062 -0.01180 3Q 0.08460 Max 0.27644 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.052793 0.027359 38.480 < 2e-16 *** trend 0.167172 0.002259 73.999 < 2e-16 *** Q2 0.028123 0.038696 0.727 0.4695 Q3 0.098231 0.038708 2.538 0.0131 * Q4 -0.170527 0.038729 -4.403 3.31e-05 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1254 on 79 degrees of freedom Multiple R-squared: 0.9859, Adjusted R-squared: 0.9852 F-statistic: 1379 on 4 and 79 DF, p-value: < 2.2e-16 haing an intercept here takes away the first quarter effect, and the intercept apperas in all quarters, this does not make sense since we want to study the effect of each quarter seperately. part d Graph the data, xt , and superimpose the fitted values, say x̂t , on the graph. Examine the residuals, xt − x̂t , and state your conclusions. Does it appear that the model fits the data well (do the residuals look white)? par(mfrow=c(1,2)) plot(log(jj), main="plot of data and fitted value") # data lines(fitted(reg1), col="red") # fitted plot(log(jj)-fitted(reg1), main="plot of residuals") 2 plot of residuals 0.1 −0.3 −0.1 0.0 log(jj) − fitted(reg1) 1 0 log(jj) 2 0.2 plot of data and fitted value 1960 1970 1980 1960 Time 1970 1980 Time the noise seems not to follow any pattern hence it looks fairly white, and the fit seems pretty good. Problem 2.2 For the mortality data examined in Example 2.2: part a Add another component to the regression in (2.21) that accounts for the particulate count four weeks prior; that is, add Pt−4 to the regression in (2.21). State your conclusion. n = length(tempr) temp = tempr - mean(tempr) # center temperature temp2 = temp^2 trend = time(cmort) # time fit1 = lm(cmort~ trend + temp + temp2 + part, na.action=NULL) fit2 = lm(cmort[5:n]~ trend[5:n] + temp[5:n] + temp2[5:n] + part[5:n] + part[1:(n-4)], na.action=NULL) summary(fit2) ## ## Call: ## lm(formula = cmort[5:n] ~ trend[5:n] + temp[5:n] + temp2[5:n] + ## part[5:n] + part[1:(n - 4)], na.action = NULL) ## ## Residuals: ## Min 1Q Median 3Q Max 3 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## -18.228 -4.314 -0.614 3.713 27.800 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.808e+03 1.989e+02 14.123 < 2e-16 *** trend[5:n] -1.385e+00 1.006e-01 -13.765 < 2e-16 *** temp[5:n] -4.058e-01 3.528e-02 -11.503 < 2e-16 *** temp2[5:n] 2.155e-02 2.803e-03 7.688 8.02e-14 *** part[5:n] 2.029e-01 2.266e-02 8.954 < 2e-16 *** part[1:(n - 4)] 1.030e-01 2.485e-02 4.147 3.96e-05 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 6.287 on 498 degrees of freedom Multiple R-squared: 0.608, Adjusted R-squared: 0.6041 F-statistic: 154.5 on 5 and 498 DF, p-value: < 2.2e-16 as the summary of the fit suggest all the predictors are sttaistically significant, yielding the model: Mt = b0 +b1t+b2(Tt M̂t = β̂0 + β̂1 t + β̂2 (Tt − T. ) + β̂3 (Tt − T. )2 + β̂4 Pt + β̂5 Pt−4 where the estimated parameter (β̂0 , β̂1 , β̂2 , β̂3 , β̂4 , β̂5 ) are given in the above table part b Using AIC and BIC, is the model in (a) an improvement over the final model in Example 2.2? aic1 = AIC(fit1)/n - log(2*pi) aic2 = AIC(fit2)/(n-4) - log(2*pi) aic = data.frame(model1 = aic1, model2=aic2) aic ## model1 model2 ## 1 4.721732 4.692916 there is a little bit of improvement in the model fit, it is not a dramatic improvement over the model without Pt−4 (model1). Problem 2.3 In this problem, we explore the difference between a random walk and a trend stationary process. part a Pt Note from (1.4), a random walk can be expressed as xt = tδ + k=1 wk , where wk is a white noise with 2 variance σw . Hence here we will generate four series that are random walk with drift of length n = 100 with 2 δ = .01 and σw = 1. Call the data xt for t = 1, ..., 100. Fit the regression xt = βt + wt using least squares. Plot the data, the true mean function (i.e., µt = .01t) and the fitted line. 4 #Part a set.seed(2) n=100 delta = 0.01 time = 1:n #time # generate the white noise par(mfrow = c(2,2)) for (k in 1:4){ w = rnorm(n, 0, 1) # generate the random walk based on the above eq x=c() for (t in 1:n){ x[t] = delta*t + sum(w[1:t]) } #true mean function mu = delta*time # fit a regression without intercept fit = lm(x ~ 0 + time) plot(time, x, type="l", main="random walk") lines(time, fitted(fit), col="red") lines(time, mu, col="blue") } random walk 20 40 60 80 100 0 40 60 time random walk random walk 100 80 100 15 30 80 0 x 10 0 x 20 time 20 0 0 4 8 x 4 −2 x 8 random walk 0 20 40 60 80 100 0 20 time 40 60 time part b and for part b, we will do the same thing for the process yt = 0.01t + wt 5 #Part b n=100 time = 1:n #time # generate the white noise par(mfrow = c(2,2)) for (k in 1:4){ w = rnorm(n, 0, 1) y = 0.01 * time + w #true mean function mu = 0.01*time # fit a regression without intercept fit = lm(y ~ 0 + time) plot(time, x, type="l", main="y_t") lines(time, fitted(fit), col="red") lines(time, mu, col="blue") } 40 60 80 100 0 20 40 60 time time y_t y_t 80 100 80 100 15 0 0 x 15 30 20 30 0 x 15 0 x 15 0 x 30 y_t 30 y_t 0 20 40 60 80 100 0 time 20 40 60 time part c Note that the distance between the fit and the true mean is significantly closer in partb because the errors in yt are independent which is one of the main assumptions of the linear regression where as in xt the errors are correlated because of the accumulation of the white noises. Please also consider comparing the estimates of the lienar fits to the mean functions by simply looking at the summary() function of the regression fits. 6 Problem 2.4 Consider a process consisting of a linear trend with an additive noise term consisting of independent random 2 variables wt with zero means and variances σw , that is, xt = β0 + β1 t + wt , where β0 , β1 are fixed constants. part a looking at the mean function: E[xt ] = E[β0 + β1 t + wt ] = β0 + β1 t which is dependent on time t, hence the series xt is not stationary. part b Note that yhe first order difference of xt can be simplified into: ∇xt = xt − xt−1 = β0 + β1 t + wt − [β0 + β1 (t − 1) + wt−1 ] = β1 + wt − wt−1 and the mean function is: E[∇xt ] = E[β1 + wt + wt−1 ] = β1 + E[wt ] − E[wt−1 ] = β1 which is indepednent of time t, and the autocovariance function is: γ∇xt (t + h, h) = Cov(xt+h , xt ) = Cov(β1 + wt+h − wt+h−1 , β1 + wt − wt−1 ) = Cov(wt+h − wt+h−1 , wt − wt−1 ) 2 2σw , h = 0 2 = −σw , |h| = 1 0, |h| > 1 which is also free from time t, hence ∇xt is stationary. part c Repeat part (b) if wt is replaced by a general stationary process, say yt , with mean function µy and autocovariance function γy (h). 7 γ∇xt (t + h, h) = Cov(xt+h , xt ) = Cov(β1 + yt+h − yt+h−1 , β1 + yt − yt−1 ) = Cov(yt+h − yt+h−1 , yt − yt−1 ) = Cov(yt+h , yt ) − Cov(yt+h , yt−1 ) − Cov(yt+h−1 , yt ) + Cov(yt+h−1 , yt−1 ) | {z } | {z } | {z } | {z } γy (h) γy (h+1) γy (h−1) γy (h) = γy (h + 1) − 2γy (h) + γy (h − 1) which is independent of time since yt is stationary and γt (·) is independent of time t, and so is its linear combination γ∇xt (h). Problem 2.6 The glacial varve record plotted in Figure 2.6 exhibits some nonstationarity that can be improved by transforming to logarithms and some additional nonstationarity that can be corrected by differencing the logarithms. part a n = length(varve) varve1 = varve[1:n/2] varve2 = varve[(n/2+1):n] data.frame(FirstHalf = var(varve1), SecondHalf = var(varve2)) ## FirstHalf SecondHalf ## 1 132.501 594.4904 as we can see the sample variance in the second half of the data is significanlty larger than the first half, therefore the variance is not homogenous, hence we need to statibilize the variance by data transformation such as log(·)-transformation. Having a log transformation obviously made data closer to the normality. par(mfrow = c(1,2)) hist(varve, main="raw data") hist(log(varve), main="log-transformed data") 8 log−transformed data 0 0 50 50 100 Frequency 150 100 150 200 250 Frequency 200 raw data 0 50 100 150 1 varve 2 3 4 5 log(varve) part b Plot the series yt = log(xt ). Do any time intervals, of the order 100 years, exist where one can observe behavior comparable to that observed in the global temperature records in Figure 1.3. plot(log(varve), main="log-transformed data") 4 3 2 log(varve) 5 log−transformed data 0 100 200 300 400 500 600 Time It doesn’t seem that there is an increasing/decreasing trend over time as we observed in Figure 1.3. part c The ACF of yt is: 9 acf(log(varve), lag.max = 20) 0.4 0.0 0.2 ACF 0.6 0.8 1.0 Series log(varve) 0 5 10 15 Lag 20 seems that the dependency in the data is very strong in close lags and dies down very slowly. part d Compute the difference ut = yt − yt−1 , examine its time plot and sample ACF, and argue that differencing the logged varve data produces a reasonably stationary series. −1 u 1 u = diff(log(varve), 1) #take the first order fifference plot(u) 0 100 200 300 400 500 600 Time differecning produces fairly reasonable stationary process. ut can be interpreted as the yearly increase in the thicknesses varves. In statistical sense, ut can be viewed as the smoothing, i.e., 10 ut = ∇yt = yt − yt−1 = log(xt ) − log(xt−1 ) xt xt−1 = log , ± xt−1 xt−1 xt − xt−1 = log 1 + xt−1 xt − xt−1 ≈ xt−1 which can be interpreted as the marginal change in the thicknesses varves. Problem 2.7 # MA Smoothing wgts = c(.5, rep(1,11), .5)/12 smooth1 = filter(gtemp, sides=2, filter=wgts) # Kernel Smoothing smooth2 = ksmooth(time(gtemp), gtemp, "normal", bandwidth=10) # Lowess smoothing smooth3 = lowess(gtemp) par(mfrow=c(3,1)) plot(gtemp, type="o", ylab="MA Smoothing") lines(smooth1, col="red") plot(gtemp, type="o", ylab="Kernel Smoothing") lines(smooth2, col="red") plot(gtemp, type="o", ylab="Lowess Smoothing") lines(smooth3, col="red") 11 0.6 0.4 0.2 0.0 MA Smoothing −0.4 −0.2 1880 1900 1920 1940 1960 1980 2000 1960 1980 2000 1960 1980 2000 0.4 0.2 0.0 −0.4 −0.2 Kernel Smoothing 0.6 Time 1880 1900 1920 1940 0.4 0.2 0.0 −0.4 −0.2 Lowess Smoothing 0.6 Time 1880 1900 1920 1940 Time All methods seems to model the ternd reasonably, but one needs to be careful on picking the tunning parameters such as the weights of MA smoothing, badwitdh of the kernel smoothing, and so on. 12