Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computer exercises # 2 in R 1. Firstly, you should install following packages that are not included in R by default: Package lmtest car sandwich moments forecast tseries Description Testing Linear Regression Models Companion to Applied Regression Robust Covariance Matrix Estimator Moments, Cumulants, Skewness, Kurtosis and Related Tests Forecasting Function for Time Series and Linear Models Time Series Analysis and Computational Finance These packages provide some functions required for diagnostic checking and significance testing after estimation of a certain econometric model using LS method (post-estimation procedure). You can install one or more packages by commands: install.packages("lmtest") install.packages("car") install.packages("sandwich") install.packages("moments") install.packages("forecast") install.packages("tseries") Installed packages are stored in a library and they should be loaded using commands: library(lmtest) library(car) library(sandwich) library(moments) library(forecast) library(tseries) Data files you want to use should be stored in the current working directory. Current working directory can be located using command: getwd() The working directory can be changed: setwd("C:/...") R is able to read data from many formats. The most common format is a text file with data separated into columns and with a header above each column describing the data. When importing dataframe from the text file make sure that decimal places of the numbers are separated by dots (not commas). 1 2. Download text file economic_indicators.txt from http://www.efzg.unizg.hr (Katedra/Statistika/Članovi/Doc.dr.sc. Josip Arnerić/Econometrics). You can save text file on your desktop (hrv. radna povšina) if desktop is your working directory. Import dataframe into R from text file using command: mydata=read.table(file="economic indicators.txt",header=TRUE) colnames(mydata) Dataframe mydata contains time series from period 2000Q1 to 2014Q3 (59 quarters) with respect to following economic indicators of China: exchange rate (CNY/USD), fdi_capital (FDI, capital used in millions of CNY), gdp_current_price (GDP in current prices, billions of CNY), gdp_growth (in %), ind_production (volume index on industrial production) and m2 (money supply M2 in billions of CNY). 3. Observations should be dated as regular frequency data using command: mydata=ts(mydata,frequency=4,start=c(2000,1)) mydata You can plot any time series using ts.plot command. If you want to plot GDP, only a third column of mydata should be considered. Be aware that each column of mydata can be separated as a single time series by referencing on the certain index in the square brackets [i,j]. Index i stands for the rows, while index j stands for the columns. ts.plot(mydata[,3],ylab="GDP in China",col="red") 4. For observed time period create some new variables and add them to the existing mydata. variable „time“ starting from zero time=ts(0:58,frequency=4,start=c(2000,1)) variable „time squared” time2=time*time Seasonal dummy variables for the first, second and the third quarter of each year (seasonal dummy variable for the last quarter is omitted by default): dummies=seasonaldummy(ts(1:59,frequency=4,start=c(2000,1))) “time dummy” variable that will be equal to 0 for the first 22 observations up to 2005 Q3 and for all other observations the value will be equal to 1: dummy=ifelse(time>21,1,0) 2 New variables should be combined with existing time series: mydata=cbind(mydata,time,time2,dummies,dummy) colnames(mydata)=c("growth","production","gdp","fdi","exchange","m2"," time","time2","q1","q2","q3","dummy") mydata 5. Using lm() command estimate a model with a constant term (intercept), power trend and seasonal dummy variables to describe the behavior of GDP in China (current prices). Explain the meaning of estimated coefficients for dummy variables! Are dummy variables statistically significant? model1=lm(gdp~time+time2+q1+q2+q3,data=mydata) summary(model1) 6. Compare the actual values (red color) and fitted values (blue color) on the single graph. ts.plot(mydata[,3],fitted(model1),gpars=list(col=c("red","blue")),main ="Actual vs fitted values (in the sample)") 7. Check if residuals of the estimated model are normally distributed using Jarque-Bera test. Write down the null and the alternative hypothesis, JB test statistic and the p-value. According to which distribution the JB statistic is distributed? Should we reject or not the null hypothesis? H 0 ... H1 ... JB ______ 6 24 _______ p value _________________ skewness(residuals(model1)) kurtosis(residuals(model1)) jarque.test(residuals(model1)) 8. Check if residuals are independently distributed up to 2 time lags (there is no autocorrelation of residuals up to the certain time lag) using Breusch-Godfrey test. Write down the null and the alternative hypothesis, LM test and the p-value. Should we reject or not the null hypothesis? H 0 ... H1 ... LM ______________ ; p value _______________ 3 bgtest(model1,order=2) 9. Calculate the approximate value of the first order autocorrelation coefficient of residuals using Durbin-Watson statistic? What can you conclude based on the value of the DW statistic? H 0 ... H1 ... DW ________ ; ˆ 1 1 2 _____________, p value ________ durbinWatsonTest(model1) 10. Check if residuals have constant variance (there is no heteroscedasticity of residuals) using Breusch-Pagan test. Write down the null and the alternative hypothesis, BP statistic and the pvalue. Should we reject or not the null hypothesis? H 0 ... H1 ... BP ______________ ; p value __________ bptest(model1,studentize=FALSE) 11. If heteroskedasticity problem exist estimate the same model using Newey-West robust standard errors that are consistent. coeftest(model1,vcov=NeweyWest) 12. According to estimated model1 compute forecasting for out of the sample observations up to 2017-Q4. Combine actual values with forecast values and present them graphically. time=ts(59:71,frequency=4,start=c(2014,4)) time2=time*time dummies=seasonaldummy(ts(59:71,frequency=4,start=c(2014,4))) new=cbind(time,time2,dummies) colnames(new)=c("time","time2","q1","q2","q3") yhat=as.matrix(fitted(model1)) pred=as.matrix(predict(model1,new)) forecast=rbind(yhat,pred) forecast=ts(forecast,frequency=4,start=c(2000,1)) all=cbind(forecast,mydata[,3]) colnames(all)=c("forecast","actual") all ts.plot(all,gpars=list(col=c("blue","red")),main="Fitted values in the sample + out of sample forecasts") 4 13. Estimate two following econometric models: a) ŷt ˆ 0 ˆ1time b) ŷt ˆ 0 ˆ1dummy ˆ 2 dummy* time where y exchange rate (CNY/USD), time 0,1,2,3,...,58 , and dummy 0 before 2005-Q3 and 1 after 2005-Q3. Compare the actual values and fitted values on the same graph within two estimated models. Which model fits data better according to R square? Explain the meaning of estimated slope coefficient after 2005-Q3. model2=lm(exchange~time,data=mydata) summary(model2) model3=lm(exchange~dummy+I(dummy*time),data=mydata) summary(model3) Estimated equation a) ŷt ________________________ R 2 ______ Estimated equation b) ŷt ________________________ R 2 ______ Slope coefficient after 2005-Q3 equals to ___________ par(mfrow=c(1,2)) ts.plot(mydata[,5],fitted(model2),gpars=list(col=c("red","blue")),main =expression(hat(y)==8.78-0.047*time)) ts.plot(mydata[,5],fitted(model3),gpars=list(col=c("red","blue")),main =expression(hat(y)==8.28+0.95*dummy-0.058*dummy*time)) 14. Perform a Wald test if the constant term after 2005-Q3 is statistically significant, considering estimated equation b) in previous example. According to which distribution Wald test statistic is distributed? Should we reject or not the null hypothesis? H 0 ... H1 ... F _____________ p value ____________ . hm=c(1,1,0) rhs=c(0) linearHypothesis(model3,hm,rhs) 5