* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Exercises for for Chapter 1 of Vinod`s “HANDS
Survey
Document related concepts
Transcript
Exercises for for Chapter 1 of Vinod’s “HANDS-ON INTERMEDIATE ECONOMETRICS USING R” H. D. Vinod Professor of Economics, Fordham University, Bronx, New York 10458 Abstract These are exercises to accompany the above-mentioned book. The book’s URL is http://www.worldscibooks.com/economics/6895. html. Not all of the following exercises are suggested by H. D. Vinod (HDV) himself. Some are suggested by his students, whose names are Clifford D. Goss (CDG), Adam R. Bragar (ARB), Jennifer W. Murray (JWM), and Steven Carlsen (SC). The initials are attached to the individual exercises to identify persons suggesting the exercise and the answers or hints. Many R outputs are suppressed for brevity. This set focuses on Chapter 1 of the text. 0 0.1 Exercises Regarding Basics of R Exercise (R basics, data entry, basic stats) HDV-1) Define a vector x with elements 2, 7, 11, 3, 7, 222, 34. Find the mean, median and (the discrete) mode of these data. #ANSWER x=c(2, 7, 11, 3, 7, 222, 34) #c is needed by R before numbers # Minimum, 1st Quartile, Median, Mean, 3rd Qu., Maximum #are given by the summary function. summary(x) #the number 7 is most frequent and hence the mode is at 7 1 #to get R to show such discrete mode use table function table(x) #for discrete mode HDV-2)Are the data positively or negatively skewed? (Hint: use ‘basicStats’ from the package ‘fBasics’, [9] and note the sign of skewness coefficient). HDV-3) Make the sixth element of x as ‘missing’ replacing the current value by the notation ‘NA’. How would you use the mean and median of R command to automatically ignore the missing data? (Hint: use na.rm=T) HDV-4) Use the scale function in R to convert Fahrenheit to Celsius. (Hint: scale(212, center=32, scale=9/5) should give 100 degrees Celsius) 0.2 Exercise (read.table in R) HDV-5) Create a directory called data in the C drive of your computer. Place the following data in the data directory (”c:/data/xyz.txt”. The data is short. It should be saved as simple text file using the ”save as ..” option. xyz 123 457 8 9 10 7 11 11 6 5 10 Use a suitable R command to read and analyze the data. (Hint use ‘read.table’) 0.3 Exercise (Subsetting, logical & or | in R) HDV-15) Explain the distinction between parentheses and brackets in R showing how brackets are used for subsetting. Set seed of 25 use a ransom sample of uniform numbers between 1 to 2000. Indicate which (if any) locations have numbers exceeding 178 and at the same time less than 181. How many numbers will be included if want numbers exceeding 178 or less than 181. set.seed(25); x=sample(1:2000); which(x>178&x<181) n1=which(x>178&x<181)#define n1 vector of locations x[n1]#values of x at those locations 2 x[x>178 & x<181]#using brackets for subsetting #note that x is repeated inside brackets, logical & is used. n2=which(x>178|x<181) #logical and replaced by logical or | length(n2) R produces the following output: > set.seed(25); x=sample(1:2000); which(x>178&x<181) [1] 744 1923 > n1=which(x>178&x<181)#define n1 vector of locations > x[n1]#values of x at those locations [1] 180 179 > x[x>178 & x<181]#using brackets for subsetting [1] 180 179 > n2=which(x>178|x<181) #logical and replaced by logical or | > length(n2) [1] 2000 The output shows that only two numbers are inside the open interval (178, 181). These numbers have locations 744 and 1923, respectively, in the 2000×1 vector array called ‘x.’ Note that if we use greater than 178 or less than 181, then all 2000 numbers satisfy this condition. 0.4 Exercise (Apple-Pie Sales regression example) HDV-6) Download the following file from my website: http://www.fordham.edu/economics/vinod/R-piesales.txt into your own data directory and copy and paste various commands. Learn what these commands do for a fully worked out regression example involving a regression of Apple pie sales. 0.5 Exercise (R data manipulations) HDV-7) Load the dataset called ‘Angell’ from the package ‘car’ [3] and summarize the data. and regress moral (Moral Integration: Composite of crime rate and welfare expenditures) on hetero (Ethnic Heterogeneity: From percentages of nonwhite and foreign-born white residents) and mobility (Geographic Mobility: From percentages of residents moving into and out of the city) and region in the US. 3 HDV-8) Use the ‘attributes’ function in R to determine which variable is categorical or factor-type. HDV-9) Use multivariate analysis of variance to study how hetero and mobility are affected by moral and region. HDV-10) Use the aggregate function of R to find categorical sums for regions Comment on low score for the mean on ‘moral’ variable in the South and low variance for ‘mobility’ in the Eastern region. #Answer Hints: library(car);data(Angell);attach(Angell);summary(Angell) #note that all variables are ratio-type except region attributes(region) reg1=lm(moral~hetero+mobility+region, data=Angell); summary(reg1) manova(cbind(hetero,mobility)~moral+region) aggregate(cbind(hetero,mobility,moral), by=list(region), mean) 0.6 Exercise (Regression, DW, qq.plot) HDV-11) Set the seed at 34 and create a sample from the set of integers from 2 to 46 and place it in a 15 by 3 matrix called yxz. Make y, x and z as names of first three columns. What is the p-value for the coefficient of z in a regression of y on x and z? What does it suggest? What is the p value of a fourth order Durbin Watson serial correlation test? What does it suggest? Use ‘qq.plot’ command to decide whether the regression errors are close to Student’s t. #Answer Code library(car) set.seed(34);yxz=matrix(sample(2:46), 15,3); y=yxz[,1]; x=yxz[,2];z=yxz[,3] reg1=lm(y~x+z); su1=summary(reg1);su1$coeff p4z=su1$coef[3,4];p4z #Since p4z >0.05 z is insignificant regressor durbin.watson(reg1, max.lag=4) #large p value means accept the null hypothesis 4 20 ● ● ● ● ● ● 0 ● ● ● −20 ● ● −10 resid(reg1) 10 ● ● ● ● −1 0 1 norm quantiles Figure 1: Since all observed points (circles) lie within the confidence band (dashed lines), do not reject normality of residuals. 5 #(H0=no serial correlation of order 4) qq.plot(resid(reg1)) #residuals are inside the confidence bounds, hence good For more information about the use of quantile-quantile plots to assess normality, see http://en.wikipedia.org/wiki/Q-Q_plot 0.7 Exercise (histogram) HDV-12) Show the use of hist command by using the CO2 data. hist(co2) Exercises Created by Clifford D. Goss (CDG) 0.8 Exercise (Simple Regression) CDG-1) It is important in Econometrics to understand the basic long-hand formulas of regression analysis. What is the equation of a regression line? What is the purpose of a regression line? What is the difference between simple regression and multiple regression? When we write the regression equation y = b0 + b1 x + , the purpose is to summarize a relationship between two variables x and y. ANSWER: Simple regression only has one independent variable and multiple regression as more than one regressor (e.g. x1 and x2). 0.9 Exercise (Regression inference basics) CDG-2) What are two inferential techniques to determine how accurate the sample estimates b0 and b1 will be? How are they related? ANSWER: Two inferential techniques are Confidence Interval Estimation and Hypothesis Testing. See http://en.wikipedia.org/wiki/Confidence_interval 6 0.10 Exercise (Error term) CDG-3) A simple linear model will include an Error term. Why is it included and what assumptions are made regarding its nature? What is a short-hand notation in writing these assumptions about the error term? ANSWER: The Error term represents the variation in y, not accounted for by the linear regression model. It is incorporated in the model because no real set of data is exactly linear and no model is error free. Assumptions made for the error term include the assumption that it is normally distributed, has a mean of zero, a variance of each t is σ 2 . The variance covariance matrix is proportional to the identity matrix. The short-hand notation is t ∼ N (0, σ 2 ) 0.11 Exercise (Intercept and slope) CDG-4) What does b1 , in the simple linear regression equation represent? How is it calculated? What does b0 represent? How is it calculated? ANSWER: Let cov(x, y) = E(x−x̄)(y−ȳ) and var(x) = E(x−x̄)2 . b1 represents the slope of the regression line. It is calculated as the covariance(y,x) divided by the variance (x) Its formula then is b1 = cov(x, y)/var(x). b0 is called the intercept. It is calculated as ȳ − b1 x̄. According to these formulas, the intercept cannot be calculated without first calculating the slope. 0.12 Exercise (Sampling distribution) CDG-5) What does the term “Sampling Distributions of a Statistic” refer to? ANSWER: Sampling distribution of a statistic refers to the probability distribution of the statistic in the sample space defined by all possible samples. For instance, sampling distribution of the mean is the probability distribution of all possible means computed from all possible samples taken from the given population parent. 0.13 Exercise (Correlation coefficient) CDG-6) What is the sample correlation coefficient between x and y? Derive its formula and indicate what its range is. ANSWER: Recall the covariance defined above. Correlation coefficient is cov(x, y) . r=√ [var(x)var(y)] 7 (1) Its range is: −1 ≤ r ≤ +1 0.14 Exercise (t test) CDG-7) What is the purpose and formula of the t-Test? What do “degrees of freedom” represent? ANSWER: The t-test helps determine acceptance/rejection of the null hypothesis that the true unknown regression coefficient β1 is zero. Roughly speaking t-statistic close to 2 means the regression coefficient is statistically significantly different from zero. In R Student’s t distribution is available without having to look up t tables. Use the commands help(qt) to get the details. See the following URL for details. http://en.wikipedia.org/wiki/Student’s_t-test 0.15 Exercise (ANOVA) CDG-8) What is total sum of squares. Describe its role in analysis of variance ANSWER: See http://en.wikipedia.org/wiki/Total_sum_of_squares http://en.wikipedia.org/wiki/Analysis_of_variance 0.16 Exercise (p-value) CDG-9) What does the p-value represent? What is its range? How do we know if we should accept or reject our null hypothesis by using the P-value? ANSWER: Roughly speaking, small p-value (e.g. p-val< 0.05) means reject the null. See http://en.wikipedia.org/wiki/P-value 0.17 Exercise (DW) CDG-10) What is the role of the Durbin-Watson (DW) statistic in regression? Which package allows one to compute DW in R? ANSWER: Roughly speaking, DW close to 2 means no problem of autocorrelated errors in a regression. Use the package ‘car’ to compute the DW statistic. For theory, see http://en.wikipedia.org/wiki/Durbin%E2%80%93Watson_statistic 8 0.18 Exercise (GMT) CDG-11) What is the Gauss-Markov Theorem? ANSWER: Roughly speaking, the GM theorem means least squares estimator is best linear unbiased (BLUE) without assuming normality of errors. It is enough to assume that the covariance matrix of regression errors is proportional to the identity matrix. For details see: http://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem 0.19 Exercise (skewness and kurtosis) CDG-12) Discuss skewness and kurtosis as concepts ANSWER: See http://en.wikipedia.org/wiki/Kurtosis http://en.wikipedia.org/wiki/Skewness 0.20 Exercise (power of a test) CDG-13) What is a statistical power function? Write the R command for a onesided and two-sided t-test using the Type I error α = 0.05, 2 degrees of freedom, and various non-centrality values. Plot the power function. Answer: See for the theory http://en.wikipedia.org/wiki/Statistical_power Power is probability of rejecting a hypothesis when it is false and should be rejected. When the true value equals the hypothesized value of zero, the probability of rejecting the null is set at the alpha level of Type I error (=0.05). When we know the probability distribution of a test statistic under the alternative hypothesis, then we can write the power function as a function of the alternative parameter value. If we are considering the t test, the probability distribution under the alternative is noncentral t. In R the function power.t.test computes these probabilities. However when the true value is zero, R computes the power to be 0.025 instead of 0.05, and seems to be incorrect. Two sided tests do have a lower power curve than corresponding one-sided tests. Thus, if the direction of alternative hypothesis (side) is known, it is advisable to use a one-sided test. x=seq(-.2,.2,by=.01) pw=power.t.test(delta = x, n=1000, typ="one.sample", alternative = "one.sided") 9 0.6 0.4 0.0 0.2 Prob of Type II error 0.8 1.0 Power function for two−sided t test −0.2 −0.1 0.0 0.1 0.2 Parameter value alternative hypothesis Figure 2: As non-centrality (true value of statistic) increases the t test for the null of zero true value becomes more and more reliable (powerful). 10 plot(x,pw$p, typ="l", main="Power function for one-sided t test", ylab=" prob of Type II error") pw=power.t.test(delta = x, n=1000, typ="one.sample", alternative = "two.sided") plot(x,pw$p, typ="l", main="Power function for two-sided t test", ylab="Prob of Type II error", xlab="Parameter value alternative hypothesis") #ALTERNATIVE formulation #power of one-sided t test works t.power=function(alph,df,noncen){crit=qt(1-alph,df) power=1-pt(crit,df,ncp=noncen);return(power)} #function ends here x=seq(-5,5,by=.2); df=10; alph=0.05; y=t.power(alph,df,x); plot(x,y,main="One-sided t test Power Curve", typ="l") t2.power=function(alph,df,noncen){ al2=alph/2 crit=qt(1-al2,df) power=1-pt(crit,df,ncp=noncen);return(power)} #function ends here x=seq(-5,5,by=.2); df=10; alph=0.05; y=t2.power(alph,df,abs(x)); plot(x,y,main="Two-sided t test Power Curve", typ="l", ylab="Prob of Type II error", xlab="Parameter value alternative hypothesis") 0.21 Exercise (Production function) CDG-14) What is a production function? What is the Cobb-Douglas production function? Derive the slope of the Isoquant assuming two standard inputs, Labor and Capital. What is Elasticity of Substitution (EOS)? What is Output Elasticity? For a Cobb-Douglas production function, what is the EOS and Output Elasticity? What measures economies of scale? ANSWER: Check the index in Vinod’s text. The answers in Chapter 1. HDV-16) Estimate the Nerlove-Ringstatd with Cobb-Douglas production function for Metals data in your textbook available in ‘Ecdat’ package. [Hint use Zellner-Ryu (1998) Journal of Applied Econometrics, vol 13, 11 pages 101-127. It is instructive to use many more functional forms discussed there.] rm(list=ls()) #clean up R memory library(Ecdat); #into current memory of R data(Metal)#pulls the Metal data into memory of R names(Metal) summary(Metal) met=as.matrix(Metal) Ly=log(met[,1])#pull first column of met, take log, define Ly LL=log(met[,2])#pull second col. of met, take log, define LL LK=log(met[,3])#pull third col. of met, take log, define LK reg=lm(Ly~ I(Ly^2)+LK+LL); summary(reg) 0.22 Exercise (Heteroscedasticity) CDG-15) What is Heteroscedasticity? What type of problem does Heteroscedasticity cause? What is another way to refer to Heteroscedasticity and what command function in R- Project is available to test for Heteroscedasticity? Answer: See http://en.wikipedia.org/wiki/Heteroskedasticity for theory. The package ‘car’ has tests and function ‘hccm’ does heteroscedasticity consistent covariance matrix. Use sqrt to get the corrected standard errors. The package called ‘lmtest’ [11] has a function ‘bptest’ for Breusch-Pagan test and ‘gqtest’ for Goldfeld-Quandt test ‘vcovHC’ gives variance covariance matrix adjusted for heteroscedasticity. 1 1.1 Chaper 1 and Using R for a Hands-On Study of Matrix Algebra Exercises created by Steven Carlsen The exercises below are intended to demonstrate some of the basic, but important, properties of matrix algebra. Note that they are examples of these 12 properties, not proofs. Please consult [7], Vinod’s companion book for a detailed study of matrix algebra using R. http://www.worldscibooks.com/ mathematics/7814.html SC) The exercises in this set generally use a set of three matrices denoted A, B and C throughout. Because of the computing power and accuracy of R, the matrices will all have dimensions of five rows by five columns (larger than what we could normally work easily by hand, but not so large as to make comparisons difficult). Note that the output from R is often not included for brevity. In many of these exercises, we could demonstrate equality (or inequality) by simply comparing the results on either side of an equation. An equivalent way of showing that two sides of an equation are equal (or not equal) is to show that one side MINUS the other side is equal to zero (or not equal to zero). We will generally do both below. For notation purposes, let’s use XpY as the name of the matrix formed by adding X and Y (p will stand for ”plus”), let’s use XmY as the name of the matrix formed by subtracting Y from X (m will stand for “minus”), let’s use XY as the name of the matrix formed by post-multiplying X by Y and let’s use Xt as the name of the transpose of matrix X. Start by creating our three matrices, using random numbers, rounded off to whole integers. Note that by using different seed numbers, we create different matrices even though we selected random numbers from within the same range (25, 50) of numbers. set.seed(1) A=round(matrix(runif(25,1,50),5,5),0) set.seed(2) B=round(matrix(runif(25,1,50),5,5),0) set.seed(3) C=round(matrix(runif(25,1,50),5,5),0 A;B;C #outputs omitted for brevity ApB = A+B; ApB BpA = B+A; BpA ApBmBpA = ApB - BpA; ApBmBpA #outputs omitted for brevity Note that ApB is identical to BpA and that ApBmBpA (which is equal to (A+B)-(B+A)) confirms the equality by having all zero elements. 13 Verify that AB is not identical to BA and that subtracting BA from AB produces a non-zero matrix. AB=A %*%B; BA=B%*%A; AB-BA Verify that (A’)’ = A Verify that (AB)’ = B’A’ At = t(A); At; Att = t(At); Att; A-Att; Verify that determinant of A+B is not the sum of two determinants. Verify following: If matrix D is obtained by interchanging a pair of rows (or columns) from matrix A, then det(D) = -det(A) A1 = A[1,] A2 = A[2,] A3 = A[3,] A4 = A[4,] A5 = A[5,] D = rbind(A2,A1,A3,A4,A5) det(D) #[1] 18606520 det(A) #[1] -18606520 #Note that det(D) = -det(A) Verify following: If matrix E is obtained by multiplying a row (or column) of matrix A by a constant k, then det(E) = kdet(A) A1 = A[1,] A2 = A[2,] A3 = A[3,] A4 = A[4,] A5 = A[5,] A1k = 2*A1 E = rbind(A1k,A2,A3,A4,A5) det(E) #[1] -37213040 2*det(A) #[1] -37213040 #Note that det(E) = 2*det(A) 14 Verify the following: The determinant of a diagonal matrix is equal to the product of the diagonal elements. Also that the determinant of a matrix is equal to the product of its eigenvalues. set.seed(40);d=sample(4:40)[1:5];d D=diag(d) #this is how you create a diagonal matrix from vector in R det(D) cumprod(d) #gives sequential multiplication of elements of d ei=eigen(D)$val; ei cumprod(ei) 1.2 Exercise (Create collinear data and study it) HDV-13) Set the seed at 345 and create a sample from the set of integers from 2 to 17 and place it in a 4 by 4 matrix called A. Note that the determinant is negative even if all numbers are positive. If B is obtained from A by multiplying the elements the second row by constant k=5, then det(B)= k det(A). Use the second row and third column for verifying the above. How is this property of determinants different from matrices? The matrix multiplication kA means multiply each element of A by k. Use k=5 to check this. Let n denotes the number of rows. Show by using the above example that det(kA)= k det(A), whence det(-A) = (−1)n det(A), by choosing k= -1. set.seed(345);s1=sample(2:17);A=matrix(s1,4,4);A det(A) B=A #initialize multiply second row by 5 B[2,]=5*A[2,]; det(B); det(B)/5 (-1)^4; det(-A) 1.3 Exercise (Quadratic form) HDV-14) What is a quadratic form. How can matrix algebra be used to represent a quadratic form? ANSWER: See an example of a matrix algebra quadratic form http://en.wikipedia.org/wiki/Quadratic_form 15 where the quadratic form is ax2 + bxy + cy 2 = v 0 M v, where v is a 2 × 1 vector with elements (x, y) and where M is a 2 × 2 symmetric matrix with a and c along the diagonal and b/2 in the off diagonal. Expressions for expectations and variances and covariances involving quadratic forms are available at http://en.wikipedia.org/wiki/Quadratic_form_(statistics) where it is shown how residual sum of squares can be written as a quadratic form RSS = y 0 (I −H)0 (I −H)y (identity minus hat matrix). Under normality of regression errors, RSS is distributed as a Chi-square random variable with degrees of freedom k = T race[(I − H)0 (I − H)]. If EHy 6= µ due to bias, the Chi-square variable is noncentral with a nonzero noncentrality parameter λ = 0.5µ0 (I − H)0 (I − H)µ. Following Exercises are contributed by Chris Finley (CJF) 1.4 Exercise (Eigenvalues-eigenvectors) CJF-1) Using a seed of 30 and numbers from 13 to 16 create a square matrix X. What are the eigenvalue and eigenvectors? set.seed(30); X=matrix(sample(13:16), 2); eigen(X) R produces the following output: $values [1] 29 -1 $vectors [,1] [,2] [1,] -0.7071068 -0.7525767 [2,] -0.7071068 0.6585046 Using a seed of 30 and numbers from 1 to 25 create a square matrix. What is the 3rd diagonal number? set.seed(30) X=matrix(sample(1:25),5) X[3,3] # 8 16 1.5 Exercise (Pie-chart, histogram) CJF-2) Using the house prices data in the AER library, determine the number of properties that have a driveway. Using R, determine the percentage of properties that do not have a driveway. Show using a pie chart. Create a histogram of the number of stories. library(AER) data(HousePrices) attach(HousePrices) summary(driveway) R produces the following output: no yes 77 469 Now some R code to tabulate the data. tab=table(driveway) prop.table(tab) R produces the following output: driveway no yes 0.1410256 0.8589744 > prop.table(tab) driveway no yes 0.1410256 0.8589744 Now the code for plotting. par(mfrow=c(2,1)) #two graphs in one pie(tab, main="Houses with Driveways") hist(stories) par(mfrow=c(1,1)) #reset Using the house prices data in the AER library, find the correlation coefficient and Spearman’s ρ. Comment on the result. 17 Houses with Driveways no yes 100 0 Frequency Histogram of stories 1.0 1.5 2.0 2.5 stories 18 3.0 3.5 4.0 library(AER); data(HousePrices); attach(HousePrices) cor(bedrooms, bathrooms) R produces the following output: [1] 0.3737688 cor(bedrooms, bathrooms, method = "spearman") [1] 0.3769018 Both measures are very close and indicate very little correlation. For further comments on how rank correlation is non-parametric measure see: http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient Using the house prices data in the AER library, regress bathrooms on price. What are the 95% and 99% confidence intervals? reg1=lm(price~bathrooms, data=HousePrices) confint(reg1) R produces the following output: 2.5 % 97.5 % (Intercept) 27502.25 38085.84 bathrooms 23642.71 31311.26 confint(reg1, level=0.99) 0.5 % 99.5 % (Intercept) 25830.49 39757.60 bathrooms 22431.41 32522.56 Using the house prices data in the AER library, separately regress lotsize on price and bedrooms on price. Find the R2 of each and comment. rm(list=ls()) library(AER) data(HousePrices) reg1=lm(price~lotsize, data=HousePrices) summary(reg1)$adj.r.squared # 0.287077 reg2=lm(price~bedrooms, data=HousePrices) summary(reg2)$r.squared # 0.1342837 19 28% of the variance in price is explained by lotsize while only 13% of the variance is explained by bedrooms. Using the HousePrice data, regress lotsize on price, but include an additional square term lotsize2 and plot price against lotsize. plot(price~lotsize, data=HousePrices, xlab="Lot Size", pch=16) reg3=lm(price~lotsize, data=HousePrices) summary(reg2)$adj.r.squared # 0.1326923 abline(reg3) reg3b=lm(price~lotsize+I(lotsize^2), data=HousePrices) summary(reg3b)$adj.r.squared # 0.3205318 is higher xval=pretty(HousePrices$lotsize, 50) hat2=predict(reg3b, newdata=list(lotsize=xval)) lines(xval, hat2, col="red", lty=2, lwd=2) The model having the quadratic term obviously fits the data better than the model without it. However, this is always the case in terms sum of squared residuals. Hence, the correct comparison is between adjusted R2 values. In this example, adding the quadratic term increases the adjusted R2 from 0.13 to 0.32. 1.6 Exercise (Tabulation, Box-plot) CJF-3) Using the Parade2005 data from the AER package, [4] find the mean wages conditional on gender and create a box plot. library(AER) data(Parade2005) attach(Parade2005) tapply(log(earnings), gender, mean) # female male #11.11900 11.30194 plot(log(earnings)~gender) #simple plot gives a box plot here since gender is #a categorical (factor) variable box.plot(log(earnings), gender) 20 ● 16 ● ● ● ● ● ● ● 14 ● ● ● 12 ● 10 log(earnings) ● ● female male gender Figure 3: Box Plot: Earnings and Gender. 21 1.7 Exercise (Basic stats) CJF-4) Summarize the BondYield data in the AER library. What is the kurtosis? What is the skewness? Does the skewness value make sense given the values of mean and the median? library(AER) data(BondYields) library(fBasics) basicStats(BondYield) R produces the following output: BondYield nobs 60.000000 NAs 0.000000 Minimum 6.660000 Maximum 9.720000 1. Quartile 7.902500 3. Quartile 8.945000 Mean 8.290833 Median 8.300000 Sum 497.450000 SE Mean 0.104351 LCL Mean 8.082027 UCL Mean 8.499640 Variance 0.653354 Stdev 0.808303 Skewness -0.234511 Kurtosis -0.823370 Since the mean is very close to the median, it may seem that skewness should be very close to 0. However, the official measure is not that close to zero. The point is that the simple comparison of mean and median to indicate skewness is merely a crude approximation. 1.8 Exercise (Model comparison) CJF-5) Using the Guns data in the AER library, regress the following 2 models: 22 Violent = b0 + b1(prisoners) + b2(income) + b3(density) + Violent = b0 + b1(prisoners) + b2(income) + b3(density) + b4(law) + Compare the 2 models. Which model is better? How can you tell? rm(list=ls()) library(AER) data(Guns) reg1=lm(violent~prisoners+income+density,data=Guns) reg2=lm(violent~prisoners+income+density+law,data=Guns) summary(reg1);summary(reg2) R produces the following output: Call: lm(formula = violent ~ prisoners + income + density, data = Guns) Residuals: Min 1Q -1173.95 -145.04 Median -35.04 3Q 119.28 Coefficients: Estimate Std. Error (Intercept) 1.605e+02 3.539e+01 prisoners 8.525e-01 4.420e-02 income 8.435e-03 2.732e-03 density 9.554e+01 5.510e+00 --Signif. codes: 0 '***' 0.001 '**' Max 671.42 t value Pr(>|t|) 4.536 6.33e-06 *** 19.288 < 2e-16 *** 3.088 0.00206 ** 17.341 < 2e-16 *** 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 210.5 on 1169 degrees of freedom Multiple R-squared: 0.6044, Adjusted R-squared: 0.6034 F-statistic: 595.3 on 3 and 1169 DF, p-value: < 2.2e-16 >summary(reg2) OUTPUT by R Call: lm(formula = violent ~ prisoners + income + density + law, data = Guns) 23 Residuals: Min 1Q -1225.51 -137.98 Median -39.65 3Q 118.48 Max 777.04 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.882e+02 3.393e+01 5.546 3.61e-08 *** prisoners 9.069e-01 4.257e-02 21.303 < 2e-16 *** income 8.350e-03 2.612e-03 3.197 0.00142 ** density 8.637e+01 5.339e+00 16.176 < 2e-16 *** lawyes -1.465e+02 1.390e+01 -10.535 < 2e-16 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 201.3 on 1168 degrees of freedom Multiple R-squared: 0.6387, Adjusted R-squared: 0.6375 F-statistic: 516.3 on 4 and 1168 DF, p-value: < 2.2e-16 anova(reg1, reg2) Analysis of Variance Table Model 1: violent ~ prisoners + income + density Model 2: violent ~ prisoners + income + density + law Res.Df RSS Df Sum of Sq F Pr(>F) 1 1169 51807994 2 1168 47312125 1 4495869 110.99 < 2.2e-16 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 The analysis of variance shows that the second model is better. It includes the factor of whether the state has ‘law’ in effect for each year. The anova summary indicates that law is significant at any reasonable level. Using the Guns data, run a regression of income on murder and plot the results and include a regression line in the plot. plot(murder~income, data=Guns) reg1=lm(murder~income, data=Guns) abline(reg1) 24 Use the regression above, but this time include a quadratic term. Does the quadratic term help the explanation? Which model is better? reg2=lm(murder~income+I(income^2), data=Guns) anova(reg1,reg2) R produces the following output: Analysis of Variance Table Model 1: murder ~ income Model 2: murder ~ income + I(income^2) Res.Df RSS Df Sum of Sq F Pr(>F) 1 1171 63099 2 1170 54868 1 8231 175.52 < 2.2e-16 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Including the square term gives a better model as indicated by the statistical significance of the F-test. 1.9 Exercise (Regression data plots) CJF-6) using the data from USMacroB from the AER package, plot gnp and mbase (include a legend) and regress mbase and lagged gnp on gnp. What is the residual sum of squares (RSS)? Run a Durbin-Watson test to test for autocorrelation. Comment on the results rm(list=ls()) library(AER) data(USMacroB) #cannot access by name #attach(data.frame(USMacroB)) plot(USMacroB[,c("gnp", "mbase")], lty=c(3,1), plot.type="single", ylab="", lwd=1.5) legend("topleft", legend = c("gnp", "money base"), lty = c(3,1), bty="n") library(dynlm) reg1=dynlm(gnp~mbase + L(gnp), data=USMacroB) summary(reg1) deviance(reg1) dwtest(reg1) 25 0 1000 2000 3000 4000 5000 gnp money base 1960 1970 1980 1990 Time Figure 4: Data Time Series Plot with legend. 26 R produces the following output: Time series regression with "ts" data: Start = 1959(2), End = 1995(2) Call: dynlm(formula = gnp ~ mbase + L(gnp), data = USMacroB) Residuals: Min 1Q -120.2126 -15.9883 Median 0.7727 3Q 16.8630 Max 90.0175 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 43.951508 17.394776 2.527 0.0126 * mbase 0.051049 0.025615 1.993 0.0482 * L(gnp) 0.988237 0.007722 127.984 <2e-16 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 30.28 on 142 degrees of freedom Multiple R-squared: 0.9991, Adjusted R-squared: 0.9991 F-statistic: 7.894e+04 on 2 and 142 DF, p-value: < 2.2e-16 deviance(reg1) [1] 130195.2 >dwtest(reg1) Durbin-Watson test data: reg1 DW = 1.3521, p-value = 2.067e-05 alternative hypothesis: true autocorrelation is greater than 0 There is highly significant positive autocorrelation among regression errors. Exercises suggested by Adam R. Bragar (ARB) 1.10 Exercise (Basic stats) ARB-1) . 27 Load the CigarettesB dataset from the AER package. Find basic stats of all variables. Are any variables negatively skewed? If so, which variable(s)? What does a negatively skewed variable suggest? library(AER) data(CigarettesB); attach(CigarettesB) #both needed library(fBasics) basicStats(CigarettesB) R produces the following output: packs nobs 46.000000 NAs 0.000000 Minimum 4.408590 Maximum 5.379060 1. Quartile 4.711547 3. Quartile 4.984450 Mean 4.847844 Median 4.814950 Sum 223.000810 SE Mean 0.028229 LCL Mean 4.790988 UCL Mean 4.904700 Variance 0.036656 Stdev 0.191458 Skewness 0.185864 Kurtosis -0.124126 price income 46.000000 46.000000 0.000000 0.000000 -0.032600 4.529380 0.363990 5.102680 0.140543 4.679075 0.273485 4.852850 0.205509 4.775455 0.200205 4.758505 9.453400 219.670930 0.012714 0.020975 0.179901 4.733209 0.231116 4.817701 0.007436 0.020238 0.086230 0.142261 -0.126862 0.474103 -0.301650 -0.378464 The price variable has negative skewness. This suggests that there are few relatively small values of price in the left tail of its probability distribution. 1.11 Exercise (Basic regression) ARB-2) Using the CigaretteB dataset from the AER package, regress packs on price and income. Interpret the coefficients. Are any of the variables statistically significant? How can you tell? What was the null hypothesis of the test you used for evaluation? 28 library(AER) data(CigarettesB); attach(CigarettesB) reg1= lm(packs~price+income); summary(reg1) R produces the following output: Call: lm(formula = packs ~ price + income) Residuals: Min 1Q -0.418675 -0.106828 Median 0.007568 3Q 0.117384 Max 0.328677 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.2997 0.9089 4.730 2.43e-05 *** price -1.3383 0.3246 -4.123 0.000168 *** income 0.1724 0.1968 0.876 0.385818 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1634 on 43 degrees of freedom Multiple R-squared: 0.3037, Adjusted R-squared: 0.2713 F-statistic: 9.378 on 2 and 43 DF, p-value: 0.0004168 The price coefficient suggests that a 1 unit increase in price will cause a 1.3383 unit decrease in the demand for packs. A 1 unit increase in income causes a .1724 increase in the demand for cigarette packs. The price variable is statistically significant. It’s t-value of -4.123 allows us to reject the null hypothesis, as evidenced by a p-value of 0.0000168. H0: There is no linear relationship between price and packs. 1.12 Exercise (Returns from asset prices) ARB-3) Using the BondYield dataset found in the AER package, create a variable r that denotes returns. Provide a summary of r and the standard deviation of r. What does the standard deviation tell us about the volatility of the bond yield? 29 library(AER) data(BondYield); attach(BondYield) r=diff(log(BondYield)) r; summary(r) sd(r) Now the output is: Jan 1990 1991 1992 1993 1994 1990 1991 1992 1993 1994 1990 1991 1992 1993 1994 -0.001105583 -0.013325455 -0.008810630 -0.001444044 Jun -0.022424859 0.016788307 -0.007272759 -0.013550343 -0.002506267 Nov -0.024430318 -0.008220833 0.013673302 0.038239953 0.012753796 Feb 0.078072770 -0.023504160 0.010915815 -0.025609594 0.022858138 Jul -0.002162163 -0.001110494 -0.018416727 -0.022069861 0.017413375 Dec -0.027249642 -0.020250841 -0.014925650 0.000000000 -0.025672355 Mar -0.036672522 0.011261380 0.007211570 -0.017004988 0.054958884 Aug 0.018231068 -0.028170877 -0.014981554 -0.045657002 -0.004944386 Min. 1st Qu. Median Mean -0.045660 -0.016040 -0.003143 -0.001030 sd(r) [1] 0.02265122 Apr 0.009559287 -0.007869630 -0.002398083 -0.015957785 0.052095112 Sep 0.015814773 -0.016129382 -0.003780723 -0.028129168 0.032909734 3rd Qu. 0.011090 May 0.001056524 0.000000000 -0.006020488 -0.004029555 0.013862856 Oct -0.003143009 -0.006993035 0.008799554 0.001500375 0.027204516 Max. 0.078070 A small standard deviation tells us that most values are near the mean. The mean is near 0. Therefore, a small standard deviation would tell us that most returns are near 0. A larger standard deviation would suggest that the values are widely dispersed, and while the mean return is 0, the returns are volatile. 30 1.13 Exercise (Regression) ARB-4) Using the CigarettesB dataset, run the regression of packs on price and income. Compute the heteroscedasticity consistent covariance matrix. Why is the presence of heteroscedasticity dangerous to the results of a regression? Compute Studentized Breusch-Pagan test Non-constant Variance Score test, and Goldfeld Quant F test for heteroscedasticity and library(AER) data(CigarettesB); attach(CigarettesB) reg1= lm(packs~price+income);su1=summary(reg1) su1$coef[,2] #Usual OLS standard errors from second column after $coef vcv=hccm(reg1);vcv #gives the variance covariance matrix se2=sqrt(diag(vcv)); se2 #new standard errors R produces the following output: Check the reg1 output above. usual OLS standard errors (Intercept) price income 0.9089257 0.3246015 0.1967544 (Intercept) price income (Intercept) 1.3486844 0.26842214 -0.29138366 price 0.2684221 0.14398409 -0.06218249 income -0.2913837 -0.06218249 0.06316924 #hetero consistent standard errors are larger (Intercept) price income 1.1613287 0.3794524 0.2513349 se2/su1$coef[,2] #ratio of standard errors $ (Intercept) price income 1.277694 1.168979 1.277404 When heteroscedasticity is present, the OLS estimators remain unbiased, but are inefficient. The estimates of variance of the beta coefficients will not be correct. It appears that in this example the standard errors are underestimated. Heteroscedasticity consistent standard errors are all bit larger by 28%, 17% and 28% respectively for the three coefficients. 31 1.14 Exercise (Regression for counts data) ARB-5) Load the CreditCard data set from the AER package. Do a Poisson Regression of age, income and expenditures on reports. Which of the variables are significant? data(CreditCard) attach(CreditCard) cc_pois= glm(reports ~ age + income + expenditure, family = poisson) summary(cc_pois) R produces the following output: Call: glm(formula = reports ~ age + income + expenditure, family = poisson) Deviance Residuals: Min 1Q Median -1.7427 -1.0689 -0.8390 3Q -0.3897 Coefficients: Estimate Std. Error (Intercept) -0.819682 0.145272 age 0.007181 0.003978 income 0.077898 0.023940 expenditure -0.004102 0.000374 --Signif. codes: 0 '***' 0.001 '**' Max 7.4991 z value Pr(>|z|) -5.642 1.68e-08 *** 1.805 0.07105 . 3.254 0.00114 ** -10.968 < 2e-16 *** 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for the Poisson family is taken to be 1) Income and expenditure are significant. 1.15 Exercise (Regression 2) ARB-6) Using the CreditCard data from AER package, regress active, age, card and owner. Print a summary. Which variables are significant at the 5 percent level? Are they all significant at the 1 percent level? Find 95 percent confidence intervals for each coefficient. 32 reg1=lm(reports~active+age+card+owner); summary(reg1) R produces the following output: Call: lm(formula = reports ~ active + age + card + owner) Residuals: Min 1Q Median -3.10658 -0.41350 -0.03806 3Q Max 0.21015 11.99671 Coefficients: Estimate Std. Error (Intercept) 1.213853 0.126452 active 0.055222 0.005264 age 0.002778 0.003386 cardyes -1.495554 0.077137 owneryes -0.172956 0.071351 --Signif. codes: 0 '***' 0.001 '**' t value Pr(>|t|) 9.599 <2e-16 *** 10.491 <2e-16 *** 0.821 0.4121 -19.388 <2e-16 *** -2.424 0.0155 * 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.153 on 1314 degrees of freedom Multiple R-squared: 0.2681, Adjusted R-squared: 0.2659 F-statistic: 120.3 on 4 and 1314 DF, p-value: < 2.2e-16 Active, card and owner are significant at the 5 percent level. Owner is not significant a t the 1 percent level as exemplified by its 0.0155 p-value. confint(reg1) In response to above one-line code, R produces the following output: 2.5 % 97.5 % (Intercept) 0.965783363 1.461921923 active 0.044896152 0.065548539 age -0.003864662 0.009421637 cardyes -1.646878701 -1.344228814 owneryes -0.312929541 -0.032982226 33 1.16 Exercise (Logit) ARB-7) Using the CreditCard data from AER package, do a logit function where we are regressing age, income and selfemp on card to determine these factors influence on the probability a person has a credit card or not. Which variables are significant? glm(formula = card ~ age + income + selfemp, family = binomial("logit")) R produces the following output: Deviance Residuals: Min 1Q Median -2.3137 0.5100 0.6902 3Q 0.7419 Coefficients: Estimate Std. Error (Intercept) 0.923442 0.240541 age -0.007033 0.006825 income 0.183021 0.048904 selfempyes -0.567567 0.242977 --Signif. codes: 0 '***' 0.001 '**' Max 1.0553 z value 3.839 -1.031 3.742 -2.336 Pr(>|z|) 0.000124 *** 0.302764 0.000182 *** 0.019497 * 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1404.6 Residual deviance: 1385.0 AIC: 1393.0 on 1318 on 1315 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 4 Income and selfemp are significant at the 5 percent level, selfemp is not significant at the 1 percent level. 1.17 Exercise (Diagonal matrix) ARB-8) Use two methods to define an identity matrix of dimension 5. 34 ident5=diag(5) ; ident5 IM5= matrix(0, nr=5, nc= 5) IM5[row(IM5) == col(IM5)] = 1 ; IM5 1.18 Exercise (‘sort.list’ or ‘which.max’ functions) ARB-9) Download the DM data set from the Ecdat package [2]. Create a variable that is the absolute value of the forward premium, call it FP. Find the date where FP is the greatest. In the following code the ‘sort list’ sorts the matrix named ‘MtxFP’ with reference to its fifth column which contains FP data. FP= abs(((f-s)/s)*100) MtxFP=cbind(date,S,f,s30,FP) MtxFP[sort.list(MtxFP[,5],decreasing=T)] Instead of ‘sort.list’ one can also use the following code: rm(list=ls()) #clean up memory of R library(Ecdat); data(DM); attach(DM) FP= abs(((f-s)/s)*100)#f=forward ask, s=spot rate MtxFP=cbind(date*10000,s,f,s30,FP) #s30=bid price which.max(FP) n=which.max(FP) #location of the max of FP round( MtxFP[(n-4):(n+4),],3) #nearby data output is suppressed for brevity. 1.19 Exercise (Stock market beta) ARB-10) Write a function to estimate the beta of Nucor (NYSE: NUE) using weekly returns and data loaded directly from Yahoo Finance. Note, you will need the ‘tseries’ package [5] for this exercise. ANSWER: we will first write a general function for getting historical stock price data for any stock and for any stock index. We choose S&P 500 index to define the overall market return. The so-called ‘beta’ compares the risk of an individual stock with that of the market as a whole. We compute first 35 difference of log of the adjusted closing price of NUE to define the return from it and similarly for S&P 500 index. Now the slope coefficient in this regression may be called the ‘beta’ for NUE stock. It is estimated to be 1.423869 implying a higher than average risk associated with investing in NUE. If we had chosen IBM as the stock, the beta becomes less than unity at 0.9565073, suggesting less risk. library(tseries) stockquote= function(x) { c(get.hist.quote(x, quote= "Adj", start= "2003-01-01", compress= "w"))} NUE=stockquote("NUE") sp500= stockquote("^gspc") dlnue=diff(log(NUE)) dlsp=diff(log(sp500)) reg1=lm(dlnue~dlsp) summary(reg1) R produces the following output: Call: lm(formula = NUE ~ ., data = r) Residuals: Min 1Q -0.167618 -0.026432 Median 0.001302 3Q 0.026874 Max 0.171358 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.004181 0.002628 1.591 0.113 sp500 1.366223 0.113219 12.067 <2e-16 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.04582 on 302 degrees of freedom (157 observations deleted due to missingness) Multiple R-squared: 0.3253, Adjusted R-squared: 0.3231 F-statistic: 145.6 on 1 and 302 DF, p-value: < 2.2e-16 36 −2 0 −5 Empirical fluctuation process OLS−based CUSUM test 0.0 0.2 0.4 0.6 0.8 1.0 0.8 1.0 5 10 Recursive CUSUM test 0 Empirical fluctuation process Time 0.0 0.2 0.4 0.6 Time Figure 5: NUE stock price against S&P 500 stock index cusum test for structural change (OLS and Recursive residuals). 1.20 Exercise (Structural change cusum test) Use the data from the previous exercise and compute stuructural change statistics in its return behavior by using ‘strucchange’ package [1] and the function ‘efp’ for empirical fluctuation process. Plot OLS cumulative sum (CUSUM) and recursive CUSUM tests. When the CUSUM curve goes outside the confidence band, we reject the null hypothesis of no structural change at that point in time. In the attached figure, the recursive test bands are shown to be expanding in width as the number of observations increases. #previous exercise code must be in memory library(strucchange) par(mfrow=c(2,1)) 37 4 2 −2 Empirical fluctuation process OLS−based MOSUM test 0.2 0.4 0.6 0.8 Recursive MOSUM test 0 2 4 6 Empirical fluctuation process Time 0.2 0.4 0.6 0.8 Time Figure 6: NUE stock price against S&P 500 stock index moving sum test for structural change (OLS and Recursive residuals). rs=efp(NUE~sp500, plot(rs) rs=efp(NUE~sp500, plot(rs) rs=efp(NUE~sp500, plot(rs) rs=efp(NUE~sp500, plot(rs) type="OLS-CUSUM") type="Rec-CUSUM") type="OLS-MOSUM") type="Rec-MOSUM") Note that moving sum test behaves differently from the cumulative sum test in the earlier figure. 38 1.21 Exercise (Time series data manipulation and plots) ARB-11) Suppose we have the a dataset of monthly returns from 1991 to 2005 that is in a raw form as shown below: rawdata= -0.21,-2.28,-2.71,2.26,-1.11,1.71,2.63,-0.45,-0.11, 4.79,5.07,-2.24,6.46, 3.82,4.29,-1.47,2.69,7.95,4.46,7.28, 3.43,-3.19,-3.14,-1.25,-0.50,2.25,2.77,6.72, 9.17,3.73, 6.72,6.04,10.62,9.89,8.23,5.37,-0.10,1.40,1.60,3.40,3.80, 3.60,4.90,9.60, 18.20,20.60,15.20,27.00,15.42, 13.31,11.22,12.77,12.43,15.83,11.44,12.32,12.10, 12.02,14.41,13.54,11.36,12.97,10.00,7.20,8.74,3.92, 8.73,2.19,3.85,1.48,2.28,2.98, 4.21,3.85,6.52,8.16, 5.36,8.58,7.00,10.57,7.12,7.95,7.05,3.84,4.93,4.30,5.44, 3.77, 4.71,3.18,0.00,5.25,4.27,5.14,3.53, 4.54,4.70,7.40,4.80,6.20,7.29,7.30,8.38,3.83, 8.07,4.88,8.17,8.25,6.46,5.96,5.88,5.03,4.99,5.87, 6.78,7.43,3.61,4.29,2.97,2.35, 2.49,1.56,2.65,2.49,2.85,1.89,3.05,2.27,2.91,3.94, 2.34,3.14,4.11,4.12,4.53,7.11, 6.17,6.25,7.03,4.13, 6.15,6.73,6.99,5.86,4.19,6.38,6.68,6.58,5.75,7.51, 6.22,8.22, 7.45,8.00,8.29,8.05,8.91, 6.83,7.33,8.52,8.62,9.80,10.63,7.70,8.91,7.50,5.88,9.82, 8.44,10.92,11.67 Convert this data into a time series for returns called r. After analyzing this data, suppose we wanted to only analyze the data from Nov. 1994 onward. Truncate r so that it fits this specified time range. Do an ARMA estimation of r. Use the tsdiag function to print diagnostic pictures of the estimation. Print a graph of the residuals. Predict the monthly returns over the next 6 months. Plot a graph that shows the predicted values of returns in red and the actual returns in blue. rawdata= c(-0.21,-2.28,-2.71,2.26,-1.11,1.71,2.63,-0.45,-0.11, 4.79,5.07,-2.24,6.46, 3.82,4.29,-1.47,2.69,7.95,4.46,7.28, 3.43,-3.19,-3.14,-1.25,-0.50,2.25,2.77,6.72, 9.17,3.73, 6.72,6.04,10.62,9.89,8.23,5.37,-0.10,1.40,1.60,3.40,3.80, 3.60,4.90,9.60, 18.20,20.60,15.20,27.00,15.42, 39 25 20 15 10 0 5 r growth (%) 1996 1998 2000 2002 2004 2006 Time Figure 7: Time Series of Asset Returns with Forecast Intervals. 40 13.31,11.22,12.77,12.43,15.83,11.44,12.32,12.10, 12.02,14.41,13.54,11.36,12.97,10.00,7.20,8.74,3.92, 8.73,2.19,3.85,1.48,2.28,2.98, 4.21,3.85,6.52,8.16, 5.36,8.58,7.00,10.57,7.12,7.95,7.05,3.84,4.93,4.30,5.44, 3.77, 4.71,3.18,0.00,5.25,4.27,5.14,3.53, 4.54,4.70,7.40,4.80,6.20,7.29,7.30,8.38,3.83, 8.07,4.88,8.17,8.25,6.46,5.96,5.88,5.03,4.99,5.87, 6.78,7.43,3.61,4.29,2.97,2.35, 2.49,1.56,2.65,2.49,2.85,1.89,3.05,2.27,2.91,3.94, 2.34,3.14,4.11,4.12,4.53,7.11, 6.17,6.25,7.03,4.13, 6.15,6.73,6.99,5.86,4.19,6.38,6.68,6.58,5.75,7.51, 6.22,8.22, 7.45,8.00,8.29,8.05,8.91, 6.83,7.33,8.52,8.62,9.80,10.63,7.70,8.91,7.50,5.88,9.82, 8.44,10.92,11.67) summary(rawdata) r <- ts(rawdata, frequency=12, start=c(1991, 4)) r <- window(r, start=c(1994, 11)) ar2 <- arima(r, order = c(2, 0, 0)) ; print(ar2) tsdiag(ar2) plot.ts(ar2$residual, ylab="residuals", col="blue", lwd=2) p <- predict(ar2, n.ahead = 6) ; print(p) ts.plot(r, p$pred, p$pred-1.96*p$se, p$pred+1.96*p$se, gpars=list(lty=c(1, 1, 2, 2), lwd=c(2, 2, 1, 1), ylab="r growth (%)", col=c("blue", "red", "red", "red"))) Exercises suggested by Jennifer Murray (JWM). The Answer / Hint follows: 1.22 Exercise (Matrix Algebra: collinearity, eigenvalues, rank) JWM-1) Demonstrate exact multicollinearity and eigenvalue decomposition. To begin create a vector called x1 of values 1 to 5. Then create a vector called x2 which is 2 times the vector x1. Create a vector of ones then bind the three vectors into a matrix, called Z. JWM-2) Next multiply the transpose of Z by Z itself and call it ZTZ. Compute the determinant of ZTZ, and then try to invert the ZTZ matrix. 41 JWM-3) Set a seed of 35 and create a random vector out of numbers between 3 and 22 by using the sample function. Select first 5 numbers from this list and call it y. Regress y on x1 and x2. What is the coefficient on x2? JWM-4) Load the package ‘fEcofin’ [10]. Solve for the eigenvectors and values of ZTZ. Separately extract the eigenvectors and values. Call the eigenvector matrix G, and the diagonal matrix of eigenvalues called Λ. Using these create the eigenvector/value decomposition. JWM-5) What is the rank of the ZTZ matrix? Is it possible to invert (solve) this matrix? #R Code for ALL Answers (output suppressed for brevity) x1=(1:5) x2=2*x1 ones=rep(1,5) Z=cbind(ones,x1,x2)#we are creating a singular matrix Z ZTZ=t(Z)%*%Z #this is the Z'Z matrix ZTZ det(ZTZ) #this is zero! solve(ZTZ)#impossible because it is singular set.seed(34); y= sample(3:22)[1:5] #this is a random vector that we will regress on x1 and x2 reg1=lm(y~x1+x2) reg1 #The x2 coefficient is not available. #This is because x2 is a simple transformation of x1. library(fEcofin) evd=eigen(ZTZ) evd G=evd$vec #in doing this we are extracting the matrix of eigenvectors G Lamda=evd$val #now this extracts only the eigenvalues Lamda diag(Lamda) #this creates a matrix with eigen values on the diagonals ev.decompose=G%*%diag(Lamda)%*%t(G) #this formula should give us the eigen value decomposition! 42 ev.decompose #this should equal ZTZ ZTZ #this is why it is called a decomposition rk(ZTZ) #this function rk for rank is from fEcofin package #notice that the rank of the matrix is 2, not 3. #This is due to the fact that #the matrix is singular! inv(ZTZ) #because it is singular, it is impossible to invert or solve this matrix. 1.23 Additional Exercises by H.D. Vinod HDV1> Fit a trade-off relation between job market vacancies and unemployment rate (Beveridge Curve). compute the elasticity of substitution using a non-homogeneous production function type method. Compare President George W. Bush versus Obama era marginal elasticities and elasticities of substitution. Plot and compare the isoquant level curves for two eras. Hint: use the EOS discussion in the textbook section 1.8. Complete solution with references is at: http://ssrn.com/abstract=2149316 HDV2> What are projection matrices? Give examples of at least 3 such matrices. [Hint: I, H, M] Analytically prove the properties for each of the 3 matrices. [Hint: Symmetric Idempotent] Construct a numerical example of each and show that the properties hold true numerically. HDV3> Define VIF, collinearity, and ridge regression, Provide theoretical discussion along with R examples. HDV4) Derive the formula for the variance of ridge estimator. Describe why shrinkage and declining deltas are desirable. Compare ridge regression with principal components regression (PCR) as shrinkage estimators. HDV5)Use R seed 10, create a long array of 100 uniform random numbers in the range 10 to 60 and make 4 columns (y1 , x1 , x2 , x3 ) from these 100, numbers using the first 1:25 for y1 , the next 26:50 for x1 and so forth. Replace the second column x1 by the R command “x2+10*x3+rnorm(25)”. This is designed to make x1 almost linearly dependent on x2 and x3 injecting “near” collinearity. 43 (1) compute the OLS regression coefficient vector b when we regress y1 on y2 , x1 and x2 including an intercept. (2) compute the G matrix and c, the uncorrelated components vector for this regression. (3) Define cpc2 as keep-2 components of c and zero out last two. Similarly define cpc3 as keep-3 components of c and zero out the last one. (4) Compute the bpc2 as keep-2 principal components PCR and bpc3 as keep-3 PCR and make a table comparing b with these two PCRs side by side. ANS: We provide the R code and some output below. set.seed(10) xx=runif(100, min=10, max=60) yy=matrix(xx, nrow=25) y1=yy[,1]; x1=yy[,2]; x2=yy[,3]; x3=yy[,4] x1=x1+10*x2+rnorm(25) reg=lm(y1~x1+x2+x3) summary(reg) ane=rep(1,25); X=cbind(ane,x1,x2,x3) X #upper case X xtx=t(X)%*%X ei=eigen(xtx) G=ei$ve G #matrix of eienvectors Gt=t(G) ei$va b=coef(reg);b c=Gt%*%b c #vector of uncorrelated components of b G%*%c #this should be b cpc3=c cpc3=c(c[1:3],0);cpc3 bpc3=G%*%cpc3; bpc3 cpc2=c(c[1:2],0,0);cpc2 bpc2=G%*%cpc2; bpc2 cb=cbind(b, bpc2,bpc3) colnames(cb)[2]="bpc2" colnames(cb)[3]="bpc3" 44 cb Selected output from R is given below. G #matrix of eienvectors [,1] [,2] [,3] [,4] [1,] -0.002595253 0.01223696 -0.08445826 0.99634849 [2,] -0.992347544 -0.08130949 -0.09244637 -0.00942268 [3,] -0.090503971 -0.03282745 0.99178386 0.08423877 [4,] -0.083956231 0.99607298 0.02617727 -0.01023327 c #vector of uncorrelated components of b [,1] [1,] -0.08083593 [2,] 0.50432166 [3,] -3.84193890 [4,] 38.84746189 G%*%c #this should be b [,1] [1,] 39.03647482 [2,] 0.02833732 [3,] -0.54715023 [4,] 0.01101989 cpc3=c cpc3=c(c[1:3],0);cpc3 [1] -0.08083593 0.50432166 -3.84193890 0.00000000 cb b bpc2 bpc3 (Intercept) 39.03647482 0.006381151 0.3308646 x1 0.02833732 0.039211195 0.3943845 x2 -0.54715023 -0.009239619 -3.8196126 x3 0.01101989 0.509127864 0.4085564 1.24 Further Exercises by H.D. Vinod HDV-f1> Under what assumptions is there a relation between the coefficient of determination R2 and Pearson bivariate correlation coefficient r1,2 ? Describe a formula stating the relation. 45 HDV-f2> Under what assumptions is there a relation between the coefficient of determination R2 and the regression coefficient when x1 is regressed on x2 . Describe a formula stating the relation. HDV-f3> How can one obtain the regression coefficients of the original model regressing x1 on x2 from the coefficients of the model where the variables are standardized (e.g., x1s = (x1 − mean(x1))/sd(x1)). True or False: Regression coefficient always equals the correlation coefficient of a standardized model? HDV-f4> Under what assumptions is there a relation between the partial correlation coefficient r1,2|3 and regression coefficient when x1 is regressed on x2 and x3 . Describe a formula stating the relation. The proposed answers to all questions in this set using R code uses a common example. ANS-f1) In general, there is no relation between the R2 and Pearson bivariate correlation coefficient r1,2 , except in the (univariate) case when there is only one regressor. Consider a multivariate regression: x1 = β1,0 + β1,2 x2 + β1,3 x3 + . (2) It can be verified that the R2 of this model is not the square of any bivariate correlation coefficients with the help of a simple numerical example. rm(list=ls()) #clean up R memory options(prompt = " ", continue = " ", width = 68, useFancyQuotes = FALSE) set.seed(89);da=sample(1:100);x1=da[1:10]; x2=da[11:20];x3=da[21:30] reg=lm(x1~x2+x3) Rsq=summary(reg)$r.squared;Rsq root.rsq=sqrt(Rsq);root.rsq c1=cor(cbind(x1,x2,x3));c1 √ We note the output for R2 is extracted from the output of the summary function with the dollar command is different from all off-diagonal correlation coefficients. Rsq=summary(reg)$r.squared;Rsq [1] 0.1026093 root.rsq=sqrt(Rsq);root.rsq [1] 0.3203269 46 c1=cor(cbind(x1,x2,x3));c1 x1 x2 x3 x1 1.0000000 -0.261317096 -0.183168895 x2 -0.2613171 1.000000000 -0.007994397 x3 -0.1831689 -0.007994397 1.000000000 If we consider a univariate version of eq. (4), assuming that β1,3 ≡ 0, we have a special case. reg2=lm(x1~x2) Rsq2=summary(reg2)$r.squared;Rsq2 root.rsq2=sqrt(Rsq2);root.rsq2 c2=cor(cbind(x1,x2));c2 Note that the univariate case output below has r12 = −0.2613171 = where the minus sign is that of ordinary correlation coefficient. √ R2 , Rsq2=summary(reg2)$r.squared;Rsq2 [1] 0.06828662 root.rsq2=sqrt(Rsq2);root.rsq2 [1] 0.2613171 c2=cor(cbind(x1,x2));c2 x1 x2 x1 1.0000000 -0.2613171 x2 -0.2613171 1.0000000 For the univariate special case, we have just verified that r12 equals the signed square root of R2 : √ r12 = sign(r12 ) R2 , (3) where the sign(w) function is (+1,0,–1) if w is (> 0, 0, < 0), respectively. Eq. (3) represents our formula stating the relation between signed square root of R2 and r12 . Now we turn to answering the second question. ANS-f2). Consider bivariate regression x1 = β1,0 + β1,2 x2 . (4) Now standardize the data for both variables. That is, replace x1 by (x1 − x̄1 )/s1 , where we subtract the mean and divide by the standard deviation and do the same for x2 . 47 In the following code we define a general function stdze(x) and apply it to x1 and x2. The standardized version of x1 and x2 is denoted as x1s and x2s. Since we are measuring the variables from respective means, the intercept should not be computed. This is accomplished in R by using command ‘reg3 = lm(x1s ∼ x2s − 1)’ below where the option ‘–1’ forces the line of regression through the origin. stdze=function(x) (x-mean(x))/sd(x) x1s=stdze(x1);x2s=stdze(x2) reg3=lm(x1s~x2s -1)#force through the origin coef(reg3) cor(x1,x2) sign(cor(x1,x2))*sqrt(summary(reg3)$r.squared) We find that the bivariate regression coefficient given by the code ‘coef(reg3)’ is the simple correlation coefficient and also the signed square root of the R2 . coef(reg3) x2s -0.2613171 cor(x1,x2) [1] -0.2613171 sign(cor(x1,x2))*sqrt(summary(reg3)$r.squared) [1] -0.2613171 Now we turn to answering the third question. HDV-f3, which asks, in the notation of the example the following question. How can one obtain the regression coefficient b12 of the original model regressing x1 on x2 from the coefficient r12 of the standardized model regressing x1s on x2s? The answer is given by the formula for mapping from correlation coefficient to regression coefficient: sd(x1) , (5) b12 = r12 ∗ sd(x2) numerically verified in the following code. b12=coef(lm(x1~x2))[2];b12 r12=cor(x1,x2);r12 r12*sd(x1)/sd(x2) 48 b12=coef(lm(x1~x2))[2];b12 x2 -0.3255658 r12=cor(x1,x2);r12 [1] -0.2613171 r12*sd(x1)/sd(x2) [1] -0.3255658 Note that both b12 and correlation times ratio of standard deviations equal -0.3255658. Now we answer: True or False: Regression coefficient always equals the correlation coefficient of a standardized model? This is true when there is a univariate regressor. However, not true when there are two or more regressors. ANS-f4) Now we turn to answering the fourth question. involving the partial correlation coefficients. Let us consider a multivariate regression of x1 on x2 and x3 , where all are standardized. x3s=stdze(x3) reg4=lm(x1s~x2s+x3s) coef(reg4) c1=cor(cbind(x1,x2,x3));r12=c1[1,2] r13=c1[1,3];r23=c1[2,3] r12.3=(r12-r23*r13) /sqrt((1-r23^2)*(1-r13^2)) r12.3 x1.3s=resid(lm(x1s~x3s-1)) x2.3s=resid(lm(x2s~x3s-1)) r12.3*sd(x1.3s)/sd(x2.3s) coef(reg4)[2] In the above code the correlation matrix is denoted as ‘c1’ with elements ‘rij’. We use the standard formula for partial correlation coefficient between x1 and x2 after removing the effect of x3 as ‘r12.3’. Let x1.3s denote the residual after removing the effect of x3s on x1s and let x2.3s denote the residual after removing the effect of x3s on x2s. Recall the way to obtain regression coefficients from (unit-free) correlation coefficients is to multiply them by the ratio of standard deviations, sd(x1.3s)/sd(x2.3s), analogous to the formula given in eq. (5) for mapping from correlation coefficient to regression coefficient. However, the need to include the ratio of 49 standard deviations shows that the answer to “True or False: Regression coefficient always equals the correlation coefficient of a standardized model?” must be that it is NOT always true, though it is true when there is only one regressor. The output is r12.3=(r12-r23*r13) /sqrt((1-r23^2)*(1-r13^2)) r12.3 [1] -0.2673124 x1.3s=resid(lm(x1s~x3s-1)) x2.3s=resid(lm(x2s~x3s-1)) r12.3*sd(x1.3s)/sd(x2.3s) [1] -0.2627982 coef(reg4)[2] x2s -0.2627982 An R package ‘generalCorr’, [6] has a function to compute partial correlation coefficients if the correlation matrix has i, j = 1, . . . , p, p ≥ 2 variables. If ‘c1’ has the correlation matrix, the following commands yield the partial correlation coefficient between x1 and x2 after removing the effect of x3, x4, .., xp. Although there is no limit on p, we illustrate the computations for the example when p = 3 used above, for brevity. library(generalCorr) parcor_ijk(c1,1,2) parcor_ijk(c1,1,3) The output of ‘parcor ijk’ has three parts (accessible with the dollar symbol): ‘ouij’ reports partial correlation between i-th and j-th variable after removing the effect of all others with column numbers listed in the third part called ‘myk.’ R also reports ‘ouji’ which is partial correlation between j-th and i-th variable after removing the effect of all others. If the correlation matrix is symmetric, as our matrix called ‘c1’ in the code above, ouij and ouji must be equal to each other. parcor_ijk(c1,1,2) $ouij [1] -0.2673124 50 $ouji [1] -0.2673124 $myk [1] 3 parcor_ijk(c1,1,3) $ouij [1] 0.191933 $ouji [1] 0.191933 $myk [1] 2 If the correlation matrix is non-symmetric, as with generalized correlation coefficients from [8], ‘ouij’ differs from ‘ouji’. This difference provides an interesting new interpretation described in [8]. For example, if ‘ouij’ exceeds ‘ouji’ the nonparametric kernel regression model xi = f (xj) is superior compared to the flipped model xj = f 0 (xi), implying that xj is a better predictor, and hence xj ‘kernel causes’ xi even after controlling for any number of confounding variables in xk. The function ‘gmcmtx0’ of ‘generalCorr’ package reports the non-symmetric matrix R∗ of generalized correlation coefficients. The function ‘parcor ijk’ uses the R∗ matrix as its argument, yielding distinct ouij and ouji as illustrated below. options(np.messages=FALSE) c2=gmcmtx0(cbind(x1,x2,x3));c2 parcor_ijk(c2,1,2) Note that the c2 matrix is asymmetric. Hence partial correlation coefficients are distinct. c2=gmcmtx0(cbind(x1,x2,x3));c2 x1 x2 x3 x1 1.0000000 -0.5556230 -0.1159349 x2 -0.2308683 1.0000000 0.0000000 x3 -0.7299638 -0.8796286 1.0000000 parcor_ijk(c2,1,2) $ouij [1] -0.2413045 51 $ouji [1] -0.6873292 $myk [1] 3 Since the magnitude of ‘ouji’ exceeds ‘ouij’ x1 is likely the kernel cause of x2 after removing the effect of x3, i.e., after controlling for x3. 1.25 Further Advanced Exercises by H.D. Vinod Can we view multiple regression as a series of regressions involving transformed variables with only one regressor at a time? For example, obtain the regression coefficients when we regress x1 on x2 and x3 from two bivariate regressions of ‘transformed x1’ on ‘transformed x2’ and a separate regression of ‘transformed x1’ on ‘transformed x3.’ (Hint: Use Frisch-waugh Theorem described in Chapter 11, p 452–454. x1 = β1,0 + β1,2 x2 + β1,3 x3 + . (6) We want OLS estimates of the slope coefficient β1,2 from transformed bivariate regression T T xT1 = β1,0 + β1,2 xT2 + T . (7) Let the transformation xT1 be defined as the residual of the auxiliary regression of x1 on x3. Similarly, let the transformation xT2 be defined as the residual of the auxiliary regression of x2 on x3. If this transformation is used, Frisch-Waugh theorem described in Ch. 11 pages 452 assures us that the OLS estimates satisfy: T β1,0 = β1,0 (8) We can check this using the example of the previous section. co1=coef(lm(x1~x2+x3));co1[2] r1=resid(lm(x1~x3)) r2=resid(lm(x2~x3)) coef(lm(r1~r2))[2] The ‘resid’ function of R computes the residuals conveniently. We note that the slope coefficient agrees. 52 co1=coef(lm(x1~x2+x3));co1[2] x2 -0.3274111 r1=resid(lm(x1~x3)) r2=resid(lm(x2~x3)) coef(lm(r1~r2))[2] r2 -0.3274111 Both slopes are seen to be exactly -0.3274111. This is no accident, but a result arising from the Frisch-Waugh theorem. Similarly defined transformed regression yields the other slope coefficient β1,3 from residuals when regressing on x2: co1=coef(lm(x1~x2+x3));co1[3] r1x=resid(lm(x1~x2)) r2x=resid(lm(x3~x2)) coef(lm(r1x~r2x))[2] The output (omitted for brevity) shows that the second slope coefficient -0.4124538 is also the regression coefficient when we regress the new residuals r1x on r2x. References [1] Achim Zeileis, Friedrich Leisch, Kurt Hornik, and Christian Kleiber, strucchange: An R package for testing for structural change in linear regression models, Journal of Statistical Software 7 (2002), no. 2, 1–38. [2] Yves Croissant, Ecdat: Data sets for econometrics, 2006, R package version 0.1-5. [3] J. Fox, car: Companion to applied regression. r package version 1.2-14, 2009. [4] Christian Kleiber and Achim Zeileis, Applied econometrics with R, Springer-Verlag, New York, 2008, ISBN 978-0-387-77316-2. [5] Adrian Trapletti and Kurt Hornik, tseries: Time series analysis and computational finance, 2012, R package version 0.10-28. 53 [6] H D Vinod, generalcorr: Generalized correlations and initial causal path, 2016, Fordham University, New York, R package version 1.0.0, May 2, 2016. [7] Hrishikesh D. Vinod, Hands-on matrix algebra using R: Active and motivated learning with applications, World Scientific, Hackensack, NJ, 2011, ISBN 978-981-4313-68-1. [8] Hrishikesh D Vinod, Matrix algebra topics in statistics and economics using R, Handbook of Statistics: Computational Statistics with R (M. B. Rao and C. R. Rao, eds.), vol. 34, North Holland, Elsevier Science, New York, 2014, pp. 143–176. [9] Diethelm Wuertz and Rmetrics core team, fbasics: Rmetrics - markets and basic statistics, 2010, R package version 2110.79. [10] Diethelm Wuertz and many others. See the SOURCE file, fEcofin: Economic and financial data sets, 2008, R package version 270.75. [11] Achim Zeileis and Torsten Hothorn, Diagnostic checking in regression relationships, R News 2 (2002), no. 3, 7–10. 54