Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EASTERN MEDITERRANEAN UNIVERSITY Faculty of Business and Economics Department of Business administration MGMT 434 Research Method in Business Studies Practical Computing Classes SPRING 2016-2017 Handout 5: Project By Stata SE 11.0-14.0 Coordinator Instructor Prof. Dr. Sami Fethi Mr. Amin Sokhanvar [email protected] [email protected] Project STATA SE 11.0 Sami Fethi PROJECT You are expected to investigate the relationship between quantity demanded of chicken meat and its determinants by employing the data set in Table 1. As a chief analyst, you have been asked to prepare a report for your managing director on the impact of the quantity demanded of chicken and its determinants. This report needs to address the concerns all the following: (a) Introduction (b) Brief information on the empirical literature (i.e. demand theory). (c) An explanation of the model and the methodology used. (d) Construct a correlation matrix as well as descriptive statistics among the variables for the equation. (e) A discussion of the coefficients associated with the regression equation. (f) Comparison of the regressions using the estimated results (t-stat, F-stat, ANOVA). (g) Any other comments you feel relevant to the issues being addressed. (h) Conclusion and recommendations. 2 Project STATA SE 11.0 Sami Fethi Tips Tips for Demand Estimation 1. Using the data in Data set 1; specify a linear functional form for the demand for chicken. 2. Based on the variables in the demand equation, estimate both the descriptive statistics and the correlation matrix. 3. Create the natural log of the existing variables (LY, LPC, and LPB…). Also create a constant and time trend by giving c and t respectively. 4. Run a regression to estimate the demand for Chicken consumption by conducting OLS. 5. Having run the regression in step 4, we need to check the output result. When we look at the result, we realize that there exists serial correlation. Or we can just drop the most insignificant variable from the model. 6. What happens if we estimate the equation without the price of substitute? Check whether the estimated Durbin-Watson statistics significant or not? 7. Find the short-run demand effects of the equation. Hint: DLY = a + b DLPC+ c DLYD4 +………….+ i, t 8. Evaluate the regression results by examining signs of parameters, p-values (or t-ratios), F-ratio and the R2. 9. Using the estimated demand equation’s results, find out income elasticity, own-price elasticity, and cross-price elasticity (i.e. pork and beef). 10. Comment on chicken product whether is luxury consumption or not? 11. Based on its own price elasticity coefficient, discuss whether price is elastic or not? 12. Is the demand for chicken affected by the variation in the prices of pork and beef? 3 Project STATA SE 11.0 Sami Fethi 4 Project STATA SE 11.0 Sami Fethi SOURCE: © Gary Koop(2000) ‘Analysis of economic data’, John Wiley and Sons, Ltd, England. 5 Project STATA SE 11.0 Sami Fethi DATA SET 1: Annual Data for the demand of Chicken meat Year Y PC PB PR YD 1960 20.22 11.1 23.25 12.2 20.4 1961 20.78 9.5 25.95 10.1 20.2 1962 21.71 9.1 25.94 10.2 21.3 1963 22.46 9 27.22 10 19.9 1964 24.1 8.9 27.82 9.2 18 1965 25.63 8.4 29.77 8.9 19.9 1966 27.34 9.2 32.08 9.7 22.2 1967 28.95 7.1 32.62 7.9 22.3 1968 31.14 7.8 32.88 8.2 23.4 1969 33.24 9.2 34.9 9.7 26.2 1970 35.87 8.7 36.88 9.1 27.1 1971 38.6 7.3 36.74 7.7 29 1972 41.4 8.7 38.49 9 33.5 1973 46.16 14.7 37.01 15.1 42.8 1974 50.1 9.5 36.93 9.7 35.6 1975 54.98 9.3 36.7 9.9 32.3 1976 59.72 12.5 39.84 12.9 33.7 1977 65.17 11.7 40.71 12 34.5 1978 72.24 12.1 43.1 12.4 48.5 1979 79.67 13.6 46.64 13.9 66.1 1980 88.22 10.7 46.91 11 62.4 1981 97.65 10.8 48.45 11.1 58.6 1982 104.26 10.1 49.52 10.3 56.7 1983 111.31 12.4 50.83 12.7 55.5 1984 123.19 15.7 52.83 15.9 57.3 1985 130.37 14.4 54.81 14.8 53.7 1986 136.49 12.3 56.47 12.5 52.6 1987 142.41 10.5 60.27 11 61.1 1988 152.97 8.6 62.28 9.2 66.6 1989 162.57 14.2 66.17 14.9 69.5 1990 171.31 8.9 69.08 9.3 74.6 1991 176.09 6.8 72.12 7.1 72.7 1992 184.94 8.4 75.38 8.6 71.3 1993 188.72 9.8 77.14 10 72.6 1994 195.55 7.2 78.61 7.4 66.7 6 Project STATA SE 11.0 Sami Fethi 1995 202.87 6.3 78.23 6.5 61.8 1996 210.91 6.4 81.42 6.6 58.7 1997 219.4 7.3 83.67 7.7 63.1 1998 231.61 7.8 83.89 8.1 59.6 1999 239.68 6.9 88.87 7.1 63.4 Introduction Economics begins and ends with the “Law” of supply and demand. The laws of supply and demand are an important beginning in the attempt to answer vital questions about the working of a market system. Demand for a good or service is defined as quantities of a good or service that people are ready (willing and able) to buy at various prices within some given time period, other factors besides price held constant. Every market has a demand side and a supply side. The demand side can be represented by a market demand curve which shows the amount of commodity buyers would like to purchase at different prices. Demand curves are drawn on the assumption that buyers’ tastes, income, the number of consumers in the market and the price of related commodities are unchanged. 7 Project STATA SE 11.0 Sami Fethi Literature Review Consumer demand theory postulates that the quantity demanded of a commodity per time period increases with a reduction in its price, with an increase in the consumer’s income, with an increase in the price of substitute commodities and a reduction in the price of complementary commodities, and with an increased taste for the commodity. On the other hand, the quantity demanded of a commodity declines with the opposite changes. The inverse relationship between the price of the commodity and the quantity demanded per period is referred to as the law of demand. A decrease in the price of a good, all other things held constant (ceteris paribus), will cause an increase in the quantity demanded of the good. An increase in the price of a good, all other things held constant, will cause a decrease in the quantity demanded of the good. Economic Theory Background: The general demand function shows the relationship between quantity demanded and the following six factors can be expressed in the next equation: Qd=f (P, I, PR, T, PE, N) Qd is quantity demanded of the good and service, P is price of the good and service, I is consumer’s income per capita, PR is price of the related goods and services, T is taste patterns of consumers, PE is expected price of the good in some future period and N is number of consumers in the market. 8 Project STATA SE 11.0 Sami Fethi The Theory says: Qd=f (P, I, PR, T, PE, N) -, +/-, +/-,+, +, + N/I, S/C DATA AND MODEL It is worth stressing that the data set used in this study was obtained from http://salvatore.swlearning.com. The data, cover the period between 1960 and 1999, are annually time series. Y= a PCb PBc PRd YDe - this is nonlinear form of the demand equation. This can be transformed by taking natural logarithms into the following loglinear specification: Ln Y = Ln a + b ln PC + c Ln PB + d Ln PR + e ln YD Y is per capita chicken consumption in pounds, PC is the price of chicken in cents per pound; PB is the price of beef in cents per pound, PR is the price of 9 Project STATA SE 11.0 Sami Fethi pork in cents per pound; YD is the US per capita disposal income in hundred of dollars. The Theory says: Qd=f (PC, PB, PR, YD) - ,+ ,+ ,+ MODEL ESTIMATION First step: Correlation Matrix corr ly- lyd (obs=40) | ly lpc lpb lpr lyd -------------+--------------------------------------------ly | 1.0000 lpc | -0.0806 1.0000 lpb | 0.9834 -0.2118 1.0000 lpr | -0.1430 0.9947 -0.2732 1.0000 lyd | 0.9492 0.0894 0.9109 0.0272 1.0000 or pwcorr ly lpc lpb lpr lyd, obs sig | ly lpc lpb lpr lyd -------------+--------------------------------------------ly | 1.0000 | | 40 | lpc | -0.0806 1.0000 | 0.6210 | 40 40 | lpb | 0.9834 -0.2118 1.0000 | 0.0000 0.1895 | 40 40 40 | lpr | -0.1430 0.9947 -0.2732 1.0000 | 0.3787 0.0000 0.0881 | 40 40 40 40 | lyd | 0.9492 0.0894 0.9109 0.0272 1.0000 10 Project STATA SE 11.0 | | 0.0000 40 0.5832 40 0.0000 40 Sami Fethi 0.8678 40 40 NOT: It is expected to have low correlation between explanatory variables and high correlation between the dependent variables. As can be seen from the table above, the correlation coefficient between ly and lpc .is slightly low. The other coefficients seem that they do not create any fragile results. In the table above, the top number is the correlation coefficient itself, the number below is the two-tailed p-value for the correlation, and bottom number is the sample size. This step also gives us further evidence for regression analysis. SECOND STEP: Descriptive statistics . summarize y-yd summarize ly- lyd Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------ly | 40 4.325927 .8271163 3.006672 5.479305 lpc | 40 2.255696 .2417019 1.84055 2.753661 lpb | 40 3.849958 .389262 3.146305 4.487175 lpr | 40 2.298974 .2355904 1.871802 2.766319 lyd | 40 3.735565 .4779164 2.890372 4.31214 This gives us a general idea about data in terms of mean, standard deviation etc… THIRD STEP: regress ly lpc lpb lpr lyd Source | SS df MS -------------+-----------------------------Model | 26.334994 4 6.58374851 Residual | .345739981 35 .009878285 -------------+-----------------------------Total | 26.680734 39 .684121385 Number of obs F( 4, 35) Prob > F R-squared Adj R-squared Root MSE = = = = = = 40 666.49 0.0000 0.9870 0.9856 .09939 -----------------------------------------------------------------------------ly | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lpc | -.0171159 .816133 -0.02 0.983 -1.673954 1.639722 lpb | 1.787105 .1442181 12.39 0.000 1.494327 2.079884 lpr | .3047523 .8450215 0.36 0.721 -1.410733 2.020237 lyd | .3135002 .1131904 2.77 0.009 .0837114 .543289 _cons | -4.387463 .4215806 -10.41 0.000 -5.243317 -3.531609 ------------------------------------------------------------------------------ Diagnostic Tests Results estat hettest 11 Project STATA SE 11.0 Sami Fethi Breusch-Pagan / Cook-Weisberg test for heteroskedasticity Ho: Constant variance Variables: fitted values of ly chi2(1) = 0.35 Prob > chi2 = 0.5522 Heteroskedasticity Another assumption of the OLS regression model is that residuals are homoscedastic. If the residuals have a constant variance, they were said to be homoscedastic, but if they are not constant, they are said to be heteroscedastic. The effect of heteroscedastic are that even though the regression coefficients are still linear and unbiased, they are no longer the best or minimum variance estimates, thus they are no longer the most efficient coefficient. As a result, in the presence of heteroscedasticity, the usual hypothesis testing routine is not reliable, raising the possibility of drawing misleading conclusions. The model was tested whether error variance is constant or not. The hypothesis is conducted as follows: H0: H1: Б12 = Б22 (Homoscedasticity) Б12 Б22 (Heteroscedasticity) It seems that there is no heteroscedasticity problem in this case. estat dwatson- Multicollonieary Durbin-Watson d-statistic( 5, 40) = 1.163403 Multicollinearity is the existence of strong relation among some or all explanatory variables of regression. Multicollinearity does not affect the best unbiased estimator of OLS but since some coefficient have large standard errors; they tend to be insignificant, thus making precise estimation to becoming difficult. For this pupose, you can use DW statistics. estat vif Variable | VIF 1/VIF -------------+---------------------lpr | 156.47 0.006391 lpc | 153.63 0.006509 lpb | 12.44 0.080370 lyd | 11.55 0.086555 -------------+---------------------Mean VIF | 83.52 NOTE: When you check the estimated DW- statistic, the general rule of thumb tells us the value should be around 2. A VIF of 1 means that there is no correlation among the variables under inspection. The general rule of thumb is that VIFs exceeding 10 are signs of serious multicollinearity requiring correction. As you can see, three of the variance inflation factors such as, 11.55, 12.44 —are fairly large. The VIF for the predictor Weight, for example, tells us that the variance of the estimated coefficient of Weight is 12 Project STATA SE 11.0 Sami Fethi inflated by a factor of 83.52 because Weight is highly correlated with at least one of the other predictors in the model. It seems that there is a multicollinearity problem in this case. estat durbinalt Durbin's alternative test for autocorrelation --------------------------------------------------------------------------lags(p) | chi2 df Prob > chi2 -------------+------------------------------------------------------------1 | 9.290 1 0.0023 --------------------------------------------------------------------------H0: no serial correlation Autocorrelation (Serial Correlation) Autocorrelation occurs when the results are not independent of each other. The OLS regression model is a minimum variance, unbiased estimator only when the residuals are independent of each other. If the autocorrelation exists in the residuals, the regression coefficients are unbiased but the standard errors will be underestimated and the test of regression coefficients will be unreliable. The most commonly used test for detecting autocorrelation is the one that is developed by Durbin and Watson, known as DurbinWatson (DW) statistics. However, in order to test for autocorrelation, you can use chisquare statistics with the tabular values by testing the following null hypothesis: H0: = 0 (no autocorrelation) H1: 0 (existence of autocorrelation) Since calculated values are smaller than tabular values, estimate regression does not have first order serial correlation. However, our case is other way around (9.29-estimated>3.84 critical value). It seems that there is a Serial Correlation problem in this case. Normality: sktest resid Skewness/Kurtosis tests for Normality ------- joint -----Variable | Obs Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2 -------------+--------------------------------------------------------------resid | 40 0.6128 0.0003 10.96 0.0042 13 Project STATA SE 11.0 Sami Fethi Normality shows us whether the residuals are normally distributed or not, as normal distribution is one of the assumptions of the OLS. To check this assumption, you can use the chi-square statistics employing the following hypothesis: H0: ut = 0 (residuals are normally distributed) H1: ut 0 (residuals are not normally distributed) Since the equation indicates its calculated value of normality is bigger (10.96-chi-sq(2)) than the tabular value (5.99-chi-sq(2)), there is a problem in terms of normality. However some other statistical programs indicate this problem other way around (i.e. the calculated value is 1.048). Functional Form: estat ovtest Ramsey RESET test using powers of the fitted values of ly Ho: model has no omitted variables F(3, 32) = 9.65 Prob > F = 0.0001 Functional form is a problem whether there is the presence of misspecification within the estimation equation. Following hypothesis are tested for the presence of misspecification: H0: = 0 (no misspecification) H1: 0 (existence of misspecification) Since the calculated figures are smaller than tabular ones, there is no variable omitted. In other words, the empirical equation is consistent with the relevant theory. In this case, there is a functional form problem although the other programs do not confirm this result obtained from Stata. FOURTH STEP : Drop the most insignificant variable ( i.e., LPC) from the model regress ly lpb lpr lyd Source | SS df MS -------------+-----------------------------Model | 26.3349897 3 8.77832989 Residual | .345744326 36 .009604009 -------------+-----------------------------Total | 26.680734 39 .684121385 Number of obs F( 3, 36) Prob > F R-squared Adj R-squared Root MSE = = = = = = 40 914.03 0.0000 0.9870 0.9860 .098 14 Project STATA SE 11.0 Sami Fethi -----------------------------------------------------------------------------ly | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lpb | 1.786771 .1413271 12.64 0.000 1.500146 2.073395 lpr | .2871495 .0963688 2.98 0.005 .0917046 .4825944 lyd | .3132104 .1107734 2.83 0.008 .0885516 .5378692 _cons | -4.383231 .3649689 -12.01 0.000 -5.123422 -3.64304 Drop the most insignificant variable (i.e., LPR) from the model regress ly lpc lpb lyd Source | SS df MS -------------+-----------------------------Model | 26.3337092 3 8.77790307 Residual | .347024795 36 .009639578 -------------+-----------------------------Total | 26.680734 39 .684121385 Number of obs F( 3, 36) Prob > F R-squared Adj R-squared Root MSE = = = = = = 40 910.61 0.0000 0.9870 0.9859 .09818 -----------------------------------------------------------------------------ly | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lpc | .2752426 .0932464 2.95 0.006 .0861301 .4643551 lpb | 1.777065 .1397851 12.71 0.000 1.493567 2.060562 lyd | .3118099 .1117186 2.79 0.008 .0852342 .5383857 _cons | -4.301347 .3432137 -12.53 0.000 -4.997417 -3.605277 ------------------------------------------------------------------------------ Drop the price of substitute variables from the model regress ly lpc lyd Source | SS df MS -------------+-----------------------------Model | 24.7757981 2 12.3878991 Residual | 1.90493588 37 .051484754 -------------+-----------------------------Total | 26.680734 39 .684121385 Number of obs F( 2, 37) Prob > F R-squared Adj R-squared Root MSE = = = = = = 40 240.61 0.0000 0.9286 0.9247 .2269 -----------------------------------------------------------------------------ly | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------lpc | -.5708903 .1509282 -3.78 0.001 -.8766998 -.2650808 lyd | 1.668583 .0763305 21.86 0.000 1.513923 1.823243 _cons | -.6194177 .4255991 -1.46 0.154 -1.481763 .242928 ------------------------------------------------------------------------------ estat vif Variable | VIF 1/VIF -------------+---------------------lpc | 1.01 0.992004 lyd | 1.01 0.992004 -------------+---------------------Mean VIF | 1.01 . estat dwatson 15 Project STATA SE 11.0 Durbin-Watson d-statistic( 3, Sami Fethi 40) = .6139399 *When you check the outputs, you will realize that the output results are getting better in terms of t-values. FIFTH STEP SHORT-RUN regress dly dlpc dlpb dlpr dlyd Source | SS df MS -------------+-----------------------------Model | .005183328 4 .001295832 Residual | .01717825 34 .000505243 -------------+-----------------------------Total | .022361578 38 .000588463 Number of obs F( 4, 34) Prob > F R-squared Adj R-squared Root MSE = = = = = = 39 2.56 0.0558 0.2318 0.1414 .02248 -----------------------------------------------------------------------------dly | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlpc | .1588333 .1585995 1.00 0.324 -.1634795 .4811462 dlpb | -.2925925 .1339937 -2.18 0.036 -.5649005 -.0202845 dlpr | -.1466017 .1644654 -0.89 0.379 -.4808357 .1876322 dlyd | .0868503 .0397139 2.19 0.036 .0061419 .1675587 _cons | .0708367 .0056002 12.65 0.000 .0594557 .0822176 ------------------------------------------------------------------------------ Drop the most insignificant variable (i.e., LPR) from the model regress dly dlpc dlpb dlyd Source | SS df MS -------------+-----------------------------Model | .004781881 3 .00159396 Residual | .017579698 35 .000502277 -------------+-----------------------------Total | .022361578 38 .000588463 Number of obs F( 3, 35) Prob > F R-squared Adj R-squared Root MSE = = = = = = 39 3.17 0.0361 0.2138 0.1465 .02241 -----------------------------------------------------------------------------dly | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------dlpc | .0183891 .0180954 1.02 0.316 -.0183466 .0551248 dlpb | -.2605479 .1287019 -2.02 0.051 -.5218266 .0007308 dlyd | .0767796 .0379611 2.02 0.051 -.0002856 .1538448 _cons | .0703506 .0055572 12.66 0.000 .0590689 .0816323 ----------------------------------------------------------------------------- EMPIRICAL RESULTS In the long run: Having estimated the log linear demand function, all slope coefficients are partial elesticities of per capita consumption chickens (Y) with respect to the appropriate the real disposable income per capita (its elasticity is about 0.32 percent), the 16 Project STATA SE 11.0 Sami Fethi retail price of chicken per pound (its own price elasticity is about -0.02 percent), the retail price of pork per pound (its cross price elasticity is about 0.32 percent), and the retail price of beef per pound(its cross price elasticity is about 1.32 percent). Individually, income and cross-price elesticities of demand are statistically significant. The demand for chicken with respect to its own price is price inelastic because absolute term of elasticity coefficient is less than 1.The two cross-price elasticities are positive and suggesting that the other two meats are competing with chicken however, the beef is only statistically significant. Thus it seems that the demand for chicken is only affected by the variation in the price of beef. CONCLUSION In this study, a demand model was investigated to find the relationship between quantity demanded for consumption of chicken and its determinants in the case of US economy for the period of 1960-1999. In order to see the impact of per capita consumption of chicken on income, own price and cross price elesticities. The empirical findings show that the demand for chicken is only affected by the variation in the price of beef. It also indicates that the demand for chicken with respect to its own price is price inelastic. 0 1 2 3 4 NOTE: Graphical presentation can be thought to add to the text. mean of ly mean of lpb mean of lyd mean of lpc mean of lpr 17 Project STATA SE 11.0 Y PB YD Sami Fethi PR PC 18