Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
~--- -----~-- INSTIT~TE RLA:2:4S OF S1ATlSTIC~ THE USE OF .REGRESSION TECHNIQUES WITH ECONOMIC· DATA • by R. L. Anderson Institute of Statistics, North Carolina state Gollege, Raleigh h The MethQ.d Qi Jr.J" The ~ Mathem2j~.:i.Q.al. ~ Squares Model. ! Single MYltiple Regression ~atign. In the traditional regression analysis we assume that we have a single dependent variate Y and several independent ft#Jtq variates, Xi' say J;:in number., These ve.riates are connected by the following mathematical model logarithmic values in order to meet the requirement that the variance is constant from observation to observation. ". The st~tisticianls and an estimate of (J'-'2 problem is to determine the values of the estimates bi from the sample of 11. The method of least squares has ~ certain desirable properties and produces the same estimates as the method of muximum likeliho?d for tho type of distribution assumed for (1). After determining • ~he bi' it is also desirable to have estimates of their variances and an over-all tGst of tho Gdoquacy of the regression model. tho sum of sq~~res of ,the sample residuals, r . SSE =.80 2 = SCy - ~~ b.X )2 i = 1 J. i , Tho method of least squares minimizes The quantity to be minimized 1s -2-0 where S indicates s1.1.JIlInation over the sample of We have not used a second ~ subscript to indicate the sample number, because it complicates the equations; instead it is understood that whe~ever S is used, we are summing over the sample. =0 This is called th€ leading equation for 'bj,. Therefore we have the following ,13et of ~ equations in the r unknowns, bi_ blSXf + b2SXl X2 + • • • + brSXIXr = SXIY bl S:lS.X2 + b2SX~ + • • • + br SX2Xr = SX2Y • • e If these equations are solved by the matrix-inversion method, which utilizes either Fisher's £-values [41 or Snedecor's k-va1ues [12) , we have at once the necessary tools to determine an estimate of QS (J'2 and the variance of each bi' well as a test of the adequacy of the entire model. For a short-cut matrix- inversion technique, one which we are using at North'Carolina State College, I refer you to the abbreviated Doolittle method described by p. S. Dw,yer [3J • I believe that the use of matrix-inversion m3thods in regression analysis needs a little promoting. Once a computer has mastered the.matrix-inversion techniques, she can handle regression problems in very little more time than rdth the elimination method which has been generally used in the past. In addition the matrix-inversion method furnishes the requisite data with which to compute the vo.riances of each bi, slllur~l iJRg 1t ! I' au t 1 iiS ,,'I' flUal ~biiUli=:it~:R:!Z8:::s=:I'RiJ!II22l11+1zl!'1PlYc!!'!l2~sl'ii!'eet!l11~R!MI.i~t~kwi'Mia~,-1W;Ail-l&iilil1'l-JII:-!1~s-eglPQ;:L:lI;:;.~.=t::'l=_Q_. SA [til; 8ld.;. JilQ:221OI then Profossor Pooch of our staff has written up an excellent description of the abbreviated Doolittle method ~ with an example. Copies can be obtained by writing to the Institute of Statistics, North Curolina State College. 'it After we have computed the bi, the total reduction in SY2 attributable r. to regression is SSR = 2: i =1 bi(SXiY). Hence the residual (or error) sum of squares is :L bi(SXiY) = Sy2 - SSE = Sy2 - (5) Ssa. SSE, under the assumptions made with (1), is distributed as degrees of freedom.. Hence an unbiased estimate of If we set up the null hypothesis that X. 2 a-2 , distributed as but with {(3 i = o~ , X 2 q-:2 with n - r .)'2 is s2 = SSE/en - r). then the reduction SS~ is also ~ degrees of freedom. Hence we can test this ntul hypothesis by means of e (6) F=SSR.n-r SSE r' with ~ and n - r degrees of freedom. In most statistical investigations one of the regression coefficients, say b]" estimates?, the mean of Y, which is not assumed to be 0 1I1hen testing the adequacy of the model (1). formin~ (7) This is accomplished by setting Xl all the other Xi to deviations from their means. Y=m+ . where xi i t= = (Xi 2 trans- Hence the model becomes b:xi + e it JC;.) - =1, am and m is the estimate Of,r. The least squares equations are the same as before except the first equation becomes simply (8) The other =y ..§I ·n m= ~ • equations use xi instead of Xi with ~ not in the equations. For example the b 2 equation is (9) b~Sx~ + b3Sx2x 3 + • • • + br Sx2x r = SX2Y • As everyone kno1:/s, we compute SXiXj it {~i = ~ = SItXj the expected value of Y is r SSRI = 2:: i =2 bi(SxtY), Under the null hypothesis, Jf' , not assumed to be zero. in sum of squares due to regression is (10) SX _ S1 .1 • T-he reduction INSTIT&JTE OF STATISTICS -4is distributed as f.. 2 (j"'2 with !:..=-1 degrees of freedom \7hen (~i = ~ l ~hich i = 2, 3,. • ., r. The error sum of squares is SSE (11) = Sy2 _ ny2 _ SSR', 1. 2:jJ'2, ';ihich is distributed as as before, with !l...::-}; degrees of freedom. If the ma.tri.x .inversion mnthod of determ::r.ing tbe bi is used, tho est,jm'1.ted variance of each l.>~ Is simply CliS2, where ci:i. i~; the 0.:5.o.5'onI11 elemcn+, 'en tl18 inverted matrix" estime.te of (;.,' :-'.! ~ .... Th'.':; 01' can be used to a',,-c.a;'Jh cnu'.'idence ""81' i.~.nco l::m:'~;f) to tIle to make a test of significance by means of . li f hi'" i = ---•._ . - ; s '\ eii with n - r degreos of fro8dom. Finally it should be emphasized thnt the analysis of variance is a special type of a linear regression, in which the Xi are definitely fixed numbers. I have discussed the use of the analysis of variance with economic data emphasizing I ., the assumptions needed,in'an article on the analysis of hog prices ..., 1 -;. 1.2. An Examo10 of a Dairy Cost ·Study: For reasons which will be explained in the next section, much economic data do not meet the requirements for the multiple regression analysis. However certain types of data do seem to meet these requirements, such as data acquired from a set of farms, business rla~es, etc. at a given period of time in which certain variables can be cons::ldored. to be more or loss fixod and can be used to estimate some dependent variable" One such study was a dairy cost study of 89 dairy farms in central North Carolina in 1941 l.- 6 J1 • Data were available for each farm on tho amounts and the costs of concentrates, of silage, and of roughage per cow, the cost of pasture per cow, the amount of milk sold per COY/, the nUlnber of cows, the net incone per cow, the labor in00me I p9r CO~[. ~ow, the hours of 1nbor per cow, and the total digestible nutrients ('1'D:;) p0.r It vns docidod to usc milk sold perco"j/ as the dependent variable s:".nnC) 1~~:-'1)1 income was so variable because of varying allowances for the hours of 1ahorusode A preliminary analysis using labor incone as the dependent variable produced no significant regression coefficients. -5I shall demonstrate the multiple regression method with the amount of Dilk sold per C0\7 as the dependent variate (Y) and tle amount of concentrates per cow (~, the a.nount of silage per cO\v (XJ), the amount of roughage per and pasturG cost per Ca\"l ( " as the independent variates. COil Cr;4,· The amounts have been quoted in hundreds of pounds perc0'/1 and the pasture cost in dollars per con. Tho regression model is: ~ (13) Y = m + L.. bixi + e i = 2 with Xi = Xi .... Xi. The computations are presented in Table 1, using the matrix.... inversion method. Table 1: COf.q:mtations for the Regression of the Anount of Milk Sold on the Amounts of FOGd Usod per Coo from 89 Dairy Herds. !(milk sdld) 57.3994 Rums of Squares and Cross-products x3 x4 Xs -6616.17 -93.7732 ...484.289 96710.77 3244.25 1358.95 19230.53 -1251.95 1254.57 10....4.. ·x·· b2 b3 IO:~rr;5Z~ S(-m~~0556220 2.236903 0.1413543 0.1152455 0.0356220 -0.0249484 0.7459220 -0.0951647 U s2 3679.74 3905.56 999.432 702.815 g 0.7459220 -0.095164.7 0.6032603 8.963883 Regression Coefficients ~Stangard Errors = 0.9343 : 0.1348 = 0.0818 : 0.0306 5 SSR' = ;>. biS(xiY) i =2 SSE -0.e249484 0.5636625 0.6032603 Y = 11358.72 - SSR' = SSE/84 = 81.2338 b4 = 0.1021 : 0.0677 b5 0.9276 ~ 0.2698 = = 4535.08 = 6823.64 INSTITrtlTE OF STATISTICS -6_ Some of the computc.tions may require an expla.nation. In order to illustrate how to compute. the values of the bi, consider b2 b2 = 10-4 [(2.236903)(3 679.74) + (0.14l3543){3905.56) + • • • + {O.7459220){702.8l5~ Then the estimated standard error of b 2 other bi, with a standard error of s =s fC22 = 0.1348 = .9343 and similarly for the {'cii • Of these regression coefficients, all but b4 were highly significant. in 1941 the cost of concentrates vms $1.80 per cwt., Since of silage $0.27 per cwt., and of roughage $0.8, per cwt. and milk sold for $3.20 per cwt., we would conclude that added concentrates and pasture were profitable, added silage about paid far itself, and added roughage was unprofitable. It should be stated that in a regression analysis, a.ny regression coefficient indicates the average change to be expected in the dependent variable for a unit change in the particular independent tit variable, with the other independent variables assumed to be held constant. Far example in the above analysis, b2 measures the added cwt. of milk expected from an addition of one cwt. of concentrates to the ration, while leaving the amounts of the other foeds unchanged. I wish to warn against using such results for increases far beyond the means, because we did not have a sufficient range of the independent variahles to study the diminishing, returns to scale. Apparently these COilS \'/erebeing fed too much roughage. Even the average amount of roughage for all hords, 3600 POWlds per carl, was said to be about all a cot! could handle iThen she \/as on pasture as long as most cows in this region are. rougl'pge \1ould be expected to be 'Wasted. Hence added If the pasture costs refiect the true cost of pasture, it appears that increased pasture is really the most profitable . method of producing .milk. pc. sture might be put in an even more favorable light if the effects on soil improvement also 'I,tere considered. 4. 3• cf Producti.qn F3»Xi:Mgns t91:1~C?\;C. Fg.rIJ§. 733 IO\1o. fc.rns in 1939. n study mrs made Tho regression nodel used ~ns: Usil1g date. secured in n ru.ndor.1 s~c or tho production functions [9] • Y = m + blXJ. + b2x 2 + b3x 3 + b4x4+ b5x5 + e J (14) where the xiare &viations from tho means of the Xi-- Y and Xi are given in terms of logarithrs for the following variables: Y = total product : cash s(l.les, home consumption and inventory increases ($) Xl value of real estate ($) X2 nonths of labor (family and hired) X3 = value of nnchinory and equipment ($) X/+ =- vo.lue of livestock on hand and livestock expense, including feed ($) X5 cash operating expenses ($) = = = E~ch regression coefficient in (14) is a measure of elasticity, indicating the averag-e proportionnl change in total product which would result from a change in the: corresponding independent varinte. For example, of product with respect to the value of real estate. [6) relationship for. the dairy study b:J. represents the elasticity We did not use a because we did not believe that there ,ms enough of a. ranee in the varinblm studied to justify using it. relationship is necessary for e dat~ logarithmic The logarithmic of this kind if the range in the variables is very great in order that the assumptions of normo.lity and equal variances for the residuaJs shm.lld hold. Also the logarithmic relationship more adequately measures the diminishing returns to scale. The data were analyzed separately for each of 5 type of farming areas, for each of 5 types of farms, for largo and small farns, and for all farms. The over-all regression coefficients were: hI = 0.2316, (l4n) b2 = .0282, b3 = .0844, b4 = .4767, with a multiple correlation coefficieat of 0.8882. the .01 prob~.bi1ity lovol. bS = .0317, All but b2 ~ere significant at The returnsto labor nere not significant as ilCS found in our da.iry study, prob:'J.bly because of the oxtrone variability in str:.ting the totnl lc,bor used on o:'.ch fc.rn. Since the suo of the coefficients is less tha.n unity, ne concludo that t.here \/ns a dininishing return to scalo, since a 1% incroos-e iuthe input of nIl rosources vlOuld not be follo\"led by a 1% incronso in output. Using the estiuates of tho elasticities (Ua) and the geonetric neans of the variables, estiI1c.tes of tho J:1a.rginc.1 productivities 'iere 1l1so secured, vrith confidence linits• ••4. The Use of the ~nalysi2 Qf Va.ric.nce for the Dairy Cost StudY. The analysis ofvc.rinnce can also be used to study the relationship betueen the anount of I ., -7milk sold per Cm"1 fed to each cow. and the average amount of silage, roughage, and concentrates I omitted pasture from this analysis because the problem h~ve is already quite complicated for an example. We divided the silage feeding into three classes - 0, Low, High; and the concentrates and roughage each into Low and High amounts per cow. The Low-High division was made so that about the same ntunbor of herds would be included in each category. The dividing point for silage was 6000 pounds per cow, for concentrates 2900 pounds per cow, and for roughage 3500 pounds per cow. The average milk sold per each of the 3 x 2 x 2 = 12 CO'17 and the number of herds for classes arc presented in Table 2. Table 2: Average C\Tts. of milk sold for different amounts of feed per cow.* Silage o Concontrates Low Low 53.06(6) Roughage High 46.61(6) High Low Low High Low High Ave. High·"·· 61.92(8) 53.0,(10) 60.1,(7) 55.42(10) 60.92(3) 56.84(44) 60.12(10) 64.07(') 59.59(9) 55.03(9) 62.87(8) 57.95(45.) Concentrates *The figures in parentheses are numbers of cows in each class. We shall consider hore the follovdng mathematical model: (15) Y = m + bUXU + ~2X12 + b20X20 + b21X21 + b22X 22 + b'lX'l + b32X32 + e, \7hore X11 is J. for the 'lotl level of concentrates and 0 elsewhere, 112 is J. for the high level of concentrates and 0 elsewhere, X20 is etc. 1 for no silage and 0 elsewhere, (15) assumes that the interactions are non-existent. This assumption cap be tested by cOr.!puting the total sum of squares attributable to the 1Z classes with JJ. degrees of freedor.! and subtracting the amount due to the main effects with 1 + 2+1 =~ : ¥, degrees of freedom, leaving 1 interaction degrees of freedom. The analysis of variance is somewhat complicated for the data given in Table 2 because of the disporportionate frequencies from one class to another. Snecedor [12J presents several techniques of ha.ndling this disporportionate problom for JNSTIT~TE OF STATISTICS -8-. SOurcp £! variation ~egrees ~ Concentrates (C) Silage (S) Roughage (R) Mean Sgmre freedom 120.5** 16.1 1.4 211-.4 1 2 1 2 C x S Cx R SxR Cx S x R Within Classes .5 1 2 2 22.0 19.0 19.76 77 The harmonic mean of tho frequencies uas 12/1.91508 = 6.26606. Since the within mean square was 123.8, the error mean square for this analysis was 123.8/6.26606 = 19.76. Note that only C is significant. Silage does not become significant in this analysis as it did in the regression equation (13). This method of unweighted means, in general, is quite good if the class frequencies are not too unequal. We mlght m~ke a cooplete least-squares solution·of the problem, whereby we compute the values of the bi in (15) by the methods given in Table 1. Since the total milk sold summed over any of these classifications (concentrates, silage, or roughage), is the same as the grand total, we actually have only 4 independent bls in (15). The simplest method of taking account of this difficulty is to let each bi2 = 0 and consider the following regression model~ (20) '.'Thore the b l indicates that conditions. \10 have changed the variables to meet the dependency ' Tho coefficients of the bi and m' and the right hand-side of the . equations (4) are (21) r m' ~l b20 b' b ~ 44 30 12 .JQ 9 29 13 0 44 44 tJ. 12 13 26 Tho vnriablo for which each 30 29 ,44 f tA n SIi! 5108.55 2370.01 1694.60 1679.66 ~ 17 2500.98 14 is the leading equation is underlined. 26 14 17 "" .. -9TIe next determine the estimates • ml = 62.4205; 1 b:i.J. =-7.53957; bI, which are as follows: , b20 = -3.09490; I b21 = -1.34.355; , b.31= 0•.37S992 Tho reduction due to these j variates is The reduction due to the ~ feed variates alone is SSR - nY2 =1228. Hence we can build up the following analysis of variance table to test the interactions. DegreeR of freedom 4 Source of varintiQn Mnin effects Concentrates (adj.) Silage (adj.) Roughage 1 2 1 Sum of §quares 1228 Mean Square 1159 1.39 3 Interactions 7 Within Classes 77 Total 88 **Highly significant (probability level of .01) 1159** 69.5 .3 600 9530 rms As before, we note that there are no significant over-all interaction effects. since the interaction moan square is less than the within classes (error) mean square. effects. Next we would like to break down the main effects into the three separate If any effect is to be tested, that regression coefficient (or cients) is omitted fro~ the reduction in total the regression equation and matrix [(20) and SQ~ coeffi~ (21)] and of squares due to the remaining effects is computed. The difference between this reduction and SSR = 294,456 is the sum of squares attributable to the omitted effQct, adjusted for the other etfects. consider the concentrates (adj.) sum of squares. , For example, . When we omit hil from (20) anti omit the second row and column fron {21), the values of the other regression coefficients are: The reduction in S~~ of squares due to those variables is 293,297. concentrate (adjusted) sum of squares is Hence the , . (26) e' =1,159. 294,456 - 293,297 This is the o'nly significant effect. Note that the sum of the main effects sums of squares is greater when they are considered separately than uhen togeth~r. The rncthematical statistician recognizes this is an evidence of a certain correIation betueen the effects because of the disproportionate frequencies. However the frequcmctes in this case were not variant enough to produce much of a discrepancy between the two sums of squares. J.d,. Smmngrv, I have presented an example of the use at tho multiple regression and the analysis of variance models in determining the best use of feed tor producing milk. The analysis of variance model is not quite so sensitive to differences in trds case, because we did lose some information in grouping the independent variables as ue did in the analysis of variance model. The analysis at have boen an exact representation of the experiment, a.nd when an experiment is so planned, 't'1e know that the assumption of fixed Xi is met exactly. In my article on the analysis of hog prices [1 J, I had data on the ratios between the prices received at Cincinnati and Louisville for each of two weight . classes on each of the five complete m!lrket days of the week a.veraged for each month over five years. problem was avoided. Since we had complete data, the disproportionate frequencies In this case the simple analysis of variance techniques for multiple classifications could be used. These examples have been presented to show that there arc certain types of economic data which meet the requirements of the multiple regression model (1). These requirements are: (1) Fixed independent variates from sample to sample, (ii) ; ......... , INSTIT&JTE OF STATISTICS -11.. normal nnd indepondent residuals with the sarno variance, and (iii) the indepondent effects are additive. The oconomist finds that (i) is often not met unless he can plan the experiment as suggested abovo for the dairy study or he can regard tho Xi as rathor typical of the situation which will continue to prevail. Transformations of the type exple.ined by M. S. Bartlett [2 ] can usually be used to handle nonnormality, unequ~l variances, and lack of additivity. For example multiplicity of the effects is reduced to additivity by the use of logarithms. The problem of lack of independence of the residuals and correlation between residuals and the main effects has not as yet been satisfactorily solved. serial correlation ~~y be made here. in th~ hog price article r1J• Some use of the tests for These problems have been discussed in detail An Econometric Model, Using A System of Simultaneous Equations. ~e ,.1 Defects of the Multiple Regression Method. Much economic data do not meet the requirements for using the multiple regression model (1). There are four main difficulties. (i) fv"any of the X-variates cannot be considered any more constant than the Y-variate. As P,rofessor Viarschak has stated [11) .• The economist has no independent variables at his disposal because he has to take the values of all variables as they corne, produced by a mechanism outside his control. This mechanism is eA~ressed by a system of simultaneous equations, as many of them as are variables. The experimenter can isolate one such equation, substituting his own action for all the other equations. The economist cannot. (page 143). For example, the economist may be given a series of average prices and quantitites so~ at these prices. But these series cannot be summarized in a single equation which reflebts how prices and quantities (and other economic variables) interacted to produce the final results. We might draw up a single regression equation expressing price as a function of quantity, time and any other pertinent variables.. However this prediction equation probably would tell us nothing of the price which would have been obtained at a given point of time if some of the 'other variables had taken on different values. This prediction -12equation would be merely a historical record of the price-quantity relationship through time. T. Haavelrno was one of the first econometricians to recognize that if a systom of equations is needed to describe a given economic situation. (3 i tho ostimates of the for any single equation will be biased unless the entire system is considered simultaneously [7] • In order to estimate the parameters in a system of equations, we must use the general method of maximum likelihood. The mathematical complexities in many cases are almost insurnountable. methods of estimation have been developed but DQ adequate tests of significanoe. (ii) Anothor major probleu in economic regression modelS is the of each regression equ<'1.tion. a supply equation or 0. To date, "idcntification~ For example, we may want to estimate a demand and production function and a profit maximizing equation. It is necessary, if the system of equations is to be used as a guide for economic policy, that·the analyst be able to "identify" each equation after the estimation e has been completed. (iii) When the data used in estimating the parameters in an econometric system are derived from a time series, ~e run into another def~ct model - the residuals are not independently distributed. in the usual regression Techniques of handling theso difficulties are being developod, but these techniques merely add to the already overburdoned complexities of the estimation procedures. As noted b~foreJ aberrations fron the other assumptions regarding the distribution of the residuals normality and tho sllme variance - can uSU<'llly be taken care of by the use of transformations. (iv) As noted before, it is also necessary that the effects be additive. Tho problem of setting up systems of economic equations has boen under considoraM.on by the Cowles Coooission for Research in Economics at tho University of Chicago for almost a decade. ~hrschak The research has beon directed by Professor J. TIith tho ablo assistance of such men as T. W. Anderson, W. H. Andrews, • -13M. A. Girshick, T. Haave1no, L. Hur\7icz, L. R. Klein, T. Koopnans, H, B. Mlnn, H. Rubin, G. Tintner, and A. Wald. O. Reiereol, M. G, Kendall, H. Wold and J. R. N. Stone, to nention a few, have also nade important contributions to the problems involved, A list of sone of the pertinent publications is given at the end of this a.rticle. An exhaustive and r:i. gorous troatnent of the theoretical rosults so far obtained will be published soon in g,to.tistical Inference Dvnn.r.U.9 EQ.,ononic Models, Cowles COI!IIJission, Monograph No, 10. in One of the best articles which presents the basic concepts of this new approach to the usc of regression equations in econonics is The Prgbabilit! Approach T. Ho.avolmo [g ~ ..2, ~ Economgtrics by J• An E:@gple of the Simultanoous Equation Model, As an example of the techniques used by those econometricians, I shall use the results presented by e Girshick and Haave1no in Statistical Analysis 2.t ~ Denand for fQod [5]. It is shmm that the denand equation cannot be estimated unless the supply equation is either knoviO or estinated simultaneously with the demand equation. A system " of five equ2tions was set up,l Y1(t) = 0.10 = agO Y4(t) = a40 Y5(t) = 0.50 + a52Y2(t) Yl(t) (27) + a12Yg(t) + o.l3Y3(t) + ~gXg(t) + ~9X9(t) + ~(t) a22Y2(t) + 0.24Y4(t) + b2SXg (t) + + e2(t) Y3(t) = 0.30 + + b37X7 (t) + b39X9(t) + e3(t) In (27) Yl Y2 Y3 -,---- + + D. Y 45 5(t) + + + b46X6(t) + b4gXg(t) + e4(t) + 0SgXg(t) + + o~(t) = consunption of food por capita. ":. = retail price of food products divided by the cost of living. = disposable inconepo~ ca.pita..·d!v1d~~by' the cos~ 9fliving. ~4 = production of ngriculturnl food products per capita. Y5 = prices received by farners for food products, divided by the cost of living, X6 = Y5 for the previous yoar. 1.07 = net investr.lents per capita divided by the cost of living. Is = tine (1922 = 1 through 1941 = 20). X9 = Y3 for the previous year. InC" ngto that aio are not moans but y-intercepts. For example, 0.30 - b39X9. = !) Hence TIe actually use the model corresponding to (7) and (9). - b370 INSTITIaJTE OF STATISTICS -15"'" Incidentally, one important sinplitication 't!ould have been to regard Y4(t) as a _ fixed variable so far as the rotail J:'I.a.rkot was concerned. This vlOuld have reduced tho nunbor of required equations to four. The nunbor of paraneters to be estimted in (27) a.re 19 ~egression coefficients plus tho vnriances and covario.nces of the residuo.ls. As explained before, the data. nero yearly indexes for e,.,ch of the twenty years, 1922-1941. It is impossible at ~·i. this time to estinate how nany degrees of freedom are represented in 5 series of jointly dependent variables (Y) and 4 series of predetermined variables (X), nor can any statement be mde as to the number of degrees of freedom used in estimating the pnraoeters. As I stated a.bove, no tests of significance have been developed to detornine the over-all adequacy of the model a.nd the usefulness of various terms in the individual equations. The complicating feature of a possible serial corre- lation in the series of indexes has also beon neglected in the analysis presented in this article. I do not intond to outline the computing procedures used in estimating the pnranotors. These ha.ve boen adequately prosented in the article itself. and are , too highlynatherontica,l to be condensed into a short summary. th~t I might mention equation 3 in (27) can be handled by the nethod of least squares, since it conto.ins but one dependent va.riable, the other variables being fixed. The problem of identifiability can be illustrated by a, compa.rison of the first two equations in (27). If VIC lot H be the total nunbcr of yt s in a given equation a.nd K the. nunbcr of Xts not used in this equation, then an equation is said to be completelY ;L,gont;J.fiod if K :: H - 1. It is said to be o}'er-identit:j.ed it K:> H - 1. And if K < H - 1, sone restrictions nust be inposed on the vcria.nce-covnriance matrix of the residuals. Honce for identification purposes, K ~ H - 1. emphasized that these are nathenntical linitutions on tho nodel. ~ in the first equation K = H - 1 = 2 and in the second equation K It should be We see that =3 and H- 1 Hence both equations nre identified, but the second is over-identified. The = 2, difference between conplete and over-identification is nerely a difference in conputotional procedures used in estimating the regression coefficients. '"' . a practical st~ndpoint, From identification involves the necessary features in an equation so that it is distinguishable froE all other equations in the systen. Cooparing the first two equations, we note that the demand e~uation has two income variables (ono fixed) and no production variable, while the supply equntion has 0 production variable and no income variables. The estimates of the regression coefficients found in this stuQy were as follows, (28) 0.10 = a20 = 97.677 13.319 a30 40.731 040 0.50 = = a12 a22 a52 == = 0.246 0.157 2.883 0.247 a13 = a.24 = 0.653 a45= 0.556 81.250 = -200.068 b:J.s =- 0.104 b28 = 0.399 b4S b58 bI9 = 0.051 b39= 0.367 b37 = 0.203 b46 The a i 2 regression coeffic~ents =- 0.190 = 0.656 =- 0.300 measure the influence of the retail price of food, respectively, on consumer demand, on retail supply, and on commercial demand. We note that the values of ai2 correspond to the economic theory tho.t consUtler dena.nd should decree.se as the price increases and that the other two should increase. Sioilnrly we note that the dernnnd increases with an increase in the disposable income (a13), that the supply of retail goods increases wit~ an increc.se in theagriculturnl proq.uction (a24), and that agricultural production increases ~lith.an increase in the current price level of fam products (a45). Also we note the depressing influence of the previous year's farm prices on the . . current faro production (b46), which I cannot explain. Also t1?e uneven influence of the time variable (XS) noeds some explanations. ~ Incone for the previous yea.r appears to have bed little influence on the consuoer denand in the cUrrent year (b:J.9) but na.turnl1y \7[tS closely relnted to the current yaa.rls incone (b39). And finally we note the apparent sizea.ble increa.se in incone when net investnent was -17incrensed (bJ7). It is unfortunate that it is not possible to place confidence limits on these estimntes. Ifue were merely interested in forming predictive equations for the dopondent variables(Y) in terns of some or all of the predetermined variables (X), Vie could h:we used the suns of squares and cross-products given in Table 3 and . osti.r.:c.ted, by the single equntion method, the regression coefficien1:l3 in these eq'l1.Q.tions: (29) Ii = 0.1 + b~6 X 6 + b~07 + blsXa + bl9~ (1 = 1, 2, J, 4, 5), ..... where Yi is the predicted value of Yi, neglecting the residual. These equations are ca.lled !treduced" equD.tions and can also be determined by solving the equations in (27), omitting the ei, simultaneously for the Its in terms of the X's. We note that equations 3. in (27) and in (29) are the same,because I3 was estimated only from predetermined variables. Equations (29) in general do not represent fundamenta.l economic relationships, such as demand or supply functions. Koopmans states As (10) • We consider the problem of estimation of parameters like elasticities of demand or supply,marginal propensities to consume, etc., not the problem of prediction of the value of economic va.riables like production, prices, etc. It is true that the use to which we put estimates of the parameters of relationsbetw2en economic variables is again one of prediction. But often this is a different type of prediction, in which the effects of certain presumed acts of policy like price regulation, or instituting compensatory public TIorks, or influencing savings habits, etc., are to be predicted. In such cases we. are dealing ..lith a type of pr~n in \1hich one, or more of tho relations found to govern the past are , and which is therefore not a straight forecast assuming continuation of all past relationships. (Page 451). In this article, uhich presents an excellent condensation of the simultaneous equation method, Koopnans points out that under certain restrictive circumstances it is possible to use the single equation estimates, of the structural coefficients in (27). bI, in (29) to determine all However, these conditions are seldom met in practice, so that it is necessary to use the nore complicated procedures which have bson developed by the Cowles Commission Research 6roup. Koopmans also points out that the use of time-lags often allows the statistician to use single equation estiTh~tion with a system of equations [10] • JNSTITfWTE OF STATISTICS -18- e In order to demonstrate how uell the four predeter~ned variates be used c:s predictors of each of the dependent variates (Y), Vie (X) could present in Table 4 the regression coefficients for tho reduced equations (29) with their standard errors, based on single-equation estimation. Also included is a measure of the adequacy of prediction in oach case, the squared multiple corre1a~ion coefficient 2 R I R2 (30) = S8R/8y2, where sy2 = 8y2 - n,Y:;' • Table 4: Reduced for~l Regression Coefficients and the Standard Errors for the _ Study of the Demand for Food. 1 Dependent Independent Variate Variate Constant aI X6 X7 Xs Xq R2 Yl 87.932 Y2 80.560 Y3 40.731 Y4 97.923 -0.059 (0.052) 0.241 (0.134) 0.040** (0.011) 0.041 (0.028) -0.128 (0.112) 0.649* (0.277) 0.203** (0.030) 0.062* (0.024) 0.161* (0.058) -0.041 (0.085) -0.253 (0.219) 0.154* (0,070) -0.052 (0.179) 0.367** (0.038) 0.180 (0.150) -0.287 (0.370) -0.487* (0.184) Ys 45.072 -0.078 (0.452) *Significant at .05 probability level; **at the .01 level. 0.7595** 0.5865** 0.8833** 0.5651* 0.6549** lThe regression coefficients which are given in the first line for each dependent variable rofor to the reduced. equations (29) in the text. The values in the second line (in parentheses) are the standard errors, computed on the basis of singleequation estimation. From Table 4 we can drau some inferences about the reliability of the estimates of the reduced form regression coefficients. (i) The regression of Y2 on the X's is rather peculiar in that no single regression coeffient is significantly greather than 1ts standard error, yet the over-all reduction is highly significant. (ii) We note tlli~t in all casos the dependent variables declined through tine (XS); hovlcver, the decreases for Yl (food consumption por capita) and Y5 (prices received e by farnors) \70re smaller th.'ln their standa.rd errors. The only real reduction uas in Y4 (agricultural production) probably because of the AAA progran after 1932. -19Renecbor that this ShO,IS a reduction in Ys, after adjusting for the effects of the other indopondent variates. (iii) Prices received by farmers during the previous year (X6) exerted a stiJ:1U1nting effect on the two dependent price variables (Y2 and YS), especially Y5 ,; . f ~hich should be highly related to its previous yearls data. The un~xplaina.b1e noga.tivo influence of X6 on the current agricultural production (I4) is not significant but still disturbing. (iv) Net invostment (X7) stimulated all of the dependent variables, and usually by a. large amount. (v) The influence of the previous yearts income (~) was definitely positive on consumption (II) and of course on tho current year's income (I3). The unexplained negative effects on prices were smaller than their standard errors. If we could solve directly for the structural regression coefficients (27) in terms of the reduced form ones (29), then ~e ~ght attach tentative standard errors to the structural coefficients using the standard errors given in Table 4. It should be reaffirmed that \113 cannot use equations (29) for policy formulations because they do not represent real economic relationships. For example, the Il and Y2 equations (29) are nixod-up supply and demand equations. II cannot be called a denand equa.tion, because it does not consider the influence of current pricos on the consunption; sinilarly Y2 does not consider the influence of consurJption or production on retail prices. In other words, we must consider some of the dependent variables along with the independent variables if a real econonic relationship is to be estinated. Conp8.ring the va.rious R2, we note that the varia.ble requiring only independent va.ria.bles for estina.tion (Y3) he-s by far the Inrgest R2. This rather sustn.ins a statement nade at tho close of the Girshick-Hnnvelno article: -20~ 1'1e should like to point out, however, that when one is searching for economic relations one cannot in general expect to find as high correlations as those obtained from a mechanical application of the method of multiple correlation to the va~iables in the structural equations. For the correlations obtained by the latter procedure are due not only to the acceptance of the same predeterr.lined variables in the equation of t he reduced form, but also to the intercorrelations between the residuals in these equations. The method of multiple correlations would produce high correlation at the expense of a bias in the estimates of the structural coefficients involved. (Page 110). structt~al Girshick and Hao.velmo have used the term "multiple correlation" in the same sense that I have used "multiple regression" and probably refer to the higher correlations which nould be obto.ined \7hen some of the dependent variables are included with the present independent ones in single equation estimation. Note that this inclusion would not be used for Y3, which could be estimated from Its alone. Ag;all these writers have indicated, this analysis using a system of equations cnmlot be said to be complete until we can determine some means of attaching confidenoe intervals to the estimates of the structural regression coefficients. Tintner has developed a suggested plan of determining hO\7 many equations are needed in the system ana also gives some approximate confidence intervals for the regression coefficients [i3] • • However he attacks the problem by assuming the error8 occur in the measurement of the variables and not in the equations themselves. The Coules Commission Group has negleoted errors of measurement and has considered only residual errors in the equations. ,.3, Sl1mmaa. I have attempted to present some results of an analysis of food prices using a system of equations, rather than a simple multiple regression equation. This analysis was used because the estimates of the parameters in a single structural equation will be biased if there is actua.lly a relationship bet.. men this equation and other unspecified equations. Also it is impossible to - . identify a single equation as either a demund, supply or .incone equntion unless all of these equations are included in the system. And the statistioian must be very careful uhen constructing n system of equations that he oan identify each -21equation Hhen the estinntion has been conpleted. The main dravrback in the use of this model of a system of equations is that no tests of significance ere available to detormine either the adequacy of the nodel or the reliability of tho estimatos of the individual parameters. gD..p Ho~ever, I anticipate that this in the theory uill soon be filled. REFERENCES CTIED ,- -\ Llj R. L. Anderson. Use of variance components in the analysis of hog prices in tuo ITh.1.rkets. Jour. Amer. Stat, Assn. 42:612-634 (1947). '~21 L M. S. Bc.rtlott. The use of transformations. [31 1'. r4 -j J s. ~om~. _~_. Dnycr. The solution or simultaneous equations. (1941). 3:39-52 (1947). Psychonetrika 6:101-129 R. A. Fisher. St~tistical Methods for Research Workers. 8th Edition. G. E. Stechert and Co., Novr Y6rk. Section 29 (1941). ~~·A. Girshick and T. E~avolmo. Statistical analysis of the demand for food. ~ononotrica 15:79-110 (1947). R. E. L. Groene. The do. rv farm - its or anozation and co t. Bull. No. 345 (1944 • N. C. Agr. Exp. T. Haavolno. The statistical implications of a system of simultaneous equations. Esonomotrica 11:1-12 (1943). T. Haavo1no e Tho probability approach to econometrics. Supplonent 1-118 (1944) . Econometrica 12, E. O. Hondy. Production functions fron c. random sample of farms. Ecnn. 28:989-1004 (1946). Jour. Farm T. Kocpnans. Statistical estimation of simultaneous economic relations. Jour. An. StrJ.t. Assn. 40 :448-466 (1945). J. Mo.fschak and \.7. H. Andrens. of production. Random simultaneous equations and the theory Econopetrico. 12:143-205 (1944). Snodecor. Statistical Methods. 4th ode (1946). Iowa State College Press, Ames, In. G. Tintnor. tfultiple r.ogression for systems of equations. 5-36 (1946) .. Econometrica 14: -22OTHER REFERENCES T. W. Andersonnnd H. Rubin. Estimation of tho Pnranetors of a Single Stochastic Difference Equation in a Complete System. Cowles Commission Pnper presented at meetings of Institute of ~bth. stat. (1946). G. Cooper. The Role of Econometric Models in Ecomonic Research. 30:101-116 (1948), Jour' Farm Econ. T. HnQvelno. Methods of Measuring the Marginal Propensity to Consune. ~at, 1I.ssn. 42:105-122 (1947). Jour.!.m. _ _~~_" Quantitative Research in Agricultural Economics: The Interdepenieme botYleen ll.griculture and the National Economy. Jour, Farm. Econ. 29:910-924. (1947) • &~ld. ~ DZcomposition Copenhagen 1948). H. Hotelling. ~ ~ Series Problems in Prediction. ~ *me Observations. G. E. C. Gads For1ng, Jour. Sociology 48:61-76 (1942) L. Hurvn.cz, Stochastic Models of Economic Fluctuations, Econometrica 12:114-124 (1944). _____.......__.'. Theory of the Firr.1 and Investment, Econometrica 14:109-136 (1946), ~ G, Kendall. Cont~~butions to the Study of Oscillatory Tine-Series. Cambridge Univ, Press, London. 1946, ____________, The Estimction of Parameters in Autoregressive Time-Series. Paper presonted at the IDterlli'1tional Statisti£!ll Conferences, Wash. D. C., 1947 L. R. Klein. Pitfalls in the Statistical Determination of the Investment Schedule. (1943). Ecen~tric!:. 11 :246-258 • The Use of Econometric Econometrica 15:111-151 (1947). ~ P~dels as a Guide to Economic Policy. T. Koepr.1ans. Linear Regression A~'11ysis of Economio Time Series, ~., No. 20. Haarlem (1937). Moth, Econ. '. Statistical Methods 2! Measuring Economic Relation§hips. Com~ssion for Research in Economics, Chicago (1947) ____~ Cowles H. B. tbnn and A. Wa1d. On tho Statistical Treatment of Linear Stochastic Differenco Equations. §Sonometric~ 11:17,3-220 (1943). f ~ . J. ~brschQk. Economic Intordependence and Statistical lmalysis. Studies in i~then~ticnl Econonics and Econopetrics, in Memory of Henry Schult~. 135-150, (19L~2), . , ._' StQtistica1 Inference from Non-experinontn1 Datu. Paper presented at Intermtional StatisticaJ.Q..onferences, Wash. D. C. (1947). O. Reiers'ol, Confluence Analysis by Means of Lag Moments and Other Methods of Confluence Analysis. Econometrica ._'.-...-... 9: 1-24 (1941). . INSTJTllJTE OF STATISTICS -2.3- o. 1945). _____• Residual Variables in Regression and Confluenoe Analysis. §k.~11.dinavisk ~arietidskr1tt (1945) Uppsa1a. J. R. N. Stone. Prediction from Autoregressive Schemes and Linear Stoohastio . pifference Systems. Paper presented at International Statistical Conferences, Jashington, D. C. (1947). G. Tintner. An Analysis of Economic Time Series. (1940). Jsmr, Am. smt. 4s§n. 35:9.3-100. _ _ _"_. An Applioation of .the Var.iate Differenoe Method to Multiple Regression. Eoonometrig~ 12:97-113 (1941+). ------------. Some Applications of Multivariate Analysis to Economio Data. Am. Stat, Assn, 41:472-500 (1946). H. Wold. AStudy 1n the Analysis of Stationary Time-Series. Uppsa1a (1938). ~. Almquist and Wikse11s, _______.Estination of Economic Relationships. Paper presented at International Sta~istical Conferences, Washington, D. C. (1947).