Download Anderson, R.L.Use of regression techniques for economic data."

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Instrumental variables estimation wikipedia , lookup

Regression toward the mean wikipedia , lookup

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
~--- -----~--
INSTIT~TE
RLA:2:4S
OF S1ATlSTIC~
THE USE OF .REGRESSION TECHNIQUES WITH ECONOMIC· DATA
•
by
R. L. Anderson
Institute of Statistics, North Carolina state Gollege, Raleigh
h
The MethQ.d Qi
Jr.J"
The
~
Mathem2j~.:i.Q.al.
~
Squares
Model.
! Single MYltiple Regression
~atign.
In the traditional regression analysis we assume
that we have a single dependent variate Y and several independent ft#Jtq variates,
Xi' say J;:in number., These ve.riates are connected by the following mathematical
model
logarithmic values in order to meet the requirement that the variance is constant
from observation to observation.
".
The
st~tisticianls
and an estimate of
(J'-'2
problem is to determine the values of the estimates bi
from the sample of 11. The method of least squares has
~
certain desirable properties and produces the same estimates as the method of
muximum likeliho?d for tho type of distribution assumed for (1). After determining
•
~he
bi' it is also desirable to have estimates of their variances and an over-all
tGst of tho Gdoquacy of the regression model.
tho sum of
sq~~res
of ,the sample residuals,
r
.
SSE =.80 2 = SCy - ~~ b.X )2
i = 1 J. i ,
Tho method of least squares minimizes
The quantity to be minimized 1s
-2-0
where S indicates s1.1.JIlInation over the sample of
We have not used a second
~
subscript to indicate the sample number, because it complicates the equations;
instead it is understood that
whe~ever
S is used, we are summing over the sample.
=0
This is called th€ leading equation for 'bj,. Therefore we have the following ,13et
of
~
equations in the r unknowns, bi_
blSXf + b2SXl X2 + • • • + brSXIXr
= SXIY
bl S:lS.X2 + b2SX~ + • • • + br SX2Xr
= SX2Y
•
•
e
If these equations are solved by the matrix-inversion method, which utilizes
either Fisher's £-values
[41
or Snedecor's k-va1ues [12) , we have at once
the necessary tools to determine an estimate of
QS
(J'2 and the variance of each bi'
well as a test of the adequacy of the entire model.
For a short-cut matrix-
inversion technique, one which we are using at North'Carolina State College, I
refer you to the abbreviated Doolittle method described by p. S. Dw,yer
[3J • I
believe that the use of matrix-inversion m3thods in regression analysis needs a
little promoting.
Once a computer has mastered the.matrix-inversion techniques,
she can handle regression problems in very little more time than rdth the
elimination method which has been generally used in the past.
In addition the
matrix-inversion method furnishes the requisite data with which to compute the
vo.riances of each bi,
slllur~l iJRg
1t !
I'
au
t
1 iiS
,,'I' flUal
~biiUli=:it~:R:!Z8:::s=:I'RiJ!II22l11+1zl!'1PlYc!!'!l2~sl'ii!'eet!l11~R!MI.i~t~kwi'Mia~,-1W;Ail-l&iilil1'l-JII:-!1~s-eglPQ;:L:lI;:;.~.=t::'l=_Q_.
SA [til; 8ld.;.
JilQ:221OI
then
Profossor Pooch of our
staff has written up an excellent description of the abbreviated Doolittle method
~
with an example. Copies can be obtained by writing to the Institute of Statistics,
North Curolina State College.
'it
After we have computed the bi, the total reduction in SY2 attributable
r.
to regression is SSR
= 2:
i
=1
bi(SXiY).
Hence the residual (or error) sum of
squares is
:L bi(SXiY) = Sy2 -
SSE = Sy2 -
(5)
Ssa.
SSE, under the assumptions made with (1), is distributed as
degrees of freedom..
Hence an unbiased estimate of
If we set up the null hypothesis that
X. 2 a-2 ,
distributed as
but with
{(3 i
= o~ ,
X 2 q-:2
with n - r
.)'2 is s2 = SSE/en - r).
then the reduction
SS~ is
also
~ degrees of freedom. Hence we can test this
ntul hypothesis by means of
e
(6)
F=SSR.n-r
SSE
r'
with
~
and n - r degrees of freedom.
In most statistical investigations one of the regression coefficients, say
b]"
estimates?, the mean of Y, which is not assumed to be 0 1I1hen testing the
adequacy of the model (1).
formin~
(7)
This is accomplished by setting Xl
all the other Xi to deviations from their means.
Y=m+
. where xi
i
t=
= (Xi
2
trans-
Hence the model becomes
b:xi + e
it
JC;.)
-
=1, am
and m is the estimate
Of,r.
The least squares equations
are the same as before except the first equation becomes simply
(8)
The other
=y
..§I
·n
m=
~
•
equations use xi instead of Xi with
~
not in the equations.
For
example the b 2 equation is
(9)
b~Sx~
+ b3Sx2x 3 + • • • + br Sx2x r = SX2Y •
As everyone kno1:/s, we compute SXiXj
it {~i = ~
= SItXj
the expected value of Y is
r
SSRI
= 2::
i
=2
bi(SxtY),
Under the null hypothesis,
Jf' , not assumed to be zero.
in sum of squares due to regression is
(10)
SX
_ S1 .1 •
T-he reduction
INSTIT&JTE OF STATISTICS
-4is distributed as f.. 2 (j"'2 with !:..=-1 degrees of freedom \7hen (~i = ~
l
~hich
i
= 2,
3,. • ., r. The error sum of squares is
SSE
(11)
= Sy2
_ ny2 _ SSR',
1. 2:jJ'2,
';ihich is distributed as
as before, with
!l...::-};
degrees of freedom.
If the ma.tri.x .inversion mnthod of determ::r.ing tbe bi is used, tho est,jm'1.ted
variance of each l.>~ Is simply CliS2, where ci:i. i~; the 0.:5.o.5'onI11 elemcn+, 'en tl18
inverted matrix"
estime.te of
(;.,'
:-'.!
~
....
Th'.':;
01'
can be used to a',,-c.a;'Jh cnu'.'idence
""81' i.~.nco
l::m:'~;f)
to tIle
to make a test of significance by means of
.
li
f
hi'"
i
= ---•._ .
-
;
s '\ eii
with n - r degreos of fro8dom.
Finally it should be emphasized thnt the analysis of variance is a special
type of a linear regression, in which the Xi are definitely fixed numbers.
I have
discussed the use of the analysis of variance with economic data emphasizing
I
.,
the assumptions needed,in'an article on the analysis of hog prices ..., 1 -;.
1.2. An Examo10 of a Dairy Cost ·Study: For reasons which will be explained in
the next section, much economic data do not meet the requirements for the multiple
regression analysis.
However certain types of data do seem to meet these
requirements, such as data acquired from a set of farms, business
rla~es,
etc.
at a given period of time in which certain variables can be cons::ldored. to be more
or loss fixod and can be used to estimate some dependent variable"
One such study
was a dairy cost study of 89 dairy farms in central North Carolina in 1941 l.- 6 J1 •
Data were available for each farm on tho amounts and the costs of concentrates,
of silage, and of roughage per cow, the cost of pasture per cow, the amount of
milk sold per
COY/,
the nUlnber of cows, the net incone per cow, the labor in00me
I
p9r
CO~[.
~ow,
the hours of 1nbor per cow, and the total digestible nutrients ('1'D:;) p0.r
It vns docidod to usc milk sold perco"j/ as the dependent variable s:".nnC)
1~~:-'1)1
income was so variable because of varying allowances for the hours of 1ahorusode
A preliminary analysis using labor incone as the dependent variable produced no
significant regression coefficients.
-5I shall demonstrate the multiple regression method with the amount of Dilk
sold per
C0\7
as the dependent variate (Y) and tle amount of concentrates per cow
(~, the a.nount of silage per cO\v (XJ), the amount of roughage per
and pasturG cost per
Ca\"l ( "
as the independent variates.
COil
Cr;4,·
The amounts have been
quoted in hundreds of pounds perc0'/1 and the pasture cost in dollars per con.
Tho regression model is: ~
(13)
Y = m + L.. bixi + e
i = 2
with Xi = Xi ....
Xi.
The computations are presented in Table 1, using the matrix....
inversion method.
Table 1: COf.q:mtations for the Regression of the Anount of Milk Sold on the Amounts
of FOGd Usod per Coo from 89 Dairy Herds.
!(milk sdld)
57.3994
Rums of Squares and Cross-products
x3
x4
Xs
-6616.17 -93.7732 ...484.289
96710.77 3244.25
1358.95
19230.53 -1251.95
1254.57
10....4..
·x··
b2
b3
IO:~rr;5Z~ S(-m~~0556220
2.236903
0.1413543 0.1152455
0.0356220 -0.0249484
0.7459220 -0.0951647
U
s2
3679.74
3905.56
999.432
702.815
g
0.7459220
-0.095164.7
0.6032603
8.963883
Regression Coefficients ~Stangard Errors
= 0.9343 :
0.1348
= 0.0818 : 0.0306
5
SSR' = ;>. biS(xiY)
i =2
SSE
-0.e249484
0.5636625
0.6032603
Y
= 11358.72 -
SSR'
= SSE/84 = 81.2338
b4 = 0.1021 : 0.0677
b5 0.9276 ~ 0.2698
=
= 4535.08
= 6823.64
INSTITrtlTE OF STATISTICS
-6_
Some of the computc.tions may require an expla.nation.
In order to illustrate
how to compute. the values of the bi, consider b2
b2
= 10-4 [(2.236903)(3 679.74)
+ (0.14l3543){3905.56) + • • • +
{O.7459220){702.8l5~
Then the estimated standard error of b 2
other bi, with a standard error of s
=s
fC22
= 0.1348
= .9343
and similarly for the
{'cii •
Of these regression coefficients, all but b4 were highly significant.
in 1941 the cost of concentrates vms $1.80 per
cwt.,
Since
of silage $0.27 per cwt.,
and of roughage $0.8, per cwt. and milk sold for $3.20 per cwt., we would conclude
that added concentrates and pasture were profitable, added silage about paid far
itself, and added roughage was unprofitable.
It should be stated that in a
regression analysis, a.ny regression coefficient indicates the average change to be
expected in the dependent variable for a unit change in the particular independent
tit
variable, with the other independent variables assumed to be held constant.
Far
example in the above analysis, b2 measures the added cwt. of milk expected from
an addition of one cwt. of concentrates to the ration, while leaving the amounts
of the other foeds unchanged.
I wish to warn against using such results for
increases far beyond the means, because we did not have a sufficient range of the
independent variahles to study the diminishing, returns to scale. Apparently
these
COilS
\'/erebeing fed too much roughage.
Even the average amount of roughage
for all hords, 3600 POWlds per carl, was said to be about all a cot! could handle
iThen she \/as on pasture as long as most cows in this region are.
rougl'pge \1ould be expected to be 'Wasted.
Hence added
If the pasture costs refiect the true
cost of pasture, it appears that increased pasture is really the most profitable
. method of producing .milk.
pc. sture might be put in an even more favorable light
if the effects on soil improvement also 'I,tere considered.
4. 3•
cf
Producti.qn F3»Xi:Mgns t91:1~C?\;C. Fg.rIJ§.
733 IO\1o. fc.rns in 1939. n study mrs made
Tho regression nodel used
~ns:
Usil1g date. secured in n ru.ndor.1 s~c
or
tho production functions
[9]
•
Y = m + blXJ. + b2x 2 + b3x 3 + b4x4+ b5x5 + e J
(14)
where the xiare &viations from tho means of the Xi--
Y and Xi are given in
terms of logarithrs for the following variables:
Y = total product : cash s(l.les, home consumption and inventory increases ($)
Xl
value of real estate ($)
X2
nonths of labor (family and hired)
X3 = value of nnchinory and equipment ($)
X/+ =- vo.lue of livestock on hand and livestock expense, including feed ($)
X5
cash operating expenses ($)
=
=
=
E~ch regression coefficient in (14) is a measure of elasticity, indicating
the averag-e proportionnl change in total product which would result from a change
in the: corresponding independent varinte.
For example,
of product with respect to the value of real estate.
[6)
relationship for. the dairy study
b:J.
represents the elasticity
We did not use a
because we did not believe that there ,ms
enough of a. ranee in the varinblm studied to justify using it.
relationship is necessary for
e
dat~
logarithmic
The logarithmic
of this kind if the range in the variables is
very great in order that the assumptions of normo.lity and equal variances for the
residuaJs shm.lld hold.
Also the logarithmic relationship more adequately measures
the diminishing returns to scale.
The data were analyzed separately for each of 5 type of farming areas, for
each of 5 types of farms, for largo and small farns, and for all farms.
The over-all
regression coefficients were:
hI = 0.2316,
(l4n)
b2
= .0282,
b3
= .0844,
b4
= .4767,
with a multiple correlation coefficieat of 0.8882.
the .01
prob~.bi1ity lovol.
bS
= .0317,
All but b2
~ere
significant at
The returnsto labor nere not significant as
ilCS
found
in our da.iry study, prob:'J.bly because of the oxtrone variability in str:.ting the totnl
lc,bor used on o:'.ch fc.rn.
Since the suo of the coefficients is less tha.n unity, ne
concludo that t.here \/ns a dininishing return to scalo, since a 1% incroos-e iuthe
input of nIl rosources vlOuld not be follo\"led by a 1% incronso in output.
Using
the estiuates of tho elasticities (Ua) and the geonetric neans of the variables,
estiI1c.tes of tho J:1a.rginc.1 productivities 'iere 1l1so secured, vrith confidence linits•
••4.
The Use of the
~nalysi2
Qf Va.ric.nce for the Dairy Cost StudY.
The analysis
ofvc.rinnce can also be used to study the relationship betueen the anount of
I
.,
-7milk sold per
Cm"1
fed to each cow.
and the average amount of silage, roughage, and concentrates
I
omitted pasture from this analysis because the problem
h~ve
is already quite complicated for an example.
We divided the silage feeding into
three classes - 0, Low, High; and the concentrates and roughage each into Low
and High amounts per cow.
The Low-High division was made so that about the same
ntunbor of herds would be included in each category.
The dividing point for silage
was 6000 pounds per cow, for concentrates 2900 pounds per cow, and for roughage
3500 pounds per cow. The average milk sold per
each of the 3 x 2 x 2
= 12
CO'17
and the number of herds for
classes arc presented in Table 2.
Table 2: Average C\Tts. of milk sold for different amounts of feed per cow.*
Silage
o
Concontrates
Low
Low 53.06(6)
Roughage
High 46.61(6)
High
Low
Low
High
Low
High
Ave.
High·"··
61.92(8) 53.0,(10) 60.1,(7) 55.42(10) 60.92(3) 56.84(44)
60.12(10) 64.07(')
59.59(9) 55.03(9)
62.87(8) 57.95(45.)
Concentrates
*The figures in parentheses are numbers of cows in each class.
We shall consider hore the follovdng mathematical model:
(15) Y = m + bUXU
+ ~2X12 + b20X20 + b21X21 + b22X
22
+ b'lX'l + b32X32 + e,
\7hore X11 is J. for the 'lotl level of concentrates and 0 elsewhere, 112 is J. for the
high level of concentrates and 0 elsewhere, X20 is
etc.
1 for no silage
and
0 elsewhere,
(15) assumes that the interactions are non-existent. This assumption cap
be tested by cOr.!puting the total sum of squares attributable to the 1Z classes
with JJ. degrees of freedor.! and subtracting the amount due to the main effects with
1 + 2+1
=~
:
¥,
degrees of freedom, leaving 1 interaction degrees of freedom.
The analysis of variance is somewhat complicated for the data given in Table
2 because of the disporportionate frequencies from one class to another.
Snecedor
[12J presents several techniques of ha.ndling this disporportionate problom for
JNSTIT~TE OF STATISTICS
-8-.
SOurcp £! variation
~egrees ~
Concentrates (C)
Silage (S)
Roughage (R)
Mean Sgmre
freedom
120.5**
16.1
1.4
211-.4
1
2
1
2
C x S
Cx R
SxR
Cx S x R
Within Classes
.5
1
2
2
22.0
19.0
19.76
77
The harmonic mean of tho frequencies uas 12/1.91508 = 6.26606.
Since the within
mean square was 123.8, the error mean square for this analysis was 123.8/6.26606 =
19.76.
Note that only C is significant. Silage does not become significant in
this analysis as it did in the regression equation (13). This method of unweighted
means, in general, is quite good if the class frequencies are not too unequal.
We mlght
m~ke
a cooplete least-squares solution·of the problem, whereby
we compute the values of the bi in (15) by the methods given in Table 1.
Since
the total milk sold summed over any of these classifications (concentrates, silage,
or roughage), is the same as the grand total, we actually have only 4 independent
bls in (15).
The simplest method of taking account of this difficulty is to let
each bi2 = 0 and consider the following regression
model~
(20)
'.'Thore the b l indicates that
conditions.
\10
have changed the variables to meet the dependency
'
Tho coefficients of the bi and m' and the right hand-side of the
.
equations (4) are
(21)
r
m'
~l
b20
b'
b
~
44
30
12
.JQ
9
29
13
0
44
44
tJ.
12
13
26
Tho vnriablo for which each
30
29
,44
f
tA
n
SIi!
5108.55
2370.01
1694.60
1679.66
~
17
2500.98
14
is the leading equation is underlined.
26
14
17
""
..
-9TIe next determine the estimates
•
ml = 62.4205;
1
b:i.J. =-7.53957;
bI, which are as follows:
,
b20
= -3.09490;
I
b21
= -1.34.355;
,
b.31= 0•.37S992
Tho reduction due to these j variates is
The reduction due to the
~
feed variates alone is
SSR - nY2
=1228.
Hence we can build up the following analysis of variance table to test the
interactions.
DegreeR of freedom
4
Source of varintiQn
Mnin effects
Concentrates (adj.)
Silage (adj.)
Roughage
1
2
1
Sum of §quares
1228
Mean Square
1159
1.39
3
Interactions
7
Within Classes
77
Total
88
**Highly significant (probability level of .01)
1159**
69.5
.3
600
9530
rms
As before, we note that there are no significant over-all interaction effects.
since the interaction moan square is less than the within classes (error) mean
square.
effects.
Next we would like to break down the main effects into the three separate
If any effect is to be tested, that regression coefficient (or
cients) is omitted
fro~
the reduction in total
the regression equation and matrix [(20) and
SQ~
coeffi~
(21)]
and
of squares due to the remaining effects is computed.
The difference between this reduction and SSR
= 294,456
is the sum of squares
attributable to the omitted effQct, adjusted for the other etfects.
consider the concentrates (adj.) sum of squares.
,
For example,
. When we omit hil from (20)
anti omit the second row and column fron {21), the values of the other regression
coefficients are:
The reduction in
S~~
of squares due to those variables is 293,297.
concentrate (adjusted) sum of squares is
Hence the
,
.
(26)
e'
=1,159.
294,456 - 293,297
This is the o'nly significant effect.
Note that the sum of the main effects sums
of squares is greater when they are considered separately than uhen
togeth~r.
The rncthematical statistician recognizes this is an evidence of a certain correIation betueen the effects because of the disproportionate frequencies.
However the
frequcmctes in this case were not variant enough to produce much of a discrepancy
between the two sums of squares.
J.d,. Smmngrv, I have presented an example of the use at tho multiple regression
and the analysis of variance models in determining the best use of feed tor
producing milk. The analysis of variance model is not quite so sensitive to
differences in trds case, because we did lose some information in grouping the
independent variables as ue did in the analysis of variance model. The analysis at
have boen an exact representation of the experiment, a.nd when an experiment is so
planned, 't'1e know that the assumption of fixed Xi is met exactly.
In my article on the analysis of hog prices [1
J, I had data on the ratios
between the prices received at Cincinnati and Louisville for each of two weight
. classes on each of the five complete m!lrket days of the week a.veraged for each
month over five years.
problem was avoided.
Since we had complete data, the disproportionate frequencies
In this case the simple analysis of variance techniques for
multiple classifications could be used.
These examples have been presented to show that there arc certain types of
economic data which meet the requirements of the multiple regression model (1).
These requirements are: (1)
Fixed independent variates from sample to sample, (ii)
;
......... ,
INSTIT&JTE OF STATISTICS
-11..
normal nnd indepondent residuals with the sarno variance, and (iii) the indepondent
effects are additive.
The oconomist finds that (i) is often not met unless he can
plan the experiment as suggested abovo for the dairy study or he can regard tho Xi
as rathor typical of the situation which will continue to prevail.
Transformations
of the type exple.ined by M. S. Bartlett [2 ] can usually be used to handle nonnormality,
unequ~l
variances, and lack of additivity.
For example multiplicity
of the effects is reduced to additivity by the use of logarithms.
The problem of
lack of independence of the residuals and correlation between residuals and the
main effects has not as yet been satisfactorily solved.
serial correlation
~~y
be made here.
in th~ hog price article
r1J•
Some use of the tests for
These problems have been discussed in detail
An Econometric Model, Using A System of Simultaneous Equations.
~e
,.1
Defects of the Multiple Regression Method.
Much economic data do not meet
the requirements for using the multiple regression model (1).
There are four main
difficulties.
(i)
fv"any of the X-variates cannot be considered any more constant than the
Y-variate. As P,rofessor Viarschak has stated
[11)
.•
The economist has no independent variables at his disposal because
he has to take the values of all variables as they corne, produced by a
mechanism outside his control. This mechanism is eA~ressed by a system
of simultaneous equations, as many of them as are variables. The experimenter can isolate one such equation, substituting his own action for all
the other equations. The economist cannot. (page 143).
For example, the economist may be given a series of average prices and
quantitites
so~
at these prices.
But these series cannot be summarized in a
single equation which reflebts how prices and quantities (and other economic
variables) interacted to produce the final results.
We might draw up a single
regression equation expressing price as a function of quantity, time and any
other pertinent variables.. However this prediction equation probably would tell
us nothing of the price which would have been obtained at a given point of time
if some of the 'other variables had taken on different values. This prediction
-12equation would be merely a historical record of the price-quantity relationship
through time.
T. Haavelrno was one of the first econometricians to recognize
that if a systom of equations is needed to describe a given economic situation.
(3 i
tho ostimates of the
for any single equation will be biased unless the entire
system is considered simultaneously [7]
•
In order to estimate the parameters
in a system of equations, we must use the general method of maximum likelihood.
The mathematical complexities in many cases are almost insurnountable.
methods of estimation have been developed but
DQ
adequate tests of significanoe.
(ii) Anothor major probleu in economic regression modelS is the
of each regression equ<'1.tion.
a supply equation or
0.
To date,
"idcntification~
For example, we may want to estimate a demand and
production function and a profit maximizing equation.
It
is necessary, if the system of equations is to be used as a guide for economic
policy, that·the analyst be able to "identify" each equation after the estimation
e
has been completed.
(iii)
When the data used in estimating the parameters in an econometric system
are derived from a time series,
~e
run into another
def~ct
model - the residuals are not independently distributed.
in the usual regression
Techniques of handling
theso difficulties are being developod, but these techniques merely add to the
already overburdoned complexities of the estimation procedures.
As noted
b~foreJ
aberrations fron the other assumptions regarding the distribution of the residuals normality and tho sllme variance - can uSU<'llly be taken care of by the use of
transformations.
(iv) As noted before, it is also necessary that the effects be additive.
Tho problem of setting up systems of economic equations has boen under
considoraM.on by the Cowles Coooission for Research in Economics at tho University
of Chicago for almost a decade.
~hrschak
The research has beon directed by Professor J.
TIith tho ablo assistance of such men as T. W. Anderson, W. H. Andrews,
•
-13M. A. Girshick, T. Haave1no, L. Hur\7icz, L. R. Klein, T. Koopnans, H, B. Mlnn,
H. Rubin, G. Tintner, and A. Wald. O. Reiereol, M. G, Kendall, H. Wold and
J. R. N. Stone, to nention a few, have also nade important contributions to the
problems involved,
A list of sone of the pertinent publications is given at the
end of this a.rticle.
An exhaustive and r:i. gorous troatnent of the theoretical
rosults so far obtained will be published soon in g,to.tistical Inference
Dvnn.r.U.9 EQ.,ononic Models, Cowles COI!IIJission, Monograph No, 10.
in
One of the best
articles which presents the basic concepts of this new approach to the usc of
regression equations in econonics is The Prgbabilit! Approach
T. Ho.avolmo [g
~ ..2,
~
Economgtrics by
J•
An E:@gple of the Simultanoous Equation Model, As an example of the
techniques used by those econometricians, I shall use the results presented by
e
Girshick and Haave1no in Statistical Analysis 2.t ~ Denand for fQod
[5].
It
is shmm that the denand equation cannot be estimated unless the supply equation
is either knoviO or estinated simultaneously with the demand equation. A system "
of five equ2tions was set up,l
Y1(t)
= 0.10
= agO
Y4(t)
= a40
Y5(t)
= 0.50 + a52Y2(t)
Yl(t)
(27)
+
a12Yg(t) + o.l3Y3(t) + ~gXg(t) + ~9X9(t) + ~(t)
a22Y2(t) + 0.24Y4(t) + b2SXg (t) +
+ e2(t)
Y3(t) = 0.30 +
+ b37X7 (t) + b39X9(t) + e3(t)
In (27) Yl
Y2
Y3
-,----
+
+
D.
Y
45 5(t) +
+
+
b46X6(t) + b4gXg(t) + e4(t)
+
0SgXg(t) +
+ o~(t)
= consunption
of food por capita.
":.
= retail price of food products divided by the cost of living.
= disposable inconepo~ ca.pita..·d!v1d~~by' the cos~ 9fliving.
~4 = production of ngriculturnl food products per capita.
Y5 = prices received by farners for food products, divided by the
cost of living,
X6 = Y5 for the previous yoar.
1.07 = net investr.lents per capita divided by the cost of living.
Is = tine (1922 = 1 through 1941 = 20).
X9 = Y3 for the previous year.
InC" ngto that aio are not moans but y-intercepts. For example, 0.30
- b39X9.
= !)
Hence TIe actually use the model corresponding to (7) and (9).
- b370
INSTITIaJTE OF STATISTICS
-15"'"
Incidentally, one important sinplitication 't!ould have been to regard Y4(t) as a
_
fixed variable so far as the rotail J:'I.a.rkot was concerned.
This vlOuld have reduced
tho nunbor of required equations to four.
The nunbor of paraneters to be estimted in (27) a.re 19 ~egression coefficients
plus tho vnriances and covario.nces of the residuo.ls. As explained before, the data.
nero yearly indexes for e,.,ch of the twenty years, 1922-1941.
It is impossible at
~·i.
this time to estinate how nany degrees of freedom are represented in 5 series of
jointly dependent variables (Y) and 4 series of predetermined variables (X), nor
can any statement be mde as to the number of degrees of freedom used in estimating
the pnraoeters.
As I stated a.bove, no tests of significance have been developed
to detornine the over-all adequacy of the model a.nd the usefulness of various terms
in the individual equations.
The complicating feature of a possible serial corre-
lation in the series of indexes has also beon neglected in the analysis presented
in this article.
I do not intond to outline the computing procedures used in estimating the
pnranotors.
These ha.ve boen adequately prosented in the article itself. and are
,
too highlynatherontica,l to be condensed into a short summary.
th~t
I might mention
equation 3 in (27) can be handled by the nethod of least squares, since it
conto.ins but one dependent va.riable, the other variables being fixed.
The problem
of identifiability can be illustrated by a, compa.rison of the first two equations
in (27).
If
VIC
lot H be the total nunbcr of yt s in a given equation a.nd K the.
nunbcr of Xts not used in this equation, then an equation is said to be completelY
;L,gont;J.fiod if K :: H - 1.
It is said to be o}'er-identit:j.ed it K:> H - 1. And if
K < H - 1, sone restrictions nust be inposed on the vcria.nce-covnriance matrix
of the residuals.
Honce for identification purposes, K
~
H - 1.
emphasized that these are nathenntical linitutions on tho nodel.
~
in the first equation K = H - 1
= 2 and
in the second equation K
It should be
We see that
=3 and
H- 1
Hence both equations nre identified, but the second is over-identified. The
= 2,
difference between conplete and over-identification is nerely a difference in
conputotional procedures
used in estimating the regression coefficients.
'"'
.
a practical
st~ndpoint,
From
identification involves the necessary features in an
equation so that it is distinguishable froE all other equations in the systen.
Cooparing the first two equations, we note that the demand
e~uation
has two
income variables (ono fixed) and no production variable, while the supply equntion
has
0
production variable and no income variables.
The estimates of the regression coefficients found in this stuQy were as
follows,
(28)
0.10 =
a20 =
97.677
13.319
a30
40.731
040
0.50
=
=
a12
a22
a52
==
=
0.246
0.157
2.883
0.247
a13 =
a.24 = 0.653
a45= 0.556
81.250
= -200.068
b:J.s =- 0.104
b28 = 0.399
b4S
b58
bI9 = 0.051
b39= 0.367
b37 = 0.203
b46
The a i 2 regression
coeffic~ents
=- 0.190
= 0.656
=-
0.300
measure the influence of the retail price
of food, respectively, on consumer demand, on retail supply, and on commercial
demand. We note that the values of ai2 correspond to the economic theory tho.t
consUtler dena.nd should decree.se as the price increases and that the other two
should increase. Sioilnrly we note that the dernnnd increases with an increase
in the disposable income (a13), that the supply of retail goods increases
wit~
an
increc.se in theagriculturnl proq.uction (a24), and that agricultural production
increases
~lith.an
increase in the current price level of fam products (a45).
Also we note the depressing influence of the previous year's farm prices on the
.
.
current faro production (b46), which I cannot explain. Also t1?e uneven influence
of the time variable (XS) noeds some explanations.
~
Incone for the previous yea.r
appears to have bed little influence on the consuoer denand in the cUrrent year
(b:J.9) but na.turnl1y
\7[tS
closely relnted to the current yaa.rls incone (b39). And
finally we note the apparent sizea.ble increa.se in incone when net investnent was
-17incrensed (bJ7).
It is unfortunate that it is not possible to place confidence
limits on these estimntes.
Ifue were merely interested in forming predictive equations for the
dopondent variables(Y) in terns of some or all of the predetermined variables (X),
Vie could h:we used the suns of squares and cross-products given in Table 3 and
. osti.r.:c.ted, by the single equntion method, the regression coefficien1:l3 in these
eq'l1.Q.tions:
(29)
Ii = 0.1
+ b~6 X
6 + b~07 +
blsXa + bl9~
(1
= 1,
2, J, 4, 5),
.....
where Yi is the predicted value of Yi, neglecting the residual. These equations
are ca.lled !treduced" equD.tions and can also be determined by solving the equations
in (27), omitting the ei, simultaneously for the Its in terms of the X's.
We
note that equations 3. in (27) and in (29) are the same,because I3 was estimated
only from predetermined variables.
Equations (29) in general do not represent
fundamenta.l economic relationships, such as demand or supply functions.
Koopmans states
As
(10) •
We consider the problem of estimation of parameters like elasticities of
demand or supply,marginal propensities to consume, etc., not the problem
of prediction of the value of economic va.riables like production, prices,
etc. It is true that the use to which we put estimates of the parameters
of relationsbetw2en economic variables is again one of prediction. But
often this is a different type of prediction, in which the effects of certain
presumed acts of policy like price regulation, or instituting compensatory
public TIorks, or influencing savings habits, etc., are to be predicted. In
such cases we. are dealing ..lith a type of pr~n in \1hich one, or more of
tho relations found to govern the past are
, and which is therefore
not a straight forecast assuming continuation of all past relationships.
(Page 451).
In this article, uhich presents an excellent condensation of the simultaneous
equation method, Koopnans points out that under certain restrictive circumstances
it is possible to use the single equation estimates,
of the structural coefficients in (27).
bI,
in (29) to determine all
However, these conditions are seldom met
in practice, so that it is necessary to use the nore complicated procedures which
have bson developed by the Cowles Commission Research 6roup.
Koopmans also points
out that the use of time-lags often allows the statistician to use single equation
estiTh~tion with a system of equations
[10] •
JNSTITfWTE OF STATISTICS
-18-
e
In order to demonstrate how uell the four
predeter~ned variates
be used c:s predictors of each of the dependent variates (Y),
Vie
(X) could
present in Table 4
the regression coefficients for tho reduced equations (29) with their standard
errors, based on single-equation estimation. Also included is a measure of the
adequacy of prediction in oach case, the squared multiple
corre1a~ion
coefficient
2
R I
R2
(30)
= S8R/8y2,
where
sy2
= 8y2 -
n,Y:;' •
Table 4: Reduced for~l Regression Coefficients and the Standard Errors for the
_
Study of the Demand for Food. 1
Dependent
Independent Variate
Variate
Constant
aI
X6
X7
Xs
Xq
R2
Yl
87.932
Y2
80.560
Y3
40.731
Y4
97.923
-0.059
(0.052)
0.241
(0.134)
0.040**
(0.011)
0.041
(0.028)
-0.128
(0.112)
0.649*
(0.277)
0.203**
(0.030)
0.062*
(0.024)
0.161*
(0.058)
-0.041
(0.085)
-0.253
(0.219)
0.154*
(0,070)
-0.052
(0.179)
0.367**
(0.038)
0.180
(0.150)
-0.287
(0.370)
-0.487*
(0.184)
Ys
45.072
-0.078
(0.452)
*Significant at .05 probability level; **at the .01 level.
0.7595**
0.5865**
0.8833**
0.5651*
0.6549**
lThe regression coefficients which are given in the first line for each dependent
variable rofor to the reduced. equations (29) in the text. The values in the second
line (in parentheses) are the standard errors, computed on the basis of singleequation estimation.
From Table 4 we can drau some inferences about the reliability of the estimates
of the reduced form regression coefficients.
(i)
The regression of Y2 on the X's is rather peculiar in that no single regression
coeffient is significantly greather than 1ts standard error, yet the over-all
reduction is highly significant.
(ii)
We note tlli~t in all casos the dependent variables declined through tine
(XS);
hovlcver, the decreases for Yl (food consumption por capita) and Y5 (prices received
e
by farnors) \70re smaller th.'ln their standa.rd errors.
The only real reduction uas
in Y4 (agricultural production) probably because of the AAA progran after 1932.
-19Renecbor that this ShO,IS a reduction in
Ys,
after adjusting for the effects of
the other indopondent variates.
(iii)
Prices received by farmers during the previous year (X6) exerted a
stiJ:1U1nting effect on the two dependent price variables (Y2 and YS), especially
Y5
,; .
f
~hich
should be highly related to its previous yearls data.
The
un~xplaina.b1e
noga.tivo influence of X6 on the current agricultural production (I4) is not
significant but still disturbing.
(iv)
Net invostment (X7) stimulated all of the dependent variables, and usually
by a. large amount.
(v)
The influence of the previous yearts income (~) was definitely positive on
consumption (II) and of course on tho current year's income (I3).
The unexplained
negative effects on prices were smaller than their standard errors.
If we could solve directly for the structural regression coefficients (27)
in terms of the reduced form ones (29), then
~e ~ght
attach tentative standard
errors to the structural coefficients using the standard errors given in Table 4.
It should be reaffirmed that
\113
cannot use equations (29) for policy formulations
because they do not represent real economic relationships.
For example, the Il
and Y2 equations (29) are nixod-up supply and demand equations.
II cannot be
called a denand equa.tion, because it does not consider the influence of current
pricos on the consunption; sinilarly Y2 does not consider the influence of
consurJption or production on retail prices.
In other words, we must consider some
of the dependent variables along with the independent variables if a real econonic
relationship is to be estinated.
Conp8.ring the va.rious R2, we note that the varia.ble requiring only independent
va.ria.bles for estina.tion (Y3) he-s by far the Inrgest R2.
This rather sustn.ins a
statement nade at tho close of the Girshick-Hnnvelno article:
-20~
1'1e should like to point out, however, that when one is searching for
economic relations one cannot in general expect to find as high
correlations as those obtained from a mechanical application of the method
of multiple correlation to the va~iables in the structural equations. For
the correlations obtained by the latter procedure are due not only to the
acceptance of the same predeterr.lined variables in the equation of t he reduced
form, but also to the intercorrelations between the residuals in these
equations. The method of multiple correlations would produce high correlation
at the expense of a bias in the estimates of the structural coefficients
involved. (Page 110).
structt~al
Girshick and Hao.velmo have used the term "multiple correlation" in the same sense
that I have used "multiple regression" and probably refer to the higher
correlations which nould be obto.ined \7hen some of the dependent variables are
included with the present independent ones in single equation estimation.
Note
that this inclusion would not be used for Y3, which could be estimated from Its
alone.
Ag;all these writers have indicated, this analysis using a system of equations
cnmlot be said to be complete until we can determine some means of attaching
confidenoe intervals to the estimates of the structural regression coefficients.
Tintner has developed a suggested plan of determining hO\7 many equations are
needed in the system ana also gives some approximate confidence intervals for
the regression coefficients [i3]
•
•
However he attacks the problem by assuming
the error8 occur in the measurement of the variables and not in the equations
themselves. The Coules Commission Group has negleoted errors of measurement and
has considered only residual errors in the equations.
,.3,
Sl1mmaa.
I have attempted to present some results of an analysis of food
prices using a system of equations, rather than a simple multiple regression
equation. This analysis was used because the estimates of the parameters in a
single structural equation will be biased if there is actua.lly a relationship
bet.. men this equation and other unspecified equations.
Also it is impossible to
- . identify a single equation as either a demund, supply or .incone equntion unless
all of these equations are included in the system. And the statistioian must
be very careful uhen constructing n system of equations that he oan identify each
-21equation Hhen the estinntion has been conpleted.
The main dravrback in the use
of this model of a system of equations is that no tests of significance ere
available to detormine either the adequacy of the nodel or the reliability of
tho estimatos of the individual parameters.
gD..p
Ho~ever,
I anticipate that this
in the theory uill soon be filled.
REFERENCES CTIED
,-
-\
Llj R. L. Anderson. Use of variance components in the analysis of hog prices
in tuo ITh.1.rkets. Jour. Amer. Stat, Assn. 42:612-634 (1947).
'~21
L
M. S. Bc.rtlott. The use of transformations.
[31
1'.
r4
-j
J
s.
~om~. _~_.
Dnycr. The solution or simultaneous equations.
(1941).
3:39-52 (1947).
Psychonetrika 6:101-129
R. A. Fisher. St~tistical Methods for Research Workers. 8th Edition.
G. E. Stechert and Co., Novr Y6rk. Section 29 (1941).
~~·A.
Girshick and T. E~avolmo. Statistical analysis of the demand for food.
~ononotrica 15:79-110 (1947).
R. E. L. Groene. The do. rv farm - its or anozation and co t.
Bull. No. 345 (1944 •
N. C. Agr. Exp.
T. Haavolno. The statistical implications of a system of simultaneous equations.
Esonomotrica 11:1-12 (1943).
T. Haavo1no e Tho probability approach to econometrics.
Supplonent 1-118 (1944)
.
Econometrica 12,
E. O. Hondy. Production functions fron c. random sample of farms.
Ecnn. 28:989-1004 (1946).
Jour. Farm
T. Kocpnans. Statistical estimation of simultaneous economic relations.
Jour. An. StrJ.t. Assn. 40 :448-466 (1945).
J. Mo.fschak and \.7. H. Andrens.
of production.
Random simultaneous equations and the theory
Econopetrico. 12:143-205 (1944).
Snodecor. Statistical Methods.
4th ode (1946).
Iowa State College Press, Ames, In.
G. Tintnor. tfultiple r.ogression for systems of equations.
5-36 (1946) ..
Econometrica 14:
-22OTHER REFERENCES
T. W. Andersonnnd H. Rubin. Estimation of tho Pnranetors of a Single Stochastic
Difference Equation in a Complete System. Cowles Commission Pnper
presented at meetings of Institute of ~bth. stat. (1946).
G. Cooper. The Role of Econometric Models in Ecomonic Research.
30:101-116 (1948),
Jour' Farm Econ.
T. HnQvelno. Methods of Measuring the Marginal Propensity to Consune.
~at, 1I.ssn. 42:105-122 (1947).
Jour.!.m.
_ _~~_" Quantitative Research in Agricultural Economics: The Interdepenieme
botYleen ll.griculture and the National Economy. Jour, Farm. Econ. 29:910-924.
(1947) •
&~ld.
~
DZcomposition
Copenhagen 1948).
H. Hotelling.
~ ~
Series
Problems in Prediction.
~
*me
Observations.
G. E. C. Gads For1ng,
Jour. Sociology 48:61-76 (1942)
L. Hurvn.cz, Stochastic Models of Economic Fluctuations, Econometrica 12:114-124
(1944).
_____.......__.'. Theory of the Firr.1 and Investment, Econometrica 14:109-136 (1946),
~
G, Kendall. Cont~~butions to the Study of Oscillatory Tine-Series. Cambridge
Univ, Press, London. 1946,
____________, The Estimction of Parameters in Autoregressive Time-Series.
Paper presonted at the IDterlli'1tional Statisti£!ll Conferences, Wash. D. C., 1947
L. R. Klein.
Pitfalls in the Statistical Determination of the Investment Schedule.
(1943).
Ecen~tric!:. 11 :246-258
• The Use of Econometric
Econometrica 15:111-151 (1947).
~
P~dels
as a Guide to Economic Policy.
T. Koepr.1ans. Linear Regression A~'11ysis of Economio Time Series,
~., No. 20. Haarlem (1937).
Moth, Econ.
'. Statistical Methods 2! Measuring Economic Relation§hips.
Com~ssion for Research in Economics, Chicago (1947)
____~
Cowles
H. B. tbnn and A. Wa1d. On tho Statistical Treatment of Linear Stochastic
Differenco Equations. §Sonometric~ 11:17,3-220 (1943).
f
~
.
J. ~brschQk. Economic Intordependence and Statistical lmalysis. Studies in
i~then~ticnl Econonics and Econopetrics, in Memory of Henry Schult~.
135-150, (19L~2),
.
,
._' StQtistica1 Inference from Non-experinontn1 Datu. Paper presented
at Intermtional StatisticaJ.Q..onferences, Wash. D. C. (1947).
O. Reiers'ol, Confluence Analysis by Means of Lag Moments and Other Methods of
Confluence Analysis. Econometrica
._'.-...-... 9: 1-24 (1941).
.
INSTJTllJTE OF STATISTICS
-2.3-
o.
1945).
_____• Residual Variables in Regression and Confluenoe Analysis.
§k.~11.dinavisk ~arietidskr1tt (1945) Uppsa1a.
J. R. N. Stone. Prediction from Autoregressive Schemes and Linear Stoohastio
. pifference Systems. Paper presented at International Statistical Conferences,
Jashington, D. C. (1947).
G. Tintner. An Analysis of Economic Time Series.
(1940).
Jsmr, Am.
smt. 4s§n. 35:9.3-100.
_ _ _"_. An Applioation of .the Var.iate Differenoe Method to Multiple Regression.
Eoonometrig~ 12:97-113 (1941+).
------------. Some Applications of Multivariate Analysis to Economio Data.
Am. Stat, Assn, 41:472-500 (1946).
H. Wold. AStudy 1n the Analysis of Stationary Time-Series.
Uppsa1a (1938).
~.
Almquist and Wikse11s,
_______.Estination of Economic Relationships. Paper presented at International
Sta~istical Conferences, Washington, D. C. (1947).