Download Barizi.; (1973)An assessment and some aplications of the empirical Bayes approach to random regression models." Thesis.

Document related concepts

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
AN ASSESSMENT AND SOME APPLICATIONS
O~M EM.PIRICAL BAYES APPROACH
.,., .IANroM REGRESSION M>rELS
by
BARIZI
Institute of Statistics
Mimeograph Series No. 85P,
January 1973
vi
TABLE OF CONTENTS
Page
LIST OF TABLES •
. •• vii
1.
INTRODUCTION..
1
2.
REVIEW OF LITERATURE
4
3.
BAYES AND EMPIRICAL BAYES ESTIMATION IN RANDOM REGRESSION
MODELS • • • • • • • • . • . • • • . . • • •
3.1 Bayes Estimation In General • • . • . . . . .
3.2 Least Squares Estimation . . . . • • . . . • •
3.3 Bayes Estimation In Random Regression Models
10
12
3.5 Remarks on Empirical
43
13
23
3.4 Empirical Bayes Estimation • • . . • • • • •
4.
~aye~
Estimation . . . . .
COMPARISON OF EMPJ;RICAL BAYES TO OlIDINARY LEAST SQUARES
ESTIMATORS • • . • • • • • • • •
49
4.1 Objeotives Of The Study. • • . . . . . . . •
4.2 Generation of R8Jldom Variables c, ex, 13 and ~
4.3 Comp~tations Of The OLS And EB Estimates
4.4 Results . . . . , . . . . . . . . . .
0
5.
6.
10
•
50
52
53
58
•
APPLICATIONS TO SOME PROBLEMS IN ECONOMETRICS
68
5.1 lnvestment Analysis • • • • • • • . .
5.2 Elasticities of Substitution •••
68
SUMMARY, CONCLUSIONS AND RECOMMENDAT;I:ONS • • .
74
6. 1
SUIllIIlary. . • . • • • • . • . . .
6.2 Conclusions and Recommendations
7.
REFERENCES
8.
APPENDIX...
......
..
...
72
..
74
75
77
80
vii
LIST OF TABLES
Page
3·1
Relation of C.V. and b/a • . . •
4.1
Several
4.2
4.3
~
. . . . . . .
~
and
~
2
Estimates for the marginal bias of
3
.
52
••••
58
. . . . . ...
Ratio of the average squared error of EB to 016
estimators for cas\=s 1, 2 and 3 • • • • • ••
arld
4.4
of prior densities of c, a,
form~
studied. .
48
~2
~
for cases 1, 2
. . . . . . . . . . . . .
60
The +atio R for cases 1, 2 and 3 after applying several
~
correction
fa~tors
(C.F.) for
~2
~
• • . • • . •
62
~2
Estimates for the marginal bias of .80 ~ for cases 1,
2 8J1d 3
. . . . . . . . . . . . . . . . .
4.6
The average of EB estimates for ~2 (uncorrected) and
estimates of its marginal bias for case 1, where
p = 2 and N = 10 . • • • • • •
. . . .
64
The ratio of the average squared error of EB to OLS
estimators for cases 4, 5, 6, 7, 8 and 9 . . .
OLS and EB estimates for regression coefficients and
error variances for the ten corporations • • •
70
5·2
Zellner's two-stage L8 estimates for regression coefficients for the ten corporations • • • • • • • •
71
5·3
OLS and EB estimates for elasticities of substitution
in import demand between flue-cured tobacco from the
United states and tobacco from other countries . • •
73
Generated values of c, a, ~, ~2 and their OLS and EB
estimates, taken from one replication, where N = 10,
T = 15, for cases 1 through 9 . . . . . . . . . . .
81
5·1
8.1
INTRODUCTION
1.
Linear regression models have been widely used in econometric
analysis using time series and/or cross section data.
In many cases
reported, the regression coefficient vector as well as the variance
of disturbances is assumed to be fixed in successive observations.
However, in some applications, the constancy of the coefficient
vector may be questioned.
For example, suppose a particular co-
efficient represents the response of a plant to nitrqgen fertilizer.
It is well-known that this response is strongly influenced by
temperature,
~ainfall
and
~oil
characteristics.
If these can be held
constant, we might expect that the coefficient is constant.
If they
vary but can be observed, it is desirable to incorporate their
influences into the model, but if they are unobserved, we might be
better to assume the coefficient to be random across locations and
fixed within each location.
Similar considerations may also apply in econometric studies.
observed by Klein
As
(1953), it is unlikely that interindividual
differences observed in a cross section sample can be explained by a
simple regression with a few independent variables.
In deriving the
production function, the supply function for factors, and the demand
function for a prOduct, Nerlove
(1965) found it appropriate to treat
the elasticities of output with respect to inputs and of factor
supplies and product demand as random variables differing from firm to
firm.
Kuh
(1959) and Nerlove (1965) treated the intercept as random
and the slopes as fixed parameters in estimating a relationship from a
time series of cross sections.
Zellner
(1966) pointed out that there
2
would be no aggregation bias in the least squares estimates of coefficients in a macro-equation, if we assumed the coefficients of the
micro-equations to be random.
By assuming the coefficient vector to
be random across units, Swamy (1970) presented a consistent and
asymptotically efficient estimator for the mean and an unbiased
estimator for the variance-covariance matrix of the coefficient vector.
If we can assume the coefficient vector to be random across units,
but fixed within units as proposed by Swamy (1970) there is no reason
why we cannot assume that the
X matrix and/or the variance of
residuals are also random across units but fixed within units, since
we may consider these units as a random sample from a larger
population.
These are the regression models,
!.
~.,
by assuming that
the coefficient vector or the variance of residuals or the
X matrix
or some combinations of them are random across units but fixed within
units, that will be studied in this thesis and we may call them random
regression models.
The regression models described by Swamy (1970)
above are called random coefficient regression models.
As pointed out by Swamy (1970), there is a close connection
between random coefficient regression models and Bayesian inference,
since the latter is also based on the assumption that a parameter
in this case the coefficient vector -- is considered as a random
variable with a fixed known
~
priori distribution.
Any restriction or
prior knowledge about the possible values of a parameter can be incorporated quite readily via an
ing the analysis of the data.
~
priori distribution, before perform-
However, a common objection for
Bayesian inference is that the requirement of the form of the
~
priori
3
distribution to be assumed known, which many statisticians consider it
to be unrealistic, although Jeffreys
(1961) suggested to use a diffuse
prior in case there is no knowledge at all about its form.
Bayesian
analysis has been applied to several econometric problems, for
examples, Tiao and Zellner
(1965) on mUltiple regression models with
autocorrelated errors, Zellner and Park
(1965) on a class of
distributed lag models, Zellner and Geisel
(1970) on distributed lag
models with applications to quarterly consumption function estimation,
Chetty
(1971) on Solow's distributed lag models and Chetty (1968) on
pooling time series and cross section data.
In this thesis we shall present another approach, called
empirical Bayes (Robbins,
1955), in estimation of the coefficient
vector and the variance of disturbances for random regression models.
Four multiple linear regression models are considered and empirical
Bayes estimators for each model are presented.
The difference between
empirical Bayes and Bayesian inference is that the first does not
assume at all about the knowledge of the form of the prior distribution
of a parameter.
A comparison of empirical Bayes to ordinary least squares
estimators is also given for a special case of a simple linear
regression model.
4
2.
REVIEW OF LITERATURE
The empirical Bayes approach was introduced by Robbins (1955).
It is applicable when the same decision problem presents itself
repeatedly and independently with a fixed but unknown
distribution of the parameter.
~
priori
The statistical decision problem that
we consiqer comprises:
e
(1)
A parameter space
(2)
An observable random variable
space
1
fined.
on which a
f(xle)
~
function of
X has a density
~.
~
=e
and
&.
In
& is a
x.
A loss function
L(&,e)~ 0 , representing the loss
we incur in taking
0
the true parameter is
(5)
e,
is de-
~
with generic element
estimation problem we take
(4)
measure
~-finite
with respect to
A decision space
e.
X belonging to a
When the parameter is
function
(3)
with generic element
as an estimator of
e, when
e.
An ~ priori distribution of
G of
e
defined on
® , which is unknown.
The problem is to choose a function
observe
x
we shall take
incur the loss
L[&(x),eJ
o(x)
0, such that when we
as an estimate of
The expected loss when
& and thereby
e
is the true
parameter is given by
R(o,e)
SlL[&(x),e]f(x\e)$(x) .
(2.1)
5
Hence, the overall expected loss when
e
has an
~
priori distribution
G is
(2.2)
which is called the Bayes risk relative to
G.
Rewrite (2.2) by using (2.1),
R(5,G) :;
Ie Sx. L[5 (x), eJf (x le)d,u (x)dG(e)
(2.3 )
:; .fx. ~G[5 (x),x]d,u (x)
where
~G[5(x),xJ :;
The objective is to find
IeL[5(x),eJf(x\e)dG(e)
5
G
(2.4)
•
such that
~G[5G(x),xJ :; min ~G[5(X),XJ
5E$)
which means that for any
(2.5)
OEl), we have
(2.6)
Since no assumption is made about the form of
(2.6) cannot be derived and, therefore,
However, an asymptotically optimal
0G
5
G, (2.2) through
cannot also be obtained.
relative to
5 , was proposed by Robbins (1964). We may consider
N
estimate of 5 . The procedures are as follows.
G
by
G, denoted
5
as an
N
6
Let
(xu-' eN)
(xl,e l ), (X ,S2)' .•• ,
2
be a random sample of size
N from a papulation of
(x,e)'s whose joint distribution function is
F(x\S)G(e).
Xl' X2 ' ••• , X are observable random
N
Note that
variables, while
e , e , .•. , eN
l
2
are unobservable.
Now, construct the following function
so that we shall take
thereby incur the loss
6N(~+1) E ~
as an estimate of
L[oN(~+1),eN+1J.
the expected loss in estimating
e + , and
N l
For a given sequence
e N+l , given
Xl' x 2 ' ... ,
~
(ON}
, by
(2.3), is
and hence the overall expected loss is given by
where
E
denotes the expectation with respect to the
N independent
Xl' X ' ... , xu- ' which have a common (marginal)
2
density with respect to g on 1, given by
random variables
(2.8)
From
(2.6)
and
(2.7),
it follows that
,
7
If a sequencr
(ON}
can be found such that
lim R( oN' G)
==
R(G) ,
N-ioo
we say that
[ON}
is asymptotically optimal relative to
G.
The problem now is whether such a sequenre exists and, further,
whether a sequence
every G
€
q
[ON}
which is asymptoticallY optimal relative to
q
can be found;
distribution on
may be the class of all possible
®.
One way to obtain [ON}
is to find a sequence
as a sequence of distribution functions of
as
N
~ 00,
i.
e
such that
d
GN(e) ~ G(e)
~.,
lim GN(e)
==
G(e)
N~oo
at every continuity point of
G.
(1964) presented a general
Robbins
method for constructin~ a particular sequenc~ GN(e)
for an unknown
G(e)
and then proved a theorem that under appropriate
conditions on the family of
hold for any
as estimators
G whatever.
f(xle), GN(e) ~ G(e)
Denoting
FG{x)
==
as
N->
00
will
LOOooF(xla)dG(e) ,
these conditions are:
(1)
For every x, F(xle)
is a continuous function of
e ;
(2)
Both
lim
e~
F(x I 00
- 00
F(xle)
==
F(xl-
00)
exist for every x
and
lim
e~oo
f(xle)
==
8
(3)
Neither
F(xl-c:o)
nor
F(xlc:o)
isa
distribution function;
(4)
If
Gl , G
2
of
e
(1964)
Robbins
are any distribution functions
such that
, then
also gave a method of obtaining directly a sequence
(&N} , which bypasses the estimation of
of
f(xle)
G, if the parametric family
and the loss function are given.
Several other works on empirical Bayes techniques on estimation
or testing hypotheses were reported by Samuel
(1968),
Clemmer and Krutchkoff
and Krutchkoff
(1969)
Martz and Krutchkoff
(1969).
and Maritz
(1969)
of Martz and Krutchkoff
(1963),
(1967),
Krutchkoff
(1969),
Rutherford
Among these works, only that
will be cited below, since it is closely
related to the work presented in this thesis.
They considered a sequence of experiments using a multiple linear
regression model
y. = 2$. +
where
X..J.
E.,
-~
-~
is a
k X 1
for
-~
i
(2.10)
= 1, 2, .•• , N
observable random vector,
X
is a
k X P
matrix of nonstochastic independent variables which is common to all
experiments,
~.
-~
is a
p X 1
realization of a random vector
prior distribution
G(~)
and
coefficient vector which is the ith
~
E.
-~
with an unknown and unspecified
is a
k X 1
random vector
distributed as normal with mean 0 and common known variance
~2I
By
making use of observations from previous experiments, they derived an
empirical Bayes estimator for the coefficient vector in the Nth
9
experiment.
They also pointed out, by a Monte Carla study using a
simple linear regression model, that the larger
N the smaller the
ratio of the average squared error for the empirical Bayes estimator
of
~N
to the average squared error of the maximum likelihood
estimator.
With the number of experiences as small as five, a
substantial improvement for the empirical Bayes estimator was
obtained in many cases.
The degree of improvement is also affected by the form of the
prior distribution of
an exceptionally diffuse prior will give a
~,
ratio of average squared errors close to one for every value of
If
~
2
N.
is unknown, as is the case in practice, they suggested to
use
the least squares estimator for
experiment) data, where
By using
s2
instead of
£N
~
2
by using the present (the Nth
is the least squares estimator of
2
~
~N
, they found only a slight effect on the
ratio of the averag! squared errors.
10
3.
BAYES AND EMPIRICAL BAYES ESTIMATION
IN RANDOM REGRESSION MODELS
Before we consider the mUltiple linear regression model, let us
briefly review Bayes estimation in general.
3.1
Bayes Estimation In General
Let
be an observable random variable in a Euclidean space
X
with a conditional density
f(xle), where
e
is the parameter.
1
e
is assumed to be a realization of an unobservable random variable
whose (prior) density is
over a parameter space ® •
g(e)
The joint density of
X and
h(x,e)
the marginal density of
e
e
is
f(xle)g(e)
(3.1)
X is
(3.2)
and the posterior density of
e
p(alx) =
given
X =x
is
~«X»)
(3.3)
g
By using a quadratic loss function
an estimator of
e,
,..
,..
L(a,e) = (a-e)
2
,..
,where
e
the overall (Bayes) risk is given by
(3.4)
The Bayes estimator for
,..
respect to
a, namely,
a
is obtained by minimizing (3.4) with
is
11
(3.5)
Note that
....
e
is a function of
B
x
only, therefore, its expectation
is given by
(3.6)
= SxSeeh(x,e)dedX = Jeeg(e)de = E(e) •
....
e
unbiased.
Further, it will be shown that Bayes estimator based on a
sufficient statistic for
Let
is called marginally
B
e
is the same as that of (3.5).
T be a sufficient statistic for
factorization theorem (Kendall and stuart,
e.
By the Neyman
1961),
f(xle)
can be
1
factorized into
(3·7)
f(xle) = f(tle)q(x)
where
f (t Ie)
is a function of
of
e
given
T over 3'
is the conditional density of
x
X=x
that does not involve
as in
(3.3),
and
q (x)
The posterior density
e
by substituting
(3.7),
can be written
as
p(elx) =
1
~he symbol
1~~:I:~~t:~de = ...,....--;.:~~~.f--
= f(e\t) •
(3.8)
f in f(xle), f(tle) and f(elt) do not represent the
same functional form, its use 1S merely to indicate the density
function, whatever form it might take.
12
Substituting
(3.8) into (3.5) gives us
(3.9)
which is what we wanted to show.
In the next presentation about mUltiple linear regression model,
Bayes estimators will be based on sufficient statistics rather than
the original random variables.
3.2
Least Squares Estimation
Consider the following mUltiple linear regression model
(3.10)
y.. is
where
~
T Xl,
X is a
is distributed with mean
unknown fixed parameters.
~
estimators for
and
~
0
TXP
nonstochastic with full rank,
and variance
~
2
2
(~,~)
I ,and
are
It is well-known that the least squares
2
are respectively given by
(3.11)
and
By additional assumption of normality of
are also maximum likelihood estimators for
after correction for bias.
for
~
and
s
2
b
~
(3.11) and (3.12)
and
2
~
, respectively,
is the best linear unbiased estimator
is the best unbiased estimator for
~
2
.
13
The density function of
S = (T-p)s
2
•
(Kendall and Stuart,
2
~
and
S
can be expressed as
1
exp[- ~[S + (b - t3)'X'X(b - t3)J}
(a\[2II)T
2ci
--
=
where
~
Therefore, by the Neyman factorization theorem
1961),
b
and
S
3.3
are jointly sufficient for
Further, it has been shown (Graybill,
1961), that
N[~,~2(X,x)-lJ and
are independently distributed as
(T-p)
(3.13)
£
~
and
X2~2 with
degrees freedom, respectively.
Bayes Estimation In Random Regression Models
In this section, four variants of multiple linear regression model
and their respective Bayes estimation will be presented.
The four
variants of the model will be denoted by model A, B, C and D as
described below.
and Krutchkoff
Bayes estimation for model A follows that of Martz
(1969), while those for models B, C and D were developed
analogous to their results.
Model A:
~
'IX 1
=
X
~
'IXp
IP< 1
+
€
'IX 1
where
(a)
€
(b)
X and
is distributed as
r:l
2
NlQ,~ I) ;
are fixed and known, where
is a matrix of full rank;
X
14
(c)
~
is an unknown realization of a random
variaqle
~
with a known prior density
Note that once the random variable
assumed to be constant over
estimator for
T
~
g(~)
takes a value
successive observations.
will be based on a sufficient statistic
~
~,
The
b
it is
Bayes
for
~
as in (3.9).
The conditional density of
b
is given by
2
IX/Xr~
1
f(bIX,R~ ) = ~ exp(- ---2(Xb - ~)/(Xb
-
2~
(o\[11)P
1:1.
-
-
-
- ~)} •
-
(3.14 )
Aside,
1
- --- X/X(b - ~)
2
-~
or
~ = £ + ~2(XIX)-1
1
2 ~b f(£lx,~,~2)
f(£ IX,~, ~) -
The posterior density of
~
given
2
(X,£,~)
is
p(~IX,b,~2) = h(Q,~, Ix,~2) = ~X,~,~'g(~)
where
-
h (blx,~2)
g-
(3.15 )
h (blx,~2)
g-
(3.16 )
h(£,~IX,~2) is the conditional joint density of b and ~,
while
Therefore, from (3.9), the Bayes estimator for
~
is given by
15
By substituting (3.15) into (3.17), we obtain
1
A
~=----
h (b
\x,<i)
S[£
2
-1
+ ~ (XIX)
g-
1
I
a
2
2
ab f(£ X,~,~ )Jf(£IX,~,~ )g(~)~
f(£IX,~,~) 2
(3.18 )
where
h (bIX,~2)
g-
the first partial derivative of
From
with respect to
b.
(3.6), we have that ~ is marginal~ unbiased, ~.~., E(~) = E(~) .
Note that
hg(£lx,~2) depends on g(~) which we denote by putting a
subscript
g.
Model B:
~
.l=x
'IX 1
+€
'IXp pXl
']Xl
where
is distributed as
N(Q,~2I) ;
(a)
€
(b)
X is fixed, known and with full rank;
(c)
(~,~2) is an unknown realization of a random
variable
density
(~A,_, ,.,.2)
v
2
g(~, ~
Wl"
) •
th a kn own JOln
." t prlor
"
16
From (3.13), we know that
b
and
=
(T-p)s
2
are jointly sufficient
(~,~2) and are independently distributed as N[~,(X/X)-1~2J and
for
X2~2
with
(T-p)
~
estimators for
degrees of freedom, respectively.
and
2
~
density of
b
conditional density of
freedom.
and
Denote
f(£IX,~,~2)
N[~, (X/X)-1~2J
which is
S
The Bayes
, therefore, will be based on these
2
sufficient statistics.
X
S
which is
to be the conditional
and
X2~2
fT_P(sl~2)
with
Note that the conditional density of
(T-p)
S
as the
degrees of
does not depend on
~.
For a given
(X,~,~2), it can be shown that £
and
independent, therefore, the conditional joint density of
denoted by
b
are
and
S,
2
fT_p(£,SIX,~,~), can be expressed as the product of the
conditional density of
Since
S
S
b
and that of
is distributed as
S,
~.~.,
X2~2 with
(T-p)
degrees of
freedom, we can write
(3.20 )
or,
2
The symbol f does not have to represent the same functional form,
it merely indicates the density function and the index (T-p) or
(T-p-2) of f indicates the degrees of freedom.
17
s
t(~)
S ~(T-p-2)-1 S(-2)
e
cr
2
= -cr--....-01.,.(T--2.....,)~T--2--T-,
-2- = -T_-S-_-2 f T_p _2 (S Icr )
~
-p~
(-p-)
P
)
(
2 2
2 r
2
(3.21)
f _ _ (slcr2 ) is the conditional density of
TP 2
type with (T-p-2) degrees of freedom.
S which is a X2 _
where
The posterior joint density of
2
f(~,cr IX,E.,S)
and
~
cr2 , given
(x,.!?, S) , is
2
2
fT_P(E.,slx,~,cr )g(~,cr )
f
(b, S IX)
T-p -
(3.22)
where the numerator of (3.22) is the conditional joint density of E.,
s,
~ and cr2 , while
~
From (3.9), the Bayes estimator for
A
~
~
2
= E(~lX,.!?,S) = SJ~f(~,cr
is given by
IX,E.,S)~dcr
2
.
substituting (3.15) and (3022) into (3.23) gives us
A
f3
-
=
SS[b
-
+
cr2( XI X)-1 0h
f (I
E. X,~,cr 2)
0....
f(bIX,~,cr2)
2
fT_P(E.,S IX,~,cr2) g(~,cr)
]
-
Further, by making use of (3.19), we have,
fT_p(E.,slx)
~dcr2
18
==
l? + f
Next,
(X/X)-l
(b SfX)
T-p -'
app~ng
-b+
- -
SS[ab
0
-
I
2
2
I2
2
2
f(l? X,~,O" )J[O" fT_P(S 0" )Jg(~,O" )~dO" •
(3.21), we obtain
S(X/X)-l
T-p-2
f T(1)
__ 2(l?,S IX)
~
f -p -b,S X)
T
(3.24 )
where
Similarly, from (3.9), the Bayes estimator for
,,2
0"
==
E(O"2\ X,l?,S)
==
0"
2
is given by
SS 2f(~,O" 21 X,l?,S)~dO" 2 •
0"
Substituting (3,22) and (3.19) successively, gives us
2
2
2
2
,,2 _
0" f T-p (l?, S IX,~, 0" )g (~, 0" )~dO"
0"
fT_p(l?,slx)
SS
_
- SS
f(l?IX,~,O"
2 )[0"2fT_;p(S 10"2 )Jg~,O" 2 )~dO" 2
f
Now, by using (3.21), we have
(b,S\X)
T-p -
•
19
Note that both
"
~
,," 2
and
and
0-
g (~,
depend on
f(l) (b SIX)
T-p-2 -'
2
0-
)
f _ (£,
T p
since
,
g(~,
depend on
2
0- )
•
Model C:
X
Y..
'IX 1
f3_
'IXp :px 1
€
+
-
'IX 1
where
(a)
E
(b)
(X,o-)
is distributed as
2
variable
2
N(Q,o- I)
is a known realization of a random
2
(X, 0- ) , while
~ is an unknown
realization of a random variable
three random variables
X,
~
The
~.
and
2
0-
have a known joint prior density
(c)
X is a matrix of full rank.
Observe that Model C is similar to Model A in the sense that
0-
2
X and
are known, therefore the only parameter to be estimated is
The posterior density of
~
given
2
(X,~,o-)
~.
is given by
(3.26 )
20
where
and the numerator of
x,
~
and
2
CJ
(3.26)
is the joint conditional density of
b,
•
~,
Therefore, the Bayes estimator for
By substituting
(3.15)
the Bayes estimator for
into
(3.27),
using
(3.9)
analogous to
and
(3.18),
is
we obtain
as follows:
~
(1)
2
1 h
CJ (X/X)g
2
(b,X,CJ)
2
h (b,X,CJ )
g-
where
Model D:
Y..
'lXl
~
X
=
pXl
'JXp
+ E
'lXl
where
2
N(0,CJ I) ;
(a)
€
(b)
X is a known rea.lization of a random variable
is distributed as
X , while
(3.26)
?
(~,CJ-)
is an unknown realization of
(3.28 )
21
a random variable (~, (l).
variables
X,
prior density
(c)
2
and
cr
g(X,~,cr
2
~
The three random
have a ,known joint
)
X is a matrix of full rank.
Note that Model D is similar to Model B in the sense that
2
(~,cr)
and
X is known
is the parameter to be estimated.
The derivation of Bayes estimators for
to that of Model B, namely the Bay'es estimator for
...
~==b+
S(X/X)-l
T-p-2
cr2
and
~
~
is analogous
is given by
f~:~_2(X,:£,S)
(3.29)
f T-p (X,b,S)
-
where
(1)
0
f _ _2 (X,:£,S) == 0:£ f _ _2 (X,:£,S) ,
TP
TP
and the Bayes estimator for
... 2
(J'
In this case
...
~
2
cr
S
f T_ _2 (X,£,S)
P
== T-p-2 f
(X,b,S)
T-p -
and ~2
depend on
and
(X,:£, S) ,
Remarks:
is given by
g(X,~,S)
f (1)
_ _ (X,:£,S)
TP 2
(3.30)
since all of
depend on
f
.
T-p,
g(X,~,cr2).
If we are dealing with a single mUltiple linear
regression model, Model A and Model B,
~.~.,
the models with fixed
X,
are the ones that are commonly used in Bayes estimation.
However, when we are dealing with several regression equations,
where the
X matrix as well as
~ and cr2 may differ from one
regression equation to another, we might be willing to assume Model C
or D.
22
For Mode.l A and C, we assume ttat
cases is not the case.
s
'::J
O"~
If
0"
2
is known, whi chin mos t
is unknown, one might wonder of using
2
2 2
, the least squares estimate for O' , instead of the true 0"
to obtain the Bayes estimate for
Bayes estimate for
~,
say
"
.~,
~.
IE this case, the obtained
wil.l not minimize the overall risk
(Bayes risk) for all possib.le values of
the Bayes risk for a special case when
On the other hand, when
2
0"
0"
0"
2
rather it only minimizes
,
2
s
2
is unkrlOwn as assumed in Models B
'::J
(~,O"~),
and D, the Bayes estimate for
,
denoted by
minimize the Bayes risk over all. possible values of
(~, 0-2 ) , wi 11
2
(~, 0" )
•
Note that for al.l models A, B, C and D, the knowledge of the
exact form of the prior density must be assumed to obtain the Bayes
estimators.
If there is no knowledge at all about the prior density,
Jeffreys (1961) suggested to use a uniform prior density as follows.
When the range of a parameter
e
is from
-c
to
c, where
and may take value
+00, the prior density is taken to be
For the case where
eE(O,C)
density of
log
lei
or
6E(-C,O), where
c > 0
g(e)o:;l.
C > 0 , the prior
is taken to be uniform, ~.~.,
f (log
Ie \ )0:;1
or
gee)
0:;
\e\-l.
Applying Jeffrey's suggestion of uniform prior density for Mode1S
A, B, C and D, we have:
23
Model A:
g (~)cx:l , since
_
~
E
E
:P
(p dimension Euclidean srace),
Model B:
··1
g (~, cr )cx:cr·
, since
and
cr>O.
Models C and D:
2
-1
g (X"~' cr )cx:cr· , since
3.4
X c E'D<p , ~ (:.E
p
and
cr > 0 •
Empirical Bayes Estimation
Recal.l that the Bayes estimators for Models A, B, C and D cannot
g(~)
be obtained, unless the prior densitips
for Model B,
2
g (X,~, cr)
for Mode.l A,
for Model C and Mode 1 D are known.
2
g(~,cr )
In
practice, this is usually not t.he case and it is unrealistic to assume
a specific form of a prior density.
In this section, this assumption about the knowledge of the prior
density will be rel.axed and a two-stage Bayes estimator, which is
called empirical Bayes
(~stimator,
will. be derived.
the four models discussed in Section
denoted by Model. A*,
B*, C*
~Dd
D*,
3.3,
the following four models,
will be considered.
Model A*:
Y.,,=X
~.
-],-1.
']Xl 'D<:p pXl
where
+
c.
-].
']Xl '
In accordance with
i
1, 2, •.• , N
24
~l' ~2' ... , ~N
(a)
are independent with a
2
common distribution
2
(X,o-)
(b)
all
is
N(.Q,a I)
fixed and known, common for
i = 1, 2, ... , N.
The matrix
X is
of full rank;
~l' ~2'
(c)
~N
0."
are unknown independent
realizations of a random variable
an
unknow~
with
~
g(~).
prior density
This is the model analyzed by Martz and Krutchkoff
they derived an empirical Bayes estimator for
We shall review
~N'
their derivation below and obtain an empirical Bayes
13. (i = 1, 2, .. u, N)
N) , by
g(~)
been known, the Bayes estimate for
= -b.l +
2
0-
-1 h
13.
-l
(i
1, 2, ... ,
(X'X)·
(1) (£. IX, (j2 )
g
l
h (b.
g
Now, since
2
(X,o-)
-l
IX,
is common for all
(£l'~l)' (£2'~2)' •.. ,
of a random variable
equation
for
(3.18), would be given by
p.
-l
consider
~stimate
0
-l
Had
(1969), where
(£,~)
(£N'~N)
(3.31)
2
(J
)
i = 1, 2, •.• , N,
we can
as independent realizations
whose joint density is denoted (recall
(3.16) for Model A of Section 3.3) by
E. 2 ,
Therefore, we can also consider that
£.1.'
realizations of a random variable
whose (marginal) density is
b
••• , £N
are independent
25
hg(~lx,~2) in (3.16).
denoted by
common for all
i
1, 2, .•. , N , we can. just denote this marginal
h (b) .
density by
g-
Noting that
variable
(X,~2) is
For simplicity, since
b
hg-l
(b.)
is a multivariate densi ty of a random
evaluated at
b = E.i ' and
~l' ~2'
•.. ,
~N
are
observable and independent, several methods for estimating
are available such as the method described by Cacoullus
Let
f(~)
~
= x.
-l
!
3
A quadratic mean consistent
asymptotically unbiased estimate for
independent observations
g
(1966) below.
be a multivariate density of a random vector
(]Xl) , evaluated at
h (b.)
!l' !2' .•• ,
f(~),
~
P
N II D..
based on the sample of
~,
of
... ,
1
and
x -x
.p
D.
is given by
kp)
(3.32)
P
j=i J
where
IKCyJ I <00,
(a)
sup
.lEEp
(b)
SI KCyJ IdX < 00,
(c)
~.~.
,
K is bounded;
K is integrable;
\y\PK(y) = 0 , where
lim
Ixl-'
~.~.,
oo
length of vector
Iyl
denotes the
X
3
Consistency here is in the sense of mean squared error consistent,
~.~.,
lim
N-.oo
E[fN(~)-f(~)J2 = 0 .. at every continuity point of f .
-
N
26
(d)
SK(I)dl = 1
;
(e)
b.. = b.. (N)
satisfies
J
lim b.. (N) = 0
N-+ CO J
J
lim N[b. . (N)J P
N-+ co
J
= co, for j =
By using £1' £2' ..• , £N
as
and
1, 2, .•• , P •
N independent observations and
setting
7~
l
>
x -x.
*#
••
,
P
l
1
P
(2II)p j=l
2
.
k J]
25 j
Sln
p) = --:::- II
b.p
b .. -b
[lJ
•
(3.33 )
[bij-bkjJ
25.
J
and defining
. (bij-bkj )
25.
Sln
_ .........._,....;JII--_ = 1 ,
for
b .. -b .
(lJ k J)
k
=i
,
20 j
a consistent and asymptotically unbiased estimate for
hg (b.)
,
-l
therefore, is given by
2
l
1+
N
L:
P
II
b .. -b
k=l j=l
(lJ
kfi
25.
J
where
(1)
5. = 0 (N)cr
J
\/t (X/Xf 1 } JJ..'
;
.
kJ
]
(3.34 )
27
(2)
,
(X/X)-l}jj
is the
(j,j)th
element of
(X'X) -1 ;
(3)
..
0 (N)
= 0 and
lim 0 (N)
satisfies
N... ex:>
lim N[6 (N)]P
N... ex:>
1
6 (N) = N)
q
= ex:>, for example, take
q > p •
wi th
Note that all of the conditions (a) to (e) given in
met by the function
K of
(3.33).
(3.32) are
We might use functions other than
(3.33) as long as it satisfies conditions (a) to (e) given in (3.32);
the choice of
(3.33) is made so that this work will be comparable to
that of Martz and Krutchkoff (1969).
At this point, a study to find a
function that will give better empirical Bayes estimators, if possible
the best in some sense, is still
~en.
of (3.32) is a
Furthermore, observe that
p X 1
vector
and can be wri tten as
h(l\b.)
g
-~
where
b.
-~
=
[~bO
() il
h (b.), •.. , ~bO
h (b. )JI
g -~
() ip g-~
= [b. l , b. 2, ..• , b. J/.
~
~
By the definition of the first
~p
derivative of a function, (3.35) .can be expressed as
1
h ( ) (b.)
g
-~
= [hg 1 (b. ), ... , hgp (b.)
J'
-J.
(3.36 )
-~
where
h . (b.)
gJ
-~
h ([b. l' ••• , b. . +
g
~
= lim --!ii!._=
€... O
~J
~
€
€,
••• ,
b.
JI)
- h (b.)
__
~p
--:;;~
g
-~
-.:.L~
28
for
,
= 1, 2, •.. , p
j
By choosing
E
such that
lim
E
=0
for example we can choose
N... co
E =
5.J
of
(3.34) -- and using (3.32),
h .(b.)
gJ
of
-J.
(3.36) can be
estimated consistently by
which in turn gives a consistent estimator for
h-~l) (b.)
-J.
-1'4
An
as follows:
[h-_l(b.),
... , -l.\J"p
h-_ (b.)J
I
-1'4 -J.
-J.
=
empirical Bayes estimator for
(3.38)
•
is obtained by substituting
~.
-J.
(3.34) and (3.38) into (3.31), namely
,
i = 1, 2, .•• , N •
By using an argument as given by Rutherford and Krutchkoff
(1969),
~.
it can be shown that the overall risk of
"~.
the Bayes risk of
of
-J.
asymptotically optimal.
(3.31), as
~.
-J.
of
(3.39) converges to
N ... co, ~.~.,
Rutherford and Krutchkoff
~
~i
of
(3.39) is
(1969) gave a set
of sufficient conditions for an empirical Bayes estimator to be
asymptotically optimal, when the squared error loss function is used.
Denoting
e,
a
and
~
e
as the Bayes and empirical Bayes estimators for
respectively, those sufficient conditions are given at the top of
the next page.
29
"
plim "e
(a)
= "e .
N-+co
,
(b)
)' > 0
For some
we have t hat
""
~i
Indeed,
of
and some real number
E( \ e \2+)') ~ M < CO
M < co ,
•
(3.39) can be shown to satisfy condition (a) as follows.
~(£i)
By the consistency of Cacoullus' estimators
and
~l)(£i)'
we have that
h (b.)
g
-~
and
Therefore, for
~
plim~.
N-+co -~
= b.
-~
~
~.
-~
of
(3.39), we have
= -~
b.
+
+
Further, by imposing a mild condition on the prior density
such that their third absolute moments are bounded, we have that
condition (b) is also satisfied.
Model B*:
Y...~ =
':IX 1
where
x
~.+
-~
'D<p pX 1
§.~,
.....
'D<l
i
1, 2, •.• , N
g(~)
30
(a)
(b)
E.
is distributed as
.§.2'
... ,
-~
2
N(O,ao.I)
-
and .§.l'
~
are independent;
.§.N
X is fixed, known and common for all
i
=
1, 2, ..• , N ;
independent realizations of a random variable
(~, 0"2)
g(~, 0"
2
wi th an unknown pri or densi ty
) •
Note that once the random variable (~,0"2) takes a value, say
2
(~.,O".),
-~
~
it is fixed for
regression equation.
f3.
(i =
-~
Had
successive observations in each ith
T
g(~,O"
2
)
been known, the Bayes estimate for
1, 2, .•• , N) , from (3.25), is given by
A
f3.
= b.
-~
-~
(3.40)
and the Bayes estimate for
o-~
2
=
~
_S_i_ f T- E- 2( O?·i' Si1xr'X))
T-p-2 f -p -2
b.,S.~ X
T
Observe that since
can consider
2
0". , from (3.26), is given by
X is common for all
(3.41)
i
=
1, 2, ••• , N , we
2 2 2
(£l'Sl'~l'O"l)' (!?2,S2'~2'0"2)' ... , (!2-N,SN'~N'O"N)
independent realizations of a random variable
2
(£,S,~,O")
as
whose joint
density is denoted (recall equation (3.22) for Model B of Section 3.3)
Thus,
can also be considered as independent realizations of a random variable
(£,S) whooo (marginal) density is fT_P(£,sIX)
from (3.22).
31
Thus,
(T-p)
:.
f -p (b.,S.
IX)
T -1 1
degrees of freedom of a random variable
(b,S)
= (b.,S.).
-1
1
Similarly,
multivariate density with
variable evaluated at
i
t::
is the value of a multivariate density with
1, 2, ..• , N,
f -p- 2(b.,S.
Ix)
-1
1
T
(T-p-2)
(b.,S.).
-1
1
evaluated at
is the value of a
degrees of freedom of a random
Since
f -p (b.,S.\X)
T -1 1
(£,S)
and
X is common for all
f -p- 2(b.,s.lx)
-1
1
T
can simply be
denoted by
f -p (b.,S.)
and f -p - 2(b.,S.),
respectively.
-1
1
T -1 1
T
Again, in this case, Cacoullus' method can be applied to obtain
consistent and asymptotically unbiased estimates for
f
T-p
(b.,S.),
-1
1
(1)
f T-p- 2(b.
,S.)
and f -p- 2(b.,
S.)
, which will be denoted by
-1
1
-1
1
T
(1)
(1)
f T (b"Si)' f T 2(b.,S.) and f T 2(b.,S.), respectively.
N, -p ~
N, -p- -1 1
N, -p- ~ 1
Therefore, SUbstituting these estimates into (3.40) we obtain an
empirical Bayes estimate for
~
~
...
A
t:.i '
(3.42)
= -i
b +
Similarly, by substituting
into
denoted by
f N, T-p (b.,S.)
-1
1
and
f N,-pT 2(b.,S.)
-1
1
(3.41), we obtain an empirical Bayes estimate for
2
~.
1
, denoted
by
The derivation of
f (1)
T 2(b.,S.)
N, -p- /~1 1
f N, T-p (b.,S.),
-1
1
will be presented below.
S.
1
=
T
2
1::
e ..
j=l
1J
f N, T-p - 2(b.,S.)
-1
1
Since
and
32
has
(T-p)
T (b.,S.)
N, -p - l l
as follows:
degrees of freedom,
(3.34),
away, analogous to
can be obtained right
f
1
,Sol )
, -p (b.
-l
fN T
(2TT )
p+L_
P
n 6. .
-n'Y
. 1 lJ
J==
(
t
2
S'-Sk
2". ] np
'Y
sine
N
1 + E
k==l
kfi
S.-S
[l
k
~J
.
J==l
. cbij-bkjj
Sln
2J:
U ••
lJ
kJ ]
b. -b .
[lj
(3.44 )
26 ..
lJ
where
(1)
'Y ==
(2)
§ == -
(3 )
6(N)
~-
1
E
N l=
. 1
(S.-S)
2
l
1 N
E S. ;
N l:;::
. 1
J:
u
(N)
ij == U
l:
(4 )
(5)
N
l
_
~
S.
1
..2:-r
(X
'X ) - } .
T-pt
jJ
is the
6 (N)
satisfies
lim N[ 6 (N)
N-+ co
1
t+1
(j,j)th
lim (N) == 0
N-+co
=co,
the random variable
and
for example, take
6(N) == N q, where q > p
The derivation of
element of
f , T-p- 2(b.,S.)
-l
l
N
+ 1 •
is rather complicated, since
33
=
S
has
(T-p)
variable
T
.E
e.
j=l
J
degrees of freedom.
S*
2
However, we can construct a random
a random variabJ,.e with
derived from
S,
i
j'
= 1,
S*
possibili ty of which pair of
free ej's.
,
2, .•. , T .
T > p + 2,
But, for
of
(3.45) gives more than one
e.' s to be thrown away out of
J
For uniformity, it is suggested to take
T.
j
S*
(T-p)
as the
(S-e~-e:
)'s and all possible (T-p) free
J JI
average of all possible
e 's out of
(T-p)
~.~.,
S* = S _ e 2 - e 2 1
j
j
j
degrees of freedom --
S, for example by throwing away two out of
degrees freedom of
where
(T-p-2)
Suppose the first
(T-p)
of e.'s are free, then we
J
have
s* =
1
T-p
T-p
1:
.E
(T-p) j=l
2
1
= (T-P)
2
j
j/=l
J
J
< j'
T-p
.E
2
2
(S-e. - e. 1 )
T-:p
T-P
.E [.E
j '=1 k=l
j=l
j < j
1
kfj
k:/=j
1
T-p
T
T
T-E-2
2
2
T-E-2
2
2
'" T.E e k +
.E
ek =
S + T-P
.E
ek •
p k=l
k=T-p+l
T-p
k=T-p+l
(3.46 )
34
Since the free
overall average of
ej's can be any
S*
in
(T-p)
out of
T of ej's, the
(3.46) is given by
s* = T-p-2 S + 1
~ ~ (T-l)e 2 = [T-E-2 + ~]s = T-2S
T-p
( T ) T-p k=l p-l k
T-p
T(T-p)
T
T-p
By using the random variable
degrees of freedom, instead of
again analogous to
S*
(3.47) which has
of
S, we can derive
(T-p-2)
f , T-p- 2(b.,S.),
-~
~
N
(3.34), as follows:
2
fN T
,
-p-
(
1
2(b.,S.) = --..:;;:;..----~
~
p+ L
P
(2II) ~'Y II 5 ..
j=l ~J
(Sin[ k
S.-S*
+ N.
L:
k=l
k/=i
S.-S*
[~ K]
2y
J
k
P
.
II
J=l
in[~J)
S.-S'¥"
[~
2y
~]
b..-bkj ) 2]
sine ~J]
25 ..
~J
(3.48 )
b .. -b .
[~t .kJ ]
~J
where
(a)
Sf
T-2
= ~ Si ' for
(b)
'Y
and
5ij
i = 1, 2, ... , N ;
are as defined in
(3.44)
f (1)T 2(b.,S.) is obtained, analogous to
N, -p- -~ ~
making use of (3.48), as follows.
Finally,
(3.38,
) by
fN(lT)
= [fN,· l , T-p- 2(b.,S.),
••• , f N, p, T-p- 2(b.,S.
)],
, -p- 2(b.,S.)
-~
~
-:l.
~
-~
~
(3.49)
35
where
f N,J,
. T-p- 2(b.,S.)
-2
2
1
= --~--(fN
v
, T-p- 2([b·2 l ,
ij
.
• •• , b.2p J', S.)
- f
2
N,
. , S.2 )}
T-p- 2 (b
-2
for j "" 1, 2, ..• , p .
Before we discuss Model C* and D*,
x = c..~,X*J
, where
X*
= [!2'
obse~ve
~, ... , ~J
that, in general,
.
Model C*:
Yi
=
'D<l
~i
Xi
= 1,
i
'D<p pxl
2, ••• , N
or
~i
'D<l
=:
[1,
Xf
J ~i + §.i '
'lXl 'IX (p-l) pXl
'D<l
= 1, 2, ..• , N
i
where
(a)
€.
-2
(i
= 1,
and §.l' §.2' ..• , §.N
(b)
(X!' o-~),
is distributed as
2, ... , N)
(X~, o-~),
are independent;
2
••• , (Xj,o-N)
are known independent
realizations of a random variable
~l' ~2'
••. ,
of a random
X* ,
~
g(X*,~,o-
and
2
~
(X*,0-2), while
are unknown independent realizations
variable~.
0-
2
N(O,o-.I)
2
2
The three random variables
have an unknown prior joint density
) ;
(c) X is a matrix of full rank.
36
Note that the assumptions for Model c* is the same as those for
g(X*,~,~2) is unknown.
Model C, except that the prior joint density
Tne parameters to be estimated in Model C* are
g(x*,~,~2) been known, the Bayes estimate for
~l' ~2' ••• , ~N'
~.
-~
Had
(i = 1, 2, •.• , N) ,
from (3.29) is given by
2 '
-1
'"
~ = b. + ~. ( X.X.)
... -~
~
*
h (1) (X. , b . , ~.2 )
g
~-~~
(3·50)
2
h (Xt,b.,~.)
~ ~
g
~
-~
~
where, as before,
(1)
2
0
2
hg (X1E",b.,~.)
="'-::-b
hg (X1E",b.,~.)
•
~ -~
~
0 h •
~ -~
~
-~
Again, by a similar argument as for Models A* and B*, we can
(Xl'£l,~i), (X~'£2'~~)'
consider
realizations of a random variable
2
hg (X*,b,~
).
to obtain a
.•• ,
~
-~
2
(X*,£,~)
as independent
whose density is
Therefore, Cacoullus ' method is again applicable here
consi~tent
and asymptotically unbiased estimate for
h (X~,b.,~~) , a multivariate density
g
(XN'£N'~~)
~
(X*,b,~2)
= (Xt,b.~~).
~ -~ ~
h (X*,b,~2) evaluated at
g
-
Denote this estimate by
h (Xt,b.,~~) •
Note that we can write
Xf. .=
[~;2' .•• , x .. , ••• , x. J,
....
-~J
-~p
for
i = 1, 2, ••• , N where
x .. = [x.~J'1' ••. , x ~Jm
.. , ••• , x.~J'T]',
.
-~J
for j
2, 3, •.• , p , which allows us to write
N
~ -J.,
~
37
hg (X-tE-,b.,cl)
== hg (x.
, ... , -~p
x. ,b.,cr~).
~ -~
~
-~ 2
~
~
Applying Cacoullus' method, analogous to
(3.44), we can obtain
N
I:
k==l
k/i.
(3.51)
where
(1)
Y == 5 (N)
(2)
cr
-2
1 N
2
== - I: cr; ;
N . 1 ...
~==
lljm
(4)
= o(N)
_
1
x.
• Jm
O. .
~J
(6 )
-2)2
( cr2 -cr
i
I-~-----­
\~ i~l
N
I: x
N i==l ijm
=-
=
0 (N)cr. \ Ir-(X-!-X.-)--l-}..~
~ ~
JJ
\It
f(X!X. )-l} ..
t
l
(xijm-X.jm)2
~
JJ
is the (j,j)th element of
-
(X!X. )-1
~
l
38
(7)
0 (N) satisfies
lim
0 (N)
= 0 and
N.... oo
.
( ) (p-l)T+p+l
lim
N[ 0 N J
N.... 00
o{N)
While
(1)
lL~
-~
2
(X~,b.,O".),
J. -J.
e;
for example take
= N q ,where q > {P-l)T+p+l .
J.
h (1) (X.* , b . , 0".2 )
estimate for
=00,
1
l
-l
J.
a consistent and asymptotically unbiased
is obtained, analogous to (3.49), as
follows:
[h--l(X~,~,O"~),
... , h-_
(Xt,b.,O"~)J'
N
J. ~.l.
J.
Np
J. -J.
J.
(3.52)
where
... ,
lL_. (X~,
b., O"~)
= ~[lL_{X~,
[b.J. l'
J. -J.
J.
u. . -~
J.
-~J
b .. + 0.. , •.. , b. JI ,O"~)
J.J
J.J
J.J
J.p
J.
for j = 1, 2, ..• , p .
Finally, by substituting (3.51) and (3.52) into (3.50), an
empirical Bayes estimate for
,.,
'"
~. = b
-J.
-i +
2
0".
J.
(i = 1, 2, ... , N), is given by
~.
-J.
(X!X. )
1
-
*
h (1)
\
(X. , b.
-~
J. -J.
2
,0". )
J.
2
h-_(X~,b.,O".)
J. J.
. N
J. -J.
J.
Model D*:
VJ.'
:J..
'D<l
=::
X.J . - 13.
J.
'D<p IP< 1
+
E.
-J.
'lXl
or, as in Model C*, we can write
,
i
= 1, 2, ... , N
(3.53 )
39
li
']Xl
=
[1: , xf
]
']Xl 'IX (p-l)
+
,
E.
-~
'lXl
i
= 1,
2, ... , N
where
(a)
E.
-~
is distributed as
2
N(Q, (Ji I)
... ,
.§.l' .§.2'
i = 1, 2,
N,
and
,
for
... ,
.§.N
are independent;
(b)
... ,
x* are observable (known) inN
dependent reali~ations of a random variable
Xi, X*,
2
X* , while
ar~
unobservable (unknown) independent
realizations of a random variable
(~,
The three random variables
and
X*,
~
2
(J ) .
(J2
have an unknown prior joint density
g(X*,~,(J
(c)
2
) ;
X is a matrix of full rank.
In this model, the
(~2,(J2)' ••• ,
2
2
(~N,(JN)'
par~eters
to be estimated are
Observe that the assumptions for this model
is the same as those for Model D, except that the prior joint
density
When
... ,
g(X*,~,(J2) is unknown.
2
g(X*,~,(J)
N) , from
is known, the Bayes es timate for
13. (i
-~
= 1, 2,
(3.30), will be given by
(3·54 )
40
2(.1
and the Bayes estimate for
(J'.
~
:::;
1, 2, ..• , N) , from (3.31), will
be given by
S. f T-p- 2(X1E-,b
J. - i ,S.J. )
~i - T-p-2 f
(X~,b.,S.)
T-p J. -J. J.
"
2
~_
(3.55 )
By the same arguments as before, we can consider
variable
(X*,£,S)
whose (marginal) density is
f
T-p
fore, Cacoullus' method is applicable for estimating
f -p- 2(X~,b.,S.)
J. -J. J.
T
and
fN,T_P(Xf'£i,Si) ,
fT(l)
-p-
2(X~,b.,S.)
J. -J. J.
f N, T-p- 2(X~,b.,S.)
J. -J. J.
as before.
and
fN(lT)
, -p-
(X*,b,S).
f
T-p
There-
(X~,b.,S.),
J. -J.
J.
Denoting
2(X*.,b.,S.)
J. --J.
as
J.
their respective (Cacoullus) estimates, empirical Bayes estimators
for
~.
-J.
2
and
~.
J.
will be given by substituting these estimates as
before into (3.54)and (3.55) as follows:
-1
A
~
.
-J.
S. (X!X. )
J.
J. J.
= -J.
b. + --=-T--=::;"":;2::""--p-
and
"2 _ _ _
S.J._ f N, T-:8- 2(X~,b.,S.)
J. -J.
J.
fri - T-p-2 f T (X~,b.,S.)
N, -p J. -J. J.
(3.57 )
Needless to show algebraic derivations, since they are analogous
to those described in Model B* and C*, we have the following results:
41
f
T
N, -p
1
(X~, b. ,8. )
1. -1. 1.
N
~
k=l
kfi
. [Xi jm-XkjmJ)
s1.n
i
p
T
n
n
26 jm
( j=2 m:;::l
b .. -b
. [1.J2~ kj J
s1.n
p
Vij
n
b .. -b ,
[1.J k J]
j=l
26 ..
1.J
wbere
(1) Y = 6(N)
_
(2 )
8
(3 )
6
(4 )
1
=-
\j
1
-N
N
_ 2
~ (S.-8)
. 1 1.
1.=
N
~
8.
N i=l 1.
is as defined in (3.51) ;
8.
1
6., = 6(N)
T1. ((X!X.)- } .. ;
1.J
-p
1. 1.
JJ
jm
(5 )
((X!X.
)-1},.
1. 1.
JJ
(6 )
6(N)
is tbe (j,j)th element of
sa.tisfies
-
lim
6(N)
=0
(X!X.
)-1
1. 1.
and
N... oo
lim N[6{N)] (p-1)T+p+1 '" 00, for example we
N... cn
42
1
can take
q
=N
q, where
> (p-1)T+p+1 .
T 2(X~,bi'S,)
,-p~.,.. ~
fN
5(N)
1
= ---------------=~-------------
(2II)(p-1)T+P+~y( ~
~
6. ) ~ 5..
j=2 111=1 JIll j=l 1J
S'-S~02
Sin[~J
(
S. -S*k
. [Xi jIll_X kjIll ] )
12
] ~p
T s~n
2·
6.
---.....:::;.y"--II
II
JIll
S. -S'lE"
. 2
1
x .. -x ,
[~]
J= m=
[lJm k JIll]
2y
26
jm
. [
s~n
N
+ I:
k=l
k/=i
P
S.-S1E"
[-2:.-2.J
2y
b .. -b .
sine 1J k J]
25
II
j=l
2
.
ij
where
and
*
(1)
T-2 S
Si = T
i'
(2)
y, 5 , 6 jm
ij
for
and
l'
_-
5(N)
1, 2, .•• , N ;
are as defined in (3.58 )
fN(ll
2(X~,b"S,)
,-I--P~ -;L
~
~
[fN 1 T
2(X~,b.,S,),
~"
,
-p~ -~
~
••• , f N,p, T-p- 2 (X1E-,
b, , S,~ ) l'
~ -~
(3.60)
where
1
f N,J,
' T-p- 2(X~,b"S.)
--(fN,-I--pm
2(X1E-,Cb'l'
1 -~ ~ ~ --~
U,.
~
~
:LJ
... , b, ]
~p
for
j
~
I ,
S.) - f
~
T
2 (X~ , b , , S, )}
~ -~
~
N, -p-
1, 2, ••• , P •
3.5 Remarks On Empirical Bayes Estimation
As indicated by the assumptions, Model A* is applicable when we
'
have a se t a f regress:Lon
equat'~ons W:L'th common
known.
o2
(X,cr )
and
cr2
4S
~
Martz and Krutchkoff (1969) pointed out, by means of a
simulation study of a simple linear regression model, that the
empirical Bayes estimator for
~
'"
(denoted by ~N ) is better than
the ordinary least squares estimate (denoted by £N ), in the sense
....
that the average squared error for
'"
~N
is smaller than that of
The more improvement will be gained, the larger
N
£N.
(the number of
experiments ).
Note that Model B* is a generalization of Model A*, since if the
random variable
2
cr
in Model B* is degenerate of a known value
Model B* will become Model A*.
Similarly Model c* is also a
generalization of Model A*, in this case if the random variable
2
(X*,cr)
of Model c* is degenerate.
generalization of Model B*.
Model D*, however, is a
cr
2
,
44
By assuming the
and
~
2
X matrix to be a random variable besides
, as described in Model C* and D*, empirical Bayes estimation
for these models will be more cumbersome as
the
~
X matrix) becomes larger.
TX P
(the dimension of
By putting some restrictions on this
X matrix, we can obtain special cases of Model C* or D*.
For example, the
fol~owing
model, denoted by Model
TIt'
is a
special case of Model D*.
,Model
Dt:
,x~J
::;: [_1
4
-'l.
'D<l 'JXl
'D<l
~.
-~
+
E.
,
-:L
i
'D<l
~l
1, 2, ... , N
where the assumptions are the same as those for Model D* with
additional restrictions below:
(a)
T is an odd number;
(b)
The elements of
x
(c)
x
i2
::;:
iT
x1E"
x il ::;: x
i3
-
2.5x
,
for
il
are equally spaced,
-~
x
...
::;:
i2
i
::;:
::;: x
1, 2,
. ..,
iT
.!.. ~. ,
xi,T-l
N
.
The chaise of restriction (c) will be discussed later at the end
of this section.
Since
is odd, we have a middle element of
T
By restrictions (b) and (c),
be expressed in terms of
c.
~
7(T-l)
~m
can
as follows:
i
4T+6m-10
xim ::;:
x.
ci '
:;:: 1, 2, .•• , N
(3.61)
m ::;: 1, 2, •.. , T
From the above restrictions, we see that random variables
and
c
assuming
density
have a
one~to-one
2
(~*,~,O" )
correspondence.
x*
Therefore, instead of
as a random variable with an unknown prior
2
2
g(~*,~,O" ) , we can as well assume that
4
random variable with an unknown prior density
(c,~,O" )
g( c,~, 0"
2
)
is a
•
Cacoullus' estimates of the required multivariate densities as
given in (3.58), (3.59) and (3.60) will become
f
1
T (c.,~,S.):=: - - - - : : ; . - - - N, -p ~ -~ ~
2
P
n
(2Il )P+ N)'t.
. 1
J:=:
&..
~J
2
s. -Sk
N
I:
k=l
k}i
Sin[?r
O
[Si-Sk
2'Y J
c.-c k
] Sin[-k-J
[ci-c k ]
p
.
Sln
. )
b .. -b k J]
[~J
n
2& ..
~J
j:;;l
2t.
where
(1)
A
= 6(NlVl:.N.
~
~=
(2)
c
(3)
)'
1
- 2
(c. -c)
~
N
1
I: c
N ~=
. 1 i
and
&••
~J
are as defined in (3.58);
4 2 2
The symbol g in g(~*,~,O") and g(c,~,O") do not necessarily
represent the same functional form, it-merely indicates the prior
density function.
46
(4)
6 (N)
satisfies
lim
6 (N)
=0
and
N.... ex:>
N[ &(N) ]P+2
lim
N.... ex:>
= ex:>,
for example take
1
- -
6 (N) = N
q > p + 2 .
q, where
2
f
T
2(c.,b.,S.)
N, -p~ -~
~
1
= -----+2
P
(2rI )p N)' 6 n 6..
. 1
J=
N
+
~J
S'-S~0
sin[-¥J
Q
S.-S~
[.2:...-..2:.
J
2)'
p
.
s~n
b .. -b k . ) 2
[~J
21:V • • JJ
n
t
k=l
. 1
J=
k/=i
~J
b .. -b k ·
[~J
J]
20 ..
~J
(3.63 )
where
)',
f
6
ij
,
6
and
(1)
T
2(c.,b.,S.)
~ -~
~
N, -p-
6(N)
=
are as defined in
(3.62).
[fN , 1 , T -P- 2(c.,b.,S.),
... ,
~ -~
~
f
N,p,
T
)J'
-p- 2(c.,b.,S.
~ -~
~
where
..• , b.~p J/,S.)
- f , T -p- 2(c.,b.,S.)}
~
~ -~
~
N
for
j =
1, 2, ••• , P .
(3.64 )
EB estimators for
2
and
~.
-~
rr.~ , from (3.56) and (3.57), are
therefore given by
~
S. (X!X. )
~.
-~
~
-1
~
(1)
f N, T-p- 2(c.,b.,S.)
~ -~
~
:;:: b. + --=T=---2-=---~
-pf N, T-p- 2(c.,b.,S.)
~""""l
~
~2
S..
rr
",,--=~,::"
f
b . , S. )
N T -p- 2 ( c ~. , -~
~
(3.66)
'
T-p-2 f , T-p (ci,b.,S.)
-~
~
N
i
where
~
X. :;:: [l,x~J .
--:I.
~
The essence of restricting the
X matrix in Model D*, is actually
to make it possible to have a one-to-one correspondence between
matrix and a scalar variable
simplified.
X
c, so that the estimation is much more
Other kinds of restrictions on
X might as well be
imposed to have a one-to-one correspondence between
X matrix and a
scalar variable.
The matrix
T X (p-l).
X*, which is one-to-one to X, has a dimension of
X*
Any restriction on
one correspondence between
dimension of
X*
Y is less than
such that we can have a one-to-
and a multivariate
T X (p-l)
Y, where the
will, therefore, simplify
the estimation, in the sense that we have variables with less
dimension in the joint prior density.
Now, some arguments for choosing restriction (c), ~.~.,
x
iT
:;:: 2.5xil (i :;:: 1, 2, ••. , N) , will be given below:
Consider a uniform distribution
1
f(x) :;::
[ b-a '
o ,
for
elsewhere
48
The mean and variance of (3.67), denoted by
given by
g
x
= ~(b+a)
212
~x = 12(b-a) .
and
g
x
and
2
~x
' are
Therefore, the coefficient
of variation will be given by
~
c. v. = - x =
fJ.
x
b-a
·577350 b+a
(3.68 )
Relation (3.68) can be rewritten as
b
a
1 + l.732052 C.V.
1 - l.732052 C.V.
and is tabulated below for several values of C. V.
Table 3.l
C. V.
-ba
.05
1.2
Relation of C.V. and b/a
.20
.10
1.4
2.1
If we design the
.25
2·5
b = 2.5a.
3.2
.40
5·5
·50
l3· 9
·55
41. 2
.60
·577350
-52.0
ro
X matrix such that the elements of
uniformly distributed with
that
·30
x~
-~
are
C. V. = .25 , then from Table 3.1 we have
In practice, many independent variables used in
regression analysis in economic studies have C.V. between .20 to ,30.
When we are dealing with an independent variable whose C.V. is not
close to .25, the restriction (c) can be revised accordingly by using
the relation given in Table 3.l.
Further designing of the
X matrix
is by imposing restriction (b) to have an equally spaced elements of
x~.
-~
Recall that a part of designing an experiment is designing the
construction of the
X matrix.
4.
COMPARISON OF EMPIRICAL BAYES TO
ORDINARY LEAST SQUARES ESTIMATORS
In this chapter, a simulation study will be presented to compare
empirical Bayes (EB) to the ordinary least squares (OLS) estimators.
The OLS estimators are chosen for comparison, since by the assumptions
3.4, particularly the
for Models A*, B*, C* and D* in Section
~l' ~2'
independence of
estimators (Theil, 1971).
pt,
work with Model
..• ,
~N
' they are the best linear unbiased
For simplicity of the study, we shall only
as described in Section 3.5.
Let us rewrite Model
Dt as
follows:
= 1,
i
where
,
1:i
of lIs and
and
E.
-l
and
x.
-1.
Q'i
,
~i
E.
-1.
are
15
X
are scalars.
are as given for Model
1
(4.1)
2, ... , N
vectors, 1
is a
Assumptions about
Dl in Section 3· 5.
Q'i
15
,
1
X
~i
Since here
vector
,
-l
T
= 15
x.
from (3.61), we can write
x.
lm
(the middle element of
and
i
1, 2, ••• , N
m
1, 2, " ., 15
(4.2)
for
x. ) .
-l
In addition, for ease of programming, the random variables
and
~
2
c,
are chosen to be independent, although according to
the assumptions for model D*, it does not necessarily require so.
generation of random variables
c, Q' , ~
and
The
~2 will be described
,
50
in later sections of this chapter and several cases concerning the
form of the prior density
g(c,a,~,~
In each case, the whole set of
(experiments) will be replicated
r
2
)
will be considered.
N regression equations
= 100
times and the values of
are varied to be 2, 3, 4, 5, 10, 15, 20, 30 and 40.
replication we generate at random
N values of
N
Note that in each
(c,a,~,~
2
) ,
so that
in general, it will be different from replication to replication.
4.1
Objectives Of The Study
Bayes estimators for a mUltiple linear regression model as seen
in Section 3.3 are dependent on the prior density.
Since EB
estimators as given in Section 3.4 are actually two-stage Bayes
estimators, therefore, they will indirectly depend on the true form of
the prior density, although the knowledge of the exact form of this
prior density is by-passed in their derivation.
Therefore, we may
suspect that the magnitude of improvement for EB over OLS estimators,
if there is any, will somewhat depend on the true form of the prior
density.
This study will attempt to show that no matter what form the true
prior density was, an improvement for EB over OLS estimators will
always be gained,
~.~.,
EB estimators are at least as good as the OLS
and in some cases they are much better in the mean squared error sense.
The ratio of the average squared error, denoted by
R
ABE
ASE
EB estimator
OLS estimator
(4.3)
51
will be used as a measure of the improvement for EB over OLB
•
estimators; the smaller
of
~
2
~
~2
(a.,b.,s.), (~.,~.,~.) as the 018 and EB estimators for
Denote
2
(~i'~i'~i)
the larger improvement gained.
R
l
l
l
l
l
l
of the model given in (4.1), respectively.
~.,~.,~~
(i = 1, 2, ... , N)
l
l
l
2
The true values
are known by generation of random
2
~
~
~2
variables
~,~,~
computed.
The average squared errors for EB and 018 estimators re-
, while
(a.,b.,s.)
l
l
(~.,~.,~.)
and
l
l
l
l
can be
quired for obtaining (4.3) are computed as follows:
A8E(a)
(4.4)
r
~
N
l
= 1 Z [-N Z (~. - ~l' )2 Jk
r k=l
and similar expressions for
average squared error for
and
~
~
i=l
~
2
Hence, the ratio of the
is given by
A
A
ASE~~
R~ = A8E a
wi th similar expressions for
l
R~
and
j
(4.5)
R
~
It is also an objective of this study to investigate the relation
between
number of
N and
R.
In practice we are usually dealing with small
N, say around 10, because if
N is very large, it is im-
practical to report individual estimates for regression parameters.
In this case (if
N is large), estimates for the means and variances
of regression parameters, for example as given by 8wamy (1970), would
be desirable.
We can expect that the larger
since the larger
N, the smaller
N the better the estimation of the required
multivariate densities to obtain EB estimators.
R,
52
4.2
•
Generation of Random Variables
c, a, ~
~2
and
c, a ,
For ease of programming, the random variables
~
and
~
2
are taken to be independent, although the assumption for the model
does not necessarily require so.
density of
c, a ,
and
~
individual prior densities,
~
2
This means that the prior joint
can be expressed as the products of the
~.~.,
Several forms of prior densities such as normal, U-shaped and L-shaped
with given mean and variance will be used for the generation of
values of
Table 4.1
c, a, ~
and
~2, as described in Table 4.1.
Several forms of prior densities of
a
studied
c, a, ~
and
~2
2 b
g4(~ )
Cases
gl(C)
g2(a)
g3 (~)
1
2
3
4
5
6
7
8
9
N(10,9)
L(10,9)
US(10,9)
N(10,144)
N(10,9)
N(0,9)
N(-10,9)
N(lO, 9)
N(10,9)
N(10,9)
L(10,9)
US (10, 9)
N(10,9)
N(10,144)
N(10,9)
N(10,9)
N(0,9)
D(lO,O)
N(5, 2. 25)
L(5,2.25)
US(5,2.~5)
N(5,2.25)
N(5,36)
N(5,2.25)
N(5,2.25)
N(0,2.25)
D(5,0)
N(50,225)
L(50,225)
US (50, 225)
N(50,225)
N(50, 225)
N(50,225)
N(50,225)
N(50,225)
N(50,225)
~=Normal, L=L-shaped, US=U-shaped, D=degenerate, the numbers within
the bracket indicate the mean and variance, respectively.
bIn the case of g4(~2) is normal, since ~2 > 0 , the prior density is
taken to be
.000008 + N(50,225) ,
=[
o
for
o<
otherwise
~
2
< 100
53
The L-shaped and U-shaped distributed random variables are
derived by linear transformation of random variables
W and
Z,
whose distributions are as follows:
1
(W_33 )2
few) := (
o
,
for
,
otherwise
(4.6)
and
1
z
2
,
1
-(1.5)3 < z < (1.5)3
for
f(z) := (
o ,
Note that
respectively.
otherwise
W and
Z
are L-shaped and U-shaped random variables,
By the following linear transformations of
we can obtain an L-shaped random variable
standard deviation
and a U-shaped random variable
cry
/.LV ' standard deviation
y :=
with mean
y
W and
Z,
/.L y ,
V with mean
cr '
V
4(3
- t.
(4.8)
\(5)cryW + /.L y - crY,fr
1
5
~(3 - b\[5 )cryZ
V
4.3
+ /.LV •
Computations Of The 018 And EB Estimates
By the generation of random variables
case described in Section
.•. , N).
From
we can obtain
c. , the elements
~
can be derived by using
error component
4.2,
€.
-1
,
(4.2),
since
€.
-1
x. (m:=
1m
and also from
c,
Of,
f3
and
cr
2
2
for each
ci ,Of i ,f3 ,cri (i := 1,
i
2,
1, 2, .'0' 15) of x.
~
cr~,
we can generate the
1
is distributed
N(o,cr~I).
-1
01
a.
1
[~i] , denoted by
The 018 estimate for
[b~J, by using (3.11)
is computed as follows:
a.
[b.~J
(4.10)
~
Further, (4.10) can be rewritten as
1
15
2
15
a i = OI i + n-( E xi
E
i m=l
m m=l
=~. +
b.
1
1
1
--D (15
15
E
15
E x.
i
m=l
15
(4. 11)
E x.
E x. E. )
im - m=l
1m m=l ~m ~m
E.
1m 1m
-
15
E x.
m=l
15
E
1m m=l
(4.12)
E. )
1m
where
15
D. = 15
2
15
E x. - (E x. )
m=l 1m
m=l 1m
1
2
~~1 from (3.12) is given by
The 018 estimate for
2 1 1
s. = -T (y. - X.b. )/(V. - X.b.) = -
-p
~
where
1
Since we have
2
1
1-1
~1
(X~X.
flx!~
1 ~
M. = I - X.
~
1
T-p
1-~
and I
is a
15
2
15
E x.
m=l
TXT
identity matrix.
] , (4.13) can be expressed as
= 15, Xi = [l,x.
- -1
T
1
15
2
15
15
E x.
E.
s. = 13 C E E. - D--[ E x. (E E. )
1
m=l lm
1. m= 1 ~m m= 1 1m
- 2
(4.13 )
E!M.E.
-~ ~-~
15
E
1m m=l
E.
1m m=l
1m 1m
2
15
2
+ 15( E x. E.
) }]
m=l
where, as before,
D
i
=
15
15 2
E x.
m=l
~m
15
2
(E x. )
m=l ~m
~m
1m
.
(4.14 )
55
EB estimates for
~.,
1
2
and
~.
1
are computed, by applying
~.
1
(3.65) and (3.66) to model in (4.1), as follows:
15
- f N 2 ll(c.,a.,b.,s.) Z x. J
"
1 1 1 1 m=l 1m
~
~1' = b. + lID f
(4.15)
~
~
b S )[- f N,1 ,11(c.,a.,b.,S.)
Z x.
. N, 13 ( c.,
1 1 1 1 m=l
1m
1
1 a.,
1 1., 1.
1
+ 15 f N 2 11(c.,a.,b.,S.)J
"
1 111
(4.16)
?2
S.f 11(c.,a.,b.,S.)
~. = 1 N,
1 1 1 1)
1
Ilf , 13(c.,a.,b.,s.
1 l 1 1
N
(4.17)
where
S.
1
= Us.12 ,
15 2
15
2
D. = 15 Z x. - (Z x. )
m=l 1m
m=l 1m
1
and
f N, 13(c.,a.,b.,S.),
1 1 1 1
f N, 11(c.,a.,b.,S.)
1 111 ,
f N" 1 11(c.,a.,b.,S.),
1
1
1 1
f N,2,11(c.,a.,b.,S.)
1 1 1 1
are as defined below, by applying (3.62), (3.63) and (3.64) on the
model in (4.1).
Let
1
V
N
2
. 1
1=
1
1
N
-N[ Z c. - -N( Z c.)
. 1
1=
1
2
l
J
56
1
6
i1
=N
10
2
si
-
15
r:
D.~ m=1
x
2
im
S* - 13
i - 15 Si '
Then,
f N, 1) (c i , ai' bi' Si)
= ii
t
+
c. -c
sine
C
~6
a. -a
k]sin[
b -b
~1
S._S~)2
f
1
11(c.,a.,b.,S.)
= -K.
~
~
~
~
N,
~
S
-s
0
~6.k]sin[ ~6 k]sin[~]
sine ~ ~l
O
S.-Sf
[~
'4
~J
i2
'4
2
(4.18 )
57
(4.19)
2
f
N"
8.) =
1 lie c . , a. , b . ,
l
l
J. J.
1
K l:
i vi 1
N
+
r:
k=1
~i
a.-a k +5· 1
b.-b
k
Sin[~Jsin[ l25. l JSin[~J
~
c.-ck
II
c.-ck
.
l
[26 J
J.2
a.-a
l k +5·l 1
[ 2o
J
i1
b.-b
l k
[--25----]
i2
2
8._8*0
sin[F]
8.-8*
[l k J
'4
1
- --l:--[f
N'
V
il
11(c.,a.,b.,8.)J
l
J.
J.
J.
(4.20)
f
N"
2 11(c.,a.,b.,8.)
J. J. J. J.
=K
1
l:
i V i2
8'-8~02
sin[.5]sin[ ~ l]
(
8.
[.5J
-S~
[~ l]
58
(4.21)
a.,
In programming the simulation,
~
2?
~
s., ex., t3;.....
b.,
~
~
are
~
computed by (4.11), (4.12), (4.14), (4.15), (4.16) and (4.17),
respectively, so that we do not have to generate the values of
y ..
-~
4.4 Results
The ratios of the average squared errors of EB to OLS estimators,
denoted by
R,
ex
RR
.....
and R0- , as described in Section 4.1 and 4.2 were
first computed for cases 1, 2, and 3 (see Table 4.1). The results are
given be low in Tab le 4.2.
Table 4.2
Ratio of the average squared error of EB to 013
estimators for cases 1, 2 and 3 a
Case 1
N
2
3
4
5
10
15
20
30
40
R
R
·943
·905
.867
.832
·719
.674
.628
.587
·553
·913
.895
.862
.803
.664
.634
.600
.540
·513
ex
t3
Case 2
R
0-
1.'488
1.428
1.359
1.318
1. 222
1.147
1.107
1. 043
1.013
R
ex
·939
.884
.846
.817
·728
.681
.642
.584
·557
Case 3
R
R
t3
·925
.872
.826
.808
·705
.660
.621
·569
.544
0-
1.430
1. 313
1. 277
1. 265
1. 220
1.183
10139
10075
1. 032
aCases 1, 2, and 3 correspond to Table 4.1.
R
ex
·953
.867
.861
.826
·729
.678
.625
·577
·546
R
t3
·944
.881
.860
.819
·721
.674
.623
·575
.547
R
0-
1.636
1.355
1.318
1.306
1.122
1. 097
1. 069
1.026
·976
59
From Table 4.2, we see that the ratios
smaller than 1 and are always decreasing as
means that EB estimators for
a
and
~
and
R
a
R~
are all
N becomes larger.
This
for cases 1, 2, and 3 are
better than the OL6 in the average squared error sense.
More
improvements are gained for EB over OL8 estimators, the larger the
number of experiments
a
and
~
(N).
are notable, for
The improvements for EB estimators for
N
= 10 or larger the gain
is more than
Recall that cases 1, 2, and 3 (see Table 4.1) have different shape
of prior densities, but their respective means and variances are the
same.
It seems that the shape of prior densities of
does not affect much the ratios of
R ,
and
a
~
respective means and variances are the same.
Note from Table 4.2, however, the ratio
than 1, except in case 3 for
that EB estimator for
the ratio
R(J'
(J'2
N
R(J'
R(J'
c,
,
a,
and
~
given their
is always greater
= 40 where R(J' = .976 , which means
is worse than the OL8 for
is also decreasing as
N ~ 40 , but,
N becomes larger.
What
so that the
follows is an attempt to improve the EB estimator for
improvement will be gained for small values of
N, say for
N around
10 or even less.
The high ratio of
R(J'
in Table 4.2 (for
caused by the large variance of
~2
(J'
, or both.
~2
(J'
N ~ 40 ) might be
, or, the large marginal bias of
An estimate for the marginal bias for
~2
(J'
, if
is known, is computed as
(4.22)
60
Using (4.22) with
r = 100 , the marginal biases of
~2
cr
for
cases 1, 2 and 3 were computed and the results are given in Table 4.3
below.
Table 4.3
Estimates for the marginal bias of
and 3
~2
cr
for cases 1, 2
N
Case 1
Case 2
Case 3
2
3
4
5
10
15
20
30
40
6.10
7.4 4
8·50
7·60
9·11
9·12
B.50
8·52
7·92
5·44
6.87
6.24
7·58
7·67
8.25
8.15
7·99
7·29
5·43
6.66
6·50
6.29
4·94
6·77
7· 03
6.31
5·72
The results in Table 4.3 reveal to us that the direction of the
marginal bias for
N
~
40 is positive.
Therefore, correcting
~2
cr.
l
of
(3.66) by a constant positive mUltiple factor of less than one will
simultaneously reduce the variance as well as the marginal bias of
~2
cr
, for
N
~
40 .
Several positive correction factors less than one were examined
in the study, namely, .95, .90, .85, .80 and ·75·
selected since estimates of the marginal bias of
around 10% to 15% of
1
r
N
~2
i=l
l
-- E [E cr. J
Nr k=l
k
These numbers were
~2
cr
in Table 4.3 are
61
~2
By applying the above correction factors for
cr
, the Ratio
R
cr
was recomputed for cases 1, 2 and 3 and the results are given in Table
4.4.
Observing the values of
R
cr
of .80 gives a good result where
~2
From now on, .80 cri
p
= 15
R is less than one for
cr
~2
(4.1), instead of cr.
~
that this correction factor of
T
4.4, a correction factor
will be used as an EB estimator for
simulation of model
where
in Table
~2
cr
as given in
N
2
~
3 .
cr
i
for the
(4.17).
Note
is recommended only for model
(4.1),
(the number of observations in the ith experiment) and
= 2 (the number of independent variables including the dummy
variable, which is the same as the number of regression coefficients
including the intercept).
Remarks:
~2
Note that the correction factor of
cr
of .80 (Table
4.4) was chosen since it gave the smallest ratio Rcr (among all
correction factors considered) for cases 1, 2 and 3. However, this
2
~2
corrected EB estimator for cr , i'~"
.80 cr , seems to give a downward marginal bias as a trade off for small variance.
.80 ~2
cr
for cases
marginal biases of
Reca11
.
ag~n,
th a t th e Bayes es t·~mat or f or
2
cr
2(c.,b.,S.)
~ -~
~
T-pf T (ci,b.,S.)
~
~
N, -p
2 , denoted by
Acr2,
cr
~2
cr
(EB
, which is a two-stage Bayes estimator) must be due
to the estimation of the multivariate densities
f
3 are given in Table
Therefore, the marginal bias for
is marginally unbiased.
estimator for
1, 2 and
Estimates of
of
and
f
T-p
(c.,b.,S.)
~
-~
~
and
(3.62) and (3.63) whose estimates are denoted by
f
T 2(c.,b.,S.).
~ -~
~
N, -p-
These multivariate
densities and their estimates, as indicated by the subscript of
depend on the degrees of freedom of
S, i.~.,
f,
(T-p), therefore, we
62
Table 4.4
N
The ratio R for cases 1, 2 and 3 after applying
~
~2
several correction factors (C. F.) for ~
C. F. =1. 00
C.F~=·95
C.F·=·90
C.F.=.85
c.F.=.80
C.F·=·75
1.091
·987
.871
.848
·761
·717
.692
.655
.642
1. 045
·943
.814
.801
' ·709
.668
.653
.627
.620
1. 041
·951
.809
.810
·707
.668
.664
.649
.649
1. 035
.898
.874
.850
.800
·751
·710
.677
.662
·979
.839
• 8~7
·791
·748
.697
.661
.639
.631
.961
.821
.822
·774
·739
.688
.660
.647
.647
1.286
·937
.899
.885
.807
·741
·701
.675
.657
1. 250
.899
.862
.842
·798
·718
.675
.656
.646
1. 254
0911
.876
.849
.837
.742
.698
.686
.685
Case 1
2
3
4
5
10
15
20
30
40
1. 488
1.428
1.359
1.318
1.222
1.147
1.107
1. 043
1. 013
1·313
1. 229
1.144 .
1.106
1.018
·956
·919
.863
.839
1.181
1.082
·981
·950
.864
.812
·781
.734
·715
Case 2
2
3
4
5
10
15
20
30
40
1. 430
1.312
1.277
1.265
1. 220
1.183
1.139
1.075
1. 032
1. 261
1.134
1.093
1. 093
1.036
·994
·949
.895
.862
1.129
·996
.962
·951
.896
.850
.806
·763
·739
Case 3
2
3
4
5
10
15
20
30
40
1.639
1.355
1.318
1.306
1.122
1. 097
1.069
1. 026
·976
1.479
1.165
1.127
1.117
·969
·931
.898
.860
.821
1.362
1. 025
.988
.976
.864
.813
·775
.743
·715
63
Estimates for the marginal bias of
2 and 3
Table 4.5
N
2
3
4
5
10
15
20
30
40
~2
(j
for cases 1,
Case 1
Case 2
Case 3
-5·12
-4.05
-3.20
-3·92
-2·71
-2·71
-3·20
-3.18
-3.66
-5.64
-4.51
-5·01
-3.94
-3.86
-3.40
-3.48
-3.60
-4.17
-9·66
-4.()8
-4.80
-4.97
-6.05
-4.58
-4.38
-4·95
-5·42
simulation for case 1 by varying
~2
(T-p).
A
(T-p) -- in this case by varying
T
can expect that the marginal bias of
only and fixed
.80
(j
also depends on
p = 2 -- was conducted with N = 10
Computing the average of
~2
(j
(j
1
r = 200 •
as
A
:2
and
N
r
= -- t
[t
~2
(j]
Nr k=l i=l i k
and the estimate of the marginal bias of
~2
(j
as given in (4.22), we
obtain the results as presented in Table 4.6.
From colums (1) and (3) of Table 4.6, we see that the marginal
bias of
~2
(j
is inversely related to the number of degrees of freedom
(T-p) , which suggests us to regress column:(3) on the inverse of
column (1).
Regressing the marginal bias of
A
....
1 -2
(j
T-p
gives the following relation:
~2
(j
on
64
Table 4.6
The average of EB estimates for ~2 (uncorrected) and
estimates of its marginal bias for case 1, where p = 2
and N = 10
(1)
(2)
(3 )
(4)
T-p
-2
~
Bias (~2)
1.77 -2
--~
T-p
7
9
11
65· 85
62.01
62.59
58.10
55.46
54.01
54.31
55·83
55.44
15· 85
12.01
12·59
8.10
5.46
4.01
4.31
5.83
5.44
16.66
12.19
10.07
7·91
6.55
5.63
...
A-
13
15
17
19
21
23
A
A-
5.06
4·71
4.27
AA-
1·77 ~2
T-p
The values of
For removing the
m~rginal bias
~2
EB estimator for
~
~2
Note that for
of
~2,
by relation (4.23), the
T-E-l.77 , i.~., the corrected
T-p
is given by T-~-1.77 &2 .
correction factor for
comes
are given in column (4) of Table 4.6.
should be
-p
T
=
15
and p
=
2 , this correction factor be-
.86. The discrepancy of the new correction factor (.86) to the
old one (.80) can be explained as follows.
The correction factor of
.86 will almost remove the marginal bias but give a higher variance of
EB estimator for
~2 compared to that with correction factor of .80.
In general, the correction factor
~2
~
such that the
margin~l
an EB estimator for
T-;p-l·77
T-p
can be used for correcting
bias is mostly removed, however to obtain
~2 with a lower ratio R , the correction factor
must be slightly smaller.
~
End of remarks.
Applying the correction factQr of .80 for
~2
~
, the ratio of the
average squared errQr of EB to 018 estimators, denoted by R ,
a
R~
and
R~,
were computed for cases
~,
5, 6, 7, 8 and 9 (See Table 4.1) as
outlined in Section 4.1 and 4.2 and the results are given in Table
4·7·
Table 4.7
The ratio of the average squared error of EB to OLS
estimators for cases 4, 5, 6, 7, 8 and 9a
Case 4
01
~
·952
·920
.888
.863
·730
.679
.621
.577
.537
·707
.607
·703
.508
.453
.435
.435
.316
.314
N
R
2
3
4
5
10
15
20
30
40
R
~
1.040
·950
.820
.807
·721
.680
.664
.637
.630
·958
·921
.851
.807
·743
.691
.647
.604
·570
10
15
20
30
40
01
~
.978
·979
·971
·975
·943
·927
.918
.896
.885
·958
·964
.964
·957
·953
·928
·930
·903
.894
R
R
~
1.075
.986
.864
.861
·784
.749
·736
·708
.699
Case 8
Case 7
2
3
4
5
Case 6
Case 5
1. 043
·920
.806
.766
·750
·709
.637
.580
.543
•!t26
·946
.804
·793
·705
.671
.658
.633
.626
·943
·905
.867
.832
·719
.674
.628
.587
.553
01
~
.898
.826
·764
·715
·557
.499
.455
.403
.375
.683
·590
.544
.473
.311
.272
.094
.038
.033
R
R
~
1. 042
·926
.800
·785
.696
.656
.643
.616
.614
Case 9
·913
.895
.862
.803
.664
.634
.600
.540
.513
1. 045
·943
.814
.801
·709
.668
.653
.627
.620
.871
·775
·705
.649
·527
.471
.428
.384
.359
.872
·768
·703
.633
.489
.447
.412
.352
.340
1.024
·922
·785
·773
.680
.644
.632
.607
.604
aSee Table 4.1 for the description of cases 4, 5, 6, 7, 8 and 9·
Comparing the values of R
OI
,
R~
from Table 4.2 wi th
R~
taken from Table 4.4 (C.F. = .80) for cases 1, 2 and 3 to those values
of
R
01,
FL
~
and
R
~
in Table 4.7 for cases 4, 5, 6, 7, 8 and 9,
several results on the next page can be observed.
66
(a)
The shape of the prior densities of
and
~
~
2
R ,
a
does not affect much the ratios
and
R~,
~
c, a,
given their respective means and
variances are the same (comparisons of cases 1,
2 and 3.
That is why the shapes of prior
densities for other cases were, then, taken to
be normal).
(b)
The higher the variability of c -- which represents the
X matrix
better EB estimator for
affect much the ratios
mean of
a,
c
2
~,~
among experiments, the
~
R
a
but it does not
and
R , given the
~
and the respective prior densities of
are the same (comparison of case 1 to
case 4).
(c)
The smaller the absolute value of the mean of
c--
which represents the absolute magnitude of the
X
matrix -- , the better EB estimators for
expecially for
ratio
R~,
~,but
a
and
it does not affect much the
given the variance of
c
and the
respective prior densities for
same (comparisons of cases 1, 6 and 7).
are the
The
reason for this is that when the magnitude of the
elements of the
X matrix is small, the magnitude
of the elements of
(X/x)-l
will be large, which in
turn makes the variance of an L8 estimator large.
That is why the ratio of the average squared
error of EB to that of an OLS (which reflects
its variance, since OLS estimators are unbiased) estimators becomes smaller.
(d)
The ratios
and
R
CT
seem to be un-
affected by the change of the prior means of
and
,
given the variances of
01
the respective prior densities for
c
01
~
,
~
and
and
CT
2
are the same (comparison of case 1 to case 8 ).
{~)
The smaller the variances of
better EB estimators for
01
01
and
and
~
respective prior densities of
the same.
c
~.~.,
,
and
,
the
given the
CT
2
are
The best EB estimators will be
obtained when the variances of
zero,
~
when
Of
and
~
01
and
are
~
are actually fixed
(comparisons of cases 1, 5 and 9).
Results of the simulation for the generated values of
(represented the
2
(OI,~,CT)
X
mat~ix),
the generated parameter values of
with their respective 018 and EB estimates, taken from one
replication of cases 1 through 9 for the special case of
T
= 15
c
, are presented in the Appendix.
N
= 10
,
Since the true parameter
values are known, we can see from the Appendix the direction as well
as the magnitude of improvement of each of EB over its corresponding
OLS estimate.
68
5.
APPLICATIONS TO SOME PROBLEMS
IN ECONOMETRICS
In economic studies, the nature of experiments described by Models
A*, B*, C* and D* of Section
3.4 might be relationships of micro-unit
variables such as firms, farms, households, etc., observed over a
period of time, or it might be relationships of regions with
observations over a time series or over several sub-units in each
region.
The estimate of the U.S. consumption function (a distributed lag
model), for example, as obtained by Griliches et al.
Zellner and Geisel
(1962) or by
(1970) might still be improved if we incorporate
the data from several other countries.
Similarly, if the constancy
of regression coefficients over a long time period is doubtful, we
might break down this series into several sub-series and assume that
the regression coefficients are constant within a sub-series but they
are random across sub-series.
By assumptions as described in Section
3.4, empirical Bayes estimates for regression coefficients in each
sub-series will be better than ordinary least squares.
Two examples will be given on the next pages to illustrate the
applications of empirical Bayes approach in economic studies.
5.1 Investment Analysis
Grunfeld
(1958) constructed a multiple linear regression model to
study the determinants of corporate investments.
variable I, the independent variables
model are described as follows:
F_
l
and
The dependent
C_
l
used in the
I
~
gross investment
additions to plant and
~
equipment plus maintenance and repairs in
millions of dollars, deflated by
F_
l
~
value of the firm
= price
l
;
of common and
preferred shares at December
price of December
P
31 (or average
31 and January 31 of the
following year) times number of common and
preferred shares outstanding plus total book
value of debt at December
dollars deflated by
C_
l
31 in million
P , lagged one year;
2
= the stock of plant and equipment =
accumulated sum of net additions to plant and
equipment deflated by
P
allowance deflated by
P
l
3
minus depreciation
, lagged one year;
where
P
l
= implicit
price deflator of producers durable
equipment (base
P
2
1947);
= implicit price deflator of G.N.P. (base 1947);
P3. = depreciation expense deflator = ten years
moving average of wholesale price index of
metals and metal products (base
1947).
The description of the above variables and the time series data
(1935-54) were also given in an article by Boot and de Wit (1960).
70
By assuming Model D* of Section 3.4 for these investment data,
where we have
= 10,
N
20
T
and p
= 3 , EB estimates for
regression coefficients and error variance are computed by using
(3.56) and (3.57).
The results, as well as the OLS estimates, are
presented in Table 5.1 below.
Table 5.1
Corporation
1
2
3
4
5
6
7
8
9
10
OLS and EB estimates for regression coefficients and
error variances for the ten corporations a
OLS estimates
Inter- Coeff. Coeff.
cept.
of F_ of C_
l
l
-149.78
-- 49·20
9·96
- 6.19
-
22·71
8.69
4.50
0·51
7·72
0.16
.1193
.1749
.0266
.0779
.1624
.1315
.0875
.0529
.0753
.0046
.3714
.3896
.1517
.3157
.0031
.0854
.1238
.0924
.0821
.4347
s
2
8423.87
9299.60
777·45
176.32
82.17
65.33
88.67
104.31
82.83
1.18
Intercept
EBestimates
Coeff. Coeff.
of F_ l of C_ l
.1205
.1638
.0310
.0786
22.26 .1646
- 7·25 .1172
- 3·01 .0854
- 3.17 .0579
- 7.94 .0725
0.03 .0071
-157·28
- 16.81
- 15.60
- 6.40
.3670
.3572
.1453
.3128
.0028
.1291
.1198
.0829
.0859
.4292
.. 2
cr
8311.60
9134·70
777·32
176.32
82.17
65.33
88.69
104.33
82.84
1.18
~he ten corporations are: (1) General Motors, (2) U.S. Steel,
(3) General Electric, (4) Chrysler, (5) Atlantic Refineries,
(6) I.B.M., (7) Union Oil, (8) Westinghouse, (9) Goodyear, and
(10) Diamond Match.
On the other hand, there is another different approach due to
Zellner (1962) for analyzing these data of ten corporations, based on
the following model:
y.
-].
where
= X.".
].-].
+
E.
-].
,
i
= 1,
2, ••. , N
(5·1)
71
(a)
~i (i = 1, 2,
(b)
x.1 (i = 1, 2,
... , N)
... , N)
is fixed;
is non-stochastic
with full rank;
(c)
E (e. )
-J.
= -0 for i = 1, 2,
... ,
N
,
i, j
= 1, 2,
E(e.en
=
l-J
eJ •• 1
lJ
for all
and
... , N .
Applying Zellner's two-stage least squares on the above data from
ten corporations, we obtain estimates for regression coefficients as
given in Table 5.2 below.
Table 5.2
Zellner's two-stage LS estimates for regression coefficients for the ten corporations
Corporations
Intercept
1.
2.
3.
4.
5·
6.
7·
8.
9·
10.
-133·00
- 18.60
- 11.20
2.45
26.50
5.56
9.67
4.11
2.58
2.20
G.M.
U.S. Steel
G.E.
Chrysler
Atl. Ref.
I. B.M.
U. Oil
West.
G. Year
D. Match.
Coefficient
of F_
l
.1130
.1700
.0332
.0672
·1310
.1310
.1120
.0525
.0760
-.0181
Coefficient
of C_
l
.3860
.3200
.1240
.3060
.0102
.0571
.1280
.0412
.0641
.3650
Zellner (1963) also claimed that only if the absolute value of
the true correlation coefficient of error components from two
different experiments,
~.~.,
72
is in the neighborhood of zero and/or
(T-2p)
is small, then the OLS
estimators are slightly more efficient than the two-stage least squares;
more efficiency of the two-stage LS will be gained the higher
and/or
(T-2p).
XiX.
2 J
most when
Ipl
The efficiency over the OLS estimates will be gained
= ~0
for all
i
Fj
.
EB estimators, therefore, are not comparable to Zellner's twostage 18 since they are based on different assumptions about error
components; EB estimators are superior over the 018 in the case where
p =0
and
N (the number of experiments) is large enough
while Zellner's two-stage LS does not depend on
efficient than the 018 when
(~5),
N and it is more
p is not close to zero.
At this point,
however, a study can be conducted to investigate the robustness of EB
estimators when
p
F0
and compare them to Zellner's two-stage least
squares estimators.
5.2
Elasticities of Substitution
018 estimates for elasticities of substitution in the demands of
the selected importers between flue-cured from the United States and
tobacco from other sources were given by Capel (1966) using annual
data for 1955-1964.
He constructed the following linearized model.
where
A,B
= quantities
of United States flue-cured and other
tobacco imported by a country in a particular
year, respectively;
73
Pa'Pb
= their
prices;
A
= elasticity
k
= intercept;
fJ.
= random
of substitution between A and B;
error.
By assuming Model D* of Section 3.4, where we have
T = 10
and
N=9 ,
p = 2 , EB estimates for the elasticities of substitution
for the nine countries are presented in Table 5.3, along with their
OLS estimates.
Table 5.3
018 and EB estimates for elasticities of substitution in
demand between flue-cured tobacco from the United
States and tobacco from other countries a
i~ort
Importing
Countries
018 estimates
Uni ted Kingdom
West Germany
Netherlands
Belgium
Japan
Egypt
Ireland
Denmark
Sweden
~he data were taken from Capel
-2.47
-3.57
-1.47
-5·15
-0.81
-2.19
-0·99
-1.04
-0.26
EB estimates
-2.463
-3.581
-l. 588
-4.257
-0.825
-1.889
-l. 002
-1.805
-0• .l38
(1966), which sources are: Egypt:
Uni ted Arab Republic tobacco report, unpublished report, Foreign
Agricultural Service, Cairo, Egypt, 1965; other countries: United
Nations (1956-1965).
6.
SUMMARY, CONCLUSIONS AND RECOMMENDATIONS
6. 1 Summary
Consider the following mUltiple linear regression model:
E.
f3.
+ -J.
-J.
'D<p p<l
'lXl
X.
J.
,
i
= 1,
2, ••• , N
where
(a) .§.l' .§.2'
..., .§.N
are independent and
2
is distributed as N(O, cr. I)
-
(b)
2
f3 i ,cr.
-
J.
E.
-J.
J.
are fixed parameters and
X.J.
is a
non-stochastic full rank matrix.
The ordinary least squares estimator for
f3. , denoted by
-J.
is the best linear unbiased and
2
1
s.J. = T-p (y.J. - X.J.-J.
b. ) I (;Z. - X. b. )
J.
J.-J.
is the minimum variance unbiased estimator for
By assuming that the coefficient vector
residuals
2
cr
or the
2
cr. •
J.
~,
or the variance of
X matrix or some combinations of them to be
random across units, but fixed within units, four variants of mUltiple
linear regression models were constructed.
When the
~
priori
distribution is assumed known, the Bayes estimation for each model was
75
presented.
~
However, when no assumption is made about the form of the
priori distribution, a two-stage Bayes estimator
which is called
the empirical Bayes estimator -- was developed for each model.
A comparison of empirical Bayes to the ordinary least squares was
performed by means of a simulation study using a mUltiple linear
regression model; and,- finally, some applications of the method in
econometrics were pointed out.
It was found that for all cases stUdied, the empirical Bayes
estimators are always better than the ordinary least squares, in the
sense that their corresponding average squared errors are smaller, no
matter what was the true form of the unknown joint prior density of
x,
~
6.2
and
0-
2
Conclusions and Recommendations
Relationships, in this case multiple linear regression, of micro-
unit variables such as firms, farms, households, regions, etc., might
be considered as a random sample from a larger popUlation.
the case, the independent variables and
~arameters
If this is
can be treated as
random variables across micro-units, but they are fixed as a
realization of a random variable over
T
successive observations with-
in a micro-unit.
Empirical Bayes estimators for
~
and
2
0-
,
for the simple
linear regression, are always better than the 018 in the sense that
their average squared errors are smaller.
The ratio of the average
squared error for EB to OLS estimators is practically not affected by
different forms of the joint prior density of
X, a,
~,
and
(J'
2
given their mean and variance are the same.
More improvement for EB estimators will be gained over the OLS,
the smaller the prior variance of a
and/or
~,~.~., the closer the
random regression model to a fixed (parameter) regression model.
Similarly, the higher the variability of the
X matrix across micro-
uni ts, the more improvement gained for EB over the OLS estimators.
This is the case often found in practice, when we are dealing with
different sizes of micro-units such as firms, farms, regions,
countries, etc., where most likely the variability of independent
variables across micro-units is high.
Here, the EB estimation is
mostly recommended.
EB estimators are better than the 018, when the error components
across, as well as within, the micro-units are independent.
The
robustness of EB estimator when the error components are correlated
need to be investigated further, in this case the appropriate
comparison is to the Zellner's two-stage least squares instead of the
ordinary least squares estimators.
Different types of functions other than the one used in
(3.33) or
even different multivariate density estimators other than that of
Cacoullus might be investigated to obtain better EB estimators than
those outlined in this thesis.
Throughout this thesis, only the point estimation is discussed.
The EB interval estimation as well as testing of
hypotheses for
random regression models are recommended for future research.
77
7.
1.
REFERENCES
Boot, J.C.S. and S.M. de Wit (1960). Investment demand; an
empirical contribution. International Economic Review,
1:3-30.
,
.
2.
Cacoullus, T. (1966). Estimation of a multivariate density.
Inst. Statist. Math. (Tokyo, Japan), 18:174-183.
3.
Capel, R. E. (1966). An analysis of the export demand for
United States flue-cured tobacco. Ph.D. Thesis, Department
of Economics,
N.C. State University, Raleigh, N.C.
Univ. Microfilms, Ann Arbor, Mich.
4.
Chetty, V. K. (1968). Pooling of time series and cross section
data. Econometrica, 36:279-290.
5·
Chetty, V. K. (1971). Estimation of Solow's distributed lag
models. Econometrica, 39:99-117.
6.
Clemmer, B.A. and R. S. Krutchkoff (1968). The.,use of empirical
Bayes estimators in a linear regression model. Biometrika,
Ann.
55: 525-534.
7.
Graybill, F.A. (1961). An Introduction to Statistical Linear
Models. Vol. 1. McGraw-Hill Book Co., Inc., New York, N.Y.
8.
Griliches, z., G. S. Maddala, R. Lucas and N. Wallace (1962).
Notes on estimated aggregate quarterly consumption functions.
Econometrica, 30:491-500.
9.
Grunfeld, Y. (1958). The determinants of corporate investment.
Unpublished Ph.D. Thesis, Department of Economics, Univ. of
Chicago, Chicago, Ill. Univ. Microfilms, Ann Arbor, Mich.
10.
Jeffreys, H. (1961). Theory of Probability.
Clarendon Press, Oxford, England.
11.
Kendall, M. G. and A. Stuart (1961). The Advanced Theory of
Statistics. Vol. 2. Hafner Publishing Co., New York, N.Y.
12.
Klein, L. R. (1953). A Textbook of Econometrics.
Peterson and Co., Evanston, IL~nois.
13.
Krutchkoff, R. G. (1967). A supplementary sample non-parametric
empirical Bayes approach to some statistical decision
problems. Biometrika, 54:451-458.
Third Edition.
Row,
14.
Kuh, E. (1959). The validity of cross-sectionally estimated
behavior equations in time series applications. Econometrica,
27:197-214.
15.
Maritz, J. S. (1969). Empirical Bayes estimation for the Poisson
distribution. Biometrika, 56:349-349.
16.
Martz, H. F. and R. S. Krutchkoff (1969). Empirical Bayes
estimators in a multiple linear regression model. Biometrika, 56:367-374.
17.
Nerlove, M. (1965). Estimation and Identification of CobbDouglas Production Functions. Rand McNally and Company,
Chicago, Ill.
18.
Robbins, H. (1955). An empirical Bayes approach to statistics.
Proc. Third Berkeley Symposium on Math. Stat. and Prob.,
1:157, Berkeley, Calif.
19.
Robbins, H. (1964). The empirical Bayes approach to statistical
decision problems. Annals of Math. Stat., 35:1-20.
20.
Rutherford, J.R. and R. S. Krutchkoff (1969).
optimality of empirical Bayes estimators.
€ -asymptoti c
Biometrika,
56 :221-223.
21.
Samuel, E. (1963). An empirical Bayes approach to testing
certain parametric hypotheses. Annals of Math. Stat.
34 :1.370-1.385·
22.
Swamy, P.A.V.B. (1970). Efficient inference in a random coefficient regression model. Econometrica, 38:311-323.
23.
Theil, H. (1971). Principles of Econometrics.
Sons, Inc., New York, N. Y.
24.
Tiao, G.C. and A. Zellner (1965). Bayes' theorem and the use of
prior knowledge in regression analysis. Biometrika,
John Wiley and
51:219-230.
25.
Zellner, A. (1962). An efficient method of estimating seemingly
unrelated regressions and tests for aggregation bias. J.
Amer. Stat. Assoc., 57:348-368.
26.
Zellner, A. (1963). Estimators for seemingly unrelated
regression equations; some exact finite sample results.
Amer. Stat. Assoc., 58:977-992.
J.
79
*
27.
Zellner, A. (1966). On the aggregation problem: A new approach
to a troublesome problem. Report
6628, Center for
Mathematical Studies in Business and Economics, University
of Chicago, Chicago, Ill.
28.
Zellner, A. and M. S. Geisel (1970). Analysis of distributed
lag models with applications to consumption function
estimation. Econometrica, 38:865-888.
29.
Zellner, A. and C. J. Park (1965). Bayesian analysis of a class
distributed lag models. Econometric Annual of the Indian
Journal, 13:432-444.
80
e,
8.
APPENDIX
e
e
e
Table 8.1
Generated values of c,~,~, ~2 and their OLS and EB estimates, taken from one
replication, where N = 10, T = 15, for cases 1 through 9
OLS estimates
Experiments
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
c
~
~
2
a
~
b
EB estimates
s
2
~
ex
""
~
,,2
"~ a
2
Case 1:
c ~ N(10,9), ~ ~ N(10,9), ~ ~ N(5,2.25), ~ ~ N(50,225)
11.60
5.84
9.45
13.31
7·29
7·55
6.85
-9.67
8.40
11.37
10.41
10.83
Case 2:
c ~ L(10,9), ~ ~ L(10,9) ~ ~
10·52
9·12
6.98
10·93
10.36
10·73
8.38
8.42
14.56
18.65
9.36
12·57
7.61
8.25
12·51
9.84
7·17
6.61
13.27
7·52
8.~2
1l·50
5.54
9·41
12·93
10·52
9·36
5.89
3.39
3·90
7·47
7· 54
5·77
4.67
2.59
.3.03
6.80
2.04
4.88
7.41
4.79
7·71
6.68
5.45
4.18
3·70
9·02
7·13
68.22
69.09
46.58
33.89
36.82
48.06
58.58
36.25
51.66
26.06
58.14
69· 78
55·58
36·78
33·90
59.46
43.59
60.19
61.12
62.14
13.84
2.02
11·55
15·55
-1·57
9·22
20.03
0.44
18.61
10.88
3.05
5.54
7·15
7.34
6.86
4.42
2.04
4.10
5·55
1.70
2
L(5,2.25), ~ ~
12.02
25.67
9·98
5·03
23.18
12.22
3.54
0.22
11. 23
-3·97
4.70
6.04
4.46
7·91
5·76
5.30
4.43
4.22
9·11
7· 79
97.56
60.50
41. 95
44.84
45.28
72.41
63.35
47·54
53·90
29·77
3.94
12.36
15.36
2.35
8.53
14.63
1. 01
16.09
10·95
3.19
5.22
7.06
7·35
6.33
4.50
2.80
4.05
5.83
1.10
77·13
55·40
40.22
41.81
43.68
63.35
60.11
48·57
51.32
30.26
11. 29
24.20
9·43
4.13
22·75
12.13
4.04
1.68
12.08
-3.41
4.76
6.17
4.53
7·99
5·79
5.30
4.38
4.06
9.04
7.76
25·25
56.01
25·10
33.40
32·57
79·60
66.16
20.36
46.62
22·79
12.12
L(50,225)
2Y.·94
59.66
24.88
35.22
32.34
94.24
72.40
20.17
50.08
24.05
():)
~
e
e
e
Table
8.1. (Continued)
OLB estimares
Experiments
c
Case
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
3:
c ~
US(10,9),
11.85
6.54
13·52
13.76
13.47
11.65
9·27
6.50
7·19
12·,79
6·73
6.47
13.66
8.15
7· 53
-6.79
13· 76
13.32
6·96
6.82
Case
t3
Of
4:
8.33
-0·55
14.24
4.07
13·79
-8·99
31.10
-22.04
24.52
9·06
C N
~
Of
6.63
3.09
6.84
3.82
6.85
3.26
3.86
3.24
3·33
6.85
N(10,144),
9·90
7.66
13.67
8·07
11.10
9·33
9·20
7.86
7·37
12.16
0"
2
US(10,9),
65.25
67.75
61.68
64.12
63.67
66.52
65.96
38.18
35·72
36.25
Of
5.04
2·92
6.01
4.13
5·25
7·14
2.84
4.97
6.42
1. 83··
~
b
a
t3
~
US(5,2.25),
EB estimates
s
2
"-
"Of
50.08
71.16
45·96
36.61
64·76
65·77
50.20
52.34
76.11
37· 97
4·96
14.48
12·94
-2·70
10·55
3.32
-5·15
6.28
-0.69
18;21
5.63
6.07
6.81
5·29
6.88
3.17
4·93
6.93
1.31
~2
0"
2 ~ US
(50, 225 )
6.06
2.64
6·73
4·99
6.03
3.19
4.25
3.66
4.02
8.13
16~65
"-
t3
0"
75.36
15·23
61. 29
8.49
- 83.70
17·68
72.13
7.49
33.84
17.68
68.26
7·51
60.36
3·51
2.09
19·76
0.10
20.38
-1·75
2.48
4.24
26.69
N(10,9), ~ ~ N(5,2.25), 0"2 ~ N(50,225)
14·93
12·91
16.99
6.51
18·96
10.31
3.28
2.94
"-
39.69
60.28
34.60
36.66
94.68
72.25
29.84
24.62
149,70
30.23
4.35
9.54
11.68
0·52
6.63
0.04
-5·13
6.56
1.31
17·91
5·99
3·30
6.67
4.87
6.18
3·59
4.24
3·72
3·77
7·88
67,61
57·53
74.44
65.37
35.47
62.48
56.08
19·74
20.64
26.09
5·70
7·99
6.15
6.04
5.56
6.52
3.16
4·94
6.84
1.33
39·10
57·38
34.05
35·83
87"27
68.54
28.20
23.82
129.68
28.64
OJ
f\)
e
e
Table 8.1
e
(Continued)
OL8 estimates
Experiments
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
c
Case 5:
c ,..,. N(lO,9),
9·74
17·83
10·55
9·92
9·64
9·99
11. 50
12.69
11.67
8·74
l7·05
11.68
-3.66
27· 93
-6.21
23.83
8.00
l8·93
7·88
Case 6:
c ,..,. N(0,9),
-4.03
0.85
1. 84
-3.08
1.35
-1. 53
-3· 53
-3· 92
-1. 80
6.46
9·53
13.32
8.74
15.06
~8·98
6.34
13.87
12.49
13.30
13· 25
9·15
2
f'
a
a
(J"
Of "'"
-0.14
1. 97
10.12
18.68
13.09
3.38
loll
-0·37
5·75
2·79
N(10, 144),
74.82
70.89
59·95
29·09
60.39
22.64
25.25
44.38
41.25
62.61
ex ,.., N(1O,9),
l.19
4.64
7·29
6.05
3·00
6·30
6.35
5·45
3.14
5·01
36.24
23.20
29· 72
26.33
43.83
45.05
33·76
70.80
47·35
40.46
EB estimates
s
b
f' "'" N(5,36 ),
24.16
ll.14
13.29
1. 40
32.43
-5· 75
25·03
12.27
21+.34
4.c23
'-0.80
2.17
10.24
18.21
12·78
3.17
0.96
-0·75
5.36
3.45
t5 ,.., N(5, 2.25),
12.05
25·10
4.26
11.34
14.09
17· 98
19·36
11. 74
8.83
10.11
1. 74
-8.61
9·94
7·13
-2.64
7·77
7·45
4·94
1.65
5·ll
(J"
2
,..,
2
""Of
""-
f'
~2
(J"
N(50,225)
44.89
21·90
68.81
10.80
12.80
68.98
22.'30
1.38
56.51
32.48
-4.48
36.30
20.49
24·93
13.28
39· 75
24.28
19·48
84.88
4.78
2 ,..,. N(50,225)
(J"
-0·58
2.18
10.29
18.20
12·77
3· 05
0·96
-0.82
5.36
3.38
41. 70
63.23
63.45
21. 05
52·75
37.16
19.40
37·51
18.41
74.14
12.12
20.17
8.16
9·89
11. 73
14.45
15.68
ll.41
9·68
9·56
1. 77
-3.00
7·87
6.66
-0·95
5·54
6.45
4.86
2.11
5·19
32.03
30·98
31. 54
35.86
46.46
46.81
42.16
45.00
45·55
39.66
29·32
29·05
20·51
34.28
56.57
59·20
44.25
52.31
53.13
41. 97
OJ
\jJ
e
e
e
Table 8.1
(Continued)
018 estimates
Experiments
c
Case 7:
I
i
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
-9·05
-13· 83
~14.76
-10.85
-8.02
-14.29
-13.03
-7·25
-11. 96
-9·65
~
ex
C "'"
0-
N(-10, 9),
11.36
9·35
11.33
5.89
12.63
3.86
4.68
11.32
6.81
13·96
c "'" N(10,9),
10·95
6.17
5·24
9·16
11. 98
5·71
6·96
12·75
8.04
10.35
1.36
-0.65
1.33
-4.10
2.63
-6.14
-5.32
1.32
-3.19
3·96
66.06
48.53
32.03
47·58
43.10
55·06
62.45
32.87
40.87
38.37
ex ,.,. N(0,9),
-2.42
0.65
-1.89
-3· 09
-0·27
1.34
-2.18
-3.37
0·54
1·39
66.06
48.53
32.03
47·58
43.10
55.06
62.45
32.87
40.87
38.37
b
a
~
ex "'" N(10, 9),
2·58
5.65
3.10
1. 90
4·73
6.34
2.82
1..63
5·54
6.39
Case 8:
2
"'" N(5,2. 25),
26
5·58
9·32
18.22
13.80
0·98
5·32
21. 94
8.14
8.77
-1.
~
1. 76
5.43
3·02
3·30
5.04
6.01
2.81
2·76
5· 71
6.10
,.,. N(0,2.25),
-11. 26
-4.42
-0.68
8.22
3.80
-9·02
-4.68
11. 94
-1. 86
-1. 22
-1. 74
1.16
-1. 65
-4·74
-0.48
2.16
-2.18
-4.01
0.29
1.67
EB estimates
s
0-
""
ex
2
2
,..,
""
~
~2
0-
N(50, 225)
16.00
-1. 21
47·16
5·52
17·07
9·04
52.80
16.52
12.16
27·98
54.48
-0.07
81. 42
6.39
21. 01
35·20
6·72
26·77
43·97
5·95
2
0- ,... N(50,225)
16.00
47·16
17· 07
52.81
27· 98
54.48
81. 42
32.50
26·77
43·97
-10.25
-3.69
-2.45
6.21
2.45
-6.06
-6.05
12.16
-2·58
-1. 04
1. 77
5·43
3·00
3.15
4.85
5· 93
2.89
2.65
5.60
5·82
15· 55
43.59
17· 95
49.42
27·33
49.18
69.15
34.19
26.64
40.86
-1.83
1. 04
-1.32
-4·53
-0.38
1.65
-1. 98
-4.04
0.37
1.64
15.84
44.00
17·28
48.56
26.38
49·49
68.00
33·90
26.15
41.31
(»
+'"
-
•
Table 8.1
(Continued )
QLS estimates
Experiments
1
2
3
4
5
6
7
8
9
10
e
•
c
ex
2
t3
Case 9:
c --- N(10,9),
16.25
12.14
17.47
9·58
10.02
0.81
12·77
13.61
13.15
13.35
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
5·0
5·0
5·0
5·0
5·0
5·0
5·0
5·0
5·0
5·0
ex
=
10, t3
68.49
36.95
20·74
45·03
56.02
55· 27
36.01
51. 85
42.52
46.88
~his is a corrected EB estimate using C.F.
b
a
(J"
=
5,
cr2 ___
17· 72
3·52
7·29
12.35
12.30
10.65
8·76
8·53
9·23
3.19
EB estimates
2
""
ex
62.84
46.13
23.10
38.83
26.40
84.10
36.47
27·82
64·77
54.01
13.56
5.86
8.28
11·74
10.39
10.43
8.12
8.20
9·24
6.08
s
'"
'"t3
~2
cr
N(50, 225)
4.48
5.32
5.26
5·12
4.65
5·28
4,84
4.92
5·06
5.48
4·73
5.1lJ.
5·20
5.18
4.83
5·46
4.88
4·94
5·05
5.26
55·42
43·61
23·52
37·95
26.81
69·60
35·66
27·97
56.60
49·41
= .80, which is slightly biased downward.
co
\.Jl