Download PDF

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Renormalization group wikipedia , lookup

Pattern recognition wikipedia , lookup

Vector generalized linear model wikipedia , lookup

Computational fluid dynamics wikipedia , lookup

Inverse problem wikipedia , lookup

Mathematical optimization wikipedia , lookup

Data assimilation wikipedia , lookup

Probability box wikipedia , lookup

Generalized linear model wikipedia , lookup

Transcript
Semiparametric Estimation and Inference in Multinomial Choice Models
By
Rafic Fahs , Scott Cardell and Ron Mittelhammer1
2001 AAEA Annual Conference
Abstract
The purpose of this paper is to incorporate semiparametric alternatives to
maximum likelihood estimation and inference in the context of unordered multinomial
response data when in practice there is often insufficient information to specify the
parametric form of the function linking the observables to the unknown probabilities. We
specify the function linking the observables to the unknown probabilities using a very
general flexible class of functions belonging to the Pearson system of cumulative
distribution equations. In this setting we consider the observations as arising from a
multinomial distribution characterized by one of the CDFs in the Pearson system. Given
this situation, it is possible to utilize the concept of unbiased estimating functions (EFs),
combined with the concept of empirical likelihood (EL) to define an (empirical)
likelihood function for the parameter vector based on a nonparametric representation of
the sample’s PDF. This leads to the concept of maximum empirical likelihood (MEL)
estimation and inference, which is analogous to parametric maximum likelihood methods
in many respects.
1
Rafic Fahs is a Ph.D candidate in Agricultural Economics at Washington State University. Scott Cardell is
Adjunct Professor in Agricultural Economics at Washington State University and Ron Mittelhammer is a
Professor of Ag Economics and Statistics.
Copyright 2001 by Rafic Fahs,Scott Cardell and Ron Mittelhammer . All rights reserved. Readers may
make verbatim copies of this document for non-commercial purposes by any means, provided that this
copyright notice appears on all such copies.
1. Introduction
Consider the unordered multinomial response model where outcomes are given in
the form of an experiment consisting of N trials on J-dimensional multinomial random
variables ( y11 ,..., y1 J ) ,..., ( yN 1 ,..., y NJ ) . The variable yij, for i = 1,2,….N, exhibits a binary
outcome, where yij = 1 is observed iff the jth choice among J unordered alternatives j =
1,2,….J, is observed on the ith trial, in which case yik = 0 ∀k ≠ j . It is assumed that the
choice situation is such that the J alternatives are mutually exclusive, so that only one of
the alternatives can be chosen on trial i, and the J alternatives exhaust the choice
J
possibilities, which then implies that
∑y
ij
= 1, ∀i.
j=1
The probability that yij = 1, denoted by Pij, is then related to a set of K explanatory
variables xi. through a link function
Pij(xi.) = P(yij = 1| xi., β ) = Gj(xi., β )
(1.1)
for i = 1,2,….N and j = 1,2,….J, where β j is a (K × 1) vector of unknown parameters,
β = vec([β1 , β 2 ,...β J ]) is a column-vectorized representation of model parameters of
dimension (KJ × 1), xi.= (xi1, xi2,…., xiK) is a (1xK) row vector of explanatory variables
values, and G ( g ) : R → [0,1] may be either known or unknown, with the additional
constraint that
J
∑ G (x ,β)
j
i.
= 1, ∀i .
(1.2)
j=1
1
The explanatory variables, xi., could be allowed to change by choice alternative, but we
focus on a basic case where they do not.
Define the noise term ε ij as
ε ij ≡ yij - E[yij|xi.] = yij - G j (xi. , β )
where E[yij|xi.] = G j (xi. , β ) because of the Bernoulli marginal distribution of the yij
variable. The data sampling process relating to observed choices can then be represented
as
yij = Pij(xi.) + ε ij = G j (xi. , β ) ) + ε ij .
(1.3)
where the ε ij ’s are assumed to be independent across observations i = 1,2,….N, the ε ij ’s
are bounded between [-1,1], and E[ ε ij xi.] = 0.
When the parametric functional form of G j (xi. , β ) is known, maximum likelihood
(ML) estimation is possible, and a specific functional choice has often been
1
G j (xi. , β ) =
J
1+
∑e
xi .βk
for j = 1
k=2
e
=
x i .β j
J
1+
∑e
xi .βk
for j = 2,3,….J
(1.4)
k=2
where β1 has been normalized, without loss of generality, to a zero vector for purposes of
parameter identification. The definition in (1.4) satisfies the required properties in (1.2)
and defines the Multinomial Logit (ML) response model applied frequently in practice.
Similarly, a Multinomial Probit (MNP) model results when a multivariate normal
distribution is used in specifying the distribution of the noise component of (1.3).
2
However, when J is large, the probit model is computationally difficult except when the
number of alternative choices is restricted to 3 or less. If the distribution underlying the
likelihood specification is in fact the correct parametric family of distributions underlying
the sampling process, then the estimator is generally unique, consistent and
asymptotically normal. However, the economic theories that motivate these models
rarely justify any particular probability distribution for the noise term .
Attempts to introduce flexibility into the specification of G ( g ) have been
problematic in applications. Supposing G ( g ) is a polynomial of order d, some
trigonometric function, or some other flexible functional form, the flexibility added to the
estimation problem may introduce unbounded likelihood functions on parameter space
boundaries, multiple local maxima, and/or non-concavities that make numerical
maximization of the log likelihood function difficult or impossible.
Semi-parametric methods of estimation provide an alternative approach, such as
Ichimura (1993) who demonstrates a least squares estimate of β which requires ε ij to be
independent of xi., ruling out endogeneity and/or measurement error. Klien and Spady
(1993) developed a quasi-maximum likelihood estimator for the case in which Yij is
binary. These estimators are consistent and asymptotically normal under regularity
conditions. But these estimators share the disadvantage of being difficult to compute
because they involve nonlinear optimization problems whose objective functions are not
necessarily concave or unimodel. Using an information theoretic formulation, Golan,
Judge, and Perloff (1996) demonstrate a semiparametric estimator for the traditional
multinomial response problem that has asymptotic properties in line with parametric
3
counterparts. But a potential draw back to their method is that the Pij’s have no direct
parametric functional link to the xi.’s which makes prediction difficult or impossible.
To cope with the preceding modeling issues one can hedge against
misspecification when the form of the link function G ( g ) is unknown by giving G ( g ) a
flexible form that satisfies (1.2) and that defines a legitimate multinomial response model
globally. One might model each dichotomous decision outcome as a Bernoulli process
(marginally) and then model the whole vector outcome, y i. , as a multinomial process.
One possibility for parameterizing the probabilities in these processes is the set of CDFs
belonging to the highly flexible Pearson system of distributions, which themselves satisfy
(1.2). The criteria for identifying different members of the Pearson system of functions
can be expressed parametrically in terms of a (2x1) ξ vector of unknown parameters.
When G ( g ) is unknown, the sampling process would then be modeled as
yij = Pij (xi.) + ε ij = G j (xi. , β, ξ ) + ε ij .
(1.5)
The overall objective of this paper is to seek a semiparametric basis for
recovering β in (1.5) by utilizing the concept of unbiased estimating functions (EFs)
combined with the concept of empirical likelihood (EL) to define an empirical likelihood
function for the parameter vector. This leads to the concept of maximum empirical
likelihood estimation of the multinomial choice model. The EL shares the sampling
properties of various nonparametric methods based on resampling of the data, such as the
bootstrap. However, in contrast to the resampling methods, EL works by optimizing a
continuous concave and differentiable function having a unique global maximum, which
makes it possible to impose side constraints on the parameters that add information to the
data.
4
The paper is organized as follows. Section 2 reviews the Pearson system of
density and cumulative distribution functions. In section 3, we state the extremum
estimation problem and investigate the asymptotic properties of the estimator. In section
4, we examine EL estimation in the multinomial choice problem. In section 5, we present
some Monte Carlo results relating to the sampling properties of the estimator. Concluding
remarks summarizing the major implications of the paper are given in section 6.
2. Pearson’s System of Frequency-Curves
The probability distributions contained in the system of curves proposed by Karl
Pearson are found as the solutions of the differential equation
1 dy
a−x
=
y dx b0 + b1 x + b2 x 2
(2.1)
where y in this context is the probability density function (PDF) evaluated at x. The
motivation by Pearson (1895) of the derivation of the distributions from (2.1) is difficult,
as well as difficult to access, and so we provide a brief and more direct overview of the
portion of the derivation that is particularly relevant to the objectives of this paper.
2.1 Identifying Probability Distributions in the Family
Multiplying each side of (2.1) by yxn ( b0 + b1 x + b2 x 2 ) , and then integrating with
respect to x obtains
∫x
n
(b0 + b1 x + b2 x 2 )
dy
dx = ∫ y (a − x ) x n dx .
dx
Integrating the left hand side of (2.2) by parts, treating
(2.2)
dy
as one part, and representing
dx
the right hand integral as the sum of two functions yields
5
x n (b0 + b1 x + b2 x 2 ) y- ∫ (nx n -1b0 + (n + 1)b1 x n + (n + 2)b2 x n+1 ) ydx = ∫ ayx n dx − ∫ yx n+1 dx .
If at the ends of the range of the curve the expression
x n (b0 + b1 x + b2 x 2 ) y
r
vanishes, that is x n (b0 + b1 x + b2 x 2 )  2
x = r1
= 0 where r1 and r2 are the extremes of the range
of variation for x, we have
− (n + 1)b1µ n′ − (n + 2)b2 µ n′ +1 = a µn′ − µn′ +1
(2.3)
where µ n′ denotes the nth moment of x about the origin.
Examining the moment equation (2.3) for n = 0,1,2,3……..q respectively, we get
q+1 equations to solve for a,b0,b1,b2 in terms of the moments ( µ r′ ), r = 0,1,2,3……..q.
The solution of these simultaneous equations results in the following representation of
(2.1):
µ3′ ( µ 4′ + 3µ2′ 2 )
−x
3
2
10µ 2′ µ 4′ − 18µ2′ − 12 µ3′
1 dy
.
=
2
y dx
2µ 2′ µ 4′ − 3µ3′ − 6µ 2′
µ 2′ (4 µ2′ µ4′ − 3µ3′2 )
µ3′ ( µ 4′ + 3µ2′ 2 )
2
x+
x
+
3
2
3
2
3
2
10 µ2′ µ4′ − 18µ 2′ − 12 µ3′ 10µ 2′ µ 4′ − 18µ2′ − 12 µ3′
10µ 2′ µ 4′ − 18µ 2′ − 12µ3′
Defining ψ 3 = β1 =
2
2
µ3′ 2
µ 4′
2ψ 4 − 3ψ 3 − 6
,
ψ
β
and
ω
,
=
=
=
4
2
ψ4 + 3
µ 2′ 3
µ 2′ 2
we are led to the following expression for the parameters a, b0, b1, b2 contained in (2.1):
a=−
b0 =
ψ3
2(1 + 2ω )
ω +2
2(1 + 2ω )
b1 =
b2 =
6
ψ3
2(1 + 2ω )
ω
2(1 + 2ω )
which are valid when -2< ω <2 and ω ≠ -.5 (please see appendix for details of the
solution).
Turning now to the integration of (2.1) and the various forms of f(x) that arise, it is
useful to note that:
1. f(x) ≥ 0 over the range of values of x when the parameter adhere to the restrictions
presented above.
+∞
∫ f ( x)dx = 1 must be satisfied.
2.
−∞
3. It is sufficient to consider ψ 3 ≥ 0 since the curve identified by ψ 3 = -c is a mirror
reflection of the curve for ψ 3 = c with respect to the y-axis.
In general, there are three main types of Pearson curves and ten transition types . The
various types are designed to handle limited or unlimited supports, as well as skewed
or symmetric, bell, U, or J-shaped curves. The criteria identifying each type are
expressed in terms of the parameters ψ 3 and ω . The mathematical derivation of the
Pearson system of PDFs is given in the book ‘Frequency Curves and Correlation’ by
W.P.Elderton. But the cumulative distributions of the Pearson system are not readily
available, and are part of the contribution of this paper. The actual forms of the
Pearson curves and their CDF’s, restrictions on parameter ranges, and distribution
supports are displayed in table (2.1).
Note that in table (2.1) we make use of the variables that are defined below.
r1 =
−ψ 3 + d
−ψ 3 − d
2
, r2 =
, d = ψ 3 − 4ω (ω + 2) ,
2ω
2ω
m1 = (
(1 + ω ) ψ 3
1 + 2ω
(1 + ω ) ψ 3
1 + 2ω
)
−(
) , m2 = −(
)
−(
),
ω
ω
ω
ω
d
d
7
z = xi β j , t = -
N=
d
1 + 2ω
1 + 2ω
, M = −(
), m =
,
ω
ω
ω
4ω (ω + 2)
ψ3
-d
-ψ
1+ω ψ 3
2
, r =
, s1 =
s=
, r= 3, v=.
2ω
2ω
2ω
ω
2ω
ψ3
−d
Derivations of all tabled results are given in the appendix
8
Table 2.1. Frequency Curves and CDFs
No. of type
MAIN TYPES
I
PDF
f(x)=
CDF
( x − r1 ) m1 (r2 − x ) m2
β (m1 + 1, m 2 + 1)(r2 − r1 ) m1 + m2 +1
r1<x<r2
≤ z)=
F(z)= p(x
z − r1
r2 − r1
m1
∫v
Criterion
Remarks
ψ 3≠0,
The roots r’s are
real and opposite signs
;limited Range;
general form of
beta-dist.;
skew;when
both M’s are positive
has a bell-shaped,
when both M’s
are negative
has a U-shaped,
when M’s
are of opposite signs has
J-shaped or twisted
J-shaped.
ψ 3≠0,
Unlimited range;
Skew; bell-shaped;
No common statistical
Dist. are of type IV form;
−1<ω < 0
[ω ≠−.5,
(2 + 3ω )ψ 23
≠ 4(1+ 2ω )2 (2 +ω )]
(1 − v) m2 dv
0
β (m1 + 1, m 2 + 1)
= IncompleteBeta(m1+1, m2+1,
IV
f(x)=
z − r1
)
r2 − r1
ω f0
ψ 23 p
4(ω )(2+ω )
−m
s 2m-1
(x+r) 2 + s2 
Γ (2m-2,v)
vi
 x+r-is  2


 x+r+is 
9
Calculations of
constants and cdf’s
see page of appendix
Continuation Table 2.1.
No. of type
VI
PDF
CDF
f(x)=
( x − r1 ) m1 ( x − r2 ) m2
β (m1 + 1,−m1 − m 2 − 1)(r1 − r2 ) m1 + m2 +1
r1 < x < +∞
F(z)= p(x
≤ z)= 1
Remarks
ψ 3≠ 0,
The roots are real
and same sign;
unlimited range
in one direction
0<ω < 2 / 5
ψ 3 2 > 4ω (ω + 2),
(2 + 3ω )ψ 23 ≠
4(1+ 2ω )2 (2 +ω )]
r1 − r2
z − r2
−
Criterion
=1-incomplete
− m −m − 2
m
∫ v 1 2 (1 − v) 2 dv
0
β (m1 + 1,−m1 − m2 − 1)
βeta(m1 + 1,−m1 − m2 − 1,
r1 − r2
)
z − r2
TRANSITION
TYPES
Normal
f ( x) =
1
e
2π
− ∞ < x < +∞
− x2
2
F(z)= p(x
z
∫
−∞
≤ z)=
− x2
1
e 2 dx
2π
10
r1 < x < +∞ ;
skew;
bell-shaped if
m1>0;
J-shaped if m1<0;
this is called
beta prime dist.
ψ3 = ω = 0
Unlimited range;
symmetrical,
bell-shaped.
Calculations of
constants and cdf’s
see page of appendix
Continuation Table 2.1.
No. of type
PDF
(t − x )
β (M + 1,0.5)
2
II
f ( x) =
2 M
CDF
for z >= 0;
2 M +1
t
−t < x < t
1−
F(z)=p(x ≤ z)= 1 − 0.5
Criterion
Remarks
z2
ψ3
t2
-1<
Limited range;
a special
case of type I;
For –1< ω <-0.5
the curve
is U-shaped,
for –0.5< ω <0
the curve
is bell-shaped,
symmetrical..
∫ beta( M + 1,0.5)
0
and for z < 0.
F(z)= p(x ≤ z)=
1−
z2
t2
0.5 ∫ beta (M + 1,0.5)
0
11
and
= 0,
ω < 0,
ω ≠ 0.5
Calculations of
constants and cdf’s
see page of appendix
Continuation Table 2.1.
No. of type
III
PDF
CDF
F(z)= p(x ≤ z)=
f ( x) =
e − N N N e − Nx ( N + x) N
Γ( N 2 )
−N < x < N
2
V
f (x) =
Criterion
2
2
2 m −1
(2r (m − 1))
Γ(2m − 1)
−2 m
−1
1
Γ( N 2 )
∫
e− u u N −1du
2
ψ3 f 0
ω =0
0
=incomplete gamma(N(N+z),N2)
ψ 3>0,
F(z)= p(x ≤ z)=
2 r ( m −1)
z+r
−2 r ( m −1)
x+r
*(x + r ) e
−r < x < +∞
N (N +z)
1−
∫
0< c < 2/ 5
ψ 23 = 4ω (2+ω )
w2 m −2 e − w dw
0
Γ(2m − 1)
=1-incomplete gamma(2m-1,2r(m-1)/z+r)
12
Remarks
Unlimited range
in one direction;
for N2>1
is a bell-shaped,
for N2< 1
is a J-shaped;
it is gamma dist.
Unlimited range
in one direction;
bell shaped;
it is an
inverse gaussian
Calculations of
constants and cdf’s
see page of appendix
Continuation Table 2.1.
No. of type
VII
PDF
2 m −1
1
2 −m
1
s
(x + s )
β (m − 0.5, 0.5)
−∞ < x < +∞
f ( x) =
2
CDF
Criterion
Remarks
for z > 0;
ψ3 = 0
ωf 0
Unlimited range;
symmetrical,
bell-shaped;
special case
of type IV;
an important
distribution
belonging
to this family
is the (central) tdist.
F(z)= p(x ≤ z)=12
0.5incomplete beta(
s1
2
s1 + z 2
,m-0.5,0.5)
and for z < 0;
F(z)= p(x ≤ z)=
2
0.5incomplete beta(
s1
2
s1 + z 2
m-0.5,0.5)
13
,
Calculations of
constants and cdf’s
see page of appendix
Continuation Table 2.1.
No. of type
VIII
PDF
CDF
f ( x) =
(1 − 2m)( x − r1 )−2 m
(r2 − r1 )1−2 m
F(z)= p(x ≤ z)=
( z − r1 )1− 2 m
(r2 − r1 )1−2 m
Criterion
Remarks
ψ 3 > 0,
ω < −0.5,
Limited range;
J-shaped;
special case of
type I.
(2 + 3ω )ψ 3 =
2
r1 < x < r2
4(1 + 2ω ) 2 (2 + ω )
1 − 2m > 0
IX
f ( x) =
(1 − 2m)( r2 − x) −2 m
(r2 − r1 )1− 2 m
r1 < x < r2
(r − z )1−2 m
1− 2
F(z)= p(x ≤ z)=
(r2 − r1 )1−2 m
ψ 3 > 0,
− 0.5 < ω < 0,
Limited range;
J-shaped;
(2 + 3ω )ψ 3 =
2
4(1 + 2ω ) 2 (2 + ω )
X
f(x)=e-(x+1)
-1<x< +∞
F(z)=1- e-(z+1)
ψ 3 = 4,
ω =0
2
14
Limited range
in one
direction;
J-shaped;
special case of
III.
Calculations of
constants and cdf’s
see page of appendix
Continuation Table 2.1.
No. of type
XI
PDF
f(x) =
(2m − 1)( x − r2 )
(r1 − r2 )1−2 m
−2 m
r1 < x < +∞
CDF
Criterion
Remarks
F(z)= p(x ≤ z)=
ψ 3 > 0,
0 < ω < 2 / 5,
Unlimited
range
in one
direction;
J-shaped.
1−
( z − r2 )1− 2 m
(r1 − r2 )1− 2 m
(2 + 3ω )ψ 3 =
2
4(1 + 2ω ) 2 (2 + ω )
XII
− x + r2 m2
)
x − r1
f(x) =
β (m2 + 1, − m2 + 1)(− r1 + r2 )
ψ3 ≥ 0
ω =0
(
r1 < x < r2
15
Limited range;
Twisted Jshaped;
Special case
of type I.
Calculations of
constants and cdf’s
see page of appendix
2.2. Using Pearson Family Distributions to Model Multinomial Choice
Regarding the use of the Pearson system for specifying models of multinomial
choice, first recall that the choice probabilities Pij = G j (xi. , β ) ∈ [ 0,1] , j = 1,….J, must
J
adhere to the adding up condition (1.2), as
J
∑ P = ∑ G (x , β ) = 1.
ij
j
j=1
i.
In principle, any
j=1
nonnegative valued function, ϑ j (x i .β) , can be used to define a legitimate specification of
G ( g ) using the specification
Pij = G j (xi. , β ) ≡
ϑ j (xi. , β)
J
∑ ϑ (x , β )
j
.
(2.4)
i.
j=1
For example, the special case of the multinomial logit model follows upon setting
ϑ j (x i. , β) ≡ e
xi . β j
. An alternative would be to let ϑ j (xi. , β) = Fj (xi. , β ) , with F j ( g ) being a
member of the flexible family of curves contained in the Pearson system of CDF’s.
Letting ξ = (ψ 3 , ω )′ denote the unknown parameters of the Pearson system, the choice of
probability specification would then be
Pij = G j (xi. , β, ξ) ≡
Fj (xi. , β , ξ)
J
∑ F (xi. , β , ξ)
.
(2.5)
j
j=1
3. Empirical Likelihood Problem Formulation and Solution
Consider the statistical model defined by (1.1), yij = G j (xi. , β ) + ε ij . We will later
assume that the functional form of G ( g ) is derived from (2.5) and one of the CDFs in the
16
Pearson system, and note that in any case E[ ε ij |xi.] = 0. In order to recover information
relating to Pij and β , we examine the general concept of empirical likelihood in the case
of iid data. In this estimation context, the joint empirical probability mass function is of
n
the form
∏δ
i
, where δ i , for i = 1,…n, represent empirical probability or sample
i =1
weights that are associated, respectively, with the n random sample vector outcomes of
the multinomial choice process, ( yi1 ,..., yiJ ) , i = 1,..., n. . To define the value of the
empirical likelihood function for θ , where θ ≡ vec([ β 1, β 2, β 3,… β J]) is a columnvectorized representation of model parameters with dimension ((KJ) × 1), the δ i ’s are
n
selected to maximize
∏δ
i
, subject to data-based constraints defined in terms of
i =1
moment equations
E[h(Yi.,xi., θ )] = E[( h1(Yi1,xi., θ ), h2(Yi2, xi., θ ),…. hJ(YiJ,xi., θ ))]′ = [0].
In the context of estimating equation parlance, h(Yi. , xi. , θ ) is a vector of
unbiased estimating functions relating to the population random vector Y. For now we
are considering general forms of the moment equations, but later we will consider
different conceptual formulations of the moment equations which are more specific to the
multinomial choice model. An empirical representation of the moment condition based
on empirical likelihood sample weights is given by E δ [ h(Y,x,θ ] =
n
∑ δ h (y , x , θ ) = 0 ,
i
i.
i.
i=1
where θ is as defined above , Y is interpreted to have the empirical distribution
δ i = P(Y = yi.), i = 1…….n, and Eδ [.] denotes an expectation taken with respect to the
empirical probability distribution {δ 1 , δ 2 ,.....δ n } .
17
The empirical likelihood maximization step chooses the joint empirical
n
probability distribution
∏δ
i
for Y that assigns the maximum possible probability to the
i =1
sample outcomes y actually observed, subject to constraints provided by the empirical
moment equations. The constraints serve to introduce θ into the estimation problem.
The empirical likelihood function for θ can be defined by maximizing the empirical
likelihood conditional on the value of θ , and then substituting the constrained maximum
value of each δ i , say
)
i \ n
into
∏δ
i =1
n
∏
)
i
, yielding a function of θ as LEL ( θ; y ) =
\ . At this point, the empirical likelihood function operates like an ordinary
i i=1
parametric likelihood function for estimation and inference purposes as long as the
estimating functions are unbiased and thus have zero expectations, have a finite
variances, and are based on independent or weakly dependent data observations. In
particular, maximizing LEL ( θ ; y ) through choice of θ , defines the maximum empirical
likelihood (MEL) estimator of the parameter vector θ .
In the sections ahead we provide more details relating to the EL procedure in the
iid case ahead, which will further motivate the general concepts involved and will also
serve to define additional notation. In particular, we provide details regarding how one
utilizes the EL concept to perform maximum empirical likelihood (MEL) estimation of
parameters for the statistical model yij = G j (xi. , β ) + ε ij in (1.5). We also discuss how to
test hypotheses and generate confidence regions and bounds based on the EL function,
including the use of the generalized empirical likelihood ratio (GELR) for inference
purposes. Finally, we extended the EL principle to the case where the data are
18
independent but not identically distributed. We then specialize the formulation to provide
estimates of the parameters of the multinomial choice model in section 4.
3.1. Nonparametric Likelihood Functions
Consider the inverse problem of using a random sample outcome
y = ( y11 ,..., y1 J ,..., yn1 ,..., ynJ ) = (y1. ,.....y n . )′ to recover an estimate of the PDF of Y. In
this nonparametric setting, a nonparametric likelihood function can be defined whose
arguments are not parameters but entire probability densities or mass functions as
n
L(f;y) =
∏ f(y i. )
j = 1……J
(3.1.1)
i=1
The nonparametric maximum likelihood (NPML) estimate of f(y i. ) is defined by
n
f̂(y ) = arg max [L(f;y)] = f̂(y ) = arg max [ ∏ f(y i. ) ].
f
f
(3.1.2)
i=1
The solution to (3.1.2) defines an empirical probability mass function of the multinomial
type that represents discrete probability masses assigned to each of the finite number of
observed sample outcomes, where δ i = f(yi.) > 0 ∀ i .
The preceding maximum likelihood problem (3.1.1) and (3.1.2) can be
represented as a nonparametric maximum likelihood problem of finding the optimal
choice of δ i ’s in a multinomial-based likelihood function, as
ˆ =  ˆ ˆ ˆ ′ n
 1 2
n
∏
DUJPD[ >
i=1
n
i @
DUJPD[
[∑ ln(
i @
.
(3.1.3)
i=1
If the δ i ’s are unrestricted in value, (3.1.3) will have no solution since the objective
function would be unbounded, and so a normalization condition on the δ i ’s is imposed.
19
n
In the case at hand, the constraint
∑δ
i
= 1 is a natural normalization condition on the
i =1
δ i ’s, along with nonnegativity.
3.2 Empirical Likelihood Function for θ
The likelihood (3.1.1) is devoid of the parameter vector θ and so it cannot be used to
distinguish likely from unlikely values of a parameter vector θ . Linkage between the
data, y = ( y11 ,...y1J ,...,y n1 ,...y nJ )′ = (y1. ,.....y n. )′ , the population distribution F(y), and
the parameter of interest, θ , is accomplished through the use of unbiased estimating
functions to define estimating equation constraints on the NPML problem. Information
about θ is conveyed by the estimating function in expectation or moment form E[h(Y ,
x , θ )] = 0 , which defines constraints on the NPML problem that generates the
empirical likelihood function. Given that the expectation is unknown because F(y) is
unknown, an estimated empirical probability distribution is applied to observed sample
outcomes of h(Y , x , θ ), to define an empirical expectation
n
∑ δ h(yi. ,xi. ,θ) = 0 that
i
i=1
approximates E[h(Y , x , θ )] = 0 and that can be used in forming an empirical moment
equation. The system of m equations hEL(Y , x , θ ) ≡
n
∑ δ h(yi. ,xi. ,θ) = 0 , when
i
i=1
viewed in the context of estimating equations for θ , is generally underdetermined, justdetermined, or over-determine for identifying a ((KJ) × 1) vector θ , depending on
whether m < , =, or > KJ, respectively. The choice of the unknown δi ’s is solved by
maximizing the empirical likelihood objective function, and in the process, the estimating
equations are reconciled to yield a solution for θ (assuming a feasible solution exists).
20
The log-empirical likelihood function for θ is defined as
 n
ln[LEL( θ ;y)] ≡ max  ∑ ln(δ i ) s.t.
δ
 i=1
n

n
∑ δ h(yi. ,xi. ,θ ) = 0 and ∑ δ = 1 ,
i
( 3.2.1)
i
i=1
i=1
Imposing both the normalization condition on the δ i ’s and the empirical moment
constraints, the solution to the problem of finding the NPML estimate of ln(f(y)) is thus
defined in terms of the choice of nonnegative δ i ’s that maximize
n
∑ ln(
subject to the
i
i=1
n
constraints
∑
i
and
i=1
n
∑ δ h(yi. ,xi. ,θ) = 0 .
i
The Lagrange function associated with the
i=1
constrained optimization problem is given by
n
n
 n

L( δ,η, λ ) ≡  ∑ ln(δ i ) - η (∑ δ i - 1 ) - λ ′ ∑ δ i h(y i. ,xi. ,θ )  .
i=1
i=1
 i=1

(3.2.2)
Solving for the optimal δ, η and λ in the Lagrange form of the problem (3.2.2)
and then substituting optimal values for δ into the objective function of the
maximization problem in (3.2.2), a specific functional form for the EL function in terms
of θ can be defined. In particular, first note that the first-order conditions with respect
to the δ i ’s are
J
∂ ln L(δ,η , λ ) 1 1
=
− ∑ λ jh j (y i . , xi . , θ) - η = 0, ∀i.
∂δ i
n δ i j=1
n
Also, from the equality
∑δ
i =1
n
∑ δ h(y
i
i.
i
(3.2.3)
∂ ln L( ,η, )
= 0 and E [h(Y , x , θ )] =
∂δ i
,xi. ,θ) = 0 it follows that
i=1
n
∑δ
i =1
i
∂ ln L( ,η, ) 1
= n -η = 0 ,
∂δ i
n
(3.2.4)
21
and thus η = 1. The resulting unique optimal δ i weights implied by (3.2.3) can be then
be expressed as the following function of θ and λ ,
  J

δ i (θ, λ ) =  n  ∑ λ j h j (y i. , xi. , θ) + 1 
 
  j=1
−1
.
(3.2.5)
n
Substituting (3.2.5) into the empirical moment equations
∑ δ h(yi. ,xi. ,θ) = 0 produces a
i
i=1
system of equations that λ must satisfy as follows:
−1
 J

δ
h
(
y
,
x
,
θ
)
=
n
 ∑ λ j h j (y i. , xi. , θ) + 1  h (y i. , xi. , θ)=0 .
∑
i
∑
i. i.
i=1
i=1
 j=1
 
n
n
−1
(3.2.6)
Under regularity conditions, Qin and Lawless (1994,pp.304-5) show that a well-defined
solution for λ in (3.2.6) exists. However, the solution λ ( θ ) is only an implicit function
of θ , which we denote in general by
1 n 

1
λ ( θ ) = arg  ∑ 
 h(y i. ,xi. ,θ) = 0
δ
 n i=1  1+λ ′h(y i. ,xi. ,θ) 

.

(3.2.7)
The solution λ ( θ ) is continuous and differentiable in θ under regularity conditions.
Substituting the optimal Lagrangian multiplier values λ ( θ ) into (3.2.5) allows
the empirical probabilities to be represented in terms of θ as δ i ( θ ) ≡ δ i [ θ , λ ( θ ) ]
-1
  J

=  n  ∑ λ j (θ)h j (y i. ,xi. ,θ) + 1  . Then, substitution of the optimal δ ( θ ) values into
 
  j=1
n
the (unscaled) objective function
∑ ln(δ ) in (3.2.1) yields the expression for the logi
i =1
empirical likelihood function evaluated at θ given by
Ln[LEL( θ ;y)] = -
n
∑ ln(n[1+λ (θ )′h(y i. , xi. , θ )]) .
i=1
22
(3.2.8)
3.3 Maximum Empirical Likelihood Estimator
We can define a maximum empirical likelihood (MEL) estimator for θ by
choosing the value of θ that maximizes the empirical likelihood function (3.2.1), or
equivalently maximizes the logarithm of the EL function as follows:
θˆ = arg max [ ln( LEL (θ, Y, x))] .
(3.3.1)
θ
The MEL estimator, θ̂ EL, is an extremum estimator whose solution is not generally
obtainable in closed form because the λ ( θ ) of the EL function (recall (3.2.7)) is not a
closed-form function of θ , and thus numerical optimization techniques are most often
required to obtain outcomes of the MEL estimator. We could also obtain the MEL
estimate of θ as the solution θ̂EL to the system of equations
h EL (y , x, θ) = Eδ h(y , x, θ) =
n
∑ δˆ h(y i. , xi. , θ) = 0
i
i =1
where
−1
  J

δˆi ( θ̂EL ) ≡ δ i [ θ̂ EL, λ ( θ̂ EL) ] =  n  ∑ λ j (θˆ EL )h j (y i. , xi. , θˆ EL ) + 1  ,
 
  j=1
(3.3.2)
for i = 1,…,n. Therefore, the MEL method of estimation can be viewed as a procedure for
combining the set of estimating functions h(y i. ,xi. ,θ) , i = 1,…….n, into a vectorestimating equation h EL (y , x, θ) that can be solved for an estimate of θ .
Qin and Lawless (1994) show that the usual consistency and asymptotic normality
properties of extremum estimators hold for the MEL estimator under regularity
conditions related to the twice continuous differentiability of h(y , x, θ) with respect to θ
23
and the boundedness of h and its first and second derivatives, all in a neighborhood of the
true parameter value θ0 . They also assume that the row rank of
 ∂h(y , x, θ) 
E
 equals the number of parameters in the vector θ (Qin and
∂θ

θ0 
Lawless,1994,p.305-6). These conditions lead to the MEL estimator’s being consistent
and asymptotically normal with limiting distribution
1
(
)
d
n 2 θˆ EL - θ 0 → N ( 0 , Σ )
(3.3.3)
where
−1
−1
  ∂h(Y, x , θ ) 
 ∂h(Y, x , θ )  
Σ = E 
  E h(Y, x , θ ) h(Y, x , θ )′ θ0   E 
 .
′
∂θ
∂
θ


 
θ0 
θ


0 

(3.3.4)
The covariance matrix Σ of the limiting normal distribution can be consistently
estimated by
−1
 n
∂h(y i. , xi. , θ )   n ˆ

ˆ
ˆ
  ∑ δˆi
  ∑ δ i h(y i. , xi. , θ EL )h(y i. , xi. , θEL )′
Σˆ =   i=1
∂θ

ˆ   i=1
θ
EL 



−1
(3.3.5)

 n
′ 
h
(
y
,
x
,
θ
∂
)
i.
i.
 ,
×  ∑ δˆi
∂θ
 i=1
θˆ EL 
 
where the δˆi ’s are the same as defined via (3.3.2). By substituting n-1 for δˆi ’s in (3.3.5),
an alternative consistent estimate is defined, which amounts to applying probability
weights based on the empirical distribution function instead of the empirical probability
weights generated by the empirical likelihood. The δˆi probability weight estimates
obtained from the EL procedure would be generally more efficient in finite samples if the
24
estimating function information is unbiased. The normal limiting distribution of θ̂EL
allows asymptotic hypothesis tests and confidence regions to be constructed.
3.4 Optimal Estimating Functions
An optimal estimating function is an unbiased estimating function having the smallest
covariance matrix. Godambe (1960) was the first to suggest that the vector estimating
function be standardized as
−1
  ∂h(Y, x , θ )  
hs (Y, x , θ ) =  Ε θ 
  h(Y, x , θ )
∂θ
 

(3.4.1)
so that the multivariate optimal estimating function, or OptEF, is then the unbiased
estimating function that minimizes , in the sense of symmetric positive definite matrix
comparisons, the covariance matrix
-1
  ∂h(Y, x , θ )  
cov [hs (Y, x , θ )] =  E θ 
  E θ [h(Y, x , θ ) h(Y, x , θ )′]
∂θ
 

-1
  ∂h(Y, x , θ )  
× Eθ 
  .
∂θ′
 

(3.4.2)
In the special case in which h(y , x, θ) is actually proportional to, or a scaled version of
the log of the score or gradient vector function corresponding to a genuine likelihood
function, it follows under the standard regularity conditions applied to maximum
likelihood estimation that
 ∂ 2 lnL(Y, x , θ ) 
 ∂h(Y, x , θ ) 
- Eθ 

 ∝ E θ  −
∂θ
∂θ∂θ′



25
(3.4.3)
and
 ∂ lnL(Y, x , θ ) ∂ lnL(Y, x , θ ) 
E θ [ h(Y, x , θ ) h(Y, x , θ )′] ∝ E θ 
 ,
∂θ
∂θ′


(3.4.4)
where the expectations on the right-hand sides of (3.4.3) and (3.4.4) are equal. In this
case (3.4.2) becomes
−1

 ∂ 2 lnL(Y, x , θ )  
cov(hs (Y, x , θ )) =  -E θ 
 ,
′
∂
θ
∂
θ



(3.4.5)
which is recognized as the usual ML covariance matrix and the CRLB for estimating the
parameter vector θ . This provides an OptEF finite sample justification for ML
estimation in the case of estimating a vector of parameter θ and is analogous to the
Gauss-Markov theorem justification for LS estimation.
The EL empirical moment constraints defined in terms of the conditional-on- θ
optimum empirical probability weights are given by
h EL (Y, x , θ ) ≡ Eˆ δ [ h(Y, x , θ )] ≡
n
∑ δˆ (θ , Y, x )h(y i. , xi. , θ) = 0
i
i=1
and these empirical moment constraints can be interpreted as vector estimating equations.
The EL provides a method for forming a convex combination of n (m × 1) estimating
functions, h(y i. , xi. , θ) , for i = 1,…,n. Thus, we want to investigate whether a particular
combination of the n estimating functions used in the MEL approach is in some sense the
best combination. Consider the class of estimation procedures that can be defined by a
combination of the estimating equation information as
n
hτ (Y , x, θ) ≡ ∑ τ(x,θ)h(y i. , xi. , θ) = 0
(3.4.6)
i=1
26
where τ(x,θ) is a ( (KJ) × m) real-valued function such that the ((KJ) × 1) vector
equation h τ (Y, x, θ) = 0 can be solved for the (K × 1) vector θ as θˆ τ (y ) .
McCullagh and Nelder (1989,p.341) show that the optimal choice of τ, in the
sense of defining a consistent estimator with minimum asymptotic covariance matrix in
the class of estimators for θ defined as solutions to (3.4.6), is given by
−1
 ∂h(y , x, θ) 
τ(x, θ) = E 
[cov(h(y , x, θ))] ,

∂θ


(3.4.7)
where Y denotes the random variable whose probability distribution is the common
population distribution of the Yi’s. In case the Y’s are independent, but not identically
distributed, we have
−1
 ∂h(yi. , xi. , θ) 
τi (xi. , θ) = E 
  cov(h(yi. , xi. , θ)) 
∂θ


(3.4.8)
n
and h (Y , x, θ) ≡ ∑ τi (xi. , θ)h(y i. , xi. , θ) = 0 . Using the optimal definition of τ i in
i=1
(3.4.8) defines an estimator for θ that has precisely the same asymptotic covariance
matrix as the MEL estimator (3.3.4) because, given the unbiased nature of the estimating
equations, cov h(yi. , xi. , θ ) = Eθ h(yi. , xi. , θ ) h(yi. , xi. , θ )′  (McCullagh and
Nelder,1989, p.341).
27
4. EL Estimation in the Multinomial Choice Problem
In this section we examine an extended illustrative example demonstrating the setup
of the MEL approach to estimating the parameters of a multinomial choice model.
In this application, the form of the unbiased estimating functions for θ is given by
n
h τi (Y , x, θ) ≡ ∑ τi (xi . ,θ)h(yi. , xi. , θ)
(4.1.1)
i=1
where
 yi1 − G1 (xi. , β ) 




 yi2 − G2 (xi. , β ) 

 y − G ( x , β ) 

3
i.
 i3

 e xi.′



.



.



 y − G ( x , β ) 

J
i.

 iJ

 y 2 − 2y G (x , β ) + 2G 2 (x , β ) − G (x , β ) 
i1 1
i.
1
i.
1
i.
 i1

2
2
 yi2 − 2y i2G2 (xi. , β ) + 2G2 (xi. , β ) − G2 (xi. , β )


2
2
h(yi . , xi. , θ) =  yi3 − 2yi3G3 (xi. , β ) + 2G3 (xi. , β ) − G3 (xi. , β ) 


.




.


 yiJ 2 − 2yiJ GJ (xi. , β ) + 2GJ 2 (xi. , β ) − GJ (xi. , β ) 




















28
(4.1.2)
and h(yi. , xi. , θ) is a vector of dimension ((KJ+J) × 1), and recall that
e denotes the
Hadamard (elementwise) product. Taking into consideration the adding up condition in
(1.2), which implies that there are redundant moment equations among the (KJ+J)
equations, we reformulate (4.1.2) and represent h(yi. , xi. , θ) with dimension ((K(J-1)+(J1)) × 1) as follow
 yi2 − G2 (xi. , β ) 




 yi3 − G3 (xi. , β ) 

 yi4 − G4 (xi. , β ) 



 e xi.′
.






.



 yiJ − GJ (xi. , β ) 

 2

2
 yi2 − 2yi2G2 (xi. , β ) + 2G2 (xi. , β ) − G2 (xi. , β )
 2

2
 yi3 − 2yi3G3 (xi. , β ) + 2G3 (xi. , β ) − G3 (xi. , β ) 
2
2
h(yi. , xi. , θ) =  yi4 − 2yi4G4 (xi. , β ) + 2G4 (xi. , β ) − G4 (xi. , β )


.




.


 y 2 − 2y G (x , β ) + 2G 2 (x , β ) − G (x , β ) 
iJ J
i.
J
i.
J
i.
 iJ

















(4.1.3)
and G j (xi. , β ) denotes the conditional expectation of Yij given xi., G j (xi. , β) = E( yij xi. ) .
In the context of multinomial choice problem, G j (xi. , β ) denotes the conditional-on- xi.
probability of choosing alternative j for observation i. For the sake of expositional clarity,
we henceforth consider the special case of the multinomial logit model upon setting
29
Gj = G j (xi. , β ) =
1
J
1 + ∑e
xi.βk
for j = 1
k =2
e
=
xi.β j
J
1+
∑e
xi.βk
for j = 2,3,…..J
(4.1.4)
k=2
We emphasize that G j (xi. , β) could be any link function of flexible form that satisfies
(1.2) and that defines a legitimate multinomial response model globally. Later we will
consider G j (xi. , β) as being formed from CDFs in the Pearson system, which themselves
satisfy (1.2).
The OptEF estimator is in the general class of estimating equations based on the
estimating functions of the form (4.1.1) characterized by the solution to
N 

 ∂ h(y i. , xi. , θ) 
-1
hOpt (y i. , xi. , θ) = ∑  E 
 ( Φi ) h((y i. , xi. , θ))  = 0
∂θ
i=1  


(4.1.5)
 ∂ h(y i. , xi. , θ) 
where E 
 is a matrix of dimension ((K(J-1) ) × (K(J-1)+(J-1))) as
∂θ


 ∂ h(y i. , xi. , θ) 
E
=
∂θ


 −∂ [ G 2 x′i. ] −∂ [G 3 xi.′ ] −∂ [ G 4 x′i. ]

∂β2
∂β2
 ∂β2
 −∂ [ G 2 x′i. ] −∂ [G 3 xi.′ ] −∂ [ G 4 x′i. ]

∂β3
∂β3
 ∂β3
 −∂ G x′ −∂ G x′ −∂ G x′
[ 3 i. ] [ 4 i. ]
 [ 2 i. ]
 ∂β4
∂β4
∂β4

∂G 2
∂G
∂G
[ 2G 2 − 1] 3 [ 2G 3 − 1] 4 [ 2G 4 − 1]
∂β2
∂β 2
∂β 2


∂G 2
∂G 3
∂G 4
[ 2G 2 − 1]
[ 2G 3 − 1]
[ 2G 4 − 1]
∂β3
∂β3
∂β3


∂G
∂G 2
∂G
[ 2G 2 − 1] 3 [ 2G 3 − 1] 4 [ 2G 4 − 1]
∂β4
∂β4
∂β 4

where
30
(4.1.6)


xi.β j
xβ


∂G j
∂G j (xi. , β )
e
e i. j
′
 1 =
= xi. 
J
J
∂β j
∂β j
 1 + e xi.βk   1 + e xi.βk
∑
∑


 k=2

k=2
′
= xi.G j (1 − G j ) ∀ j ≥ 2
∂G j
∂β k
=
∂G j (xi. , β )
∂β k
∂  G jx′i. 
∂β j
= - x′i.G jG k


=



(4.1.7)
for k ≠ j and k>1
= xi.x′i.G j (1 − G j ) ∀ j ≥ 2
= 0
for j=1
(4.1.8)
∂  G jx′i. 
∂β k
= -xi.x′i.G jG k
for k ≠ j and k > 1
and Φi = cov ( h(y i. , xi. , θ) ) = Eθ h(y i. , xi. , θ) h(y i. , x i. , θ)′ is the covariance matrix of
(Yij|xi.) having dimension ((K(J-1)+(J-1) × (K(J-1)+(J-1)). Given the 0-1 dichotomous
outcomes of the Yij’s, note that
E(Yijn) = G j (xi. , β ) for every positive integer n.
Also, given that the Yij’s must sum to 1, it follows that
E(Yijn Yikm) = 0
for every m and n positive integers greater than one
= - G j (xi. , β ) Gk (xi. , β )
for m=n=1.
In order to define the OptEF, note that the covariance matrix of any [Yi1,.. YiJ,
(Yi1-G1)2, … (YiJ-GJ)2] vector in this multinomial choice problem is given by
31
Φi =
−2G 2G 3xi.x′i.
−2G 2G 4 xi.x′i.
− G 2G 3 (1- 4G3 )x′i.
− G 2 G 4 (1- 4G 4 )x′i.
G 2 (1− G 2 )(1− 2G 2 ) xi.′
 G 2 (1− G 2 ) xi.xi.′



′
′
′
′
′
′
−
2G
G
G
(1
−
G
)
−
2G
G
−
G
G
1
4G
G
(1
−
G
)(1
−
2G
)
−
G
G
1
4G
x
x
x
x
x
x
(
)x
x
(
)x


2 3 i. i.
3
3 i. i.
3 4 i. i.
2 3
2 i.
3
3
3 i.
3 4
4 i.


′
−2G 2G 4 xi.xi′.
−2G 3G 4 xi.x′i.
− G 2G 4 (1- 4G 2 )xi.
− G 3G 4 (1- 4G3 )x′i.
G 4 (1− G 4 ) xi.xi.′
G 4 (1− G 4 )(1− 2G 4 ) xi.′




 G 2 (1− G 2 )(1− 2G 2 ) xi.′
− G 2G 3 (1- 4G 2 )xi.′
− G 2G 4 (1- 4G 2 )xi.′
G 2 (1− G 2 )(2G 2 −1)2
G 2G 3 [2G 2 (1− 4G 3 ) − (1− 2G3 )] G 2G 4 [2G 4 (1− 4G 2 ) − (1− 2G 2 )]


 − G G (1- 4G )x′
G3 (1− G 3 )(1− 2G3 ) xi.′ − G 2G 4 (1- 4G 2 )xi.′ G 2G 3 [2G 2 (1− 4G 3 ) − (1− 2G3 )]
G 3 (1− G3 )(2G 3 −1)2
G 3G 4 [2G 4 (1− 4G 3 ) − (1− 2G3 )] 
3 i.
2 3


2

′
′
′

 − G 2G 4 (1- 4G 4 )xi. − G3G 4 (1- 4G 4 )xi. G 4 (1− G 4 )(1− 2G 4 ) xi. G 2G 4 [2G 4 (1− 4G 2 ) − (1− 2G 2 )] G 3G 4 [2G 4 (1− 4G 3 ) − (1− 2G 3 )] G 4 (1− G 4 )(2G 4 −1)
(4.1.7).
By substituting (4.1.3) through (4.1.9) in (4.1.5) we have constructed an optimal
estimating function of the form
32
h O pt ( y , x , θ ) =
N

 ∂ h ( y i. , x i. , θ ) 
 c o v h (y i . , x i. , θ )
∂θ


E 
∑
i= 1

(
(
))
-1

h (y i. , x i. , θ )  =

=
n
∑
i= 1






























′ G 3G 2
′ G 3 G 2  2 G 3 −1
x i. x i.
x i. x ′i.G 4 G 2
x ′i. G 2 (1 − G 2 )  2 G 2 − 1 
-x i.
  -x i. x i.′ G 2 (1 − G 2 )


  x i. x i.′ G 2 G 3
′ G (1 − G )
- x i. x i.
x i. x ′i. G 4 G 3
- x ′i. G 2 G 3  2 G 2 − 1 
x ′i.G 3 (1 − G 3 )  2 G 3 − 1 
3
3
  x x′ G G
′G G
′ G (1 − G )
′ G G  2 G −1
- x i. x i.
x i. x i.
-x ′i.G 4 G 2  2 G 2 − 1 
-x i.
4 3
4
4
4 3
3 
  i. i. 4 2





 y − G (x , β ) 


2 i.
  i2



 y − G (x , β ) 


3 i.
  i3



  y i4 − G 4 ( x i. , β ) 


′


 e x i.

.








.




 y − G (x , β ) 


J i.

  iJ


 y 2 − 2 y G (x , β ) + 2G 2 (x , β ) − G (x , β )

i2 2 i.
2
i.
2 i.
 i2





2
2
 y i3 − 2 y i3 G 3 ( x i. , β ) + 2 G 3 ( x i. , β ) − G 3 ( x i. , β ) 





×  y i4 2 − 2 y i4 G 4 ( x i. , β ) + 2 G 4 2 ( x i. , β ) − G 4 ( x i. , β ) 






.



.






2
2
 y iJ − 2 y iJ G J ( x i. , β ) + 2 G J ( x i. , β ) − G J ( x i. , β ) 

























33









































′ G 2 G 4  2 G 4 − 1  
-x i.


-x ′i.G 4 G 3  2 G 4 − 1   Φ i− 1 


x ′i.G 4 (1 − G 4 )  2 G 4 − 1  




 = 0 (4 .1 .9 )




































that can be used in the constrained optimization problem of (3.2.2).
If the ε i . ’s are iid, each with extreme value distribution, then the special case of
the multinomial logit model is defined as in (4.1.4) and the log-likelihood function of the
multinomial logit model can be specified as
n  J
J
x β 
ln( L(β; y )) = ∑  ∑ yij[xi.β j ] - ln(1 + ∑ e i. j ) 
i=1  j= 2
k =2

e
where Gj(xi. β ) =
xi.β j
J
1+
∑e
xi.βk
(4.1.10)
and β 1 ≡ [0].
k=2
Solving the first-order conditions of (4.1.10) obtains
∂ ln( L(β; y )) n
= ∑ x′i. (yij − G j (xi.β)) = 0 , for j = 2,…..J .
∂β j
i=1
(4.1.11)
Note that the hOpt (Y , x, θ) in (4.1.5) can be specified to represent the first-order
conditions (4.1.11) and therefore
 yi2 −
y −
n  i3
hOpt (Y , x, θ) = ∑ 
i=1 

 yiJ −
G 2 (x i. , β) 
G 3 (x i. , β) 

 e x′i.
.

.

G J (x i. , β) 
(4.1.12).
The solution for β obtained from (4.1.11) or (4.1.12) is the optimal estimating function
(OptEF) estimator for β . The estimating function given by (4.1.12) is asymptotically
optimal in the sense that it solves the problem of seeking the unbiased estimating function
that produces the consistent estimating equation (EE) estimator of β with the smallest
asymptotic covariance matrix. Furthermore, the ML estimator has the finite sample
optimality property of representing the estimating function (4.1.12) with the smallest
34
standardized covariance matrix. We emphasize that these optimality results are predicted
on the assumption that the logistic-extreme value distribution assumption underlying the
likelihood specification is in fact the correct parametric family of distributions underlying
the data sampling process. It is also useful to note that (4.1.3) subsumes (4.1.12) and the
asymptotic covariance matrix of the MEL estimator generally becomes smaller (by a
positive definite matrix) as the number of estimating equations on which it is based
increases (Qin and Lawless,1994, Corollary 1).
4.1 Adding Flexibility to the EL Formulation
In this section we introduce flexibility into the specification of G ( g ) by adding
parameters to index members of the class of Pearson Family distributions. While we
focus on the Pearson class here, we emphasize that any other class of distributions could
be used. The criteria for identifying different members of the system of Pearson
distributions can be expressed parametrically in terms of a (2 × 1) vector, ξ ,of
parameters, so that G j (xi. , θ) = G j (xi. , β, ξ) where θ ≡ vec([β1 , β 2 ,.......β J ,ψ 3 , ω ]) is a
column-vectorized representation of model parameters now of dimension ((KJ+2) × 1).
Hence, the alternative formulation of unbiased estimating functions for θ is of the form
n
h (Y , x, θ) ≡ ∑ τi (xi. ,θ)h(y i. , xi. , θ)
i=1
where
35
(4.2.1)
 yi2 − G2 (xi. , β, ξ ) 




 yi3 − G3 (xi. , β, ξ ) 

 yi4 − G4 (xi. , β, ξ ) 



 e xi.′
.






.




 yiJ − GJ (xi. , β, ξ ) 

 2
2
 yi2 − 2yi2G2 (xi. , β, ξ ) + 2G2 (xi. , β, ξ ) − G2 (xi. , β, ξ )

 2
2
 yi3 − 2yi3G3 (xi. , β, ξ ) + 2G3 (xi. , β, ξ ) − G3 (xi. , β, ξ ) 
2
2
h(y i. , xi. , θ) =  yi4 − 2yi4G4 (xi. , β, ξ ) + 2G4 (xi. , β, ξ ) − G4 (xi. , β, ξ )


.




.


 y 2 − 2y G (x , β, ξ ) + 2G 2 (x , β, ξ ) − G (x , β, ξ ) 
iJ J
i.
J
i.
J
i.

 iJ
















(4.2.2)
and G j (xi. , β, ξ ) again denotes the conditional expectation of Yij given xi. In the context
of multinomial choice problem, G j (xi. , β, ξ ) denotes the conditional-on- xi. choice
probability. The OptEF estimator is in the general class of estimating equations based on
the estimating functions of the form (4.2.1) characterized by the solution to
hOpt (y i. , xi. , θ) =
  ∂ h(y i. , xi. , θ) 
 cov ( h(y i. , xi. , θ) )
∂θ
i=1  

N
∑ E 
(
)
-1

h(y i. , xi. , θ)  = 0

(4.2.3)
 ∂ h(yij , xi. , θ) 
where E 
 is now a matrix of dimension ((K(J-1)+2 ) × (K(J-1)+(J-1))),
∂θ


where , for example in the case of J = 4 ,
36
 ∂ h(y i. , xi. , θ) 
E
=
∂θ


 −∂ [ G 2 xi.′ ]

 ∂β 2
 −∂ [ G x ′ ]
2 i.

 ∂β3

 −∂ [ G 2 xi.′ ]
 ∂β 4

 −∂ [ G 2 xi.′ ]
 ∂ψ
3

 −∂ [ G 2 xi.′ ]

 ∂ω

−∂ [ G 3 xi.′ ] −∂ [ G 4 x ′i. ] ∂G 2
∂G
∂G
[ 2G 2 − 1] 3 [ 2G3 − 1] 4 [ 2G 4 − 1] 
∂β 2
∂β2
∂β2
∂β2
∂β2


′
′
−∂ [ G 3 xi. ] −∂ [ G 4 x i. ] ∂G 2
∂G
∂G
[ 2G 2 − 1] 3 [ 2G3 − 1] 4 [ 2G 4 − 1] 
∂β3
∂β3
∂β3
∂β 3
∂β3


−∂ [ G 3 xi.′ ] −∂ [ G 4 x ′i. ] ∂G 2
∂G 3
∂G 4
[ 2G 2 − 1]
[ 2G3 − 1]
[ 2G 4 − 1] 
∂β4
∂β4
∂β4
∂β4
∂β4

−∂ [ G 3 xi.′ ] −∂ [ G 4 xi.′ ] ∂G 2
∂G 3
∂G 4
[ 2G 2 − 1]
[ 2G 3 − 1]
[ 2G 4 − 1]
∂ψ 3
∂ψ 3
∂ψ 3
∂ψ 3
∂ψ 3


−∂ [ G 3 xi.′ ] −∂ [ G 4 xi.′ ] ∂G 2
∂G 3
∂G 4
[ 2G 2 − 1]
[ 2G3 − 1]
[ 2G 4 − 1]
∂ω
∂ω
∂ω
∂ω
∂ω

(4.2.4)
and Φi = cov ( h(y i. , xi. , θ) ) = Eθ h(y i. , xi. , θ) h((y i. , xi. , θ) )′ is the
((K(J-1)+(J-1) × (K(J-1)+(J-1)) covariance matrix of (Yij|xi.).
In order to define the OptEF, note that the covariance matrix of any [Yi1,.. YiJ, (Yi1-G1)2,
… (YiJ-GJ)2] vector in this multinomial choice problem is given by (again for an
illustrative case where J = 4)
Φi =
G 2 (1− G 2 ) xi.xi.′
−2G 2G 3xi.x′i.
−2G 2G 4 xi.x′i.
(4.2.5)
−2G 2G 3xi.x′i.
−2G 2G 4 xi.x′i.
G 2 (1− G 2 )(1− 2G 2 ) xi.′
− G 2G 3 (1- 4G3 )x′i.
− G 2 G 4 (1- 4G 4 )x′i.




G3 (1− G 3 ) xi.xi.′
G 3 (1− G3 )(1− 2G 3 ) x′i.
−2G 3G 4 xi.x′i.
− G 2G 3 (1- 4G 2 )x′i.
− G 3G 4 (1- 4G 4 )x′i.




−2G 3G 4 xi.x′i.
− G 2G 4 (1- 4G 2 )x′i.
− G 3G 4 (1- 4G3 )x′i.
G 4 (1− G 4 ) xi.xi.′
G 4 (1− G 4 )(1− 2G 4 ) xi.′




 G 2 (1− G 2 )(1− 2G 2 ) xi.′
− G 2G 3 (1- 4G 2 )xi.′
− G 2G 4 (1- 4G 2 )xi.′
G 2 (1− G 2 )(2G 2 −1)2
G 2G 3 [2G 2 (1− 4G 3 ) − (1− 2G3 )] G 2G 4 [2G 4 (1− 4G 2 ) − (1− 2G 2 )]


 − G G (1- 4G )x′
G3 (1− G 3 )(1− 2G3 ) xi.′ − G 2G 4 (1- 4G 2 )xi.′ G 2G 3 [2G 2 (1− 4G 3 ) − (1− 2G3 )]
G 3 (1− G3 )(2G 3 −1)2
G 3G 4 [2G 4 (1− 4G 3 ) − (1− 2G3 )] 
2 3
3 i.


2

′
′
′

 − G 2G 4 (1- 4G 4 )xi. − G3G 4 (1- 4G 4 )xi. G 4 (1− G 4 )(1− 2G 4 ) xi. G 2G 4 [2G 4 (1− 4G 2 ) − (1− 2G 2 )] G 3G 4 [2G 4 (1− 4G 3 ) − (1− 2G 3 )] G 4 (1− G 4 )(2G 4 −1)
By substituting (4.2.4) and (4.2.5) in (4.2.3) we have constructed an optimal estimating
function that can be used in the constrained optimization problem of (3.2.2).
37
5. Sampling Experiments
We performed a Monte Carlo experiment to estimate a Multinomial Logit (ML) response
model where β1 has been normalized, without loss of generality, to a zero vector for
purposes of parameter identification and there existed four choice alternatives. The x data
were all generated iid from the uniform distribution having support on the interval
( −5, 5 ) . The logistic distribution was used to generate the choice probabilities underlying
the data sampling process. The link function used to model the multinomial choice
problem was Pearson X. The parameters of the latent variable equations underlying the
Multinomial Logit model are given by
β21   0.1
β31   0.4
 β41   0.7 








β 2 = β22  = 0.2  , β 3 = β32  =  0.5 , and β 4 = β 42  =  0.8 
.
β 23   0.3
β33   0.6
β 43   0.9 
The results of the Monte Carlo experiment, for 200 repetitions of the sampling
experiment, are displayed in Table 1. The results suggest that the EL estimation
procedure produces reasonably accurate estimates of the model parameters. As the
sample size increases, the mean square error decreases, indicative of the consistency of
the EL estimator. The means of the estimates for the 200 Monte Carlo replications are
very close to the true values of the model parameters for sample sizes ≥ 500, suggesting
that for all practical purposes, the EL estimators are producing near-unbiased estimates of
the parameters. For smaller sample sizes, there is some indication that the parameter
estimates are biased to some degree, although the degree of bias is relatively small.
Overall, the estimates were quite accurate across all sample sizes, and accurate for large
38
sample sizes, and would appear to be useful from an empirical application application
perspective.
Table 1. Monte Carlo Results: Multinomial Choice Model with Four Alternatives1
Sample Sizes
Parameter
TrueValue
50
100
200
250
300
500
600
700
β21
β22
β 23
0.1
0.115
0.134
0.121
0.127
0.096
0.094
0.106
0.093
0.2
0.212
0.216
0.209
0.209
0.208
0.198
0.199
0.197
0.3
0.307
0.306
0.319
0.311
0.298
0.296
0.301
0.301
β31
0.4
0.436
0.439
0.457
0.425
0.412
0.410
0.399
0.421
β32
0.5
0.523
0.533
0.523
0.530
0.521
0.509
0.506
0.504
β33
0.6
0.629
0.618
0.634
0.628
0.627
0.610
0.611
0.608
β41
β42
β 43
0.7
0.739
0.720
0.751
0.747
0.698
0.708
0.707
0.719
0.8
0.856
0.851
0.852
0.834
0.832
0.814
0.815
0.804
0.9
0.956
0.945
0.963
0.941
0.935
0.917
0.908
0.910
0.531
0.519
0.459
0.292
0.283
0.149
0.130
0.114
1
MSE( β )
1) Values below the sample size indicators are the sample means of the estimates for 200
MC repetitions of the experiment.
We also note that the computation of the estimates for this 4-dimensional choice
model was relatively quick with effectively no numerical difficulties when finding
solutions. We also note that the discrepancy in some of the parameters may be due to the
fact that we have used numerical gradients instead of analytical gradients in solving the
EL optimization problem. Analytical gradients could serve to speed convergence further,
and would also allow solutions to higher levels of tolerance, potentially further increasing
the accuracy of the parameter estimates.
39
6. Concluding Remarks
This paper has presented a flexible semiparametric methodology for estimating
multinomial choice models. The parameter estimates from the Monte Carlo results appear
quite reasonable and demonstrate the potential usefulness of the proposed approach. The
estimates obtained by this procedure are consistent and asymptotically normal. However,
our consistent estimator will generally not be fully efficient. Nonetheless, because of the
computational difficulties associated with more efficient estimators, the empirical
tractability of the method for estimating a system of multinomial choice models for large
data sets and for relatively large dimensional choice sets is very attractive in empirical
practice. Moreover, in practice, there is often insufficient information to specify the
parametric form of the function linking the observable data to the unknown choice
probabilities, in which case a fully efficient method of estimating the model parameters
will generally remain unknown in any case. In such cases, the flexible Pearson family of
parametric distributions may be useful as a basis for a flexible specification of a link
function.
40
7. References
Ahn, H., Ichimura, H. and Powell, J. L., (1993). Semiparametric estimation of censored
selection models with a nonparametric selection mechanism, Journal of
Econometrics 58: 3-29.
Armstrong, B.G., (1985). The generalized linear model, Communications in Statistics
14(B): 529-544.
Baggerly, K.A. Empirical Likelihood as a Goodness of Fit Measure. Biometrika, 85:
535-547, 1998.
Blundell, R. and Powell, J.L., (1999). Endogeneity in single index models, Working
Paper, Department of Economics, UC Berkeley.
Caroll, R.J., D. Ruppert, and L.A. Stefansky, (1995), Measurement Error in Nonlinear
Models, London; Chapman and Hall.
Cressie, N. and Read, T. Multinomial Goodness of Fit Tests. Journal of Royal Statistical
Society of Series B 46:440-464, 1984.
Elderton, W.P., Frequency Curves and Correlation, Haren Press, 1953.
Elderton, W.P. and Johnson, N.L. Systems of Frequency Curves, Cambridge University
Press, 1969.
Golan, A., Judge, G.G. and Perloff, J., (1996). A Maximum Entropy Approach to
Recovering Information from Multinomial Response Data, Journal of the American
Statistical Association 91: 841-853.
Ichimura, H., (1993). Semiparametric least squares (SLS) and weighted SLS estimation
of single-index models, Journal of Econometrics 58: 71-120.
41
Klein, R., Spady, R.H., (1993). An efficient semiparametric estimator for binary
response models, Econometrica 61: 387-421.
Kullback, S. Information Theory and Statistics. New York: John Wiley and Sons, 1959.
Kullback, S. and Leibler, R.A. On information and sufficiency, The Annals of
Mathematical Statistics 22: 79-86, 1951.
Lewbel, A., (1997). Semiparametric estimation of location and other discrete choice
moments, Econometric Theory 13: 32-51.
Lewbel, A., (1998). Semiparametric latent variable model estimation with endogenous or
mismeasured regressors, Econometrica 66: 105-121.
Lewbel, A., (2000). Semiparametric qualitative response model equation with unknown
heteroscedasticity or instrumental variables, Journal of Econometrics 97: 145-177.
Maddala, G.S., (1983). Limited Dependent and Qualitative Variables in Econometrics,
In: Econometric Society Monograph No. 3. Cambridge University Press,
Cambridge.
McFadden, D.L., (1984). Econometric analysis of qualitative response models, In:
Griliches, Z., Intriligator, M.D. (Eds), Handbook of Econometrics, Vol.2 Elsevier,
Amsterdam, pp.1395-1457.
Mittelhammer, R.C., Judge, G.G. and Miller, D.J., Econometric Foundations, Cambridge
University Press, 2000.
Newey, W.K., (1985). Semiparametric estimation of limited dependent variable models
with endogenous explanatory variables. Annales de l’INSEE 59-60, 219-237.
Owen, A. Empirical likelihood ratio confidence intervals for a single functional.
Biometrika 75: 237-249.
42
Owen, A. Empirical likelihood for linear models. The Annals of Statistics 19 (4): 17251747, 1991.
Owen, A. Empirical likelihood. New York: Chapman and Hall, 2000.
Pearson, K. (1895). Contributions to the mathematical theory of evolution. II. Skew
variations in homogeneous material, Philosophical Transactions of the Royal
Society of London, Series A, 186, 343-414.
Pearson, K. (1924). On the mean error of frequency distributions, Biometrika, 16, 198200.
Pearson, E.S. and Hartley, H.O., Biometricka Tables for Statisticians, Cambridge
University Press, Vol. 1, 1956.
Powell, J.L., Stock, J.H., Stoker, T.M., (1989). Semiparametric estimation of index
coefficients, Econometrica 57: 1403-1430.
Qin, J. and Lawless, J. Empirical likelihood and general estimating equations. The
Annals of Statistics 22 (1): 300-325, 1994.
Ruud, P.A. (1986). Consistent estimation of limited dependent variable models despite
misspecification of distribution, Journal of Econometrics 32: 157-187.
43
Appendix: Derivation of CDFs of the Pearson Family of Distributions
Pearson type I
( x − r1 ) m1 (r2 − x ) m2
f(x) =
β (m1 + 1, m 2 + 1)(r2 − r1 ) m1 + m2 +1
z
k ∫ (x − r1) m1 (r 2 − x) m 2 dx = 1
r1
r 2 − r1
let u = x-r1 then dx = du then
∫
(u ) m1 (1 −
0
u m2
) (r 2 − r1) m 2 du
r2 - r1
1
let w =
u
⇒ dw (r 2 − r1) = du = k ∫ (r 2 − r1) m1 w m1 (1 − w ) m 2 (r 2 − r1) m 2+1 dw = 1
r 2 − r1
0
which implies
k=
(r 2 − r1)
m1+ m 2 +1
1
.
Beta(m1 + 1, m 2 + 1)
z
∫
F(z) = p( x ≤ z ) = k ( x − r1)
m1
(r 2 − x) m 2 dx
let u=x
then
r1
z − r1
p(x ≤ z ) = k
∫ (u)
z −r 1
m1
(r 2 − r1 − u)
m2
du = k (r 2 − r1)
0
let w =
m2
∫
u m1 (1 −
0
u
) m 2 du
r 2 − r1
u
⇒ (r 2 − r1)dw = du then
r 2 − r1
p(x ≤ z) = k(r2 − r1) m2+m1+1
z −r1
r 2−r1
m1
m2
∫ (w) (1 − w) dw =
0
= incomplete beta(
z − r1
r 2 −r 1
Pearson type IV
Is not available.
44
1
beta(m1 + 1, m2 + 1)
,m1+1,m2+1).
z −r1
r 2−r1
∫ (w)
0
m1
(1 − w) m2 dw
Pearson type VI
f(x) =
(x − r1 ) m1 (x − r2 ) m 2
β(m1 + 1,−m1 − m 2 − 1)(r1 − r2 ) m1 + m 2 +1
r1 < x < +∞
∞
∞
k ∫ (x − r1) m1 (−r 2 + x) m 2 dx = 1 let x-r2=u then
k
r1
∞
a = r1-r2 which implies k
∫ (u )
∫ (u )
r1− r 2
(r 2 − r1 + u ) m1 du = 1 let
r1− r 2
m1
∞
m2
m2
u 
u
(u − a) du = 1 = ∫  − 1 a m1  
a 
a
a
m1
m2
du
let
0
m1
1 u
1  1
= ⇒ −ka m1+ m 2+1 ∫  − 1  
z a
z  z
1
m2


1
z − 2 dz ⇒ k = m1+m 2+1
.
a
beta(m1 + 1,−m1 − m 2 − 1) 

z
p(x ≤ z ) = k ∫ (x − r1) m1 (−r 2 + x) m 2 dx let u= x-r2 then
r1
z −r 2
p(x ≤ z ) = k
∫ (u + r 2 − r1)
m1
(u)
m2
du =ka
m1+ m 2
r1− r 2
z −r 2
∫
a
u
u
( − 1) m1 ( ) m 2 du let wu = a then
a
a
1
du=aw dw then p( x ≤ z ) = ka
-2
m1+ m 2 +1
∫ (1 − w)
m1
a
z −r 2
Transition Pearson type II
(t 2 − x 2 ) M
t 2 M +1 β (M + 1,0.5)
−t < x < t
f ( x) =
45
(w ) −m1− 2−m 2 dx = incomplete beta.
find k
s
s
k ∫ (s − x ) dx = 1 = 2k ∫ (s − x ) dx = 2ks
2
2 m
2
−s
2 m
s
2m
0
let u = (1 −
k=
s
2m +1
x2 m
∫0 (1 − s 2 ) dx =1
0
1
1
1
2
2
−
x2
2m
2m
2m +1
m s
m s
m
2
)
then
2
ks
u
(
)
du
=
ks
u
(
)
du
=
ks
u
(
1
−
u
)
du = 1
∫1 2x
∫0 x
∫0
s2
1
beta(m + 1,.5)
z
0 2

2 m
2
2 m
F(z) = p(x ≤ z) = k ∫ (s − x ) dx = k  ∫ (s − x ) dx + ∫ (s − x ) dx
−s
0
−s

s
2
1−
2 m
x2
= u then
s2
F(z) = p(x ≤ z) =
1
1
1
s
− s 2m ∫ (u) m  sdu − s 2m +1
2
2
x
0
1−
z2
s2
∫
−1
(u) m (1 − u ) 2 sdu =
1
2
.5 incomplete beta( 1− z2 ,m+1,.5) for z > 0 and for z<0
s
F(z) = p(x ≤ z)=1-.5 incomplete beta( 1 −
z2
s2
,m+1,.5)
Pearson type III
f ( x) =
e − N N N e − Nx ( N + x) N
Γ( N 2 )
−N < x < N
2
2
2
−1
e −n n n
given k=
gamma (n 2 )
2
2
46
1 1
1
+
2 2 beta(m + 1,.5)
1
∫ (u) (1 − u )
m
1−
z
2
s2
−1
2
du =
z
∫
Then F(z) = p(x ≤ z) = k e
− nx
(n + x )n −1 dx
2
−n
Let u = n(n+x) which implies F(z) = p(x ≤ z) =
n (n + z )
∫
k
e −u + n (u )
n 2 −1
2
n1−n n −1du =
2
0
1
gamma (n 2 )
n (n + z )
∫
e −u (u )
n 2 −1
du =
0
= incomplete gamma(n(n+z),n2).
Pearson type V
Given k =
X+ r =
∞
z
2 r ( m −1)
−
(2r (m − 1)) 2m −1
− 2m
then F(z) = p(x ≤ z) = k ∫ e x + r (r + x )
dx
gamma (2m − 1)
−n
1
which implies F(z) = p(x ≤ z) =
u
k ∫ e − 2r (m −1) (u )
2m− 2
du =
1
z +r
(2r(m − 1) )2m−1
gamma (2m − 1)
∞
∫e
− 2 r ( m −1)
(u )2m −2 du = incomplete gamma.
1
z +r
Pearson type VII
2 m −1
s1 ( x 2 + s1 ) − m
β (m − 0.5, 0.5)
−∞ < x < +∞
f ( x) =
2
z
k ∫ (s + x )
2
−∞
let
2 −m
given k=
s 2m −1
then F(z) = p(x ≤ z) =
beta(m − .5,.5)
z
0 2

2 −m
dx =k  ∫ (s + x ) dx + ∫ (s 2 + x 2 ) −m dx
0
 −∞

1
x2
= 1 + 2 which implies then F(z) = p(x ≤ z) =
u
s
47
−
m−
s 1− 2m
s1− 2m
2
2
u
(
1
−
u
)
du
+
2 ∫0
2
1
3
1
1
∫u
m−
3
2
−
1
(1 − u) 2 du
s2
s2 +z 2
for z > 0

s2
, m − .5,.5  =
2
2

s +z

F(z) = p(x ≤ z) = 1-.5 incomplete beta 
=.5 +
1
1
beta(m − .5,.5)
∫u
m−
3
2
−
1
(1 − u) 2 d
s2
s 2 +z 2
For z <0
F(z) = p(x ≤ z) =
s 1− 2m
2
1
∫u
m−
3
2
−
1
(1 − u) 2 d u.
s2
s 2 +z 2
Pearson type VIII
f ( x) =
(1 − 2m)( x − r1 )−2 m
(r2 − r1 )1−2 m
where k =
r1 < x < r2
1 − 2m
then
(r 2 − r1)1−2m
1 − 2m > 0
z
∫
F(z) = p(x ≤ z) = k ( x − r1)
− 2m
r1
Pearson type XI
− x + r2 m2
)
x − r1
f(x) =
β (m2 + 1, − m2 + 1)(− r1 + r2 )
(
r1 < x < r2
given k =
1
(r 2 − r1)beta(m 2 + 1,−m 2 + 1)
48
1− 2 m
z − r1)
(
u.
dx =
(r 2 − r1)1− 2m
z
∫
then F(z) = p(x ≤ z) = k ( x − r1)
−m 2
(r 2 − x) m 2 dx .
r1
let r2-x=u then F(z) = p(x ≤ z) = k
r 2 −r 1
∫ (u )
m2
(r 2 − r1 − u) −m 2 du
r 2− z
let z =
1
u
which implies F(z) = p(x ≤ z) =
r 2 − r1
beta(m2 + 1,1 − m 2)
incomplete beta
49
1
∫ (z )
r 2− z
r 2 − r1
m2
(1 − z ) −m 2 dz