Download Some results on Bayesian nonparametric priors derived from

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Construction and Properties of Bayesian Nonparametric Regression Models – Isaac Newton Institute Workshop – 6-10 August 2007 – Cambridge, UK
Some results on Bayesian nonparametric priors derived from Poisson-Kingman models
Annalisa Cerquetti
Istituto di Metodi Quantitativi, Bocconi University Milano, Italy
[email protected]
A BSTRACT
We elucidate some connections between Poisson-Kingman models for
random partitions (Pitman, 2003) and priors recently arising in a nonparametric Bayesian context. In particular we show that normalized generalized
Gamma (N-GG) processes arise from exponential tilting of Poisson-Kingman
models derived from the stable law. It turns out that results on quantities of
statistical interest in Bayesian nonparametrics under those priors, like the
analogous of the Blackwell-MacQueen prediction rules or the distribution of
the number of distinct elements observed in a sample, can be easily derived
from Pitman’s results. This construction also elucidates that N-GG priors
induce exchangeable partition probability functions in Gibbs product form
(Gnedin and Pitman, 2006).
P RELIMINARIES AND B ASIC D EFINITIONS
In Lijoi, Mena and Prünster (2005) the normalized Inverse
Gaussian (N-IG) process has been introduced as an alternative
to the Dirichlet process to be used in Bayesian nonparametric
mixture modeling. By mimicking Ferguson’s (1973) famous construction of the Dirichlet process, the authors define a random
discrete probability measure P , on a Polish space (S, S), whose
finite dimensional distributions have the multivariate law of a
vector of n independent r.v.’s with inverse Gaussian distribution
divided by their sum. Here we show that the larger class of
normalized generalized Gamma processes (N-GG), (already
considered in James, 2002; see also Lijoi et al., 2007) may be
derived from Poisson-Kingman models for random partitions of
the positive integers (Pitman, 2003). In particular these processes
arise as random discrete probability measures whose ranked
atoms follow a Poisson-Kingman distribution derived from a
positive α−stable law with mixing distribution the exponentially
tilted version of the stable density.
We start with some well-known facts about random measures
and random partitions. First recall that given a strictly positive
r.v. T , with density fT and Laplace transform
Z ∞
E(e−λT ) =
e−λtfT (t)dt = exp{−ψ(λ)}
0
where,
according to Lévy-Kintchine formula, for λ > 0, ψ(λ) =
R∞
−λx)ρ(dx) is the Laplace exponent, f (·) is uniquely iden(1−e
T
0
R∞
tified by its unique Lévy measure ρ(·), which satisfies 0 ρ(dx) =
∞. If H(·) is a probability measure on a Polish space (S, S), fixed
and non-atomic, then for each T and H, one may construct (see
e.g. Kingman, 1993) a completely random measure µ on S, characterized by the Laplace functional for every positive measurable
function g on S
Z Z ∞
E[e−µ(g)|H] = exp −
(1 − e−g(s)x)ρ(dx)H(ds)
S 0
R
Lévy density ρ and mixing distribution γ given by
Z ∞
P K(ρ, γ) :=
P K(ρ|t)γ(dt),
(1)
0
where P K(ρ|t) is the regular conditional distribution of (Pi)
given (T = t) constructed above, and γ is an arbitrary probability
distribution on (0, ∞). If γ(·) = fT (·) then P K(ρ, γ) = P K(ρ).
E XPONENTIAL TILTING
IN α STABLE P OISSON -K INGMAN MODELS
In Section 4.2 Pitman (2003) focuses on exponential tilting as
one of the basic operations on Lévy densities which lead to
a tractable class of mixed PK partitions models. The idea of
tilting density functions is very old, in Lévy processes setting the equivalent transformation it is also known as Esscher
transform (see e.g. Sato, 1999). Here we recall the basic definition.
Given a probability density f (·) on (0, ∞), with Laplace exponent ψ(λ), the corresponding family of exponentially tilted
densities fλ(·) is given by
fλ(t) = exp{ψ(λ) − λt}f (t)
for every λ > 0 and has corresponding Laplace transform, for
every b > 0,
(2)
exp{−ψλ(b)} = exp{−ψ(b + λ) + ψ(λ)},
if additionally f (·) is infinitely divisible, then
Z ∞
ψλ(b) =
(1 − e−bs)e−λsρ(ds),
P (·) :=
∞
X
therefore tilting a probability density yields a corresponding
family of tilted Lévy measures, ρλ(·) = e−λtρ(·), for every λ > 0.
The interest in exponential tilting lies essentially in the fact
that a basic PK model equivalent to a mixed PK model exists
if and only if the mixing density belongs to the family of the
corresponding tilted densities, in which case the basic model is
driven by the tilted version of the Lévy measure of the mixed
model, in short
P K(ρ2) = P K(ρ1, γ)
if and only if
ρ2(s) = ρ1(s) exp{−λs}, and γ(t) = f1(t) exp{ψ1(λ) − λt}
(see Pitman, 2003, Sec. 4.2; Cerquetti, 2007b). Additionally the
focus on mixed PK models derived from the positive α-stable laws
arises from the particularly tractable Gibbs product form of type
α of the EPPF these models induce, which is
p(n1, . . . , nk ) = Vn,k
k
Y
(1 − α)nj −1↑,
ρα,δ (s) = δ2α
and
ρλα,δ (s) =
PiδYi (·).
δ2αα
bution with Lévy density ρ, and also enlarged the basic model by
considering the larger class of Poisson-Kingman distributions with
n!
Sα(n, k) := Bn,k ((1 − α)•−1↑) =
k!
X
k
Y
1
(1 − α)nj −1↑
nj !
(n1,...,nk ) j=1
where the sum extends over the space of all compositions
(n1, . . . , nk ) of n. Sα(n, k) is known as the generalized Stirling
number of the first kind, and has the following explicit formula,
k
X
1
k
j
Sα(n, k) = k
(−jα)n↑.
(−1)
j
α k!
In Section 6.1 Pitman (2003) also introduces the concept of αdiversity for a random partition following a PK distribution derived from an α-stable law. Pitman states that an exchangeable
partition Π of the positive integers N has α-diversity Dα, if and
only if there exists a random variable Dα, with 0 < Dα < ∞ a.s.,
such that, if Kn is the number of blocks in the restriction of Π to
[n], then
Kn a.s.
−→ Dα
as n → ∞.
(8)
α
n
In Proposition 13, item (i), Pitman states that if Π is a P K(ρα, γ)
partition of N for some α ∈ (0, 1), then Dα = T −α, where T
has distribution γ. Recalling that P K(ρλα) = P K(ρα, fαλ) for
fαλ(t) = fα(t) exp(ψα(λ)−λt), an elementary transformation yields
the asymptotic distribution of Kn/nα for samples from normalized generalized Gamma priors. For fα,δ (t) the density of an
(α, δ) stable law, then
Kn a.s.
−→ Dα
α
n
as n → ∞
where Dα has density
1
1)
−
1 ζ α fα,δ (x α )
fDα (x) = exp δζ −
.
1
+1
2 x
αx α
(9)
Γ(1 − α)
1
ζ
ψα,δ (b) = −δζ + δ(ζ α + 2b)α,
which is well-known to identify to the family of generalized
Gamma distributions defined for α < 1 (see e.g. Brix, 1999).
ζ
Notice that, for α ∈ (0, 1), ψα,δ (∞) = ∞, hence the necessary
condition for normalization, P (T > 0) = 1, is satisfied.
It follows that an explicit expression for the EPPF induced by a
P K(ρλα) model, hence by sampling from a normalized generalized
Gamma process, can be easily derived from the general form for
basic PK models given in Pitman (2003), without resorting to the
more complex theory for mixed PK models and is given by (see
Cerquetti, 2007),
p(n1, . . . , nk ) =
k
eδζ δ k αk 2n Y
Γ(n)
Z ∞
(1−α)nj −1↑
j=1
0
notice the Gibbs product form, for
Vn,k =
Z ∞
δζ
k
k
n
e δ α 2
Γ(n)
0
1
λn−1
Remark 2. The construction we propose of normalized generalized Gamma processes also suggests an alternative derivation
(see Cerquetti, 2007b) of a result recently obtained by means of
an analitic technique in Lijoi et al. (2007b), namely that this is
the unique family of random discrete probability measures with
EPPF in Gibbs form admitting a construction via normalization
of completely random measures.
C ERQUETTI , A. (2007) A note on Bayesian nonparametric priors
derived from exponentially tilted Poisson-Kingman models.
Statistics & Probability Letters (In press).
C ERQUETTI , A. (2007b) On a Gibbs characterization of normalized generalized Gamma processes. arXiv:0707.3408.
G NEDIN S. & P ITMAN , J. (2006) Exchangeable Gibbs partitions
and Stirling triangles. Journal of Mathematical Sciences, 138, 3,
5674-5685.
J AMES , L. F. (2002) Poisson Process Partition calculus with applications to exchangeable models and Bayesian nonparametrics. arXiv:math.PR/0205093.
K INGMAN , J.F.C (1978) The representation of partition structures. J. London Math. Soc. 2, 374–380.
L IJOI , A., M ENA , R. AND P R ÜNSTER , I. (2005) Hierarchical
mixture modeling with normalized Inverse-Gaussian priors.
JASA, vol. 100, 1278-1291.
L IJOI , A., M ENA , R. AND P R ÜNSTER , I. (2007) Controlling
e
the reinforcement in Bayesian nonparametric mixture models.
n−1
λ
dλ,
1
JRSS B, (In press).
(ζ α + 2λ)n−kα
L IJOI , A., P R ÜNSTER I. AND WALKER , S.G. (2007b) Investi(5)
gating nonparametric priors with Gibbs structure. Statistica
Sinica, (In press).
α +2λ)α
−δ(ζ
e
1
α
Remark 1. Notice that analogous results for normalized generalized Gamma processes have been derived in Lijoi et al. (2007)
in relation with Bayesian nonparametric mixture modeling,
without resorting to Pitman’s results. Specializing formulas (5)
(7) and (9) for α = 1/2 results for N-IG priors as in Lijoi et al.
(2005) are recovered.
S ELECTED R EFERENCES
s−1−αe−λs,
i=1
↓
Pitman termed the law Q of (Pi) on P1 Poisson-Kingman distri-
(4)
ζ 1/α
and by (2) the corresponding Laplace exponent, for λ = 2 re-
P r(Πn = {A1, . . . , Ak }) = p(n1, . . . , nk ),
µ(·)
P (·) =
=
T
α
s−1−α.
Γ(1 − α)
Now the basic P K(ρα,δ ) model is well-known to yield the laws
of the ranked atoms of the normalized α-stable process. By exponential tilting of ρα,δ one obtains:
sults
∞
X
(3)
Notice that, although the probability density functions of positive (α, δ)-stable laws for general α ∈ (0, 1) and δ > 0 are known
only in the form of series representations, corresponding Laplace
exponent and Lévy density are as follows:
i=1
where, for j = 1, 2, . . . , k, nj = |Aj | ≥ 1 and j=1 nj = n, for
some non-negative symmetric function p of finite sequences of
positive integers called the exchangeable partition probability function (EPPF) determined by Π. Pitman (2003), generalizing Kingman’s construction of the Dirichlet process as a Gamma process
with independent increments divided by the sum, introduces a
large class of RDPMs deriving the law Q by a discrete distribution (Pi) = (Ji/T ), where J1 ≥ J2 ≥ · · · ≥ 0 are the random
lenghts of the ranked
points of a Poisson process with Lévy denP
sity ρ and T = i Ji. It is easy to see that this construction is formally equivalent to the homogeneous normalized random measure’s construction given above, so that
(7)
(
for α ∈ (0, 1), where the weights Vn,k , for n > 1 and 1 ≤ k ≤ n
are the solutions to a specific backward recursion, (see Gnedin
and Pitman, 2006, Th. 12, item iii).
ψα,δ (λ) = δ(2λ)α
Pk
P r(Kn = k) = Vn,k Sα(n, k)
j=1
PiδYi (·),
for Yi iid ∼ H(·) and (Pi) ∼ Q. From Kingman’s (1978) theory
of exchangeable random partitions, a sample (X1, . . . , Xn) from
a RDPM P induces a random partition Π of the positive integers N
by the exchangeable equivalence relation i ≈ j ⇔ Xi = Xj , that
is to say two positive integers, belong to the same block of Π if
and only if Xi = Xj , where Xi|P are iid ∼ P . It follows that, for
each restriction Πn = {A1, . . . , Ak } of Π to [n] = {1, . . . , n}, and
for each n = 1, 2, . . .,
Even the distribution of the number of blocks in the random
partition induced by a sample from a N-GG prior, that typically plays the role of the prior on the number of components
in Bayesian nonparametric mixture modeling, can be easily derived. From Gnedin and Pitman (2006) an EPPF in Gibbs product
form induces the law of the number of blocks Kn, by summation
over all partitions {A1, . . . , Ak } of [n] with k blocks, and |Aj | = nj ,
j = 1, . . . , k. Specializing for EPPFs in Gibbs form of type α one
gets
j=1
0
where µ(g) = S g(s)µ(ds), so that T = µ(S) := S I{s ∈ S}µ(ds).
An (homogeneous) normalized random measure (NRM) P on (S, S)
is then obtained by normalizing µ as follows
As proved e.g. in James (2003) NRMs select almost surely discrete distributions, and it is well known that given a law Q on
↓
the space P1 of decreasing sequences of positive numbers with
sum 1, and a law H(·) on a Polish space (S, S), a random discrete
probability measure (RDPM) P on S may always be defined as
D ISTRIBUTION OF THE NUMBER OF BLOCKS
for Vn,k as in (6) and
R
µ(·)
µ(·)
P (·) :=
=
.
µ(S)
T
By (5) general expressions for predictive distributions easily
follow by applying well-known results for species sampling
models (see e.g. Hansen and Pitman, 2000).
(ζ + 2λ)n−kα
1
−δ(ζ α +2λ)α
dλ.
(6)
P ITMAN , J. (2003) Poisson-Kigman partitions, IMS LN - Monograph Series, 40, 1-34.