Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Construction and Properties of Bayesian Nonparametric Regression Models – Isaac Newton Institute Workshop – 6-10 August 2007 – Cambridge, UK Some results on Bayesian nonparametric priors derived from Poisson-Kingman models Annalisa Cerquetti Istituto di Metodi Quantitativi, Bocconi University Milano, Italy [email protected] A BSTRACT We elucidate some connections between Poisson-Kingman models for random partitions (Pitman, 2003) and priors recently arising in a nonparametric Bayesian context. In particular we show that normalized generalized Gamma (N-GG) processes arise from exponential tilting of Poisson-Kingman models derived from the stable law. It turns out that results on quantities of statistical interest in Bayesian nonparametrics under those priors, like the analogous of the Blackwell-MacQueen prediction rules or the distribution of the number of distinct elements observed in a sample, can be easily derived from Pitman’s results. This construction also elucidates that N-GG priors induce exchangeable partition probability functions in Gibbs product form (Gnedin and Pitman, 2006). P RELIMINARIES AND B ASIC D EFINITIONS In Lijoi, Mena and Prünster (2005) the normalized Inverse Gaussian (N-IG) process has been introduced as an alternative to the Dirichlet process to be used in Bayesian nonparametric mixture modeling. By mimicking Ferguson’s (1973) famous construction of the Dirichlet process, the authors define a random discrete probability measure P , on a Polish space (S, S), whose finite dimensional distributions have the multivariate law of a vector of n independent r.v.’s with inverse Gaussian distribution divided by their sum. Here we show that the larger class of normalized generalized Gamma processes (N-GG), (already considered in James, 2002; see also Lijoi et al., 2007) may be derived from Poisson-Kingman models for random partitions of the positive integers (Pitman, 2003). In particular these processes arise as random discrete probability measures whose ranked atoms follow a Poisson-Kingman distribution derived from a positive α−stable law with mixing distribution the exponentially tilted version of the stable density. We start with some well-known facts about random measures and random partitions. First recall that given a strictly positive r.v. T , with density fT and Laplace transform Z ∞ E(e−λT ) = e−λtfT (t)dt = exp{−ψ(λ)} 0 where, according to Lévy-Kintchine formula, for λ > 0, ψ(λ) = R∞ −λx)ρ(dx) is the Laplace exponent, f (·) is uniquely iden(1−e T 0 R∞ tified by its unique Lévy measure ρ(·), which satisfies 0 ρ(dx) = ∞. If H(·) is a probability measure on a Polish space (S, S), fixed and non-atomic, then for each T and H, one may construct (see e.g. Kingman, 1993) a completely random measure µ on S, characterized by the Laplace functional for every positive measurable function g on S Z Z ∞ E[e−µ(g)|H] = exp − (1 − e−g(s)x)ρ(dx)H(ds) S 0 R Lévy density ρ and mixing distribution γ given by Z ∞ P K(ρ, γ) := P K(ρ|t)γ(dt), (1) 0 where P K(ρ|t) is the regular conditional distribution of (Pi) given (T = t) constructed above, and γ is an arbitrary probability distribution on (0, ∞). If γ(·) = fT (·) then P K(ρ, γ) = P K(ρ). E XPONENTIAL TILTING IN α STABLE P OISSON -K INGMAN MODELS In Section 4.2 Pitman (2003) focuses on exponential tilting as one of the basic operations on Lévy densities which lead to a tractable class of mixed PK partitions models. The idea of tilting density functions is very old, in Lévy processes setting the equivalent transformation it is also known as Esscher transform (see e.g. Sato, 1999). Here we recall the basic definition. Given a probability density f (·) on (0, ∞), with Laplace exponent ψ(λ), the corresponding family of exponentially tilted densities fλ(·) is given by fλ(t) = exp{ψ(λ) − λt}f (t) for every λ > 0 and has corresponding Laplace transform, for every b > 0, (2) exp{−ψλ(b)} = exp{−ψ(b + λ) + ψ(λ)}, if additionally f (·) is infinitely divisible, then Z ∞ ψλ(b) = (1 − e−bs)e−λsρ(ds), P (·) := ∞ X therefore tilting a probability density yields a corresponding family of tilted Lévy measures, ρλ(·) = e−λtρ(·), for every λ > 0. The interest in exponential tilting lies essentially in the fact that a basic PK model equivalent to a mixed PK model exists if and only if the mixing density belongs to the family of the corresponding tilted densities, in which case the basic model is driven by the tilted version of the Lévy measure of the mixed model, in short P K(ρ2) = P K(ρ1, γ) if and only if ρ2(s) = ρ1(s) exp{−λs}, and γ(t) = f1(t) exp{ψ1(λ) − λt} (see Pitman, 2003, Sec. 4.2; Cerquetti, 2007b). Additionally the focus on mixed PK models derived from the positive α-stable laws arises from the particularly tractable Gibbs product form of type α of the EPPF these models induce, which is p(n1, . . . , nk ) = Vn,k k Y (1 − α)nj −1↑, ρα,δ (s) = δ2α and ρλα,δ (s) = PiδYi (·). δ2αα bution with Lévy density ρ, and also enlarged the basic model by considering the larger class of Poisson-Kingman distributions with n! Sα(n, k) := Bn,k ((1 − α)•−1↑) = k! X k Y 1 (1 − α)nj −1↑ nj ! (n1,...,nk ) j=1 where the sum extends over the space of all compositions (n1, . . . , nk ) of n. Sα(n, k) is known as the generalized Stirling number of the first kind, and has the following explicit formula, k X 1 k j Sα(n, k) = k (−jα)n↑. (−1) j α k! In Section 6.1 Pitman (2003) also introduces the concept of αdiversity for a random partition following a PK distribution derived from an α-stable law. Pitman states that an exchangeable partition Π of the positive integers N has α-diversity Dα, if and only if there exists a random variable Dα, with 0 < Dα < ∞ a.s., such that, if Kn is the number of blocks in the restriction of Π to [n], then Kn a.s. −→ Dα as n → ∞. (8) α n In Proposition 13, item (i), Pitman states that if Π is a P K(ρα, γ) partition of N for some α ∈ (0, 1), then Dα = T −α, where T has distribution γ. Recalling that P K(ρλα) = P K(ρα, fαλ) for fαλ(t) = fα(t) exp(ψα(λ)−λt), an elementary transformation yields the asymptotic distribution of Kn/nα for samples from normalized generalized Gamma priors. For fα,δ (t) the density of an (α, δ) stable law, then Kn a.s. −→ Dα α n as n → ∞ where Dα has density 1 1) − 1 ζ α fα,δ (x α ) fDα (x) = exp δζ − . 1 +1 2 x αx α (9) Γ(1 − α) 1 ζ ψα,δ (b) = −δζ + δ(ζ α + 2b)α, which is well-known to identify to the family of generalized Gamma distributions defined for α < 1 (see e.g. Brix, 1999). ζ Notice that, for α ∈ (0, 1), ψα,δ (∞) = ∞, hence the necessary condition for normalization, P (T > 0) = 1, is satisfied. It follows that an explicit expression for the EPPF induced by a P K(ρλα) model, hence by sampling from a normalized generalized Gamma process, can be easily derived from the general form for basic PK models given in Pitman (2003), without resorting to the more complex theory for mixed PK models and is given by (see Cerquetti, 2007), p(n1, . . . , nk ) = k eδζ δ k αk 2n Y Γ(n) Z ∞ (1−α)nj −1↑ j=1 0 notice the Gibbs product form, for Vn,k = Z ∞ δζ k k n e δ α 2 Γ(n) 0 1 λn−1 Remark 2. The construction we propose of normalized generalized Gamma processes also suggests an alternative derivation (see Cerquetti, 2007b) of a result recently obtained by means of an analitic technique in Lijoi et al. (2007b), namely that this is the unique family of random discrete probability measures with EPPF in Gibbs form admitting a construction via normalization of completely random measures. C ERQUETTI , A. (2007) A note on Bayesian nonparametric priors derived from exponentially tilted Poisson-Kingman models. Statistics & Probability Letters (In press). C ERQUETTI , A. (2007b) On a Gibbs characterization of normalized generalized Gamma processes. arXiv:0707.3408. G NEDIN S. & P ITMAN , J. (2006) Exchangeable Gibbs partitions and Stirling triangles. Journal of Mathematical Sciences, 138, 3, 5674-5685. J AMES , L. F. (2002) Poisson Process Partition calculus with applications to exchangeable models and Bayesian nonparametrics. arXiv:math.PR/0205093. K INGMAN , J.F.C (1978) The representation of partition structures. J. London Math. Soc. 2, 374–380. L IJOI , A., M ENA , R. AND P R ÜNSTER , I. (2005) Hierarchical mixture modeling with normalized Inverse-Gaussian priors. JASA, vol. 100, 1278-1291. L IJOI , A., M ENA , R. AND P R ÜNSTER , I. (2007) Controlling e the reinforcement in Bayesian nonparametric mixture models. n−1 λ dλ, 1 JRSS B, (In press). (ζ α + 2λ)n−kα L IJOI , A., P R ÜNSTER I. AND WALKER , S.G. (2007b) Investi(5) gating nonparametric priors with Gibbs structure. Statistica Sinica, (In press). α +2λ)α −δ(ζ e 1 α Remark 1. Notice that analogous results for normalized generalized Gamma processes have been derived in Lijoi et al. (2007) in relation with Bayesian nonparametric mixture modeling, without resorting to Pitman’s results. Specializing formulas (5) (7) and (9) for α = 1/2 results for N-IG priors as in Lijoi et al. (2005) are recovered. S ELECTED R EFERENCES s−1−αe−λs, i=1 ↓ Pitman termed the law Q of (Pi) on P1 Poisson-Kingman distri- (4) ζ 1/α and by (2) the corresponding Laplace exponent, for λ = 2 re- P r(Πn = {A1, . . . , Ak }) = p(n1, . . . , nk ), µ(·) P (·) = = T α s−1−α. Γ(1 − α) Now the basic P K(ρα,δ ) model is well-known to yield the laws of the ranked atoms of the normalized α-stable process. By exponential tilting of ρα,δ one obtains: sults ∞ X (3) Notice that, although the probability density functions of positive (α, δ)-stable laws for general α ∈ (0, 1) and δ > 0 are known only in the form of series representations, corresponding Laplace exponent and Lévy density are as follows: i=1 where, for j = 1, 2, . . . , k, nj = |Aj | ≥ 1 and j=1 nj = n, for some non-negative symmetric function p of finite sequences of positive integers called the exchangeable partition probability function (EPPF) determined by Π. Pitman (2003), generalizing Kingman’s construction of the Dirichlet process as a Gamma process with independent increments divided by the sum, introduces a large class of RDPMs deriving the law Q by a discrete distribution (Pi) = (Ji/T ), where J1 ≥ J2 ≥ · · · ≥ 0 are the random lenghts of the ranked points of a Poisson process with Lévy denP sity ρ and T = i Ji. It is easy to see that this construction is formally equivalent to the homogeneous normalized random measure’s construction given above, so that (7) ( for α ∈ (0, 1), where the weights Vn,k , for n > 1 and 1 ≤ k ≤ n are the solutions to a specific backward recursion, (see Gnedin and Pitman, 2006, Th. 12, item iii). ψα,δ (λ) = δ(2λ)α Pk P r(Kn = k) = Vn,k Sα(n, k) j=1 PiδYi (·), for Yi iid ∼ H(·) and (Pi) ∼ Q. From Kingman’s (1978) theory of exchangeable random partitions, a sample (X1, . . . , Xn) from a RDPM P induces a random partition Π of the positive integers N by the exchangeable equivalence relation i ≈ j ⇔ Xi = Xj , that is to say two positive integers, belong to the same block of Π if and only if Xi = Xj , where Xi|P are iid ∼ P . It follows that, for each restriction Πn = {A1, . . . , Ak } of Π to [n] = {1, . . . , n}, and for each n = 1, 2, . . ., Even the distribution of the number of blocks in the random partition induced by a sample from a N-GG prior, that typically plays the role of the prior on the number of components in Bayesian nonparametric mixture modeling, can be easily derived. From Gnedin and Pitman (2006) an EPPF in Gibbs product form induces the law of the number of blocks Kn, by summation over all partitions {A1, . . . , Ak } of [n] with k blocks, and |Aj | = nj , j = 1, . . . , k. Specializing for EPPFs in Gibbs form of type α one gets j=1 0 where µ(g) = S g(s)µ(ds), so that T = µ(S) := S I{s ∈ S}µ(ds). An (homogeneous) normalized random measure (NRM) P on (S, S) is then obtained by normalizing µ as follows As proved e.g. in James (2003) NRMs select almost surely discrete distributions, and it is well known that given a law Q on ↓ the space P1 of decreasing sequences of positive numbers with sum 1, and a law H(·) on a Polish space (S, S), a random discrete probability measure (RDPM) P on S may always be defined as D ISTRIBUTION OF THE NUMBER OF BLOCKS for Vn,k as in (6) and R µ(·) µ(·) P (·) := = . µ(S) T By (5) general expressions for predictive distributions easily follow by applying well-known results for species sampling models (see e.g. Hansen and Pitman, 2000). (ζ + 2λ)n−kα 1 −δ(ζ α +2λ)α dλ. (6) P ITMAN , J. (2003) Poisson-Kigman partitions, IMS LN - Monograph Series, 40, 1-34.