Download Exponential families

Exponential families Peter D. Hoff September 26, 2013 Much of this content comes from Lehmann and Casella [1998] section 1.5. Contents 1 The canonical exponential family 1 2 Basic results 6 1 The canonical exponential family Construction of an exponential family of densities Exponential families are classes of probability measures constructed from 1. a dominating measure µ 1 2. a statistic t(X) Let • (X , A) be a measurable space, • µ be a measure on A, • t : X → Rs For η ∈ Rs , define the measure Z eη T t(x) µ(dx) ∀A ∈ A Z T A(η) = log νη (X ) = log eη t(x) µ(dx). νη (A) = A If A(η) < ∞, we can define a probability measure Pη on (X , A) via its density w.r.t. µ: T p(x|η) = eη t(x)−A(η) , x ∈ X Z Pη (A) = p(x|η)µ(dx). A Note that • Pη (X ) = 1 by construction, and so (X , A, Pη ) is a probability space. • Pη is absolutely continuous w.r.t. µ, with RN density p(x|η). R T We can construct such a density for each η ∈ Rs for which eη t(x) dx is finite. Definition 1 (canonical exponential family). Let • (X , A, µ) be a measure space, • t : X → Rs be an s-dimensional statistic that does not satisfy any linear constraints, R T • A(η) = log eη t(x) µ(dx). A collection of densities given by {p(x|η) = exp(η T t(x) − A(η)) : η ∈ H̃} , where H̃ ⊂ H = {η : A(η) < ∞} is called an s-dimensional exponential family. 2 Notes: • The set H = {η : A(η) < ∞} is called the natural parameter space. • Each density p(x|η) defines a measure Pη µ via Pη (A) = R A p(x|η)µ(dx). We say that the measures {Pη : η ∈ H̃} “have a common dominating measure” µ. Minimal, full and curved exponential families “Doesn’t satisfy a linear constraint” means 6 ∃a ∈ Rs : a 6= 0, aT t(x) = c ∀x ∈ X . Some authors do not include this “no linear constraints” requirement for the statistic t. If t does satisfy a linear constraint, the natural parameter space includes points that correspond to the same density and probability distribution. As a result, the parameter will be non-identifiable (in the natural parameter space): Definition 2. A model P = {p(x|η) : η ∈ H} for (X , A) is nonidentifiable if there exists η1 , η2 ∈ H : η1 6= η2 but P (A|η1 ) = P (A|η2 ) ∀A ∈ A. Exercise: Show that if t satisfies a linear constraint and H is the parameter space, then the exponential family model is non-identifiable. Most authors refer to an EFM where t does not satisfy a linear constraint as a minimal parametrization. Since a non-minimal representation can always be made minimal, and the recommendation is always to do so, it seems simplest just to require it in the definition. Definition 3 (full rank). If the parameter space for an exponential family contains an sdimensional open set, then it is called full rank. An exponential family that is not full rank is generally called a curved exponential family, as typically the parameter space is a curve in Rs of dimension less than s. Examples Often an exponential family model is parameterized as P = {p(x|θ) = h(x) exp{η(θ)T t(x) − B(θ) : θ ∈ Θ}. This is done 3 • if the parameter θ is more interpretable than η • so that the dominating measure can be something simple. Example(normal model): The univariate normal model on (R, B(R)) can be represented with the class of densities {p(x|µ, σ 2 ) : µ ∈ R, σ 2 ∈ R+ } w.r.t. Lebesgue measure, where p(x|µ, σ 2 ) = (2πσ 2 )−1/2 exp(−(x − µ)2 /[2σ 2 ]) = (2π)−1/2 exp(−x2 2σ1 2 + x σµ2 − µ2 2σ 2 − 12 log σ 2 ). This is the same model as p(x|η) = (2π)−1/2 exp(η T t(x) − A(η)) where ! ! 2 x µ/σ t(x) = , η(µ, σ 2 ) = , A(η) = (µ2 /σ 2 + log σ 2 )/2. 2 2 −1/(2σ ) x To reparameterize back, note that µ = −η1 /(2η2 ) and σ 2 = −1/(2η2 ). What is the natural parameter space? Does it correspond to (µ, σ 2 ) ∈ R × R+ ? Recall, Z ∞ H = {η1 , η2 : 2 eη1 x+η2 x dx < ∞} −∞ − Convince yourself that H = R × R , which gives (µ, σ 2 ) ∈ R × R+ . The exponential family model defined by t(x) = (x, x2 ) and H̃ = H is the normal model. The normal model with (µ, σ 2 ) ∈ R × R+ is a two-dimensional full rank exponential family. Example(a curved normal model): Consider the normal model having the following mean-variance relationship: X ∼ normal(θ, θ2 ) , θ ∈ R. Let P = {p(x|µ, σ 2 ) : µ ∈ R, σ 2 = µ2 }, where p(x|µ, σ 2 ) are the normal densities given above. The densities in this model can be written p(x|θ) = (2πθ2 )−1/2 exp(−(x − θ)2 /[2θ2 ]) ∝x exp(−(x2 − 2θx + θ2 )/[2θ2 ]) = exp(x/θ − x2 /[2θ2 ] − 1/2) ∝x exp(x/θ − x2 /[2θ2 ]) ≡ exp(η1 t1 (x) + η2 t2 (x)). 4 Since t(x) = (x, x2 ) doesn’t satisfy a linear constraint, this is a two-dimensional exponential family. The natural parameter space corresponding to t(x) is η ∈ R × R−1 . Our reduced parameter space is η(θ) = (1/θ, −1/[2θ2 ]). This is a one-dimensional curve in two-dimensional space. Draw a picture. This family is a two-dimensional exponential family (in minimal form). It is not a full rank exponential family. Example:(multinomial model) Let X ∼ multinomial(n, θ), for which P Θ = {θ ∈ Rp : θj = 1} and P X = {x ∈ {0, 1, . . . , n} : xj = n}. The density of Pθ w.r.t. counting measure µ on X is p(x|θ) = n x θ1x1 × · · · × θpxp . We can rewrite this in canonical exponential form as p(x|η) = exp(x1 η1 + · · · xp ηp ), where ηj = log θj and the dominating measure is µ̃(x) = n x × µ(x), i.e. the multinomial coefficient “has been absorbed into the dominating measure”. P The parameter space for this model is H̃ = {η ∈ Rp : eηj = 1}, which is a p−1-dimensional curve in Rp . Is the multinomial model a p-dimensional curved exponential family? Note that 1T t(x) = 1 ∀x ∈ X , so this “family” • doesn’t satisfy our definition, or if you prefer • is not in minimal form. 5 Consider the usual parameterization again, but now express the model in terms of t(x) = (x1 , . . . , xp−1 ): Pp−1 x1 θ1 · · · θpn− 1 xj p−1 xj Y θj n n = x θp θp p(x|θ) = n x j=1 = n x exp(η1 x1 + · · · + ηp−1 xp−1 − A(η)), where ηj = log(θj /θp ) and A(η) can be computed as follows: θj = θp eηj X 1 − θp = θp eηj θp = 1+ 1 P eηj A(η) = −n log θp = n log(1 + X eηj ) Thus the multinomial model is a (p − 1)-dimensional exponential family generated by the statistic t(x) = (x1 , . . . , xp−1 ). Does Θ correspond to H? H = {η ∈ Rp−1 : X exp{η1 x1 + · · · + ηp−1 xp−1 } < ∞} = Rp−1 . x∈X This of contains a p − 1-dimensional rectangle, and so the multinomial model is a full rank p − 1-dimensional exponential family. 2 Basic results Convexity of H: The largest EFM based on a statistic t(x) is the one based on the natural parameter space: {p(x|η) : η ∈ H̃} ⊂ {p(x|η) : η ∈ H} since H̃ ⊂ H. 6 The natural parameter space is usually (but not always) open, making this “fullest family” also full rank. It is always the case that H is convex, and that A(η) is convex on H. Theorem 1. The natural parameter space H for densities of the form p(x|η) = exp(η T t(x)− A(η)) is convex, and A(η) is convex on H. Proof. Recall Hölder’s inequality: For a ∈ [0, 1], b = 1 − a, Z Z fg ≤ f 1/a a Z g 1/b b Now let η1 , η2 ∈ H and apply the inequality: Z Z T T A(aη1 +bη2 ) T e = exp((aη1 + bη2 ) t(x)) = eaη1 t ebη2 t Z a Z b η1T t η2T t e e ≤ = eaA(η1 )+bA(η2 ) < ∞ and so aη1 + bη2 ∈ H, and A(η) is convex. Continuity, integration and differentiation The following theorem is useful in a variety of contexts: Theorem 2 (LC 5.8). For any integrable function f the expected value function E[f |η], Z E[f |η] = f (x) exp(η T t(x) − A(η)) µ(dx), is, at any η in the interior of H, 1. continuous as a function of η, 2. has derivatives w.r.t. η of all orders, 3. derivatives can be obtained by differentiating the integrand. The first item is used in two key results in estimation and testing: • In estimation, the theorem implies that risk function for exponential family models are continuous. This will help us characterize all admissible estimators for such models. 7 • In testing, the theorem implies that the power function for any test is continuous. This will help us characterize unbiased testing procedures. An important application of the theorem is the calculation of moments of t. R By definition, eA(η) = eηt µ(dx). Taking derivatives w.r.t. η gives d A(η) e dη 0 A(η) A (η)e 0 = d dη Z = Z A (η) = Z eηt µ(dx) teηt µ(dx) teηt−A(η) µ(dx) = E[t(X)|η]. More generally, Theorem 3 (Barndorff-Neilsen(1978) thm 8.1). Let P = {p(x|η) = exp(η T t−A(η)) : η ∈ H} be an exponential family and η ∈ int H. Then Z ∂k T A(η) e = tk11 (x) × · · · × tks s (x)eη t(x) µ(dx) k1 k1 ∂η1 · · · ∂η1 ∀k1 , . . . , ks ≥ 0. This result helps us with the moment generating function. Moment generating function: T Mt (u1 , . . . , up ) = E[eu t |η] Z T = e(η+u) t−A(η) µ(dx) Z T A(η+u)−A(η) =e e(η+u) t−A(η+u) µ(dx) = eA(η+u)−A(η) This works as long as η is in the interior of H and u is small enough so that η + u ∈ H. From this, we can use the above theorem to show ∂k Mt (u)|u=0 = E[tk11 × · · · × tks s |η]. ∂η1k1 · · · ∂η1k1 8 References E. L. Lehmann and George Casella. Theory of point estimation. Springer Texts in Statistics. Springer-Verlag, New York, second edition, 1998. ISBN 0-387-98502-6. 9

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Exponential families