Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Example 5 Let U = (U1 , . . . , Un ) be a random vector with a distribution function F on Rn . Then for a ∈ R, b ∈ R+ , U + a = (U1 + a. . . . , Un + a), bU = (bU1 , . . . , bUn ), a + bU = (a + bU1 , . . . , a + bUn ), generate location, scale and location/scale families, respectively. (This follows by checking that the corresponding distributions for U + a, bU and a + bU satisfy the definitions for location, scale and location/scale families). ✷ Example 6 Let U = (U1 , . . . , Up ) be a random vector, with Ui independent N (0, 1). Let a� = (a1 , . . . , ap ) ∈ Rp and B an invertible p × p matrix be arbitrary, and define X1 a1 U1 .. .. = + B .. , Xp ap Up or, using matrix formulation, as X = a + BU. Then X is a Gaussian vector and E(X) = a. Cov(X) = E((X − a)(X − a)� ) = E(BU U � B � ) = BB � =: Σ Thus the distribution of X lies in is the family of non singular p-variate Normal distributions: If 1 � e−u u/2 , p/2 (2π) fU (u) = then fX (x) = 1 � (2π)p/2 |B| e−(x−a) Σ −1 (x−a) by the formula for variable transformation in integrands. 10 , ✷ Example 7 (Linear model in Normal distribution) Assume U = (U1 , . . . , Un ) is a stochastic vector and let Xi = ai + bUi , where b > 0 and a = (a1 , . . . , an ) lying in an s-dimensional subspace Ω ⊂ Rn (meaning that every vector a can be written as ai = s � dij βj , j=1 with (β1 , . . . , βs ) ∈ Rs and D = (di j) an n × s matrix of rank s). Then if (U1 , . . . , Un ) are i.i.d. N (0, 1) fX (x) = � 1 − (xi −ai )2 /2b2 √ e . ( 2πb)n ✷ Example 8 (Nonparametric family with support on R) Assume U1 , . . . , Un are i.i.d. r.v.’s with distribution F = N (0, 1). Let G = {g : g : R → R continuous and strictly increasing, lim g(u) = ±∞}. u→± Then under the binary operation · equal to composition, g1 · g2 (x) = g1 (g2 (x)) G is a group (thus (i) · is a map G × G → G, (ii) there is a unit e in G and (iii) for every g there is an inverse g −1 in G such that e = g · g −1 ). The group family of distributions P = {F (g −1 ) : g ∈ G}, is the the set of all continuous distributions with support on R (the corresponding r.v.’s are given by Xi = g(Ui )). ✷ Example 9 (Symmetric distributions) Let F = N (0, 1), and define G = {g : g : R → R continuous and strictly increasing, g(−u) = −g(u), lim g(u) = ±∞}. u→± Then G is group under composition and P = {F (g −1 ) : g ∈ G} is the class of distributions with support on R that are symmetric around the origin. ✷ 11 3.3 Exponential families Let P = {Pθ : θ ∈ Ω} be a family of distributions. Assume that Pθ has a density pθ = dPθ /dµ, with respect to the (σ-finite) measure µ, that is of the form s � pθ (x) = exp( i=1 ηi (θ)Ti (x) − B(θ))h(x). Then the set P is called an s−dimensional exponential family. Here {ηi }, B are real-valued functions defined on the parameter space Ω, and {Ti } are statistics. An alternative form is the so called canonical form s � pη (x) = exp( i=1 ηi Ti (x) − A(η))h(x), with η = (η1 . . . . , ηs ) the so called canonical parameters. Note that in our case µ is either Lebesgue measure, in which case pθ (and pη ) is the density function for a continuous r.v., or µ is counting measure, in which case pθ is the probability mass function of a discrete r.v..The function h is used to prevent the use of more elaborate measures µ than Lebesgue measure and counting measure. We note first that since the pη are supposed to be densities corresponding to a probability measure they should integrate to one, i.e. � pη (x) dµ(x) = � exp which is possible if and only if � exp �� �� � ηi Ti (x) h(x) dµ(x)e−A(η) = 1, � ηi Ti (x) h(x) dµ(x) < ∞. If this holds, the normalizing function A(η) can be chosen as A(η) = log �� exp �� � � ηi Ti (x) h(x) dµ(x) . Definition 2 The set Ω = {η = (η1 , . . . , ηs )} for which � exp �� � ηi Ti (x) h(x) dµ(x) < ∞. holds is called the natural parameter space. ✷ Example 10 Assume X is distributed as N (ξ, σ 2 ), so that θ = (ξ, σ 2 ). Then the density (w.r.t. Lebesgue measure on R) is pθ (x) = √ 1 ξ 1 ξ2 exp( 2 x − 2 x2 − 2 ). σ 2σ 2σ 2πσ −1 Therefore, the natural parameters are ( σξ2 , 2σ 2 ), and the natural parameter space is R × (−∞, 0]. ✷ 12 If the statistics T1 , . . . , Ts are linearly dependent, the number of terms in the expression can be reduced (until they no longer are linearly dependent). If this reduction is not performed, the parameters (or equivalently the probability measures Pθ ) will not be identifiable. Definition 3 Let P = {Pθ : θ ∈ Ω} with Ω some parameter space. If there are values θ1 �= θ2 such that Pθ1 = Pθ2 , the family of distributions P is said to be unidentifiable for θ. ✷ If P is nonidentifiable for the parameter θ, one can not from knowledge of the particular Pθ ∈ P that is the true (unknown) distribution draw conclusion about which θ ∈ Ω that is the true (unknown) parameter. When one wants to draw inference from an observable x of X ∼ Pθ one can at best say which distribution is the true distribution, and the knowledge of this then does not lead to the true parameter θ. This situation is therefore undesirable. Example 11 (Multinomial distribution) Assume we make n independent trials with s + 1 different possible outcomes, O0 , . . . , Os and let pi = P (outcome of type Oi ), i = 0, 1, . . . , s. Let Xi be the number of outcomes of type i Xi = n � 1{outcome Oi in trial i}, i=1 for i = 1, . . . , s. Then, with θ = (p0 , . . . ps ) and x = (x0 , . . . , xs ), pθ (x) = Pθ (X0 = x0 , . . . , Xs = xs ) n! = px0 · . . . · pxs s x0 ! · . . . · xs ! 0 = exp(x0 log x0 + . . . + xs log xs )h(x), with h(x) = n!/(x0 ! · . . . · xs !). But, since x0 + . . . + xs = n, this can be written as pθ (x) = exp (x0 log x0 + x1 (log x1 − log x0 ) + x1 log x0 + . . . +xs (log xs − log x0 ) + xs log x0 ) h(x) = exp(n log p0 + x1 log(p1 /p0 ) + . . . + xs log(ps /p0 ))h(x), This is an s-dimensional exponential family, with natural parameters ηi = log(pi /p0 ), i = 1, . . . , s, A(η) = −n log(p0 ), and natural parameter space Rs . ✷ Clearly also linear dependence between the natural parameters η gives rise to nonidentifiability problems. Definition 4 If neither the Ti nor the ηi are linearly dependent the canonical representation is said to be minimal. ✷ 13