Download Example 5 Let U = (U 1,...,Un) be a random vector with a distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Transcript
Example 5 Let U = (U1 , . . . , Un ) be a random vector with a distribution function F
on Rn . Then for a ∈ R, b ∈ R+ ,
U + a = (U1 + a. . . . , Un + a),
bU = (bU1 , . . . , bUn ),
a + bU = (a + bU1 , . . . , a + bUn ),
generate location, scale and location/scale families, respectively. (This follows by
checking that the corresponding distributions for U + a, bU and a + bU satisfy the
definitions for location, scale and location/scale families).
✷
Example 6 Let U = (U1 , . . . , Up ) be a random vector, with Ui independent N (0, 1).
Let a� = (a1 , . . . , ap ) ∈ Rp and B an invertible p × p matrix be arbitrary, and define






X1
a1
U1






..
..
=
+
B




 ..  ,
Xp
ap
Up
or, using matrix formulation, as
X = a + BU.
Then X is a Gaussian vector and
E(X) = a.
Cov(X) = E((X − a)(X − a)� )
= E(BU U � B � )
= BB �
=: Σ
Thus the distribution of X lies in is the family of non singular p-variate Normal
distributions: If
1
�
e−u u/2 ,
p/2
(2π)
fU (u) =
then
fX (x) =
1
�
(2π)p/2 |B|
e−(x−a) Σ
−1 (x−a)
by the formula for variable transformation in integrands.
10
,
✷
Example 7 (Linear model in Normal distribution) Assume U = (U1 , . . . , Un ) is a
stochastic vector and let
Xi = ai + bUi ,
where b > 0 and a = (a1 , . . . , an ) lying in an s-dimensional subspace Ω ⊂ Rn (meaning
that every vector a can be written as
ai =
s
�
dij βj ,
j=1
with (β1 , . . . , βs ) ∈ Rs and D = (di j) an n × s matrix of rank s). Then if (U1 , . . . , Un )
are i.i.d. N (0, 1)
fX (x) =
�
1
− (xi −ai )2 /2b2
√
e
.
( 2πb)n
✷
Example 8 (Nonparametric family with support on R) Assume U1 , . . . , Un are i.i.d.
r.v.’s with distribution F = N (0, 1). Let
G = {g : g : R → R continuous and strictly increasing, lim g(u) = ±∞}.
u→±
Then under the binary operation · equal to composition,
g1 · g2 (x) = g1 (g2 (x))
G is a group (thus (i) · is a map G × G → G, (ii) there is a unit e in G and (iii)
for every g there is an inverse g −1 in G such that e = g · g −1 ). The group family of
distributions
P = {F (g −1 ) : g ∈ G},
is the the set of all continuous distributions with support on R (the corresponding
r.v.’s are given by Xi = g(Ui )).
✷
Example 9 (Symmetric distributions) Let F = N (0, 1), and define
G = {g : g : R → R continuous and strictly increasing,
g(−u) = −g(u), lim g(u) = ±∞}.
u→±
Then G is group under composition and
P = {F (g −1 ) : g ∈ G}
is the class of distributions with support on R that are symmetric around the origin. ✷
11
3.3
Exponential families
Let P = {Pθ : θ ∈ Ω} be a family of distributions. Assume that Pθ has a density
pθ = dPθ /dµ, with respect to the (σ-finite) measure µ, that is of the form
s
�
pθ (x) = exp(
i=1
ηi (θ)Ti (x) − B(θ))h(x).
Then the set P is called an s−dimensional exponential family. Here {ηi }, B are
real-valued functions defined on the parameter space Ω, and {Ti } are statistics.
An alternative form is the so called canonical form
s
�
pη (x) = exp(
i=1
ηi Ti (x) − A(η))h(x),
with η = (η1 . . . . , ηs ) the so called canonical parameters.
Note that in our case µ is either Lebesgue measure, in which case pθ (and pη ) is
the density function for a continuous r.v., or µ is counting measure, in which case pθ
is the probability mass function of a discrete r.v..The function h is used to prevent
the use of more elaborate measures µ than Lebesgue measure and counting measure.
We note first that since the pη are supposed to be densities corresponding to a
probability measure they should integrate to one, i.e.
�
pη (x) dµ(x) =
�
exp
which is possible if and only if
�
exp
��
��
�
ηi Ti (x) h(x) dµ(x)e−A(η) = 1,
�
ηi Ti (x) h(x) dµ(x) < ∞.
If this holds, the normalizing function A(η) can be chosen as
A(η) = log
��
exp
��
�
�
ηi Ti (x) h(x) dµ(x) .
Definition 2 The set Ω = {η = (η1 , . . . , ηs )} for which
�
exp
��
�
ηi Ti (x) h(x) dµ(x) < ∞.
holds is called the natural parameter space.
✷
Example 10 Assume X is distributed as N (ξ, σ 2 ), so that θ = (ξ, σ 2 ). Then the
density (w.r.t. Lebesgue measure on R) is
pθ (x) = √
1
ξ
1
ξ2
exp( 2 x − 2 x2 − 2 ).
σ
2σ
2σ
2πσ
−1
Therefore, the natural parameters are ( σξ2 , 2σ
2 ), and the natural parameter space is
R × (−∞, 0].
✷
12
If the statistics T1 , . . . , Ts are linearly dependent, the number of terms in the
expression can be reduced (until they no longer are linearly dependent). If this
reduction is not performed, the parameters (or equivalently the probability measures
Pθ ) will not be identifiable.
Definition 3 Let P = {Pθ : θ ∈ Ω} with Ω some parameter space. If there are values
θ1 �= θ2 such that Pθ1 = Pθ2 , the family of distributions P is said to be unidentifiable
for θ.
✷
If P is nonidentifiable for the parameter θ, one can not from knowledge of the particular Pθ ∈ P that is the true (unknown) distribution draw conclusion about which
θ ∈ Ω that is the true (unknown) parameter. When one wants to draw inference
from an observable x of X ∼ Pθ one can at best say which distribution is the true
distribution, and the knowledge of this then does not lead to the true parameter θ.
This situation is therefore undesirable.
Example 11 (Multinomial distribution) Assume we make n independent trials with
s + 1 different possible outcomes, O0 , . . . , Os and let
pi = P (outcome of type Oi ), i = 0, 1, . . . , s.
Let Xi be the number of outcomes of type i
Xi =
n
�
1{outcome Oi in trial i},
i=1
for i = 1, . . . , s. Then, with θ = (p0 , . . . ps ) and x = (x0 , . . . , xs ),
pθ (x) = Pθ (X0 = x0 , . . . , Xs = xs )
n!
=
px0 · . . . · pxs s
x0 ! · . . . · xs ! 0
= exp(x0 log x0 + . . . + xs log xs )h(x),
with h(x) = n!/(x0 ! · . . . · xs !). But, since x0 + . . . + xs = n, this can be written as
pθ (x) = exp (x0 log x0 + x1 (log x1 − log x0 ) + x1 log x0 +
. . . +xs (log xs − log x0 ) + xs log x0 ) h(x)
= exp(n log p0 + x1 log(p1 /p0 ) + . . . + xs log(ps /p0 ))h(x),
This is an s-dimensional exponential family, with natural parameters ηi = log(pi /p0 ),
i = 1, . . . , s, A(η) = −n log(p0 ), and natural parameter space Rs .
✷
Clearly also linear dependence between the natural parameters η gives rise to nonidentifiability problems.
Definition 4 If neither the Ti nor the ηi are linearly dependent the canonical representation is said to be minimal.
✷
13