Download Appendix D Probability Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Maximum entropy thermodynamics wikipedia , lookup

Central limit theorem wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
Appendix D
Probability Distributions
This appendix archives a number of useful results from texts by Papoulis [44], Lee [33]
and Cover [12]. Table 16.1 in Cover (page 486) gives entropies of many distributions
not listed here.
D.1 Transforming PDFs
Because probabilities are dened as areas under PDFs when we transform a variable
y = f (x)
(D.1)
we transform the PDF by preserving the areas
p(y)jdyj = p(x)jdxj
(D.2)
where the absolute value is taken because the changes in x or y (dx and dy) may be
negative and areas must be positive. Hence
p(y) = p(x)
(D.3)
j dxdy j
where the derivative is evaluated at x = f (y). This means that the function f (x)
must be one-to-one and invertible.
1
If the function is many-to-one then it's inverse will have multiple solutions x ; x ; :::; xn
and the PDF is transformed at each of these points (Papoulis' Fundamental Theorem
[44], page 93)
(D.4)
p(y) = p(dyx ) + p(dyx ) + ::: + p(dyxn )
1
1
j dx j
1
2
j dx j
2
2
j dxn j
D.1.1 Mean and Variance
For more on the mean and variance of functions of random variables see Weisberg
[64].
161
Expectation is a linear operator. That is
E [(a x + a x)] = a E [x] + a E [x]
1
2
1
2
(D.5)
Therefore, given the function
y = ax
(D.6)
we can calculate the mean and variance of y as functions of the mean and variance
of x.
E [y] = aE [x]
V ar(y) = a V ar(x)
(D.7)
2
If y is a function of many uncorrelated variables
y=
we can use the results
E [y] =
V ar[y] =
X
i
aixi
X
i
X
i
But if the variables are correlated then
V ar[y] =
X
i
ai V ar[xi] + 2
2
(D.8)
ai E [xi ]
ai V ar[xi]
2
XX
i
j
aiaj V ar(xi; xj )
(D.9)
(D.10)
(D.11)
where V ar(xi; xj ) denotes the covariance of the random variables xi and xj .
Standard Error
As an example, the mean
X
m = N1 xi
of uncorrelated variables xi has a variance
m V ar(m) =
2
i
X1
i
x2
N V ar(xi)
= N
where we have used the substitution ai = 1=N in equation D.10. Hence
m = px
N
(D.12)
(D.13)
(D.14)
0.2
p(x)
0.15
0.1
0.05
0
−10
−5
0
5
10
15
x
Figure D.1: The Gaussian Probability Density Function with = 3 and = 2.
D.2 Uniform Distribution
The uniform PDF is given by
U (x; a; b) = b 1 a
(D.15)
for a x b and zero otherwise. The mean is 0:5(a + b) and variance is (b a) =12.
2
The entropy of a uniform distribution is
H (x) = log(b a)
(D.16)
D.3 Gaussian Distribution
The Normal or Gaussian probability density function, for the case of a single variable,
is
!
1
(
x
)
N (x; ; ) = (2 ) = exp
(D.17)
2
2
2
2 1 2
2
where and are the mean and variance.
2
D.3.1 Entropy
The entropy of a Gaussian variable is
H (x) = 12 log + 12 log 2 + 21
2
(D.18)
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
2
4
6
8
10
Figure D.2: The Gamma Density for b = 1:6 and c = 3:125.
For a given variance, the Gaussian distribution has the highest entropy. For a proof
of this see Bishop ([3], page 240).
D.3.2 Relative Entropy
For Normal densities q(x) = N (x; q ; q ) and p(x) = N (x; p; p ) the KL-divergence
is
+ + 2q p 1
D[qjjp] = 21 log p + q p 2 q
(D.19)
2
2
2
2
2
2
2
2
2
q
p
D.4 The Gamma distribution
The Gamma density is dened as
x
c
1
x
(x; b; c) =
exp
(c) bc
b
1
(D.20)
where () is the gamma function [49]. The mean of a Gamma density is given by bc
and the variance by b c. Logs of gamma densities can be written as
log (x; b; c) = x + (c 1) log x + K
(D.21)
b
where K is a quantity which does not depend on x; the log of a gamma density
comprises a term in x and a term in log x. The Gamma distribution is only dened
for positive variables.
2
D.4.1 Entropy
Using the result for Gamma densities
Z
p(x) log x = (c) + log b
(D.22)
where () is the digamma function [49] the entropy can be derived as
H (x) = log (c) + c log b (c 1)((c) + log b) + c
(D.23)
D.4.2 Relative Entropy
For Gamma densities q(x) = (x; bq ; cq ) and p(x) = (x; bp; cp) the KL-divergence is
D[qjjp] = (cq 1)(cq ) log bq cq log (cq )
+ log (cp) + cp log bp (cp 1)((cq ) + log bq ) + bbq cq
(D.24)
p
D.5 The 2-distribution
If z ; z ; :::; zN are independent normally distributed random variables with zero-mean
and unit variance then
1
2
x=
N
X
i=1
zi
(D.25)
2
has a -distribution with N degrees of freedom ([33], page 276). This distribution
is a special case of the Gamma distribution with b = 2 and c = N=2. This gives
2
1 xN= exp x (x; N ) = (N=
2) 2N=
2
2
2
1
2
(D.26)
The mean and variance are N and 2N . The entropy and relative entropy can be
found by substituting the the values b = 2 and c = N=2 into equations D.23 and
D.24. The distribution is only dened for positive variables.
2
If x is a variable with N degrees of freedom and
2
p
y= x
(D.27)
then y has a -density with N degrees of freedom. For N = 3 we have a Maxwell
density and for N = 2 a Rayleigh density ([44], page 96).
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
2
4
6
8
10
Figure D.3: The Density for N = 5 degrees of freedom.
2
D.6 The t-distribution
If z ; z ; :::; zN are independent Normally distributed random variables with mean and variance and m is the sample mean and s is the sample standard deviation
then
x = mp (D.28)
s= N
has a t-distribution with N 1 degrees of freedom. It is written
1
2
2
t(x; D) = B (D=12; 1=2) 1 + xD
2
! (D+1)=2
(D.29)
where D is the number of 'degrees of freedom' and
B (a; b) = ((aa)+(bb))
(D.30)
is the beta function. For D = 1 the t-distribution reduces to the standard Cauchy
distribution ([33], page 281).
D.7 Generalised Exponential Densities
The `exponential power' or `generalised exponential' probability density is dened as
=R
p(a) = G(a; R; ) = 2R(1=R) exp( jajR )
1
(D.31)
0.4
0.35
0.3
0.3
0.25
0.25
p(t)
p(t)
0.4
0.35
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
−10
0
−10
(a)
(b)
Figure D.4: The t-distribution with (a) N = 3 and (b) N = 49 degrees of freedom.
−5
0
t
5
10
0.1
−5
0
t
5
10
0.12
0.09
0.1
0.08
0.08
0.07
0.06
0.06
0.05
0.04
0.04
0.02
0.03
(b)
(a)
Figure D.5: The generalised exponential distribution with (a) R = 1,w = 5 and (b)
R = 6,w = 5. The parameter R xes the weight of the tails and w xes the width
of the distribution. For (a) we have a Laplacian which has positive kurtosis (k = 3);
heavy tails. For (b) we have a light-tailed distribution with negative kurtosis (k = 1).
0.02
−10
−5
0
5
0
−10
10
−5
0
5
10
where () is the gamma function [49], the mean of the distribution is zero , the
width of the distribution is determined by 1= and the weight of its tails is set by
R. This gives rise to a Gaussian distribution for R = 2, a Laplacian for R = 1 and a
uniform distribution in the limit R ! 1. The density is equivalently parameterised
by a variable w, which denes the width of the distribution, where w = =R giving
p(a) = 2w R(1=R) exp( ja=wjR)
(D.32)
1
1
The variance is
=R)
V = w (3
(1=R)
which for R = 2 gives V = 0:5w . The kurtosis is given by [7]
2
(D.33)
2
) (1=R) 3
K = (5=R
(D.34)
(3=R)
where we have subtracted 3 so that a Gaussian has zero kurtosis. Samples may be
generated from the density using a rejection method [59].
2
1
For non zero mean we simply replace a with a where is the mean.
D.8 PDFs for Time Series
Given a signal a = f (t) which is sampled uniformly over a time period T , its PDF,
p(a) can be calculated as follows. Because the signal is uniformly sampled we have
p(t) = 1=T . The function f (t) acts to transform this density from one over t to to
one over a. Hence, using the method for transforming PDFs, we get
p(a) = p(t)
(D.35)
j dadt j
where jj denotes the absolute value and the derivative is evaluated at t = f (x).
1
D.8.1 Sampling
When we convert an analogue signal into a digital one the sampling process can
have a crucial eect on the resulting density. If, for example, we attempt to sample
uniformly but the sampling frequency is a multiple of the signal frequency we are,
in eect, sampling non-uniformly. For true uniform sampling it is necessary that the
ratio of the sampling and signal frequencies be irrational.
D.8.2 Sine Wave
For a sine wave, a = sin(t), we get
p(a) = jcos1(t)j
(D.36)
where cos(t) is evaluated at t = sin (a). The inverse sine is only dened for =2 t =2 and p(t) is uniform within this. Hence, p(t) = 1=. Therefore
p(a) = p 1
(D.37)
1 a
This density is multimodal, having peaks at +1 and 1. For a more general sine wave
1
2
a = R sin(wt)
we get p(t) = w=
(D.38)
p(a) = q 1
1 (a=R)
2
which has peaks at R.
(D.39)
4
3.5
3
2.5
2
1.5
1
0.5
0
−3
−2
−1
0
1
2
3
Figure D.6: The PDF of a = R sin(wt) for R = 3.
Appendix E
Multivariate Probability
Distributions
E.1 Transforming PDFs
Just as univariate Probability Density Functions (PDFs) are transformed so as to preserve area so multivariate probability distributions are transformed so as to preserve
volume. If
y = f (x)
(E.1)
then this can be achieved from
p(y) = absp((xjJ) j)
(E.2)
where abs() denotes the absolute value and jj the determinant and
2
66
J = 66
4
3
@y1
:: @x
@y2d 7
7
:: @x
d 7
(E.3)
:: :: :: :: 75
@yd
@yd @yd
@x1 @x2 :: @xd
is the Jacobian matrix for d-dimensional vectors x and y. The partial derivatives
are evaluated at x = f (y). As the determinant of J measures the volume of the
transformation, using it as a normalising term therefore preserves the volume under
the PDF as desired. See Papoulis [44] for more details.
@y1
@x1
@y2
@x1
@y1
@x2
@y2
@x2
1
E.1.1 Mean and Covariance
For a vector of random variables (Gaussian or otherwise), x, with mean x and
covariance x a linear transformation
y = Fx + C
171
(E.4)
4
0.2
3
0.15
0.1
2
0.05
1
0
4
0
2
−1
0
2
1
−2
0
−1
0
1
2
3
4
−2
(a)
(b) −2−2 −1
Figure E.1: (a) 3D-plot and (b) contour plot of Multivariate Gaussian PDF with
= [1; 1]T and = = 1 and = = 0:6 ie. a positive correlation of
r = 0:6.
4
3
11
22
12
21
gives rise to a random vector y with mean
y = F x + C
and covariance
y = F xF T
(E.5)
(E.6)
If we generate another random vector, this time from a dierent linear transformation
of x
z = Gx + D
(E.7)
then the covariance between the random vectors y and z is given by
y;z = F xGT
(E.8)
The i,j th entry in this matrix is the covariance between yi and zj .
E.2 The Multivariate Gaussian
The multivariate normal PDF for d variables is
1
1
T
N (x; ; ) = (2)d= jj = exp 2 (x ) (x )
1
2
1 2
(E.9)
where the mean is a d-dimensional vector, is a d d covariance matrix, and jj
denotes the determinant of .
E.2.1 Entropy
The entropy is
H (x) = 21 log jj + d2 log 2 + d2
(E.10)
E.2.2 Relative Entropy
For Normal densities q(x) = N (x; q ; q ) and p(x) = N (x; p; p) the KL-divergence
is
d (E.11)
pj
T
D[qjjp] = 0:5 log jj
+
0
:
5
Trace
(
q ) + 0:5(q p) p (q p)
p
q j
2
1
1
where jpj denotes the determinant of the matrix p.
E.3 The Multinomial Distribution
If a random variable x can take one of one m discrete values x ; x ; ::xm and
1
2
p(x = xs) = s
(E.12)
then x is said to have a multinomial distribution.
E.4 The Dirichlet Distribution
If = [ ; ; :::m ] are the parameters of a multinomial distribution then
1
2
q() = (tot )
m s
Y
s
s=1
1
( s )
(E.13)
denes a Dirichlet distribution over these parameters where
tot =
X
s
s
(E.14)
The mean value of s is s=tot .
E.4.1 Relative Entropy
For Dirichlet densities q() = D(; q ) and p() = D(; p) where the number of
states is m and q = [q (1); q (2); ::; q (m)] and p = [p(1); p(2); ::; p(m)]. the
KL-divergence is
D[qjjp] =
(log qtot ) +
(log ptot) +
m
X
(q (s) 1)((q (s)) (qtot ) log (q (s))(E.15)
s=1
m
X
(p(s) 1)((q (s)) (qtot ) log (p(s))
s=1
where
qtot =
ptot =
and () is the digamma function.
m
X
s=1
m
X
s=1
q (s)
p(s)
(E.16)