Download Review of Gaussian random variables If x is a Gaussian random

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Gibbs sampling wikipedia , lookup

Law of large numbers wikipedia , lookup

Central limit theorem wikipedia , lookup

Karhunen–Loève theorem wikipedia , lookup

Transcript
Review of Gaussian random variables
If x is a Gaussian random variable (with zero mean), then its probability
distribution function is given by
P (x) = √
Note that
Z
1
2
2
e−x /2σ .
2πσ
(1)
∞
P (x) dx = 1,
(2)
−∞
(If x were Gaussian with non-zero mean, you could remove the mean by redefining x ← x − hxi, and then put it back again later.) The moments are
easy to compute, and they all depend only on the standard deviation σ. All
the odd moments vanish,
hx2m+1 i = 0
(3)
and the even moments are easily computed by (for example) taking the
derivatives of (1) with respect to 1/2σ 2 . One finds that hx2 i = σ 2 , hx4 i = 3σ 4 ,
etc.
Why be Gaussian? First of all, many things are. For example, the single point statistics of velocity in turbulent flow are found to approximately
Gaussian. This is often considered to be a manifestation of the Central Limit
Theorem, which we prove below. Second, Gaussian distributions are easy to
handle mathematically. Many statistical theories of turbulence can viewed
as expansions about Gaussianity.
For any (not necessarily Gaussian) probability distribution P (x), the
characteristic function is defined as
Z ∞
φ(ω) =
P (x)eiωx dx
(4)
−∞
Thus φ(ω) is the Fourier transform of P (x). It follows that
Z ∞
0
φ (ω) =
ixP (x)eiωx dx,
φ0 (0) = ihxi
Z−∞
∞
00
φ (ω) =
(ix)2 P (x)eiωx dx,
φ00 (0) = −hx2 i
−∞
1
(5)
etc. Thus, knowledge of the probability distribution is equivalent to knowledge of all its moments. For the Gaussian distribution (1) we obtain
φ(ω) = e−σ
2 ω 2 /2
.
(6)
The characteristic function is often used to do proofs. As an example,
we will prove the central limit theorem. Let x1 , x2 , . . . , xn be n independent
but not necessarily Gaussian random variables all having zero mean and the
same distribution function P1 (x). Consider the random variable defined as
their average:
1
(7)
x = (x1 + x2 + · · · + xn )
n
The Central Limit Theorem says that, as n → ∞, x becomes Gaussian even
though the xi are not.
Let P (x) be the unknown distribution of x, and let φ(ω) be its characteristic function. Then
Z
φ(ω) = P (x)eiωx dx
ZZ
Z
iω
(x1 + x2 + · · · + xn ) dx1 dx2 · · · dxn
=
· · · P (x1 , x2 , . . . , xn ) exp
n
ZZ
Z
iω
(x1 + x2 + · · · + xn ) dx1 dx2 · · · dxn
=
· · · P1 (x1 )P1 (x2 ) · · · P1 (xn ) exp
n
Z
Z
Z
iωx1
iωx2
iωxn
= P1 (x1 ) exp(
)dx1 P1 (x2 ) exp(
)dx2 · · · P1 (xn ) exp(
)dxn
n
n
n
ω n
= φ1
(8)
n
where φ1 (ω) is the Fourier transform of P1 (x). Since
φ1 (ω) = 1 + 0 −
σ12 2
ω + ···
2
where σ12 is the variance of any of the xi , we have
ω σ2 ω2
φ1
= 1 − 1 2 + ···
n
2 n
(9)
(10)
Thus, by (8), as n → ∞,
φ (ω) →
σ2 ω2
1− 1 2
2 n
2
n
(11)
Since as n → ∞
s n
→ es
1+
n
(12)
we find that
2
φ (ω) → e−σ1 ω
2 /2n
(13)
√
and therefore (cf. (6)) x is Gaussian with standard deviation σ = σ1 / n.
QED
Now let x1 , x2 , . . . , xn be n random variables with probability distribution
function
!
1X
Aij xi xj
(14)
P (x1 , x2 , . . . , xn ) = C exp −
2 ij
The x1 , x2 , . . . , xn are said to be jointly Gaussian. With no loss in generality
we can assume that Aij is symmetric. Then the matrix A has real eigenvalues
and orthogonal eigenvectors (which can be made orthonormal). That is
Ae(i) = λ(i) e(i)
where
e(i)
T
e(i) = δij
(15)
(16)
and the T means transpose. It is possible to transform from the original
variables x1 , x2 , . . . , xn to new variables y1 , y2 , . . . , yn in which the quadratic
form
X
Q=
Aij xi xj
(17)
ij
is diagonal. The transformation is given by
X (j) X
xi =
yj ei =
Uij yj
j
(18)
j
Thus yj is the amplitude of the eigenvector e(j) in an expansion of the column
vector (x1 , x2 , · · · , xn )T in terms of the eigenvectors. We write this in matrix
notation as
x = Uy
(19)
with the understanding that x and y are n-dimensional column vectors. The
matrix U is defined by
U = (e(1) e(2) · · · e(n) )
(20)
3
Thus the column vectors of U are the eigenvectors of A. By the orthonormality of the column vectors, we have
UT U = I
(21)
That is, the transform of U is equal to its inverse. Matrices with this property
are called unitary matrices.
Now we compute
Q = xT Ax
= (U y)T A(U y)
= yT U T AU y
= yT Dy
X
=
λ(i) yi2
(22)
i
where
U T AU = D ≡ diag(λ(1) , λ(2) , . . . λ(n) )
(23)
That is, D is the diagonal matrix with the eigenvalues as its diagonal components.
The probability distribution of y1 , y2 , . . . , yn takes the form
!
X
1
λ(i) yi2
(24)
P (y1 , y2 , . . . , yn ) = C exp −
2 i
Since this factors into functions of each yi , we easily compute the normalization constant
(λ(1) λ(2) · · · λ(n) )1/2
C=
(25)
(2π)n/2
and the covariances
hyi yj i =
1
δij
λ(i)
(26)
In matrix notation
hyyT i = D−1
(27)
What are the moments of our original variables xi ? By the linearity of the
transformation from y to x, we have that
hxi i = hxi xj xk i = 0
4
(28)
In fact, all the odd moments vanish. For the second moments we have using
(27)
hxxT i = hU y(U y)T i
= hU yyT (U )T i
= U D−1 U T
(29)
But by (23) and (21), this is
hxxT i = U (U T AU )−1 U T = A−1
(30)
hxi xj i = A−1
ij
(31)
That is
Finally we consider hxi xj xk xm i. We have
X
hxi xj xk xm i =
hUir yr Ujs ys Ukp yp Umq yq i
rspq
=
X
Uir Ujs Ukp Umq hyr ys yp yq i
(32)
rspq
where the summation convention is in effect. But we know that the yi are
independent random variables. Thus
hyr ys yp yq i = hyr ys iδrs hyp yq iδpq + hyr yp iδrp hys yq iδsq + hyr yq iδrq hys yp iδsp (33)
(which is true as a special case when r = s = p = q). Substituting this back
into (32), we conclude that
hxi xj xk xm i = hxi xj ihxk xm i + hxi xk ihxj xm i + hxi xm ihxj xk i
(34)
A similar factorization rule applies to all the higher even moments, but we
shall find particular use for (34).
5