Download Uncorrelatedness and Independence

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Randomness wikipedia , lookup

Transcript
Uncorrelatedness and Independence
Uncorrelatedness:Two r.v. x and y are uncorrelated if
Cxy = E[(x − mx)(y − my )T ] = 0
or equivalently
Rxy = E[xyT ] = E[x]E[yT ] = mxmTy
White random vector :This is defined to be a
r.v. with zero mean and unit covariance (correlation) matrix.
mx = 0, Rx = Cx = I
Example: What will be the mean and covariance of the white r.v. under the orthogonal
transform?
Let T denote an orthogonal matrix (i.e. TT T =
TTT = I).Such a matrix defines an orthogonal
transform (i.e. a rotation of an coordinate system - preserves distances in the space). Thus,
define y = Tx.
1
Hence,
my = . . . = 0
and
Cy = . . . = I
Therefore the orthogonal transformation preserves whiteness.
Example: Calculate the Rx for x = As + n,
where s is a random signal with correlation matrix Rs and the noise vector n has zero mean
and is uncorrelated with the signal.
Independence:Two random variables x, y are
statistically independent if
px,y (x, y) = px(x)py (y)
i.e. if the joint pdf of (x, y) factors out into
the product of their marginal probability distributions px and py .
2
From the definition of statistical independence
it follows that
E[g(x)h(y)] = E[g(x)]E[h(y)]
where g, h are any absolutely integrable functions. Similarly for random vectors the definition of statistical independence reads
px,y (x, y) = px(x)py (y)
and the property reads
E[g(x)h(y)] = E[g(x)]E[h(y)]
Properties:
• the statistical independence of two r.v.’s
implies their uncorrelatedness
• Independence is a stronger property than
uncorrelatedness.
• Only for gaussian variables uncorrelatedness and independence coincide.
3
Example: Consider the discrete random vector
from our example
X \ Y
0
1
2
0
1/18
1/9
1/6
6/18
1
1/9
1/18
1/6
6/18
2
1/6
1/9
1/18
6/18
6/18
5/18
7/18
Are X, Y independent? To check, Let’s construct a table which entries are products of the
corresponding marginal probabilities of X, Y.
X \ Y
0
1
2
0
6/18*6/18
6/18*5/18
6/18*7/18
1
6/18*6/18
6/18*5/18
6/18*7/18
2
6/18*6/18
6/18*5/18
6/18*7/18
Hence, X, Y are not independent.
uncorrelated?
Are they
4
E(XY ) =
XX
XY p(X, Y )
X X
= 0 ∗ 0 ∗ 1/18 + 0 ∗ 1 ∗ 1/9 + 0 ∗ 2 ∗ 1/6
+1 ∗ 0 ∗ 1/9 + 1 ∗ 1 ∗ 1/18 + 1 ∗ 2 ∗ 1/9
+2 ∗ 0 ∗ 1/6 + 2 ∗ 1 ∗ 1/6 + 2 ∗ 2 ∗ 1/18
= 15/18
However, E(X)E(Y ) = 19/18 hence X, Y are
correlated.
Central limit theorem (CLT)
Classical probability is concerned with random
variables and sequences of independent identically distributed (iid) r.v.’s. A very important
case - sequence of partial sums of iid r.v.’s
xk =
k
X
zi
i=1
5
Consider the normalised variables
x − mxk
yk = k
σxk
where mxk and σxk are the mean and variance
of Xk .
Central limit theorem asserts that the distribution of yk converges to a normal distribution
with k → ∞. Analogous formulation of the
CLT holds in the case of random vectors.
• CLT justifies use of the gaussian variables
for modelling random phenomena
• in practice sums of a relatively small number of r.v.’s will show gaussianity even if
individual components are not necessarily
identical.
6
Conditional probability
Conditional density :Consider random vectors
x, y with marginal pdf’s px(x) and py (y), respectively and a joint pdf px,y (x, y). Conditional density of x given y is defined as
px|y (x|y) =
px,y (x, y)
py (y)
Similarly, conditional density of y given x is
defined as
px,y (x, y)
py|x(y|x) =
px(x)
The conditional probability distributions allow
to address questions like, ‘what is the probability density of a r.v. x given that a random
vector y has a fixed value y0’. For statistically independent r.v.’s the conditional densities equal the respective marginal densities.
7
Example: Consider the bivariate discrete random vector
X \ Y
0
1
2
0
1/18
1/9
1/6
6/18
1
1/9
1/18
1/6
6/18
2
1/6
1/9
1/18
6/18
6/18
5/18
7/18
The conditional probability function of Y given
X = 1 is
Y |X=1
0
1
2
1
1/9
5/18
1/18
5/18
1/9
5/18
Bayes Rule:From definitions of the conditional
densities we can obtain the following alternative formulas for calculating the joint pdf
px,y (x, y) = py|x(y|x)px(x) = px|y (x|y)py (y)
From the above follows so called Bayes rule for
calculating the conditional density of y given
x:
8
py|x(y|x) =
px|y (x|y)py (y)
px(x)
where the denominator can be calculated by
integration
px(x) =
Z ∞
−∞
px|y (x|η)py (η)d η
Bayes rule allows to compute the posterior density py|x(y|x) given the observed vector x and
either knowing or assuming the prior distribution py (y).
Conditional expectations
E[g(x, y)|y] =
Z ∞
−∞
g(ξ, y)px|y (ξ|y)d ξ
The conditional expectation is a random variable - it depends on the r.v. y. The following
relationship holds
E[g(x, y)] = E[E[g(x, y)|y]]
9
The family of multivariate gaussian densities
px(x) =
1
(2π)n/2(det Cx)1/2
1
× exp − (x − mx)T C−1
x (x − mx )
2
where n is the dimension of x, mx is the mean
and Cx is the covariance matrix of x and is
assumed to be strictly positive definite.
Properties:
• mx and Cx define uniquely the Gaussian
pdf.
• closed under linear transforms - if x is a
random gaussian vector then y = Ax is
also gaussian with my = Amx and Cy =
ACxAT
• marginal and conditionals are gaussian
10
uncorrelatedness and geometric structure: If
the covariance matrix Cx of the multidimensional gaussian density is not diagonal, then
the components of x are not independent. Cx
is symmetric and positive definite matrix, hence
it can be represented as
Cx = EDET
=
n
X
λiei eT
i
i=1
where E is an orthogonal matrix containing
eigenvectors of Cx as its columns and D =
diag(λ1, λ2, . . . , λn ) is a diagonal matrix containing the corresponding eigenvalues of Cx.
Transform
u = ET (x − mx)
rotates the data so that the components of u
are uncorrelated and hence independent.
11
The cross-section of gaussian pdf with constant value of the density is a hyper-ellipsoid
−1(x − m ) = c
(x − mx)T Cx
x
centered at the mean, with axis parallel to the
eigenvectors of Cx and the eigenvalues being
the corresponding variances.
Higher-order Statistics
Consider a scalar r.v. x with a probability density function px(x). The j th moment of x is
αj = E[xj ] =
Z ∞
−∞
ξ j px(ξ)d ξ
and the j th central moment of x
µj = E[(x − α1)j ] =
Z ∞
−∞
(ξ − mx)j px(ξ)d ξ
12
Skewness and Kurtosis: The third central moment called skewness provides a measure of
asymmetricity of the pdf. The fourth order
statistics called kurtosis indicates nongaussianity of r.v. It is defined for zero-mean r.v. as
kurt(x) = E[x4] − 3(E[x2])2
Distribution with negative kurtosis are called
subgaussian (usually flatter than Gaussian or
multimodal). Distribution with positive kurtosis are called supergaussian (usually sharper
peaked than Gaussian with longer tails).
Properties of kurtosis:
• for 2 statistically independent r.v. x, y,
kurt(x + y) = kurt(x) + kurt(y)
• for any scalar a: kurt(ax) = a4kurt(x)
13
Example: Laplacian density has a pdf
λ
px(x) = exp(−λ |x|)
2
Example: Exponential family of pdf’s (with
zero mean) contains Gaussian, Laplacian and
uniform pdf’s as special cases:
ν
|x|
px(x) = C exp −
νE[|x|ν ]
!
i.e. for ν = 2 the above pdf is equivalent to
the Gaussian pdf
px(x) = C exp −
|x|
2
2
2E[|x| ]
!
= C exp −
x2
2σx2
!
ν = 1 gives Laplacian pdf and ν → ∞ yields
uniform pdf.
14