Download Multivariate Analysis (Slides 2)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Covariance and contravariance of vectors wikipedia , lookup

Determinant wikipedia , lookup

Matrix (mathematics) wikipedia , lookup

Gaussian elimination wikipedia , lookup

System of linear equations wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Four-vector wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Jordan normal form wikipedia , lookup

Matrix multiplication wikipedia , lookup

Matrix calculus wikipedia , lookup

Ordinary least squares wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Principal component analysis wikipedia , lookup

Transcript
Multivariate Analysis (Slides 2)
• In today’s class we will look at some important preliminary material that
we need to cover before we look at many multivariate analysis methods.
• This material will include topics that you are likely to have seen in courses
in probability and linear algebra.
1
Notation: observational unit
• A vector of observed values of each of m variables for observational unit i
can be denoted as a column vector:


x
 i1 


 xi2 

xi = 
 .. 
 . 


xim
i.e., xij represents the value of variable j for observational unit i.
• Observations will generally be denoted by lower case letters.
• For univariate data m = 1.
2
Notation: multivariate data
• Multivariate data sets are generally denoted by a data matrix X with n
rows and m columns (n = total number of observational units, m = number
of variables).

X
=
x
 11

 x21

 ..
 .

xn1
x12
...
x1m
x22
..
.
...
..
.
xn2
. . . xnm
• The ith row of X is therefore xTi .
3
x2m
..
.








Random variables
• A random variable (r.v.) is a mapping which assigns a real number to each
outcome of a variable.
• Say our variable of interest is ‘gender’. The outcomes of this variable are
therefore ‘male’ and ‘female’. The random variable X assigns a number to
these outcomes, i.e.,

 1
if female
X=
 0
if male.
• A random variables whose value is unknown will generally be denoted by
upper case letters.
4
Expectation of a Random Variable
• If X is a discrete random variable with probability mass function P (X),
then the expected value of X (also known as its mean) is defined as
P
µ = E[X] = x xP (X = x).
• If X is a continuous random variable with probability density function f (x)
R∞
defined on the space R, then µ = E[X] = −∞ xf (x)dx.
• The expectation of a function u(X) is given by:
P
a) µ = E[u(X)] = x u(x)P (X = x) for a discrete r.v.
R∞
b) µ = E[u(X)] = −∞ u(x)f (x)dx for a continuous r.v.
5
Variance, Covariance and Correlation
• Let’s consider random variables X1 , X2 , . . . , Xm .
• The variance of Xi is defined to be
Var[Xi ] = E[(Xi − E[Xi ])2 ] = E[Xi2 ] − (E[Xi ])2 .
• The covariance of Xi and Xj is defined to be
Cov[Xi , Xj ] = E[(Xi − E[Xi ])(Xj − E[Xj ])].
• The correlation of Xi and Xj is defined to be
Cov[Xi , Xj ]
.
Cor[Xi , Xj ] = p
Var[Xi ]Var[Xj ]
• Correlation is hence a ‘normalized’ form of covariance, with equality
between the two existing if the random variables have unit variance.
6
Covariance Matrix
• Frequently, it is convenient to record the variance and covariance of a set of
random variables X = (X1 , X2 , . . . , Xm ) using a matrix


s
s12 · · · s1m

 11


 s21 s22 · · · s2m 

Σ=
..
..  ,
 ..
..

 .
.
.
.


sm1
sm2
· · · smm
where sij = Cov[Xi , Xj ] and sii = Var[Xi ].
• We call this matrix a covariance matrix.
• Σ = Cov[X] = Var[X] (usually referred to by former equality).
7
Independence
• Two random variables X1 and X2 are said to be independent if and only
if:
P (X1 = x1 , X2 = x2 ) = P (X1 = x1 )P (X2 = x2 )
• For two independent random variables X1 and X2 :
E[X1 X2 ] = E[X1 ]E[X2 ]
• Proof: Exercise (refer to the definition).
8
Linear Combinations
• Suppose that a and b are constants and the random variable X has
expected value µ and variance σ 2 , then
a) E[aX + b] = aE[X] + b = aµ + b
b) Var[aX + b] = a2 Var[X] = a2 σ 2 .
• Let X1 and X2 denote two independent random variables with respective
means µ1 and µ2 and variances σ12 and σ22 . If a1 and a2 are constants, then
c) E[a1 X1 + a2 X2 ] = a1 E[X1 ] + a2 E[X2 ] = a1 µ1 + a2 µ2 .
d) Var[a1 X1 + a2 X2 ] = a21 Var[X1 ] + a22 Var[X2 ] = a21 σ12 + a22 σ22 .
• If X1 and X2 are not independent then c) above still holds but d) is
replaced by
e) Var[a1 X1 + a2 X2 ] = a21 Var[X1 ] + a22 Var[X2 ] + 2a1 a2 Cov[X1 , X2 ].
• Proofs: Exercise (return to the definitions).
9
Linear Combinations for Covariance
Suppose that a, b, c and d are constants and X, Y , W and Z are random
variables with non-zero variance, then
a) Cov[aX + b, cY + d] = acCov[X, Y ]
b) Cov[aX + bY, cW + dZ] =
acCov[X, W ] + adCov[X, Z] + bcCov[Y, W ] + bdCov[Y, Z].
• Proofs: Exercise (as with the rest, manipulation of algebra from the
definitions).
10
Matrix representation: Expected Value
• Similar ideas follow in the more general a1 X1 + · · · + am Xm case.
• Remember E[a1 X1 + a2 X2 + · · · + am Xm ] = a1 µ1 + a2 µ2 + · · · am µm .
• Let a = (a1 , a2 , . . . , am )T be a vector of constants and
X = (X1 , X2 , . . . , Xm )T be a vector of random variables.
• Then we can write

a1 X 1 + a2 X 2 + . . . am X m
X
 1

 X2
= (a1 , a2 , · · · , am ) 
 ..
 .

Xm




 = aT X



• Hence we can write E[aT X] = aT µ, where µ = (µ1 , µ2 , . . . , µm )T .
11
Matrix representation: Variance
• Similarly Var[a1 X1 + a2 X2 + · · · + am Xm ] = Var[aT X], and
Var[aT X]
=
a21 Var[X1 ] + a22 Var[X2 ] + · · · + a2m Var[Xm ]
+a1 a2 Cov[X1 , X2 ] + · · · + a1 am Cov[X1 , Xm ]
+ · · · + am−1 am Cov[Xm−1 , Xm ]
=
m
X
a2i Var[Xi ] +
=
a2i sii +
i=1
=
ai aj Cov[Xi , Xj ]
i=1 j6=i
i=1
m
X
m
m X
X
m
m X
X
i=1 j6=i
aT Σa
12
ai aj sij
Matrix representation: Covariance
• Suppose that U = aT X and V = bT X.
Cov[U, V ] =
m
X
ai bi sii +
m
m X
X
ai bj sij .
i=1 j6=i
i=1
• In matrix notation,
Cov[U, V ] = aT Σb = bT Σa.
• Proof: Exercise
13
Example
• Let E[X1 ] = 2, Var[X1 ] = 4, E[X2 ] = 0, Var[X2 ] = 1 and Cor[X1 , X2 ] = 1/3.
• Exercise:
– What is the expected value and variance of X1 + X2 ?
– What is the expected value and variance of X1 − X2 ?
14
Eigenvalues and Eigenvectors
• Suppose that we have a m × m matrix A.
• Definition: λ is an eigenvalue of A if there exists a non-zero vector v such
that
Av = λv.
• The vector v is said to be an eigenvector of A corresponding to the
eigenvalue λ.
• We can find eigenvalues by solving the equation,
det(A − λI) = 0.
15
Finding Eigenvalues
• If v is an eigenvector of A with eigenvalue λ, then Av − λIv = 0.
• Hence (A − λI)v = 0.
• If there exists an inverse (A − λI)−1 then the trivial solution v = 0 is
obtained.
• When there does not exist a trivial solution there is no inverse and hence
det(A − λI) = 0.
16
Picture
• The eigenvectors are only scaled by the matrix A.
6
Ax
6
x
5
4
3
1
2
y[2, ]
3
2
1
0
0
a
b
−1
b
−1
x[2, ]
4
5
a
−1
0
1
2
3
x[1, ]
−1
0
1
y[1, ]
17
2
3
Picture II
• Other vectors are rotated and scaled by the matrix A.
a
1
−1
0
a
0
3
4
5
b
2
y[2, ]
3
2
1
b
−1
x[2, ]
4
5
6
Ax
6
x
−1
0
1
2
3
x[1, ]
−1
0
1
y[1, ]
18
2
3
Unit Eigenvectors
• Property: If v is an eigenvector corresponding to eigenvalue λ, then
u = αv will also be an eigenvector corresponding to λ.
Pm 2
• Definition: The eigenvector v is a unit eigenvector if i=1 vi = vT v = 1.
• We can turn any eigenvector v into a unit eigenvector by multiplying it by
the value α = √v1T v .
19
Orthogonal and Orthonormal Vectors
• Definition: Two vectors u and v are orthogonal if uT v = 0 = vT u.
• Definition: Two vectors u and v are orthonormal if they are orthogonal
and uT u = 1 and vT v = 1.
20
Eigenvalues of Covariance Matrices
• Fact: The eigenvalues of a covariance matrix Σ are non-negative.
• If λ is an eigenvalue of Σ, then
Σv = λv,
where v is an eigenvector corresponding to λ.
• Hence,
vT Σv = vT λv = λvT v
vT Σv
⇒λ= T .
v v
• The numerator and denominator are both non-negative (why?), so λ must
be non-negative.
21
Eigenvectors of Covariance Matrices
• Fact: An m × m covariance matrix Σ has m orthonormal eigenvectors.
• Proof Omitted, but uses Spectral Decomposition (or similar theorem) of a
symmetric matrix.
• This result will be used when developing principal components analysis.
22
Example
• Suppose that we have the matrix

A=
1 2
2 5

.
• Question: What are the eigenvalues and corresponding eigenvectors of A?
23
Example
• Answer: The eigenvalues are 5.83 and 0.17 (2dp) and the corresponding
eigenvectors are




0.38
0.92

 and 
.
0.92
−0.38
24
Example: Frog Cranial Measurements
• Example: The cranial length and cranial breath of 35 female frogs are
believed to have expected value (23, 24)T and covariance matrix


17.7 20.3


20.3 24.4
• Question: What are the eigenvalues and eigenvectors of the covariance
matrix?
25