* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Multivariate Analysis (Slides 2)
Covariance and contravariance of vectors wikipedia , lookup
Determinant wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Gaussian elimination wikipedia , lookup
System of linear equations wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Four-vector wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Jordan normal form wikipedia , lookup
Matrix multiplication wikipedia , lookup
Matrix calculus wikipedia , lookup
Ordinary least squares wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Multivariate Analysis (Slides 2) • In today’s class we will look at some important preliminary material that we need to cover before we look at many multivariate analysis methods. • This material will include topics that you are likely to have seen in courses in probability and linear algebra. 1 Notation: observational unit • A vector of observed values of each of m variables for observational unit i can be denoted as a column vector: x i1 xi2 xi = .. . xim i.e., xij represents the value of variable j for observational unit i. • Observations will generally be denoted by lower case letters. • For univariate data m = 1. 2 Notation: multivariate data • Multivariate data sets are generally denoted by a data matrix X with n rows and m columns (n = total number of observational units, m = number of variables). X = x 11 x21 .. . xn1 x12 ... x1m x22 .. . ... .. . xn2 . . . xnm • The ith row of X is therefore xTi . 3 x2m .. . Random variables • A random variable (r.v.) is a mapping which assigns a real number to each outcome of a variable. • Say our variable of interest is ‘gender’. The outcomes of this variable are therefore ‘male’ and ‘female’. The random variable X assigns a number to these outcomes, i.e., 1 if female X= 0 if male. • A random variables whose value is unknown will generally be denoted by upper case letters. 4 Expectation of a Random Variable • If X is a discrete random variable with probability mass function P (X), then the expected value of X (also known as its mean) is defined as P µ = E[X] = x xP (X = x). • If X is a continuous random variable with probability density function f (x) R∞ defined on the space R, then µ = E[X] = −∞ xf (x)dx. • The expectation of a function u(X) is given by: P a) µ = E[u(X)] = x u(x)P (X = x) for a discrete r.v. R∞ b) µ = E[u(X)] = −∞ u(x)f (x)dx for a continuous r.v. 5 Variance, Covariance and Correlation • Let’s consider random variables X1 , X2 , . . . , Xm . • The variance of Xi is defined to be Var[Xi ] = E[(Xi − E[Xi ])2 ] = E[Xi2 ] − (E[Xi ])2 . • The covariance of Xi and Xj is defined to be Cov[Xi , Xj ] = E[(Xi − E[Xi ])(Xj − E[Xj ])]. • The correlation of Xi and Xj is defined to be Cov[Xi , Xj ] . Cor[Xi , Xj ] = p Var[Xi ]Var[Xj ] • Correlation is hence a ‘normalized’ form of covariance, with equality between the two existing if the random variables have unit variance. 6 Covariance Matrix • Frequently, it is convenient to record the variance and covariance of a set of random variables X = (X1 , X2 , . . . , Xm ) using a matrix s s12 · · · s1m 11 s21 s22 · · · s2m Σ= .. .. , .. .. . . . . sm1 sm2 · · · smm where sij = Cov[Xi , Xj ] and sii = Var[Xi ]. • We call this matrix a covariance matrix. • Σ = Cov[X] = Var[X] (usually referred to by former equality). 7 Independence • Two random variables X1 and X2 are said to be independent if and only if: P (X1 = x1 , X2 = x2 ) = P (X1 = x1 )P (X2 = x2 ) • For two independent random variables X1 and X2 : E[X1 X2 ] = E[X1 ]E[X2 ] • Proof: Exercise (refer to the definition). 8 Linear Combinations • Suppose that a and b are constants and the random variable X has expected value µ and variance σ 2 , then a) E[aX + b] = aE[X] + b = aµ + b b) Var[aX + b] = a2 Var[X] = a2 σ 2 . • Let X1 and X2 denote two independent random variables with respective means µ1 and µ2 and variances σ12 and σ22 . If a1 and a2 are constants, then c) E[a1 X1 + a2 X2 ] = a1 E[X1 ] + a2 E[X2 ] = a1 µ1 + a2 µ2 . d) Var[a1 X1 + a2 X2 ] = a21 Var[X1 ] + a22 Var[X2 ] = a21 σ12 + a22 σ22 . • If X1 and X2 are not independent then c) above still holds but d) is replaced by e) Var[a1 X1 + a2 X2 ] = a21 Var[X1 ] + a22 Var[X2 ] + 2a1 a2 Cov[X1 , X2 ]. • Proofs: Exercise (return to the definitions). 9 Linear Combinations for Covariance Suppose that a, b, c and d are constants and X, Y , W and Z are random variables with non-zero variance, then a) Cov[aX + b, cY + d] = acCov[X, Y ] b) Cov[aX + bY, cW + dZ] = acCov[X, W ] + adCov[X, Z] + bcCov[Y, W ] + bdCov[Y, Z]. • Proofs: Exercise (as with the rest, manipulation of algebra from the definitions). 10 Matrix representation: Expected Value • Similar ideas follow in the more general a1 X1 + · · · + am Xm case. • Remember E[a1 X1 + a2 X2 + · · · + am Xm ] = a1 µ1 + a2 µ2 + · · · am µm . • Let a = (a1 , a2 , . . . , am )T be a vector of constants and X = (X1 , X2 , . . . , Xm )T be a vector of random variables. • Then we can write a1 X 1 + a2 X 2 + . . . am X m X 1 X2 = (a1 , a2 , · · · , am ) .. . Xm = aT X • Hence we can write E[aT X] = aT µ, where µ = (µ1 , µ2 , . . . , µm )T . 11 Matrix representation: Variance • Similarly Var[a1 X1 + a2 X2 + · · · + am Xm ] = Var[aT X], and Var[aT X] = a21 Var[X1 ] + a22 Var[X2 ] + · · · + a2m Var[Xm ] +a1 a2 Cov[X1 , X2 ] + · · · + a1 am Cov[X1 , Xm ] + · · · + am−1 am Cov[Xm−1 , Xm ] = m X a2i Var[Xi ] + = a2i sii + i=1 = ai aj Cov[Xi , Xj ] i=1 j6=i i=1 m X m m X X m m X X i=1 j6=i aT Σa 12 ai aj sij Matrix representation: Covariance • Suppose that U = aT X and V = bT X. Cov[U, V ] = m X ai bi sii + m m X X ai bj sij . i=1 j6=i i=1 • In matrix notation, Cov[U, V ] = aT Σb = bT Σa. • Proof: Exercise 13 Example • Let E[X1 ] = 2, Var[X1 ] = 4, E[X2 ] = 0, Var[X2 ] = 1 and Cor[X1 , X2 ] = 1/3. • Exercise: – What is the expected value and variance of X1 + X2 ? – What is the expected value and variance of X1 − X2 ? 14 Eigenvalues and Eigenvectors • Suppose that we have a m × m matrix A. • Definition: λ is an eigenvalue of A if there exists a non-zero vector v such that Av = λv. • The vector v is said to be an eigenvector of A corresponding to the eigenvalue λ. • We can find eigenvalues by solving the equation, det(A − λI) = 0. 15 Finding Eigenvalues • If v is an eigenvector of A with eigenvalue λ, then Av − λIv = 0. • Hence (A − λI)v = 0. • If there exists an inverse (A − λI)−1 then the trivial solution v = 0 is obtained. • When there does not exist a trivial solution there is no inverse and hence det(A − λI) = 0. 16 Picture • The eigenvectors are only scaled by the matrix A. 6 Ax 6 x 5 4 3 1 2 y[2, ] 3 2 1 0 0 a b −1 b −1 x[2, ] 4 5 a −1 0 1 2 3 x[1, ] −1 0 1 y[1, ] 17 2 3 Picture II • Other vectors are rotated and scaled by the matrix A. a 1 −1 0 a 0 3 4 5 b 2 y[2, ] 3 2 1 b −1 x[2, ] 4 5 6 Ax 6 x −1 0 1 2 3 x[1, ] −1 0 1 y[1, ] 18 2 3 Unit Eigenvectors • Property: If v is an eigenvector corresponding to eigenvalue λ, then u = αv will also be an eigenvector corresponding to λ. Pm 2 • Definition: The eigenvector v is a unit eigenvector if i=1 vi = vT v = 1. • We can turn any eigenvector v into a unit eigenvector by multiplying it by the value α = √v1T v . 19 Orthogonal and Orthonormal Vectors • Definition: Two vectors u and v are orthogonal if uT v = 0 = vT u. • Definition: Two vectors u and v are orthonormal if they are orthogonal and uT u = 1 and vT v = 1. 20 Eigenvalues of Covariance Matrices • Fact: The eigenvalues of a covariance matrix Σ are non-negative. • If λ is an eigenvalue of Σ, then Σv = λv, where v is an eigenvector corresponding to λ. • Hence, vT Σv = vT λv = λvT v vT Σv ⇒λ= T . v v • The numerator and denominator are both non-negative (why?), so λ must be non-negative. 21 Eigenvectors of Covariance Matrices • Fact: An m × m covariance matrix Σ has m orthonormal eigenvectors. • Proof Omitted, but uses Spectral Decomposition (or similar theorem) of a symmetric matrix. • This result will be used when developing principal components analysis. 22 Example • Suppose that we have the matrix A= 1 2 2 5 . • Question: What are the eigenvalues and corresponding eigenvectors of A? 23 Example • Answer: The eigenvalues are 5.83 and 0.17 (2dp) and the corresponding eigenvectors are 0.38 0.92 and . 0.92 −0.38 24 Example: Frog Cranial Measurements • Example: The cranial length and cranial breath of 35 female frogs are believed to have expected value (23, 24)T and covariance matrix 17.7 20.3 20.3 24.4 • Question: What are the eigenvalues and eigenvectors of the covariance matrix? 25