* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Multivariate Analysis (Slides 2)
Survey
Document related concepts
Covariance and contravariance of vectors wikipedia , lookup
Determinant wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Gaussian elimination wikipedia , lookup
System of linear equations wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Four-vector wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Jordan normal form wikipedia , lookup
Matrix multiplication wikipedia , lookup
Matrix calculus wikipedia , lookup
Ordinary least squares wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Transcript
Multivariate Analysis (Slides 2) • In today’s class we will look at some important preliminary material that we need to cover before we look at many multivariate analysis methods. • This material will include topics that you are likely to have seen in courses in probability and linear algebra. 1 Notation: observational unit • A vector of observed values of each of m variables for observational unit i can be denoted as a column vector: x i1 xi2 xi = .. . xim i.e., xij represents the value of variable j for observational unit i. • Observations will generally be denoted by lower case letters. • For univariate data m = 1. 2 Notation: multivariate data • Multivariate data sets are generally denoted by a data matrix X with n rows and m columns (n = total number of observational units, m = number of variables). X = x 11 x21 .. . xn1 x12 ... x1m x22 .. . ... .. . xn2 . . . xnm • The ith row of X is therefore xTi . 3 x2m .. . Random variables • A random variable (r.v.) is a mapping which assigns a real number to each outcome of a variable. • Say our variable of interest is ‘gender’. The outcomes of this variable are therefore ‘male’ and ‘female’. The random variable X assigns a number to these outcomes, i.e., 1 if female X= 0 if male. • A random variables whose value is unknown will generally be denoted by upper case letters. 4 Expectation of a Random Variable • If X is a discrete random variable with probability mass function P (X), then the expected value of X (also known as its mean) is defined as P µ = E[X] = x xP (X = x). • If X is a continuous random variable with probability density function f (x) R∞ defined on the space R, then µ = E[X] = −∞ xf (x)dx. • The expectation of a function u(X) is given by: P a) µ = E[u(X)] = x u(x)P (X = x) for a discrete r.v. R∞ b) µ = E[u(X)] = −∞ u(x)f (x)dx for a continuous r.v. 5 Variance, Covariance and Correlation • Let’s consider random variables X1 , X2 , . . . , Xm . • The variance of Xi is defined to be Var[Xi ] = E[(Xi − E[Xi ])2 ] = E[Xi2 ] − (E[Xi ])2 . • The covariance of Xi and Xj is defined to be Cov[Xi , Xj ] = E[(Xi − E[Xi ])(Xj − E[Xj ])]. • The correlation of Xi and Xj is defined to be Cov[Xi , Xj ] . Cor[Xi , Xj ] = p Var[Xi ]Var[Xj ] • Correlation is hence a ‘normalized’ form of covariance, with equality between the two existing if the random variables have unit variance. 6 Covariance Matrix • Frequently, it is convenient to record the variance and covariance of a set of random variables X = (X1 , X2 , . . . , Xm ) using a matrix s s12 · · · s1m 11 s21 s22 · · · s2m Σ= .. .. , .. .. . . . . sm1 sm2 · · · smm where sij = Cov[Xi , Xj ] and sii = Var[Xi ]. • We call this matrix a covariance matrix. • Σ = Cov[X] = Var[X] (usually referred to by former equality). 7 Independence • Two random variables X1 and X2 are said to be independent if and only if: P (X1 = x1 , X2 = x2 ) = P (X1 = x1 )P (X2 = x2 ) • For two independent random variables X1 and X2 : E[X1 X2 ] = E[X1 ]E[X2 ] • Proof: Exercise (refer to the definition). 8 Linear Combinations • Suppose that a and b are constants and the random variable X has expected value µ and variance σ 2 , then a) E[aX + b] = aE[X] + b = aµ + b b) Var[aX + b] = a2 Var[X] = a2 σ 2 . • Let X1 and X2 denote two independent random variables with respective means µ1 and µ2 and variances σ12 and σ22 . If a1 and a2 are constants, then c) E[a1 X1 + a2 X2 ] = a1 E[X1 ] + a2 E[X2 ] = a1 µ1 + a2 µ2 . d) Var[a1 X1 + a2 X2 ] = a21 Var[X1 ] + a22 Var[X2 ] = a21 σ12 + a22 σ22 . • If X1 and X2 are not independent then c) above still holds but d) is replaced by e) Var[a1 X1 + a2 X2 ] = a21 Var[X1 ] + a22 Var[X2 ] + 2a1 a2 Cov[X1 , X2 ]. • Proofs: Exercise (return to the definitions). 9 Linear Combinations for Covariance Suppose that a, b, c and d are constants and X, Y , W and Z are random variables with non-zero variance, then a) Cov[aX + b, cY + d] = acCov[X, Y ] b) Cov[aX + bY, cW + dZ] = acCov[X, W ] + adCov[X, Z] + bcCov[Y, W ] + bdCov[Y, Z]. • Proofs: Exercise (as with the rest, manipulation of algebra from the definitions). 10 Matrix representation: Expected Value • Similar ideas follow in the more general a1 X1 + · · · + am Xm case. • Remember E[a1 X1 + a2 X2 + · · · + am Xm ] = a1 µ1 + a2 µ2 + · · · am µm . • Let a = (a1 , a2 , . . . , am )T be a vector of constants and X = (X1 , X2 , . . . , Xm )T be a vector of random variables. • Then we can write a1 X 1 + a2 X 2 + . . . am X m X 1 X2 = (a1 , a2 , · · · , am ) .. . Xm = aT X • Hence we can write E[aT X] = aT µ, where µ = (µ1 , µ2 , . . . , µm )T . 11 Matrix representation: Variance • Similarly Var[a1 X1 + a2 X2 + · · · + am Xm ] = Var[aT X], and Var[aT X] = a21 Var[X1 ] + a22 Var[X2 ] + · · · + a2m Var[Xm ] +a1 a2 Cov[X1 , X2 ] + · · · + a1 am Cov[X1 , Xm ] + · · · + am−1 am Cov[Xm−1 , Xm ] = m X a2i Var[Xi ] + = a2i sii + i=1 = ai aj Cov[Xi , Xj ] i=1 j6=i i=1 m X m m X X m m X X i=1 j6=i aT Σa 12 ai aj sij Matrix representation: Covariance • Suppose that U = aT X and V = bT X. Cov[U, V ] = m X ai bi sii + m m X X ai bj sij . i=1 j6=i i=1 • In matrix notation, Cov[U, V ] = aT Σb = bT Σa. • Proof: Exercise 13 Example • Let E[X1 ] = 2, Var[X1 ] = 4, E[X2 ] = 0, Var[X2 ] = 1 and Cor[X1 , X2 ] = 1/3. • Exercise: – What is the expected value and variance of X1 + X2 ? – What is the expected value and variance of X1 − X2 ? 14 Eigenvalues and Eigenvectors • Suppose that we have a m × m matrix A. • Definition: λ is an eigenvalue of A if there exists a non-zero vector v such that Av = λv. • The vector v is said to be an eigenvector of A corresponding to the eigenvalue λ. • We can find eigenvalues by solving the equation, det(A − λI) = 0. 15 Finding Eigenvalues • If v is an eigenvector of A with eigenvalue λ, then Av − λIv = 0. • Hence (A − λI)v = 0. • If there exists an inverse (A − λI)−1 then the trivial solution v = 0 is obtained. • When there does not exist a trivial solution there is no inverse and hence det(A − λI) = 0. 16 Picture • The eigenvectors are only scaled by the matrix A. 6 Ax 6 x 5 4 3 1 2 y[2, ] 3 2 1 0 0 a b −1 b −1 x[2, ] 4 5 a −1 0 1 2 3 x[1, ] −1 0 1 y[1, ] 17 2 3 Picture II • Other vectors are rotated and scaled by the matrix A. a 1 −1 0 a 0 3 4 5 b 2 y[2, ] 3 2 1 b −1 x[2, ] 4 5 6 Ax 6 x −1 0 1 2 3 x[1, ] −1 0 1 y[1, ] 18 2 3 Unit Eigenvectors • Property: If v is an eigenvector corresponding to eigenvalue λ, then u = αv will also be an eigenvector corresponding to λ. Pm 2 • Definition: The eigenvector v is a unit eigenvector if i=1 vi = vT v = 1. • We can turn any eigenvector v into a unit eigenvector by multiplying it by the value α = √v1T v . 19 Orthogonal and Orthonormal Vectors • Definition: Two vectors u and v are orthogonal if uT v = 0 = vT u. • Definition: Two vectors u and v are orthonormal if they are orthogonal and uT u = 1 and vT v = 1. 20 Eigenvalues of Covariance Matrices • Fact: The eigenvalues of a covariance matrix Σ are non-negative. • If λ is an eigenvalue of Σ, then Σv = λv, where v is an eigenvector corresponding to λ. • Hence, vT Σv = vT λv = λvT v vT Σv ⇒λ= T . v v • The numerator and denominator are both non-negative (why?), so λ must be non-negative. 21 Eigenvectors of Covariance Matrices • Fact: An m × m covariance matrix Σ has m orthonormal eigenvectors. • Proof Omitted, but uses Spectral Decomposition (or similar theorem) of a symmetric matrix. • This result will be used when developing principal components analysis. 22 Example • Suppose that we have the matrix A= 1 2 2 5 . • Question: What are the eigenvalues and corresponding eigenvectors of A? 23 Example • Answer: The eigenvalues are 5.83 and 0.17 (2dp) and the corresponding eigenvectors are 0.38 0.92 and . 0.92 −0.38 24 Example: Frog Cranial Measurements • Example: The cranial length and cranial breath of 35 female frogs are believed to have expected value (23, 24)T and covariance matrix 17.7 20.3 20.3 24.4 • Question: What are the eigenvalues and eigenvectors of the covariance matrix? 25