Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4 - 7/21/2011 Slide 1 of 41 Last Time ■ Matrices and vectors ◆ Eigenvalues ◆ Eigenvectors ◆ Determinants Overview ● Last Time ● Today’s Lecture MVN MVN Properties MVN Parameters MVN Likelihood Functions ■ Basic descriptive statistics using matrices: ◆ Mean vectors MVN in Common Methods Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 ◆ Covariance Matrices ◆ Correlation Matrices Slide 2 of 41 Today’s Lecture Overview ● Last Time ■ Putting our new knowledge to use with a useful statistical distribution: the Multivariate Normal Distribution ■ This roughly maps onto Chapter 4 of Johnson and Wichern ● Today’s Lecture MVN MVN Properties MVN Parameters MVN Likelihood Functions MVN in Common Methods Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 Slide 3 of 41 Multivariate Normal Distribution ■ The generalization of the univariate normal distribution to multiple variables is called the multivariate normal distribution (MVN) ■ Many multivariate techniques rely on this distribution in some manner ■ Although real data may never come from a true MVN, the MVN provides a robust approximation, and has many nice mathematical properties ■ Furthermore, because of the central limit theorem, many multivariate statistics converge to the MVN distribution as the sample size increases Overview MVN ● Univariate Review ● MVN ● MVN Contours MVN Properties MVN Parameters MVN Likelihood Functions MVN in Common Methods Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 Slide 4 of 41 Univariate Normal Distribution ■ The univariate normal distribution function is: f (x) = √ Overview MVN 1 2πσ 2 −[(x−µ)/σ]2 /2 e ● Univariate Review ● MVN ● MVN Contours MVN Properties ■ The mean is µ ■ The variance is σ 2 ■ The standard deviation is σ ■ Standard notation for normal distributions is N (µ, σ 2 ), which will be extended for the MVN distribution MVN Parameters MVN Likelihood Functions MVN in Common Methods Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 Slide 5 of 41 Univariate Normal Distribution N (0, 1) Univariate Normal Distribution Overview 0.4 MVN ● Univariate Review ● MVN ● MVN Contours 0.3 MVN Properties MVN Parameters 0.2 MVN in Common Methods f(x) MVN Likelihood Functions Assessing Normality 0.0 0.1 Wrapping Up −6 −4 −2 0 2 4 6 x Lecture #4 - 7/21/2011 Slide 6 of 41 Univariate Normal Distribution N (0, 2) Univariate Normal Distribution Overview 0.4 MVN ● Univariate Review ● MVN ● MVN Contours 0.3 MVN Properties MVN Parameters 0.2 MVN in Common Methods f(x) MVN Likelihood Functions Assessing Normality 0.0 0.1 Wrapping Up −6 −4 −2 0 2 4 6 x Lecture #4 - 7/21/2011 Slide 7 of 41 Univariate Normal Distribution N (1.75, 1) Univariate Normal Distribution Overview 0.4 MVN ● Univariate Review ● MVN ● MVN Contours 0.3 MVN Properties MVN Parameters 0.2 MVN in Common Methods f(x) MVN Likelihood Functions Assessing Normality 0.0 0.1 Wrapping Up −6 −4 −2 0 2 4 6 x Lecture #4 - 7/21/2011 Slide 8 of 41 UVN - Notes Overview ■ The area under the curve for the univariate normal distribution is a function of the variance/standard deviation ■ In particular: MVN ● Univariate Review ● MVN P (µ − σ ≤ X ≤ µ + σ) = 0.683 ● MVN Contours MVN Properties P (µ − 2σ ≤ X ≤ µ + 2σ) = 0.954 MVN Parameters MVN Likelihood Functions MVN in Common Methods ■ Also note the term in the exponent: Assessing Normality Wrapping Up ■ Lecture #4 - 7/21/2011 (x − µ) σ 2 = (x − µ)(σ 2 )−1 (x − µ) This is the square of the distance from x to µ in standard deviation units, and will be generalized for the MVN Slide 9 of 41 MVN ■ The multivariate normal distribution function is: f (x) = Overview MVN ● Univariate Review 1 (2π)p/2 |Σ| −1 −(x−µ)′ Σ e 1/2 (x−µ)/2 ● MVN ● MVN Contours MVN Properties ■ The mean vector is µ ■ The covariance matrix is Σ ■ Standard notation for multivariate normal distributions is Np (µ, Σ) ■ Visualizing the MVN is difficult for more than two dimensions, so I will demonstrate some plots with two variables - the bivariate normal distribution MVN Parameters MVN Likelihood Functions MVN in Common Methods Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 Slide 10 of 41 Bivariate Normal Plot #1 µ= Overview " 0 0 # ,Σ = " 1 0 0 1 # MVN ● Univariate Review ● MVN ● MVN Contours 0.16 MVN Properties 0.14 0.12 MVN Parameters 0.1 MVN Likelihood Functions 0.08 MVN in Common Methods 0.06 Assessing Normality 0.04 0.02 Wrapping Up 0 4 4 2 2 0 0 −2 −2 −4 Lecture #4 - 7/21/2011 −4 Slide 11 of 41 Bivariate Normal Plot #1a µ= Overview MVN " 0 0 # ,Σ = " 1 0 0 1 # 4 ● Univariate Review ● MVN 3 ● MVN Contours MVN Properties 2 MVN Parameters 1 MVN Likelihood Functions MVN in Common Methods Assessing Normality 0 −1 Wrapping Up −2 −3 −4 −4 Lecture #4 - 7/21/2011 −3 −2 −1 0 1 2 3 4 Slide 12 of 41 Bivariate Normal Plot #2 µ= Overview " 0 0 # ,Σ = " 1 0.5 0.5 1 # MVN ● Univariate Review ● MVN ● MVN Contours MVN Properties 0.2 MVN Parameters 0.15 MVN Likelihood Functions MVN in Common Methods 0.1 Assessing Normality 0.05 Wrapping Up 0 4 4 2 2 0 0 −2 −2 −4 Lecture #4 - 7/21/2011 −4 Slide 13 of 41 Bivariate Normal Plot #2 µ= Overview MVN " 0 0 # ,Σ = " 1 0.5 0.5 1 # 4 ● Univariate Review ● MVN 3 ● MVN Contours MVN Properties 2 MVN Parameters 1 MVN Likelihood Functions MVN in Common Methods Assessing Normality 0 −1 Wrapping Up −2 −3 −4 −4 Lecture #4 - 7/21/2011 −3 −2 −1 0 1 2 3 4 Slide 14 of 41 MVN Contours ■ Overview The lines of the contour plots denote places of equal probability mass for the MVN distribution ◆ MVN ● Univariate Review The lines represent points of both variables that lead to the same height on the z-axis (the height of the surface) ● MVN ● MVN Contours ■ MVN Properties These contours can be constructed from the eigenvalues and eigenvectors of the covariance matrix MVN Parameters MVN Likelihood Functions ◆ The direction of the ellipse axes are in the direction of the eigenvalues ◆ The length of the ellipse axes are proportional to the constant times the eigenvector MVN in Common Methods Assessing Normality Wrapping Up ■ Specifically: (x − µ)′ Σ−1 (x − µ) = c2 √ has ellipsoids centered at µ, and has axes ±c λi ei Lecture #4 - 7/21/2011 Slide 15 of 41 MVN Contours, Continued Overview ■ Contours are useful because they provide confidence regions for data points from the MVN distribution ■ The multivariate analog of a confidence interval is given by an ellipsoid, where c is from the Chi-Squared distribution with p degrees of freedom ■ Specifically: MVN ● Univariate Review ● MVN ● MVN Contours MVN Properties MVN Parameters MVN Likelihood Functions MVN in Common Methods (x − µ)′ Σ−1 (x − µ) = χ2p (α) Assessing Normality Wrapping Up provides the confidence region containing 1 − α of the probability mass of the MVN distribution Lecture #4 - 7/21/2011 Slide 16 of 41 MVN Contour Example ■ Imagine we had a bivariate normal distribution with: " # " # 0 1 0.5 µ= ,Σ = 0 0.5 1 ■ The covariance matrix has eigenvalues and eigenvectors: # " # " 1.5 0.707 −0.707 ,E = λ= 0.5 0.707 0.707 ■ We want to find a contour where 95% of the probability will fall, corresponding to χ22 (0.05) = 5.99 Overview MVN ● Univariate Review ● MVN ● MVN Contours MVN Properties MVN Parameters MVN Likelihood Functions MVN in Common Methods Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 Slide 17 of 41 MVN Contour Example ■ This contour will be centered at µ ■ Axis 1: Overview MVN √ µ ± 5.99 × 1.5 ● Univariate Review ● MVN ● MVN Contours " 0.707 0.707 # −0.707 0.707 # = " = " 2.12 2.12 # " , MVN Properties −2.12 −2.12 # MVN Parameters MVN Likelihood Functions ■ Axis 2: MVN in Common Methods Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 √ µ ± 5.99 × 0.5 " −1.22 1.22 # " , 1.22 −1.22 # Slide 18 of 41 MVN Properties Overview MVN ■ The MVN distribution has some convenient properties ■ If X has a multivariate normal distribution, then: 1. Linear combinations of X are normally distributed MVN Properties ● MVN Properties MVN Parameters 2. All subsets of the components of X have a MVN distribution MVN Likelihood Functions MVN in Common Methods Assessing Normality 3. Zero covariance implies that the corresponding components are independently distributed Wrapping Up 4. The conditional distributions of the components are MVN Lecture #4 - 7/21/2011 Slide 19 of 41 Linear Combinations ■ If X ∼ Np (µ, Σ), then any set of q linear combinations of variables A(q×p) are also normally distributed as AX ∼ Nq (Aµ, AΣA′ ) ■ For example, let p = 3 and Y be the difference between X1 and X2 . The combination matrix would be h i A = )1 −1 0 Overview MVN MVN Properties ● MVN Properties MVN Parameters MVN Likelihood Functions MVN in Common Methods Assessing Normality Wrapping Up For X µ1 σ11 µ = µ2 , Σ = σ21 µ3 σ31 σ12 σ22 σ32 σ13 σ23 σ33 For Y = AX µY = Lecture #4 - 7/21/2011 h µ1 − µ2 i , ΣY = h σ11 + σ22 − 2σ12 i Slide 20 of 41 MVN Properties ■ The MVN distribution is characterized by two parameters: ◆ The mean vector µ ◆ The covariance matrix Σ Overview MVN MVN Properties ■ MVN Parameters ● MVN Properties ● UVN CLT The maximum likelihood estimates for these parameters are given by: ● Multi CLT n ● Sufficient Stats MVN Likelihood Functions ◆ MVN in Common Methods Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 ◆ 1X 1 ′ The mean vector: x̄ = xi = X’1 n i=1 n The covariance matrix n 1X 1 S= (xi − x̄)2 = (X − 1x̄′ )′ (X − 1x̄′ ) n i=1 n Slide 21 of 41 Distribution of x̄ and S Recall back in Univariate statistics you discussed the Central Limit Theorem (CLT) Overview MVN It stated that, if the set of n observations x1 , x2 , . . . , xn were normal or not... MVN Properties MVN Parameters ■ The distribution of x̄ would be normal with mean equal to µ and variance σ 2 /n ■ We were also told that (n − 1)s2 /σ 2 had a Chi-Square distribution with n − 1 degrees of freedom ■ Note: We ended up using these pieces of information for hypothesis testing such as t-test and ANOVA. ● MVN Properties ● UVN CLT ● Multi CLT ● Sufficient Stats MVN Likelihood Functions MVN in Common Methods Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 Slide 22 of 41 Distribution of x̄ and S We also have a Multivariate Central Limit Theorem (CLT) Overview It states that, if the set of n observations x1 , x2 , . . . , xn are multivariate normal or not... MVN MVN Properties ■ The distribution of x̄ would be normal with mean equal to µ and variance/covariance matrix Σ/n ■ We are also told that (n − 1)S will have a Wishart distribution, Wp (n − 1, Σ), with n − 1 degrees of freedom MVN Parameters ● MVN Properties ● UVN CLT ● Multi CLT ● Sufficient Stats MVN Likelihood Functions MVN in Common Methods ◆ Assessing Normality Wrapping Up ■ Lecture #4 - 7/21/2011 This is the multivariate analogue to a Chi-Square distribution Note: We will end up using some of this information for multivariate hypothesis testing Slide 23 of 41 Distribution of x̄ and S Overview MVN MVN Properties MVN Parameters ● MVN Properties ● UVN CLT ● Multi CLT ■ Therefore, let x1 , x2 , . . . , xn be independent observations from a population with mean µ and covariance Σ ■ The following are true: √ ◆ n X̄ − µ is approximately Np (0, Σ) ◆ n (X − µ)′ S−1 (X − µ) is approximately χ2p ● Sufficient Stats MVN Likelihood Functions MVN in Common Methods Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 Slide 24 of 41 Sufficient Statistics ■ The sample estimates X̄ and S) are sufficient statistics ■ This means that all of the information contained in the data can be summarized by these two statistics alone ■ This is only true if the data follow a multivariate normal distribution - if they do not, other terms are needed (i.e., skewness array, kurtosis array, etc...) ■ Some statistical methods only use one or both of these matrices in their analysis procedures and not the actual data Overview MVN MVN Properties MVN Parameters ● MVN Properties ● UVN CLT ● Multi CLT ● Sufficient Stats MVN Likelihood Functions MVN in Common Methods Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 Slide 25 of 41 Density and Likelihood Functions Overview ■ The MVN distribution is often the core statistical distribution for a uni- or multivariate statistical technique ■ Maximum likelihood estimates are preferable in statistics due to a set of desirable asymptotic properties, including: MVN MVN Properties ◆ MVN Parameters MVN Likelihood Functions ● Intro to MLE ◆ ● MVN Likelihood MVN in Common Methods ◆ Assessing Normality Wrapping Up ■ Lecture #4 - 7/21/2011 Consistency: the estimator converges in probability to the value being estimated Asymptotic Normality: the estimator has a normal distribution with a functionally known variance Efficiency: no asymptotically unbiased estimator has lower asymptotic mean squared error than the MLE The form of the MVN ML function frequently appears in statistics, so we will briefly discuss MLE using normal distributions Slide 26 of 41 An Introduction to Maximum Likelihood ■ Maximum likelihood estimation seeks to find parameters of a statistical model (mapping onto the mean vector and/or covariance matrix) such that the statistical likelihood function is maximized ■ The method assumes data follow a statistical distribution, in our case the MVN ■ More frequently, the log-likelihood function is used instead of the likelihood function Overview MVN MVN Properties MVN Parameters MVN Likelihood Functions ● Intro to MLE ● MVN Likelihood MVN in Common Methods Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 ◆ The “logged” and “un-logged” version of the function have a maximum at the same point ◆ The “logged” version is easier mathematically Slide 27 of 41 Maximum Likelihood for Univariate Normal Overview ■ We will start with the univariate normal case and then generalize ■ Imagine we have a sample of data, X, which we will assume is normally distributed with an unknown mean but a known variance (say the variance is 1) ■ We will build the maximum likelihood function for the mean ■ Our function rests on two assumptions: MVN MVN Properties MVN Parameters MVN Likelihood Functions ● Intro to MLE ● MVN Likelihood MVN in Common Methods Assessing Normality 1. All data follow a normal distribution Wrapping Up 2. All observations are independent ■ Lecture #4 - 7/21/2011 Put into statistical terms: X is independent and identically distributed (iid) as N1 (µ, 1) Slide 28 of 41 Building the Likelihood Function Overview ■ Each observation, then, follows a normal distribution with the same mean (unknown) and variance (1) ■ The distribution function begins with the density – the function that provides the normal curve (with (1) in place of σ 2 ): MVN MVN Properties MVN Parameters MVN Likelihood Functions 1 (Xi − µ) f (Xi |µ) = p exp − 2(1)2 2π(1) ● Intro to MLE ● MVN Likelihood MVN in Common Methods Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 2 ■ The density provides the “likelihood” of observing an observation Xi for a given value of µ (and a known value of σ2 = 1 ■ The “likelihood” is the height of the normal curve Slide 29 of 41 The One-Observation Likelihood Function The graph shows f (Xi |µ = 1) for a range of X 0.4 Overview MVN f(X|mu) MVN Parameters MVN Likelihood Functions ● Intro to MLE 0.1 ● MVN Likelihood 0.2 0.3 MVN Properties MVN in Common Methods 0.0 Assessing Normality Wrapping Up −4 −2 0 2 4 X The vertical lines indicate: Lecture #4 - 7/21/2011 ■ f (Xi = 0|µ = 1) = .241 ■ f (Xi = 1|µ = 1) = .399 Slide 30 of 41 The Overall Likelihood Function Overview ■ Because we have a sample of N observations, our likelihood function is taken across all observations, not just one ■ The “joint” likelihood function uses the assumption that observations are independent to be expressed as a product of likelihood functions across all observations: MVN MVN Properties MVN Parameters MVN Likelihood Functions ● Intro to MLE ● MVN Likelihood MVN in Common Methods Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 L(x|µ) = f (X1 |µ) × f (X2 |µ) × . . . × f (XN |µ) ! P N N/2 N 2 Y 1 i=1 (Xi − µ) L(x|µ) = exp − f (Xi |µ) = 2 2π(1) 2(1) i=1 ■ The value of µ that maximizes f (x|µ) is the MLE (in this case, it’s the sample mean) ■ In more complicated models, the MLE does not have a closed form and therefore must be found using numeric methods Slide 31 of 41 The Overall Log-Likelihood Function ■ For an unknown mean µ and variance σ 2 , the likelihood function is: ! PN N/2 2 1 2 i=1 (Xi − µ) L(x|µ, σ ) = exp − 2πσ 2 2σ 2 ■ More commonly, the log-likelihood function is used: Overview MVN MVN Properties MVN Parameters MVN Likelihood Functions ● Intro to MLE ● MVN Likelihood MVN in Common Methods Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 2 L(x|µ, σ ) = − N 2 log 2πσ 2 − PN i=1 (Xi − 2σ 2 µ) 2 ! Slide 32 of 41 The Multivariate Normal Likelihood Function ■ For a set of N independent observations on p variables, X(N ×p) , the multivariate normal likelihood function is formed by using a similar approach ■ For an unknown mean vector µ and covariance Σ, the joint likelihood is: Overview MVN MVN Properties MVN Parameters MVN Likelihood Functions ● Intro to MLE ● MVN Likelihood MVN in Common Methods L(X|µ, Σ) = Assessing Normality N Y 1 ′ p/2 |Σ|1/2 (2π) i=1 −1 exp − (xi − µ) Σ (xi − µ) /2 Wrapping Up = Lecture #4 - 7/21/2011 1 N X ! 1 ′ −1 exp − (x − µ) Σ (xi − µ) /2 i np/2 |Σ|n/2 (2π) i=1 Slide 33 of 41 The Multivariate Normal Likelihood Function Overview MVN ■ Occasionally, a more intricate form of the MVN likelihood function shows up ■ Although mathematically identical to the function on the last page, this version typically appears without explanation: MVN Properties MVN Parameters L(X|µ, Σ) = (2π) MVN Likelihood Functions ● Intro to MLE ● MVN Likelihood MVN in Common Methods Assessing Normality " exp −tr Σ−1 N X i=1 −np/2 |Σ|−n/2 (xi − x̄) (xi − x̄)′ + n (x̄ − µ) (x̄ − µ)′ !# ! /2 Wrapping Up Lecture #4 - 7/21/2011 Slide 34 of 41 The MVN Log-Likelihood Function Overview MVN ■ As with the univariate case, the MVN likelihood function is typically converted into a log-likelihood function for simplicity ■ The MVN log-likelihood function is given by: MVN Properties MVN Parameters l(X|µ, Σ) = − MVN Likelihood Functions ● Intro to MLE ● MVN Likelihood MVN in Common Methods Assessing Normality 1 2 N X i=1 np n log (2π) − log (|Σ|) − 2 2 ! ′ (xi − µ) Σ−1 (xi − µ) Wrapping Up Lecture #4 - 7/21/2011 Slide 35 of 41 But...Why? Overview ■ The MVN distribution, likelihood, and log-likelihood functions show up frequently in statistical methods ■ Commonly used methods rely on versions of the distribution, methods such as: MVN MVN Properties MVN Parameters ◆ MVN Likelihood Functions ◆ MVN in Common Methods ● Motivation for MVN ◆ ● MVN in Mixed Models ● MVN in SEM ◆ Assessing Normality Wrapping Up ◆ ■ Lecture #4 - 7/21/2011 Linear models (ANOVA, Regression) Mixed models (i.e., hierarchical linear models, random effects models, multilevel models) Path models/simultaneous equation models Structural equation models (and confirmatory factor models) Many versions of finite mixture models Understanding the form of the MVN distribution will help to understand the commonalities between each of these models Slide 36 of 41 MVN in Mixed Models ■ From SAS’ manual for proc mixed : Overview MVN MVN Properties MVN Parameters MVN Likelihood Functions MVN in Common Methods ● Motivation for MVN ● MVN in Mixed Models ● MVN in SEM Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 Slide 37 of 41 MVN in Structural Equation Models ■ From SAS’ manual for proc calis: ■ This package uses only the covariance matrix, so the form of the likelihood function is phrased using only the Wishart Distribution: Overview MVN MVN Properties MVN Parameters MVN Likelihood Functions MVN in Common Methods ● Motivation for MVN ● MVN in Mixed Models ● MVN in SEM Assessing Normality Wrapping Up Lecture #4 - 7/21/2011 w (S|Σ) = −1 (n−p−2) |S| exp −tr SΣ /2 Qp 1 p(n−1)/2 p(p−1)/4 (n−1)/2 2 π kΣ| i=1 Γ 2 (n − i) Slide 38 of 41 Assessing Normality ■ Recall from earlier that IF the data have a Multivariate normal distribution then all of the previously discussed properties will hold ■ There are a host of methods that have been developed to assess multivariate normality - just look in Johnson & Wichern ■ Given the relative robustness of the MVN distribution, I will skip this topic, acknowledging that extreme deviations from normality will result in poorly performing statistics ■ More often than not, assessing MV normality is fraught with difficulty due to sample-estimated parameters of the distribution Overview MVN MVN Properties MVN Parameters MVN Likelihood Functions MVN in Common Methods Assessing Normality ● Assessing Normality ● Transformations to Near Normality Wrapping Up Lecture #4 - 7/21/2011 Slide 39 of 41 Transformations to Near Normality ■ Historically, people have gone on an expedition to find a transformation to near-normality when learning their data may not be MVN ■ Modern statistical methods, however, make that a very bad idea ■ More often than not, transformations end up changing the nature of the statistics you are interested in forming ■ Furthermore, not all data need to be MVN (think conditional distributions) Overview MVN MVN Properties MVN Parameters MVN Likelihood Functions MVN in Common Methods Assessing Normality ● Assessing Normality ● Transformations to Near Normality Wrapping Up Lecture #4 - 7/21/2011 Slide 40 of 41 Final Thoughts Overview ■ The multivariate normal distribution is an analog to the univariate normal distribution ■ The MVN distribution will play a large role in the upcoming weeks ■ We can finally put the background material to rest, and begin learning some statistics methods ■ Tomorrow: lab with SAS - the “fun” of proc iml ■ Up next week: Inferences about Mean Vectors and Multivariate ANOVA MVN MVN Properties MVN Parameters MVN Likelihood Functions MVN in Common Methods Assessing Normality Wrapping Up ● Final Thoughts Lecture #4 - 7/21/2011 Slide 41 of 41