Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 1 Copyright c 2017 by Nathaniel E. Helwig Copyright Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 2 Outline of Notes 1) Univariate Normal: 3) Multivariate Normal: Distribution form Distribution form Standard normal Probability calculations Probability calculations Affine transformations Affine transformations Conditional distributions Parameter estimation Parameter estimation 2) Bivariate Normal: 4) Sampling Distributions: Distribution form Univariate case Probability calculations Multivariate case Affine transformations Conditional distributions Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 3 Univariate Normal Univariate Normal Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 4 Univariate Normal Distribution Form Normal Density Function (Univariate) Given a variable x ∈ R, the normal probability density function (pdf) is (x−µ)2 1 − f (x) = √ e 2σ2 σ 2π 1 (x − µ)2 = √ exp − 2σ 2 σ 2π (1) where µ ∈ R is the mean σ > 0 is the standard deviation (σ 2 is the variance) e ≈ 2.71828 is base of the natural logarithm Write X ∼ N(µ, σ 2 ) to denote that X follows a normal distribution. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 5 Univariate Normal Standard Normal Standard Normal Distribution If X ∼ N(0, 1), then X follows a standard normal distribution: 1 2 f (x) = √ e−x /2 2π 0.0 f (x ) 0.2 0.4 (2) −4 Nathaniel E. Helwig (U of Minnesota) −2 0 x Introduction to Normal Distribution 2 4 Updated 17-Jan-2017 : Slide 6 Univariate Normal Probability Calculations Probabilities and Distribution Functions Probabilities relate to the area under the pdf: Z P(a ≤ X ≤ b) = b f (x)dx a (3) = F (b) − F (a) where Z x F (x) = f (u)du (4) −∞ is the cumulative distribution function (cdf). Note: F (x) = P(X ≤ x) Nathaniel E. Helwig (U of Minnesota) =⇒ 0 ≤ F (x) ≤ 1 Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 7 Univariate Normal Probability Calculations Normal Probabilities 0.2 0.3 0.4 Helpful figure of normal probabilities: 0.1 34.1% 34.1% 2.1% 0.0 0.1% −3σ 13.6% −2σ −1σ 13.6% µ 1σ 2.1% 2σ 0.1% 3σ From http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 8 Univariate Normal Probability Calculations Normal Distribution Functions (Univariate) Helpful figures of normal pdfs and cdfs: 1.0 1.0 μ = 0, μ = 0, μ = 0, μ = −2, σ 2 = 0.2, σ 2 = 1.0, σ 2 = 5.0, σ 2 = 0.5, 0.8 μ = 0, μ = 0, μ = 0, μ = −2, σ 2 = 0.2, σ 2 = 1.0, σ 2 = 5.0, σ 2 = 0.5, 0.6 2 Φμ,σ (x) 0.6 2 φμ,σ (x) 0.8 0.4 -3 -2 -1 0.4 -3 -2 -1 0.2 0.2 0.0 0.0 −5 −4 −3 −2 −1 0 x 1 2 3 4 5 http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg −5 −4 −3 −2 −1 0 x 1 2 3 4 5 http://en.wikipedia.org/wiki/File:Normal_Distribution_CDF.svg Note that the cdf has an elongated “S” shape, referred to as an ogive. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 9 Univariate Normal Affine Transformations Affine Transformations of Normal (Univariate) Suppose that X ∼ N(µ, σ 2 ) and a, b ∈ R with a 6= 0. If we define Y = aX + b, then Y ∼ N(aµ + b, a2 σ 2 ). Suppose that X ∼ N(1, 2). Determine the distributions of. . . Y =X +3 Y = 2X + 3 Y = 3X + 2 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 10 Univariate Normal Affine Transformations Affine Transformations of Normal (Univariate) Suppose that X ∼ N(µ, σ 2 ) and a, b ∈ R with a 6= 0. If we define Y = aX + b, then Y ∼ N(aµ + b, a2 σ 2 ). Suppose that X ∼ N(1, 2). Determine the distributions of. . . Y =X +3 =⇒ Y ∼ N(1(1) + 3, 12 (2)) ≡ N(4, 2) Y = 2X + 3 =⇒ Y ∼ N(2(1) + 3, 22 (2)) ≡ N(5, 8) Y = 3X + 2 =⇒ Y ∼ N(3(1) + 2, 32 (2)) ≡ N(5, 18) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 10 Univariate Normal Parameter Estimation Likelihood Function Suppose that x = (x1 , . . . , xn ) is an iid sample of data from a normal iid distribution with mean µ and variance σ 2 , i.e., xi ∼ N(µ, σ 2 ). The likelihood function for the parameters (given the data) has the form n Y n Y (xi − µ)2 √ L(µ, σ |x) = f (xi ) = exp − 2 2σ 2 2πσ i=1 i=1 2 1 and the log-likelihood function is given by LL(µ, σ 2 |x) = n X i=1 n n 1 X n (xi −µ)2 log(f (xi )) = − log(2π)− log(σ 2 )− 2 2 2 2σ Nathaniel E. Helwig (U of Minnesota) i=1 Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 11 Univariate Normal Parameter Estimation Maximum Likelihood Estimate of the Mean The MLE of the mean is the value of µ that minimizes n X (xi − µ) = i=1 where x̄ = (1/n) Pn 2 i=1 xi n X xi2 − 2nx̄µ + nµ2 i=1 is the sample mean. Taking the derivative with respect to µ we find that P ∂ ni=1 (xi − µ)2 = −2nx̄ + 2nµ ←→ ∂µ x̄ = µ̂ i.e., the sample mean x̄ is the MLE of the population mean µ. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 12 Univariate Normal Parameter Estimation Maximum Likelihood Estimate of the Variance The MLE of the variance is the value of σ 2 that minimizes Pn n x 2 nx̄ 2 n n 1 X 2 2 2 (xi − µ̂) = log(σ ) + i=12 i − 2 log(σ ) + 2 2 2 2σ 2σ 2σ i=1 where x̄ = (1/n) Pn i=1 xi is the sample mean. Taking the derivative with respect to σ 2 we find that ∂ n2 log(σ 2 ) + 1 2σ 2 Pn ∂σ 2 i=1 (xi − µ̂)2 = n n 1 X − (xi − µ̂)2 2σ 2 2σ 4 i=1 which implies that the sample variance σ̂ 2 = (1/n) MLE of the population variance σ 2 . Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Pn i=1 (xi − x̄)2 is the Updated 17-Jan-2017 : Slide 13 Bivariate Normal Bivariate Normal Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 14 Bivariate Normal Distribution Form Normal Density Function (Bivariate) Given two variables x, y ∈ R, the bivariate normal pdf is h n io (y −µy )2 2ρ(x−µx )(y −µy ) (x−µx )2 1 + − exp − 2(1−ρ 2) 2 2 σx σy σx σ p y f (x, y ) = 2 2πσx σy 1 − ρ (5) where µx ∈ R and µy ∈ R are the marginal means σx ∈ R+ and σy ∈ R+ are the marginal standard deviations 0 ≤ |ρ| < 1 is the correlation coefficient X and Y are marginally normal: X ∼ N(µx , σx2 ) and Y ∼ N(µy , σy2 ) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 15 Bivariate Normal Distribution Form 0.10 4 0.12 √ Example: µx = µy = 0, σx2 = 1, σy2 = 2, ρ = 0.6/ 2 0 y 0.04 0.04 0.06 −2 f (x, y ) 0.08 0.06 0.10 f (x, y ) 0.08 2 0.12 4 0.02 0 x 2 4 −4 y 0.00 −2 −2 0.02 2 0 −4 −4 −4 −2 0 2 4 x http://en.wikipedia.org/wiki/File:MultivariateNormal.png Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 16 Bivariate Normal Distribution Form Example: Different Means µx = −1, µy = −1 −2 0 2 4 x −4 −2 0 2 4 x 0.08 0.06 f (x, y ) 0.04 0 0.02 −2 0.00 −4 y 2 0.10 0.08 0.06 0.04 0.02 0.00 f (x, y ) 2 0 y −2 −4 0.08 f (x, y ) 0.06 0.04 0.02 0.00 −4 −4 0.10 4 0.12 4 0.10 4 2 0 −2 y 0.12 µx = 1, µy = 2 0.12 µx = 0, µy = 0 −4 −2 0 2 4 x √ Note: for all three plots σx2 = 1, σy2 = 2, and ρ = 0.6/ 2. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 17 Bivariate Normal Distribution Form Example: Different Correlations ρ=0 0 2 4 x −4 −2 0 2 4 x 0.15 0.10 0.05 f (x, y ) 0 −2 0.00 −4 y 2 0.08 0.06 0.04 0.02 0.00 f (x, y ) 2 0 y −2 −4 0.08 f (x, y ) 0.06 0.04 0.02 0.00 −2 4 0.10 4 0.10 4 2 0 y −2 −4 −4 0.20 ρ = 1.2/ 2 0.12 ρ = −0.6/ 2 −4 −2 0 2 4 x Note: for all three plots µx = µy = 0, σx2 = 1, and σy2 = 2. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 18 Bivariate Normal Distribution Form Example: Different Variances σy = 1 σy = 2 −2 0 2 4 x 0.08 4 −4 −2 0 2 4 x 0.06 0.04 0.02 f (x, y ) 2 0.00 y 0 −2 −4 0.08 0.06 0.04 0.02 0.00 −4 −4 f (x, y ) 2 0 y −2 0.10 0.05 0.00 −4 f (x, y ) 0 −2 y 2 0.15 0.10 4 4 0.12 σy = 2 −4 −2 0 2 4 x Note: for all three plots µx = µy = 0, σx2 = 1, and ρ = 0.6/(σx σy ). Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 19 Bivariate Normal Probability Calculations Probabilities and Multiple Integration Probabilities still relate to the area under the pdf: Z bx Z by P([ax ≤ X ≤ bx ] and [ay ≤ Y ≤ by ]) = f (x, y )dy dx ax where RR (6) ay f (x, y )dy dx denotes the multiple integral of the pdf f (x, y ). Defining z = (x, y ), we can still define the cdf: F (z) = P(X ≤ x and Y ≤ y ) Z x Z y = f (u, v )dv du −∞ Nathaniel E. Helwig (U of Minnesota) (7) −∞ Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 20 Bivariate Normal Probability Calculations Normal Distribution Functions (Bivariate) 4 −4 −2 0 2 0.8 0.6 0.4 0.2 −4 y 0 −2 F (x, y ) 2 0.08 0.02 4 0.0 0.00 −4 f (x, y ) 0.06 0.04 0 −2 y 2 0.10 4 0.12 1.0 Helpful figures of bivariate normal pdf and cdf: −4 x −2 0 2 4 x √ Note: µx = µy = 0, σx2 = 1, σy2 = 2, and ρ = 0.6/ 2 Note that the cdf still has an ogive shape (now in two-dimensions). Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 21 Bivariate Normal Affine Transformations Affine Transformations of Normal (Bivariate) Given z = (x, y )0 , suppose that z ∼ N(µ, Σ) where µ = (µx , µy )0 is the 2 × 1 mean vector 2 σx ρσx σy Σ= is the 2 × 2 covariance matrix ρσx σy σy2 a11 a12 b1 0 0 Let A = and b = with A 6= 02×2 = . a21 a22 b2 0 0 If we define w = Az + b, then w ∼ N(Aµ + b, AΣA0 ). Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 22 Bivariate Normal Conditional Distributions Conditional Normal (Bivariate) The conditional distribution of a variable Y given X = x is fY |X (y |X = x) = fXY (x, y ) fX (x) (8) where fXY (x, y ) is the joint pdf of X and Y fX (x) is the marginal pdf of X In the bivariate normal case, we have that Y |X ∼ N(µ∗ , σ∗2 ) (9) σ where µ∗ = µy + ρ σyx (x − µx ) and σ∗2 = σy2 (1 − ρ2 ) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 23 Bivariate Normal Conditional Distributions Derivation of Conditional Normal To prove Equation (9), simply write out the definition and simplify: fY |X (y |X = x) = fXY (x, y ) fX (x) ( − exp " (x−µx )2 σx2 1 2(1−ρ2 ) = + exp " ( (x−µx )2 σx2 1 − 2(1−ρ2 ) exp = ( exp = σ2 ρ2 y2 σx + (y −µy )2 σy2 − 2ρ(x−µx )(y −µy ) σx σy 1 2σy2 (1−ρ2 ) p / 2πσx σy 1 − ρ2 √ (x−µx )2 − / σx 2π 2 2σx (y −µy )2 σy2 √ 2πσy p − 2ρ(x−µx )(y −µy ) σx σy # + (x−µx )2 2σx2 ) 1 − ρ2 " − #) #) σ (x − µx )2 + (y − µy )2 − 2ρ σy (x − µx )(y − µy ) x p √ 2πσy 1 − ρ2 ( ) h i2 σ exp − 2 1 2 y − µy − ρ σy (x − µx ) x 2σy (1−ρ ) = p √ 2πσy 1 − ρ2 which completes the proof. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 24 Bivariate Normal Conditional Distributions Statistical Independence for Bivariate Normal Two variables X and Y are statistically independent if fXY (x, y ) = fX (x)fY (y ) (10) where fXY (x, y ) is joint pdf, and fX (x) and fY (y ) are marginals pdfs. Note that if X and Y are independent, then fY |X (y |X = x) = fX (x)fY (y ) fXY (x, y ) = = fY (y ) fX (x) fX (x) (11) so conditioning on X = x does not change the distribution of Y . If X and Y are bivariate normal, what is the necessary and sufficient condition for X and Y to be independent? Hint: see Equation (9) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 25 Bivariate Normal Conditional Distributions Example #1 A statistics class takes two exams X (Exam 1) and Y (Exam 2) where the scores follow a bivariate normal distribution with parameters: µx = 70 and µy = 60 are the marginal means σx = 10 and σy = 15 are the marginal standard deviations ρ = 0.6 is the correlation coefficient Suppose we select a student at random. What is the probability that. . . (a) the student scores over 75 on Exam 2? (b) the student scores over 75 on Exam 2, given that the student scored X = 80 on Exam 1? (c) the sum of his/her Exam 1 and Exam 2 scores is over 150? (d) the student did better on Exam 1 than Exam 2? (e) P(5X − 4Y > 150)? Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 26 Bivariate Normal Conditional Distributions Example #1: Part (a) Answer for 1(a): Note that Y ∼ N(60, 152 ), so the probability that the student scores over 75 on Exam 2 is 75 − 60 P(Y > 75) = P Z > 15 = P(Z > 1) = 1 − P(Z < 1) = 1 − Φ(1) = 1 − 0.8413447 = 0.1586553 Rx 2 where Φ(x) = −∞ f (z)dz with f (x) = √1 e−x /2 denoting the standard 2π normal pdf (see R code for use of pnorm to calculate this quantity). Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 27 Bivariate Normal Conditional Distributions Example #1: Part (b) Answer for 1(b): Note that (Y |X = 80) ∼ N(µ∗ , σ∗2 ) where µ∗ = µY + ρ σσYX (x − µX ) = 60 + (0.6)(15/10)(80 − 70) = 69 σ∗2 = σY2 (1 − ρ2 ) = 152 (1 − 0.62 ) = 144 If a student scored X = 80 on Exam 1, the probability that the student scores over 75 on Exam 2 is 75 − 69 P(Y > 75|X = 80) = P Z > 12 = P (Z > 0.5) = 1 − Φ(0.5) = 1 − 0.6914625 = 0.3085375 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 28 Bivariate Normal Conditional Distributions Example #1: Part (c) Answer for 1(c): Note that (X + Y ) ∼ N(µ∗ , σ∗2 ) where µ∗ = µX + µY = 70 + 60 = 130 σ∗2 = σX2 + σY2 + 2ρσX σY = 102 + 152 + 2(0.6)(10)(15) = 505 The probability that the sum of Exam 1 and Exam 2 is above 150 is 150 − 130 P(X + Y > 150) = P Z > √ 505 = P (Z > 0.8899883) = 1 − Φ(0.8899883) = 1 − 0.8132639 = 0.1867361 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 29 Bivariate Normal Conditional Distributions Example #1: Part (d) Answer for 1(d): Note that (X − Y ) ∼ N(µ∗ , σ∗2 ) where µ∗ = µX − µY = 70 − 60 = 10 σ∗2 = σX2 + σY2 − 2ρσX σY = 102 + 152 − 2(0.6)(10)(15) = 145 The probability that the student did better on Exam 1 than Exam 2 is P(X > Y ) = P(X − Y > 0) 0 − 10 =P Z > √ 145 = P (Z > −0.8304548) = 1 − Φ(−0.8304548) = 1 − 0.2031408 = 0.7968592 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 30 Bivariate Normal Conditional Distributions Example #1: Part (e) Answer for 1(e): Note that (5X − 4Y ) ∼ N(µ∗ , σ∗2 ) where µ∗ = 5µX − 4µY = 5(70) − 4(60) = 110 σ∗2 = 52 σX2 + (−4)2 σY2 + 2(5)(−4)ρσX σY = 25(102 ) + 16(152 ) − 2(20)(0.6)(10)(15) = 2500 Thus, the needed probability can be obtained using 150 − 110 P(5X − 4Y > 150) = P Z > √ 2500 = P (Z > 0.8) = 1 − Φ(0.8) = 1 − 0.7881446 = 0.2118554 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 31 Bivariate Normal Conditional Distributions Example #1: R Code # Example 1a > pnorm(1,lower=F) [1] 0.1586553 > pnorm(75,mean=60,sd=15,lower=F) [1] 0.1586553 # Example 1d > pnorm(-10/sqrt(145),lower=F) [1] 0.7968592 > pnorm(0,mean=10,sd=sqrt(145),lower=F) [1] 0.7968592 # Example 1b > pnorm(0.5,lower=F) [1] 0.3085375 > pnorm(75,mean=69,sd=12,lower=F) [1] 0.3085375 # Example 1e > pnorm(0.8,lower=F) [1] 0.2118554 > pnorm(150,mean=110,sd=50,lower=F) [1] 0.2118554 # Example 1c > pnorm(20/sqrt(505),lower=F) [1] 0.1867361 > pnorm(150,mean=130,sd=sqrt(505),lower=F) [1] 0.1867361 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 32 Multivariate Normal Multivariate Normal Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 33 Multivariate Normal Distribution Form Normal Density Function (Multivariate) Given x = (x1 , . . . , xp )0 with xj ∈ R ∀j, the multivariate normal pdf is 1 1 0 −1 exp − (x − µ) Σ (x − µ) f (x) = (12) 2 (2π)p/2 |Σ|1/2 where µ = (µ1 , . . . , µp )0 σ11 σ12 σ21 σ22 Σ= . .. .. . is the p × 1 mean vector · · · σ1p · · · σ2p .. is the p × p covariance matrix .. . . σp1 σp2 · · · σpp Write x ∼ N(µ, Σ) or x ∼ Np (µ, Σ) to denote x is multivariate normal. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 34 Multivariate Normal Distribution Form Some Multivariate Normal Properties The mean and covariance parameters have the following restrictions: µj ∈ R for all j σjj > 0 for all j √ σij = ρij σii σjj where ρij is correlation between Xi and Xj σij2 ≤ σii σjj for any i, j ∈ {1, . . . , p} (Cauchy-Schwarz) Σ is assumed to be positive definite so that Σ−1 exists. Marginals are normal: Xj ∼ N(µj , σjj ) for all j ∈ {1, . . . , p}. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 35 Multivariate Normal Probability Calculations Multivariate Normal Probabilities Probabilities still relate to the area under the pdf: Z b1 P(aj ≤ Xj ≤ bj ∀j) = Z a1 where R ··· R bp ··· f (x)dxp · · · dx1 (13) ap f (x)dxp · · · dx1 denotes the multiple integral f (x). We can still define the cdf of x = (x1 , . . . , xp )0 F (x) = P(Xj ≤ xj ∀j) Z x1 Z xp ··· f (u)dup · · · du1 = −∞ Nathaniel E. Helwig (U of Minnesota) (14) −∞ Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 36 Multivariate Normal Affine Transformations Affine Transformations of Normal (Multivariate) Suppose that x = (x1 , . . . , xp )0 and that x ∼ N(µ, Σ) where µ = {µj }p×1 is the mean vector Σ = {σij }p×p is the covariance matrix Let A = {aij }n×p and b = {bi }n×1 with A 6= 0n×p . If we define w = Ax + b, then w ∼ N(Aµ + b, AΣA0 ). Note: linear combinations of normal variables are normally distributed. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 37 Multivariate Normal Conditional Distributions Multivariate Conditional Distributions Given variables x = (x1 , . . . , xp )0 and y = (y1 , . . . , yq )0 , we have fY |X (y|X = x) = fXY (x, y) fX (x) (15) where fY |X (y|X = x) is the conditional distribution of y given x fXY (x, y) is the joint pdf of x and y fX (x) is the marginal pdf of x Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 38 Multivariate Normal Conditional Distributions Conditional Normal (Multivariate) Suppose that z ∼ N(µ, Σ) where z = (x0 , y0 )0 = (x1 , . . . , xp , y1 , . . . , yq )0 µ = (µ0x , µ0y )0 = (µ1x , . . . , µpx , µ1y , . . . , µqy )0 Note: µx is mean vector of x, and µy is mean vector of y Σxx Σxy Σ= where (Σxx )p×p , (Σyy )q×q , and (Σxy )p×q , Σ0xy Σyy Note: Σxx is covariance matrix of x, Σyy is covariance matrix of y, and Σxy is covariance matrix of x and y In the multivariate normal case, we have that y|x ∼ N(µ∗ , Σ∗ ) (16) 0 −1 where µ∗ = µy + Σ0xy Σ−1 xx (x − µx ) and Σ∗ = Σyy − Σxy Σxx Σxy Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 39 Multivariate Normal Conditional Distributions Statistical Independence for Multivariate Normal Using Equation (16), we have that y|x ∼ N(µ∗ , Σ∗ ) ≡ N(µy , Σyy ) (17) if and only if Σxy = 0p×q (a matrix of zeros). Note that Σxy = 0p×q implies that the p elements of x are uncorrelated with the q elements of y. For multivariate normal variables: uncorrelated → independent For non-normal variables: uncorrelated 6→ independent Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 40 Multivariate Normal Conditional Distributions Example #2 Each Delicious Candy Company store makes 3 size candy bars: regular (X1 ), fun size (X2 ), and big size (X3 ). Assume the weight (in ounces) of the candy bars (X1 , X2 , X3 ) follow a multivariate normal distribution with parameters: 5 4 −1 0 µ = 3 and Σ = −1 4 2 7 0 2 9 Suppose we select a store at random. What is the probability that. . . (a) the weight of a regular candy bar is greater than 8 oz? (b) the weight of a regular candy bar is greater than 8 oz, given that the fun size bar weighs 1 oz and the big size bar weighs 10 oz? (c) P(4X1 − 3X2 + 5X3 < 63)? Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 41 Multivariate Normal Conditional Distributions Example #2: Part (a) Answer for 2(a): Note that X1 ∼ N(5, 4) So, the probability that the regular bar is more than 8 oz is 8−5 P(X1 > 8) = P Z > 2 = P(Z > 1.5) = 1 − Φ(1.5) = 1 − 0.9331928 = 0.0668072 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 42 Multivariate Normal Conditional Distributions Example #2: Part (b) Answer for 2(b): (X1 |X2 = 1, X3 = 10) is normally distributed, see Equation (16). The conditional mean of (X1 |X2 = 1, X3 = 10) is given by µ∗ = µX1 + Σ012 Σ−1 22 (x̃ − µ̃) 4 2 −1 1 − 3 = 5 + −1 0 2 9 10 − 7 1 9 −2 −2 = 5 + −1 0 3 32 −2 4 = 5 + 24/32 = 5.75 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 43 Multivariate Normal Conditional Distributions Example #2: Part (b) continued Answer for 2(b) continued: The conditional variance of (X1 |X2 = 1, X3 = 10) is given by σ∗2 = σX2 1 − Σ012 Σ−1 22 Σ12 4 2 −1 −1 = 4 − −1 0 2 9 0 1 9 −2 −1 = 4 − −1 0 0 32 −2 4 = 4 − 9/32 = 3.71875 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 44 Multivariate Normal Conditional Distributions Example #2: Part (b) continued Answer for 2(b) continued: So, if the fun size bar weighs 1 oz and the big size bar weighs 10 oz, the probability that the regular bar is more than 8 oz is 8 − 5.75 P(X1 > 8|X2 = 1, X3 = 10) = P Z > √ 3.71875 = P(Z > 1.166767) = 1 − Φ(1.166767) = 1 − 0.8783477 = 0.1216523 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 45 Multivariate Normal Conditional Distributions Example #2: Part (c) Answer for 2(c): (4X1 − 3X2 + 5X3 ) is normally distributed. The expectation of (4X1 − 3X2 + 5X3 ) is given by µ∗ = 4µX1 − 3µX2 + 5µX3 = 4(5) − 3(3) + 5(7) = 46 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 46 Multivariate Normal Conditional Distributions Example #2: Part (c) continued Answer for 2(c) continued: The variance of (4X1 − 3X2 + 5X3 ) is given by 4 2 σ∗ = 4 −3 5 Σ −3 5 4 −1 0 4 −1 4 2 −3 = 4 −3 5 0 2 9 5 19 = 4 −3 5 −6 39 = 289 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 47 Multivariate Normal Conditional Distributions Example #2: Part (c) continued Answer for 2(c) continued: So, the needed probability can be obtained as 63 − 46 P(4X1 − 3X2 + 5X3 < 63) = P Z < √ 289 = P(Z < 1) = Φ(1) = 0.8413447 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 48 Multivariate Normal Conditional Distributions Example #2: R Code # Example 2a > pnorm(1.5,lower=F) [1] 0.0668072 > pnorm(8,mean=5,sd=2,lower=F) [1] 0.0668072 # Example 2b > pnorm(2.25/sqrt(119/32),lower=F) [1] 0.1216523 > pnorm(8,mean=5.75,sd=sqrt(119/32),lower=F) [1] 0.1216523 # Example 2c > pnorm(1) [1] 0.8413447 > pnorm(63,mean=46,sd=17) [1] 0.8413447 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 49 Multivariate Normal Parameter Estimation Likelihood Function Suppose that xi = (xi1 , . . . , xip ) is a sample from a normal distribution iid with mean vector µ and covariance matrix Σ, i.e., xi ∼ N(µ, Σ). The likelihood function for the parameters (given the data) has the form L(µ, Σ|X) = n Y f (xi ) = i=1 n Y i=1 1 (2π)p/2 |Σ|1/2 1 exp − (xi − µ)0 Σ−1 (xi − µ) 2 and the log-likelihood function is given by n LL(µ, Σ|X) = − np n 1X log(2π) − log(|Σ|) − (xi − µ)0 Σ−1 (xi − µ) 2 2 2 i=1 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 50 Multivariate Normal Parameter Estimation Maximum Likelihood Estimate of Mean Vector The MLE of the mean vector is the value of µ that minimizes n X (xi − µ)0 Σ−1 (xi − µ) = i=1 n X x0i Σ−1 xi − 2nx̄0 Σ−1 µ + nµ0 Σ−1 µ i=1 where x̄ = (1/n) Pn i=1 xi is the sample mean vector. Taking the derivative with respect to µ we find that ∂LL(µ, Σ|X) = −2nΣ−1 x̄ + 2nΣ−1 µ ∂µ ←→ x̄ = µ̂ The sample mean vector x̄ is the MLE of the population mean µ vector. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 51 Multivariate Normal Parameter Estimation Maximum Likelihood Estimate of Covariance Matrix The MLE of the covariance matrix is the value of Σ that minimizes −n log(|Σ−1 |) + n X tr{Σ−1 (xi − µ̂)(xi − µ̂)0 } i=1 where µ̂ = x̄ = (1/n) Pn i=1 xi is the sample mean. Taking the derivative with respect to Σ−1 we find that n X ∂LL(µ, Σ|X) = −nΣ + (xi − µ̂)(xi − µ̂)0 ∂Σ−1 i=1 P i.e., the sample covariance matrix Σ̂ = (1/n) ni=1 (xi − x̄)(xi − x̄)0 is the MLE of the population covariance matrix Σ. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 52 Sampling Distributions Sampling Distributions Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 53 Sampling Distributions Univariate Case Univariate Sampling Distributions: x̄ and s2 In the univariate normal case, we have that P x̄ = (1/n) ni=1 xi ∼ N(µ, σ 2 /n) P (n − 1)s2 = ni=1 (xi − x̄)2 ∼ σ 2 χ2n−1 χ2k denotes a chi-square variable with k degrees of freedom. σ 2 χ2k = Pk 2 i=1 zi iid where zi ∼ N(0, σ 2 ) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 54 Sampling Distributions Multivariate Case Multivariate Sampling Distributions: x̄ and S In the multivariate normal case, we have that P x̄ = (1/n) ni=1 xi ∼ N(µ, Σ/n) P (n − 1)S = ni=1 (xi − x̄)(xi − x̄)0 ∼ Wn−1 (Σ) Wk (Σ) denotes a Wishart variable with k degrees of freedom. Wk (Σ) = Pk 0 i=1 zi zi iid where zi ∼ N(0p , Σ) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 55