Download Introduction to Normal Distribution

Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 1 Copyright c 2017 by Nathaniel E. Helwig Copyright Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 2 Outline of Notes 1) Univariate Normal: 3) Multivariate Normal: Distribution form Distribution form Standard normal Probability calculations Probability calculations Affine transformations Affine transformations Conditional distributions Parameter estimation Parameter estimation 2) Bivariate Normal: 4) Sampling Distributions: Distribution form Univariate case Probability calculations Multivariate case Affine transformations Conditional distributions Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 3 Univariate Normal Univariate Normal Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 4 Univariate Normal Distribution Form Normal Density Function (Univariate) Given a variable x ∈ R, the normal probability density function (pdf) is (x−µ)2 1 − f (x) = √ e 2σ2 σ 2π 1 (x − µ)2 = √ exp − 2σ 2 σ 2π (1) where µ ∈ R is the mean σ > 0 is the standard deviation (σ 2 is the variance) e ≈ 2.71828 is base of the natural logarithm Write X ∼ N(µ, σ 2 ) to denote that X follows a normal distribution. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 5 Univariate Normal Standard Normal Standard Normal Distribution If X ∼ N(0, 1), then X follows a standard normal distribution: 1 2 f (x) = √ e−x /2 2π 0.0 f (x ) 0.2 0.4 (2) −4 Nathaniel E. Helwig (U of Minnesota) −2 0 x Introduction to Normal Distribution 2 4 Updated 17-Jan-2017 : Slide 6 Univariate Normal Probability Calculations Probabilities and Distribution Functions Probabilities relate to the area under the pdf: Z P(a ≤ X ≤ b) = b f (x)dx a (3) = F (b) − F (a) where Z x F (x) = f (u)du (4) −∞ is the cumulative distribution function (cdf). Note: F (x) = P(X ≤ x) Nathaniel E. Helwig (U of Minnesota) =⇒ 0 ≤ F (x) ≤ 1 Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 7 Univariate Normal Probability Calculations Normal Probabilities 0.2 0.3 0.4 Helpful figure of normal probabilities: 0.1 34.1% 34.1% 2.1% 0.0 0.1% −3σ 13.6% −2σ −1σ 13.6% µ 1σ 2.1% 2σ 0.1% 3σ From http://en.wikipedia.org/wiki/File:Standard_deviation_diagram.svg Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 8 Univariate Normal Probability Calculations Normal Distribution Functions (Univariate) Helpful figures of normal pdfs and cdfs: 1.0 1.0 μ = 0, μ = 0, μ = 0, μ = −2, σ 2 = 0.2, σ 2 = 1.0, σ 2 = 5.0, σ 2 = 0.5, 0.8 μ = 0, μ = 0, μ = 0, μ = −2, σ 2 = 0.2, σ 2 = 1.0, σ 2 = 5.0, σ 2 = 0.5, 0.6 2 Φμ,σ (x) 0.6 2 φμ,σ (x) 0.8 0.4 -3 -2 -1 0.4 -3 -2 -1 0.2 0.2 0.0 0.0 −5 −4 −3 −2 −1 0 x 1 2 3 4 5 http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg −5 −4 −3 −2 −1 0 x 1 2 3 4 5 http://en.wikipedia.org/wiki/File:Normal_Distribution_CDF.svg Note that the cdf has an elongated “S” shape, referred to as an ogive. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 9 Univariate Normal Affine Transformations Affine Transformations of Normal (Univariate) Suppose that X ∼ N(µ, σ 2 ) and a, b ∈ R with a 6= 0. If we define Y = aX + b, then Y ∼ N(aµ + b, a2 σ 2 ). Suppose that X ∼ N(1, 2). Determine the distributions of. . . Y =X +3 Y = 2X + 3 Y = 3X + 2 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 10 Univariate Normal Affine Transformations Affine Transformations of Normal (Univariate) Suppose that X ∼ N(µ, σ 2 ) and a, b ∈ R with a 6= 0. If we define Y = aX + b, then Y ∼ N(aµ + b, a2 σ 2 ). Suppose that X ∼ N(1, 2). Determine the distributions of. . . Y =X +3 =⇒ Y ∼ N(1(1) + 3, 12 (2)) ≡ N(4, 2) Y = 2X + 3 =⇒ Y ∼ N(2(1) + 3, 22 (2)) ≡ N(5, 8) Y = 3X + 2 =⇒ Y ∼ N(3(1) + 2, 32 (2)) ≡ N(5, 18) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 10 Univariate Normal Parameter Estimation Likelihood Function Suppose that x = (x1 , . . . , xn ) is an iid sample of data from a normal iid distribution with mean µ and variance σ 2 , i.e., xi ∼ N(µ, σ 2 ). The likelihood function for the parameters (given the data) has the form n Y n Y (xi − µ)2 √ L(µ, σ |x) = f (xi ) = exp − 2 2σ 2 2πσ i=1 i=1 2 1 and the log-likelihood function is given by LL(µ, σ 2 |x) = n X i=1 n n 1 X n (xi −µ)2 log(f (xi )) = − log(2π)− log(σ 2 )− 2 2 2 2σ Nathaniel E. Helwig (U of Minnesota) i=1 Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 11 Univariate Normal Parameter Estimation Maximum Likelihood Estimate of the Mean The MLE of the mean is the value of µ that minimizes n X (xi − µ) = i=1 where x̄ = (1/n) Pn 2 i=1 xi n X xi2 − 2nx̄µ + nµ2 i=1 is the sample mean. Taking the derivative with respect to µ we find that P ∂ ni=1 (xi − µ)2 = −2nx̄ + 2nµ ←→ ∂µ x̄ = µ̂ i.e., the sample mean x̄ is the MLE of the population mean µ. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 12 Univariate Normal Parameter Estimation Maximum Likelihood Estimate of the Variance The MLE of the variance is the value of σ 2 that minimizes Pn n x 2 nx̄ 2 n n 1 X 2 2 2 (xi − µ̂) = log(σ ) + i=12 i − 2 log(σ ) + 2 2 2 2σ 2σ 2σ i=1 where x̄ = (1/n) Pn i=1 xi is the sample mean. Taking the derivative with respect to σ 2 we find that ∂ n2 log(σ 2 ) + 1 2σ 2 Pn ∂σ 2 i=1 (xi − µ̂)2 = n n 1 X − (xi − µ̂)2 2σ 2 2σ 4 i=1 which implies that the sample variance σ̂ 2 = (1/n) MLE of the population variance σ 2 . Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Pn i=1 (xi − x̄)2 is the Updated 17-Jan-2017 : Slide 13 Bivariate Normal Bivariate Normal Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 14 Bivariate Normal Distribution Form Normal Density Function (Bivariate) Given two variables x, y ∈ R, the bivariate normal pdf is h n io (y −µy )2 2ρ(x−µx )(y −µy ) (x−µx )2 1 + − exp − 2(1−ρ 2) 2 2 σx σy σx σ p y f (x, y ) = 2 2πσx σy 1 − ρ (5) where µx ∈ R and µy ∈ R are the marginal means σx ∈ R+ and σy ∈ R+ are the marginal standard deviations 0 ≤ |ρ| < 1 is the correlation coefficient X and Y are marginally normal: X ∼ N(µx , σx2 ) and Y ∼ N(µy , σy2 ) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 15 Bivariate Normal Distribution Form 0.10 4 0.12 √ Example: µx = µy = 0, σx2 = 1, σy2 = 2, ρ = 0.6/ 2 0 y 0.04 0.04 0.06 −2 f (x, y ) 0.08 0.06 0.10 f (x, y ) 0.08 2 0.12 4 0.02 0 x 2 4 −4 y 0.00 −2 −2 0.02 2 0 −4 −4 −4 −2 0 2 4 x http://en.wikipedia.org/wiki/File:MultivariateNormal.png Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 16 Bivariate Normal Distribution Form Example: Different Means µx = −1, µy = −1 −2 0 2 4 x −4 −2 0 2 4 x 0.08 0.06 f (x, y ) 0.04 0 0.02 −2 0.00 −4 y 2 0.10 0.08 0.06 0.04 0.02 0.00 f (x, y ) 2 0 y −2 −4 0.08 f (x, y ) 0.06 0.04 0.02 0.00 −4 −4 0.10 4 0.12 4 0.10 4 2 0 −2 y 0.12 µx = 1, µy = 2 0.12 µx = 0, µy = 0 −4 −2 0 2 4 x √ Note: for all three plots σx2 = 1, σy2 = 2, and ρ = 0.6/ 2. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 17 Bivariate Normal Distribution Form Example: Different Correlations ρ=0 0 2 4 x −4 −2 0 2 4 x 0.15 0.10 0.05 f (x, y ) 0 −2 0.00 −4 y 2 0.08 0.06 0.04 0.02 0.00 f (x, y ) 2 0 y −2 −4 0.08 f (x, y ) 0.06 0.04 0.02 0.00 −2 4 0.10 4 0.10 4 2 0 y −2 −4 −4 0.20 ρ = 1.2/ 2 0.12 ρ = −0.6/ 2 −4 −2 0 2 4 x Note: for all three plots µx = µy = 0, σx2 = 1, and σy2 = 2. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 18 Bivariate Normal Distribution Form Example: Different Variances σy = 1 σy = 2 −2 0 2 4 x 0.08 4 −4 −2 0 2 4 x 0.06 0.04 0.02 f (x, y ) 2 0.00 y 0 −2 −4 0.08 0.06 0.04 0.02 0.00 −4 −4 f (x, y ) 2 0 y −2 0.10 0.05 0.00 −4 f (x, y ) 0 −2 y 2 0.15 0.10 4 4 0.12 σy = 2 −4 −2 0 2 4 x Note: for all three plots µx = µy = 0, σx2 = 1, and ρ = 0.6/(σx σy ). Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 19 Bivariate Normal Probability Calculations Probabilities and Multiple Integration Probabilities still relate to the area under the pdf: Z bx Z by P([ax ≤ X ≤ bx ] and [ay ≤ Y ≤ by ]) = f (x, y )dy dx ax where RR (6) ay f (x, y )dy dx denotes the multiple integral of the pdf f (x, y ). Defining z = (x, y ), we can still define the cdf: F (z) = P(X ≤ x and Y ≤ y ) Z x Z y = f (u, v )dv du −∞ Nathaniel E. Helwig (U of Minnesota) (7) −∞ Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 20 Bivariate Normal Probability Calculations Normal Distribution Functions (Bivariate) 4 −4 −2 0 2 0.8 0.6 0.4 0.2 −4 y 0 −2 F (x, y ) 2 0.08 0.02 4 0.0 0.00 −4 f (x, y ) 0.06 0.04 0 −2 y 2 0.10 4 0.12 1.0 Helpful figures of bivariate normal pdf and cdf: −4 x −2 0 2 4 x √ Note: µx = µy = 0, σx2 = 1, σy2 = 2, and ρ = 0.6/ 2 Note that the cdf still has an ogive shape (now in two-dimensions). Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 21 Bivariate Normal Affine Transformations Affine Transformations of Normal (Bivariate) Given z = (x, y )0 , suppose that z ∼ N(µ, Σ) where µ = (µx , µy )0 is the 2 × 1 mean vector 2 σx ρσx σy Σ= is the 2 × 2 covariance matrix ρσx σy σy2 a11 a12 b1 0 0 Let A = and b = with A 6= 02×2 = . a21 a22 b2 0 0 If we define w = Az + b, then w ∼ N(Aµ + b, AΣA0 ). Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 22 Bivariate Normal Conditional Distributions Conditional Normal (Bivariate) The conditional distribution of a variable Y given X = x is fY |X (y |X = x) = fXY (x, y ) fX (x) (8) where fXY (x, y ) is the joint pdf of X and Y fX (x) is the marginal pdf of X In the bivariate normal case, we have that Y |X ∼ N(µ∗ , σ∗2 ) (9) σ where µ∗ = µy + ρ σyx (x − µx ) and σ∗2 = σy2 (1 − ρ2 ) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 23 Bivariate Normal Conditional Distributions Derivation of Conditional Normal To prove Equation (9), simply write out the definition and simplify: fY |X (y |X = x) = fXY (x, y ) fX (x) ( − exp " (x−µx )2 σx2 1 2(1−ρ2 ) = + exp " ( (x−µx )2 σx2 1 − 2(1−ρ2 ) exp = ( exp = σ2 ρ2 y2 σx + (y −µy )2 σy2 − 2ρ(x−µx )(y −µy ) σx σy 1 2σy2 (1−ρ2 ) p / 2πσx σy 1 − ρ2 √ (x−µx )2 − / σx 2π 2 2σx (y −µy )2 σy2 √ 2πσy p − 2ρ(x−µx )(y −µy ) σx σy # + (x−µx )2 2σx2 ) 1 − ρ2 " − #) #) σ (x − µx )2 + (y − µy )2 − 2ρ σy (x − µx )(y − µy ) x p √ 2πσy 1 − ρ2 ( ) h i2 σ exp − 2 1 2 y − µy − ρ σy (x − µx ) x 2σy (1−ρ ) = p √ 2πσy 1 − ρ2 which completes the proof. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 24 Bivariate Normal Conditional Distributions Statistical Independence for Bivariate Normal Two variables X and Y are statistically independent if fXY (x, y ) = fX (x)fY (y ) (10) where fXY (x, y ) is joint pdf, and fX (x) and fY (y ) are marginals pdfs. Note that if X and Y are independent, then fY |X (y |X = x) = fX (x)fY (y ) fXY (x, y ) = = fY (y ) fX (x) fX (x) (11) so conditioning on X = x does not change the distribution of Y . If X and Y are bivariate normal, what is the necessary and sufficient condition for X and Y to be independent? Hint: see Equation (9) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 25 Bivariate Normal Conditional Distributions Example #1 A statistics class takes two exams X (Exam 1) and Y (Exam 2) where the scores follow a bivariate normal distribution with parameters: µx = 70 and µy = 60 are the marginal means σx = 10 and σy = 15 are the marginal standard deviations ρ = 0.6 is the correlation coefficient Suppose we select a student at random. What is the probability that. . . (a) the student scores over 75 on Exam 2? (b) the student scores over 75 on Exam 2, given that the student scored X = 80 on Exam 1? (c) the sum of his/her Exam 1 and Exam 2 scores is over 150? (d) the student did better on Exam 1 than Exam 2? (e) P(5X − 4Y > 150)? Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 26 Bivariate Normal Conditional Distributions Example #1: Part (a) Answer for 1(a): Note that Y ∼ N(60, 152 ), so the probability that the student scores over 75 on Exam 2 is 75 − 60 P(Y > 75) = P Z > 15 = P(Z > 1) = 1 − P(Z < 1) = 1 − Φ(1) = 1 − 0.8413447 = 0.1586553 Rx 2 where Φ(x) = −∞ f (z)dz with f (x) = √1 e−x /2 denoting the standard 2π normal pdf (see R code for use of pnorm to calculate this quantity). Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 27 Bivariate Normal Conditional Distributions Example #1: Part (b) Answer for 1(b): Note that (Y |X = 80) ∼ N(µ∗ , σ∗2 ) where µ∗ = µY + ρ σσYX (x − µX ) = 60 + (0.6)(15/10)(80 − 70) = 69 σ∗2 = σY2 (1 − ρ2 ) = 152 (1 − 0.62 ) = 144 If a student scored X = 80 on Exam 1, the probability that the student scores over 75 on Exam 2 is 75 − 69 P(Y > 75|X = 80) = P Z > 12 = P (Z > 0.5) = 1 − Φ(0.5) = 1 − 0.6914625 = 0.3085375 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 28 Bivariate Normal Conditional Distributions Example #1: Part (c) Answer for 1(c): Note that (X + Y ) ∼ N(µ∗ , σ∗2 ) where µ∗ = µX + µY = 70 + 60 = 130 σ∗2 = σX2 + σY2 + 2ρσX σY = 102 + 152 + 2(0.6)(10)(15) = 505 The probability that the sum of Exam 1 and Exam 2 is above 150 is 150 − 130 P(X + Y > 150) = P Z > √ 505 = P (Z > 0.8899883) = 1 − Φ(0.8899883) = 1 − 0.8132639 = 0.1867361 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 29 Bivariate Normal Conditional Distributions Example #1: Part (d) Answer for 1(d): Note that (X − Y ) ∼ N(µ∗ , σ∗2 ) where µ∗ = µX − µY = 70 − 60 = 10 σ∗2 = σX2 + σY2 − 2ρσX σY = 102 + 152 − 2(0.6)(10)(15) = 145 The probability that the student did better on Exam 1 than Exam 2 is P(X > Y ) = P(X − Y > 0) 0 − 10 =P Z > √ 145 = P (Z > −0.8304548) = 1 − Φ(−0.8304548) = 1 − 0.2031408 = 0.7968592 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 30 Bivariate Normal Conditional Distributions Example #1: Part (e) Answer for 1(e): Note that (5X − 4Y ) ∼ N(µ∗ , σ∗2 ) where µ∗ = 5µX − 4µY = 5(70) − 4(60) = 110 σ∗2 = 52 σX2 + (−4)2 σY2 + 2(5)(−4)ρσX σY = 25(102 ) + 16(152 ) − 2(20)(0.6)(10)(15) = 2500 Thus, the needed probability can be obtained using 150 − 110 P(5X − 4Y > 150) = P Z > √ 2500 = P (Z > 0.8) = 1 − Φ(0.8) = 1 − 0.7881446 = 0.2118554 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 31 Bivariate Normal Conditional Distributions Example #1: R Code # Example 1a > pnorm(1,lower=F) [1] 0.1586553 > pnorm(75,mean=60,sd=15,lower=F) [1] 0.1586553 # Example 1d > pnorm(-10/sqrt(145),lower=F) [1] 0.7968592 > pnorm(0,mean=10,sd=sqrt(145),lower=F) [1] 0.7968592 # Example 1b > pnorm(0.5,lower=F) [1] 0.3085375 > pnorm(75,mean=69,sd=12,lower=F) [1] 0.3085375 # Example 1e > pnorm(0.8,lower=F) [1] 0.2118554 > pnorm(150,mean=110,sd=50,lower=F) [1] 0.2118554 # Example 1c > pnorm(20/sqrt(505),lower=F) [1] 0.1867361 > pnorm(150,mean=130,sd=sqrt(505),lower=F) [1] 0.1867361 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 32 Multivariate Normal Multivariate Normal Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 33 Multivariate Normal Distribution Form Normal Density Function (Multivariate) Given x = (x1 , . . . , xp )0 with xj ∈ R ∀j, the multivariate normal pdf is 1 1 0 −1 exp − (x − µ) Σ (x − µ) f (x) = (12) 2 (2π)p/2 |Σ|1/2 where µ = (µ1 , . . . , µp )0  σ11 σ12 σ21 σ22  Σ= . ..  .. . is the p × 1 mean vector  · · · σ1p · · · σ2p   ..  is the p × p covariance matrix .. . .  σp1 σp2 · · · σpp Write x ∼ N(µ, Σ) or x ∼ Np (µ, Σ) to denote x is multivariate normal. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 34 Multivariate Normal Distribution Form Some Multivariate Normal Properties The mean and covariance parameters have the following restrictions: µj ∈ R for all j σjj > 0 for all j √ σij = ρij σii σjj where ρij is correlation between Xi and Xj σij2 ≤ σii σjj for any i, j ∈ {1, . . . , p} (Cauchy-Schwarz) Σ is assumed to be positive definite so that Σ−1 exists. Marginals are normal: Xj ∼ N(µj , σjj ) for all j ∈ {1, . . . , p}. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 35 Multivariate Normal Probability Calculations Multivariate Normal Probabilities Probabilities still relate to the area under the pdf: Z b1 P(aj ≤ Xj ≤ bj ∀j) = Z a1 where R ··· R bp ··· f (x)dxp · · · dx1 (13) ap f (x)dxp · · · dx1 denotes the multiple integral f (x). We can still define the cdf of x = (x1 , . . . , xp )0 F (x) = P(Xj ≤ xj ∀j) Z x1 Z xp ··· f (u)dup · · · du1 = −∞ Nathaniel E. Helwig (U of Minnesota) (14) −∞ Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 36 Multivariate Normal Affine Transformations Affine Transformations of Normal (Multivariate) Suppose that x = (x1 , . . . , xp )0 and that x ∼ N(µ, Σ) where µ = {µj }p×1 is the mean vector Σ = {σij }p×p is the covariance matrix Let A = {aij }n×p and b = {bi }n×1 with A 6= 0n×p . If we define w = Ax + b, then w ∼ N(Aµ + b, AΣA0 ). Note: linear combinations of normal variables are normally distributed. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 37 Multivariate Normal Conditional Distributions Multivariate Conditional Distributions Given variables x = (x1 , . . . , xp )0 and y = (y1 , . . . , yq )0 , we have fY |X (y|X = x) = fXY (x, y) fX (x) (15) where fY |X (y|X = x) is the conditional distribution of y given x fXY (x, y) is the joint pdf of x and y fX (x) is the marginal pdf of x Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 38 Multivariate Normal Conditional Distributions Conditional Normal (Multivariate) Suppose that z ∼ N(µ, Σ) where z = (x0 , y0 )0 = (x1 , . . . , xp , y1 , . . . , yq )0 µ = (µ0x , µ0y )0 = (µ1x , . . . , µpx , µ1y , . . . , µqy )0 Note: µx is mean vector of x, and µy is mean vector of y Σxx Σxy Σ= where (Σxx )p×p , (Σyy )q×q , and (Σxy )p×q , Σ0xy Σyy Note: Σxx is covariance matrix of x, Σyy is covariance matrix of y, and Σxy is covariance matrix of x and y In the multivariate normal case, we have that y|x ∼ N(µ∗ , Σ∗ ) (16) 0 −1 where µ∗ = µy + Σ0xy Σ−1 xx (x − µx ) and Σ∗ = Σyy − Σxy Σxx Σxy Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 39 Multivariate Normal Conditional Distributions Statistical Independence for Multivariate Normal Using Equation (16), we have that y|x ∼ N(µ∗ , Σ∗ ) ≡ N(µy , Σyy ) (17) if and only if Σxy = 0p×q (a matrix of zeros). Note that Σxy = 0p×q implies that the p elements of x are uncorrelated with the q elements of y. For multivariate normal variables: uncorrelated → independent For non-normal variables: uncorrelated 6→ independent Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 40 Multivariate Normal Conditional Distributions Example #2 Each Delicious Candy Company store makes 3 size candy bars: regular (X1 ), fun size (X2 ), and big size (X3 ). Assume the weight (in ounces) of the candy bars (X1 , X2 , X3 ) follow a multivariate normal distribution with parameters:     5 4 −1 0 µ = 3 and Σ = −1 4 2 7 0 2 9 Suppose we select a store at random. What is the probability that. . . (a) the weight of a regular candy bar is greater than 8 oz? (b) the weight of a regular candy bar is greater than 8 oz, given that the fun size bar weighs 1 oz and the big size bar weighs 10 oz? (c) P(4X1 − 3X2 + 5X3 < 63)? Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 41 Multivariate Normal Conditional Distributions Example #2: Part (a) Answer for 2(a): Note that X1 ∼ N(5, 4) So, the probability that the regular bar is more than 8 oz is 8−5 P(X1 > 8) = P Z > 2 = P(Z > 1.5) = 1 − Φ(1.5) = 1 − 0.9331928 = 0.0668072 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 42 Multivariate Normal Conditional Distributions Example #2: Part (b) Answer for 2(b): (X1 |X2 = 1, X3 = 10) is normally distributed, see Equation (16). The conditional mean of (X1 |X2 = 1, X3 = 10) is given by µ∗ = µX1 + Σ012 Σ−1 22 (x̃ − µ̃) 4 2 −1 1 − 3 = 5 + −1 0 2 9 10 − 7 1 9 −2 −2 = 5 + −1 0 3 32 −2 4 = 5 + 24/32 = 5.75 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 43 Multivariate Normal Conditional Distributions Example #2: Part (b) continued Answer for 2(b) continued: The conditional variance of (X1 |X2 = 1, X3 = 10) is given by σ∗2 = σX2 1 − Σ012 Σ−1 22 Σ12 4 2 −1 −1 = 4 − −1 0 2 9 0 1 9 −2 −1 = 4 − −1 0 0 32 −2 4 = 4 − 9/32 = 3.71875 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 44 Multivariate Normal Conditional Distributions Example #2: Part (b) continued Answer for 2(b) continued: So, if the fun size bar weighs 1 oz and the big size bar weighs 10 oz, the probability that the regular bar is more than 8 oz is 8 − 5.75 P(X1 > 8|X2 = 1, X3 = 10) = P Z > √ 3.71875 = P(Z > 1.166767) = 1 − Φ(1.166767) = 1 − 0.8783477 = 0.1216523 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 45 Multivariate Normal Conditional Distributions Example #2: Part (c) Answer for 2(c): (4X1 − 3X2 + 5X3 ) is normally distributed. The expectation of (4X1 − 3X2 + 5X3 ) is given by µ∗ = 4µX1 − 3µX2 + 5µX3 = 4(5) − 3(3) + 5(7) = 46 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 46 Multivariate Normal Conditional Distributions Example #2: Part (c) continued Answer for 2(c) continued: The variance of (4X1 − 3X2 + 5X3 ) is given by   4 2  σ∗ = 4 −3 5 Σ −3 5    4 −1 0 4    −1 4 2 −3 = 4 −3 5 0 2 9 5   19 = 4 −3 5 −6 39 = 289 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 47 Multivariate Normal Conditional Distributions Example #2: Part (c) continued Answer for 2(c) continued: So, the needed probability can be obtained as 63 − 46 P(4X1 − 3X2 + 5X3 < 63) = P Z < √ 289 = P(Z < 1) = Φ(1) = 0.8413447 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 48 Multivariate Normal Conditional Distributions Example #2: R Code # Example 2a > pnorm(1.5,lower=F) [1] 0.0668072 > pnorm(8,mean=5,sd=2,lower=F) [1] 0.0668072 # Example 2b > pnorm(2.25/sqrt(119/32),lower=F) [1] 0.1216523 > pnorm(8,mean=5.75,sd=sqrt(119/32),lower=F) [1] 0.1216523 # Example 2c > pnorm(1) [1] 0.8413447 > pnorm(63,mean=46,sd=17) [1] 0.8413447 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 49 Multivariate Normal Parameter Estimation Likelihood Function Suppose that xi = (xi1 , . . . , xip ) is a sample from a normal distribution iid with mean vector µ and covariance matrix Σ, i.e., xi ∼ N(µ, Σ). The likelihood function for the parameters (given the data) has the form L(µ, Σ|X) = n Y f (xi ) = i=1 n Y i=1 1 (2π)p/2 |Σ|1/2 1 exp − (xi − µ)0 Σ−1 (xi − µ) 2 and the log-likelihood function is given by n LL(µ, Σ|X) = − np n 1X log(2π) − log(|Σ|) − (xi − µ)0 Σ−1 (xi − µ) 2 2 2 i=1 Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 50 Multivariate Normal Parameter Estimation Maximum Likelihood Estimate of Mean Vector The MLE of the mean vector is the value of µ that minimizes n X (xi − µ)0 Σ−1 (xi − µ) = i=1 n X x0i Σ−1 xi − 2nx̄0 Σ−1 µ + nµ0 Σ−1 µ i=1 where x̄ = (1/n) Pn i=1 xi is the sample mean vector. Taking the derivative with respect to µ we find that ∂LL(µ, Σ|X) = −2nΣ−1 x̄ + 2nΣ−1 µ ∂µ ←→ x̄ = µ̂ The sample mean vector x̄ is the MLE of the population mean µ vector. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 51 Multivariate Normal Parameter Estimation Maximum Likelihood Estimate of Covariance Matrix The MLE of the covariance matrix is the value of Σ that minimizes −n log(|Σ−1 |) + n X tr{Σ−1 (xi − µ̂)(xi − µ̂)0 } i=1 where µ̂ = x̄ = (1/n) Pn i=1 xi is the sample mean. Taking the derivative with respect to Σ−1 we find that n X ∂LL(µ, Σ|X) = −nΣ + (xi − µ̂)(xi − µ̂)0 ∂Σ−1 i=1 P i.e., the sample covariance matrix Σ̂ = (1/n) ni=1 (xi − x̄)(xi − x̄)0 is the MLE of the population covariance matrix Σ. Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 52 Sampling Distributions Sampling Distributions Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 53 Sampling Distributions Univariate Case Univariate Sampling Distributions: x̄ and s2 In the univariate normal case, we have that P x̄ = (1/n) ni=1 xi ∼ N(µ, σ 2 /n) P (n − 1)s2 = ni=1 (xi − x̄)2 ∼ σ 2 χ2n−1 χ2k denotes a chi-square variable with k degrees of freedom. σ 2 χ2k = Pk 2 i=1 zi iid where zi ∼ N(0, σ 2 ) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 54 Sampling Distributions Multivariate Case Multivariate Sampling Distributions: x̄ and S In the multivariate normal case, we have that P x̄ = (1/n) ni=1 xi ∼ N(µ, Σ/n) P (n − 1)S = ni=1 (xi − x̄)(xi − x̄)0 ∼ Wn−1 (Σ) Wk (Σ) denotes a Wishart variable with k degrees of freedom. Wk (Σ) = Pk 0 i=1 zi zi iid where zi ∼ N(0p , Σ) Nathaniel E. Helwig (U of Minnesota) Introduction to Normal Distribution Updated 17-Jan-2017 : Slide 55

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Introduction to Normal Distribution