Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Program Faculty of Life Sciences • The normal distribution (continued) • Density function The Normal distribution • Comparing data with the normal distribution Ib Skovgaard & Claus Ekstrøm • The standard normal distribution • Scaling and standardization E-mail: [email protected] • The normal distribution in R • Sums of normals are again normal • Normal probability calculations • Probabilities of centered intervals • Summary of main points Slide 2 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution The normal (or Gaussian) distribution Why the normal distribution? The normal distribution is used to model residual variation The normal distribution 95 100 Linear regression • has nice mathematical properties ● • arises as a sum of “random errors” from several sources of Digestibility % 75 80 85 90 ● ● ● variation. This is the result called the Central Limit Theorem (CLT): sum or mean of many terms resemble a normal distribution. ● ● 70 ● ● Con Alp Enr Fen Ive Spi 0 5 10 15 20 25 Stearic acid % One sample: 96 • often fit (biological) data ● 65 Organic material 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 One-way ANOVA 119 119 Blood pressure 108 126 128 Slide 3 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution 30 35 Also called the Gaussian distribution after Carl Friedrich Gauss — German mathematician and physicist, 1777–1855. 110 105 94 Slide 4 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution Weights of crabs Density and probabilities 0.15 • Weights of 162 crabs of a Density for the normal distribution with mean µ and standard deviation σ : 1 1 f (y ) = √ exp − 2 (y − µ)2 2σ 2πσ 2 certain age: y1 , . . . , y162 . Density 0.10 • R: ȳ = 12.76, s = 2.25 • Histogram normalized to a 0.00 density of the normal distribution 8 f (y ) = √ 2π · 2.252 exp − 10 12 14 16 Weight 1 (y − 12.76)2 2 · 2.252 18 20 Density • Curve for f , where f is the 1 b 0.05 have total area 1 The probability (= population frequency) of the interval between a and b is equal to the area under the density curve within that interval: Z b P(a < Y < b) = f (y ) dy a The curve fits nicely to the histogram. Slide 5 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution Slide 6 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution Are the data normally distributed? Scaling and standardization Weight of 162 crabs: y1 , . . . , y162 . Sample mean is ȳ = 12.76 and standard deviation is s = 2.25. Histogram and density QQ-plot 0.15 20 ● 0.05 Density 0.10 Sample Quantiles 10 12 14 16 18 ● ● ●● ● ●● ●● 0.00 8 ● ●● ●● • Scaling preserves normality: If Y is normal then a + b · Y is normal with the “natural” mean a + bµ and standard deviation |b| σ . ● ●●●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●●● ●● ●●● ●● ●● • Standardization: In particular, the standardized version, Z= Y −µ σ has mean zero and standard deviation one. ● ● 8 10 12 14 16 Weight 18 20 −2 −1 0 1 Theoretical Quantiles 2 • Histogram compared with the density for N(ȳ , s 2 ) • QQ-plot: quantile-quantile plot. Compare with a straight line with intercept ȳ and slope σ . R: qqnorm(wgt); qqline(wgt) or abline(..) Slide 7 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution Slide 8 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution Standard normal distribution 1.0 Distribution function 0.8 0.90 1 1 φ (z) = √ exp − z 2 2 2π Cdf (Φ) 0.4 0.6 0.05 0.0 0.0 0.05 0.2 0.1 Density of N(0, 1): Density 0.4 Probabilities from N(µ, σ 2 ) may be converted to probabilities from N(0, 1). For example, Y −µ a−µ a−µ a−µ P(Y ≤ a) = P ≤ =P Z ≤ =Φ σ σ σ σ Density (φ) 0.2 0.3 The standard normal distribution −4 −2 0 z 2 4 −4 −2 0 z 2 4 Distribution function — “area to the left of z”: Φ(z) = P(Z ≤ z) = Z The 95%-quantile is 1.6449: Φ(1.6449) = P(Z ≤ 1.6449) = 0.95 P(−1.6449 ≤ Z ≤ 1.6449) = z −∞ φ (x) dx Slide 10 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution Slide 9 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution Standard normal distribution Normal distribution function in R The distribution function for N(µ, σ ) is Density Distribution function Φµ,σ (x) = P(Y ≤ x), 1.0 0.4 which in R is the function pnorm(..): 0.05 Examples (crab weights): 0.0 0.0 0.05 > pnorm(x, mean= mu, sd=sigma) 0.2 0.1 Cdf (Φ) 0.4 0.6 Density (φ) 0.2 0.3 0.8 0.90 −4 −2 0 z 2 4 The 97.5%-quantile is 1.960: Φ(1.960) = P(Z ≤ 1.960) = 0.975 P(−1.96 ≤ Z ≤ 1.96) = Slide 11 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution −4 −2 0 z 2 4 > pnorm(14, mean= 12.76, sd=2.25) [1] 0.7092212 # Prob. of a value less or equal to 14 > pnorm(14, mean= 12.76, sd=2.25, lower.tail=FALSE) [1] 0.2917888 # Prob. of a value greater than 14 Slide 12 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution Normal distribution quantiles in R R: The standard normal distribution Quantiles in N(µ, σ ) are values of the inverse distribution function. In R it is written In pnorm(..) and qnorm(..) in R, mu=0 and sigma=1 are default values, corresponding to the standard normal distribution. > qnorm(q, mean= mu, sd=sigma) Example (crab weights): # 0.75 = Prob. of a value less or equal to ? > qnorm(0.75, mean= 12.76, sd=2.25) [1] 14.27760 # Answer: 75% of the population has values below 14.28 Slide 13 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution Sum of normally distributed variables • Distribution function: pnorm, fx. > pnorm(1.6449) [1] 0.9500048 • Quantiles: qnorm, fx. > qnorm(0.975) [1] 1.959964 Compare with the table in Appendix C2, p. 400 in the book. Slide 14 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution N-probabilities Crab data: assume that crab weight ∼ N(12.76, 2.252 ) What is the probability that Sum of normals is again normal: • a randomly selected crab weighs at most 14 gram? If Y1 and Y2 are independent and both normal then their sum is again normal with • en a randomly selected crab weighs at least 10 gram? • mean of sum = sum of means, • variance of sum = sum of variances, • BUT NOT standard deviation of sum = sum of standard deviations, This holds also for the sum of more than two variables. Slide 15 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution • a randomly selected crab weighs between 10 and 14 gram? • the sum of the weights of two randomly selected crabs is at least 26 gram? > pnorm(1.227) [1] 0.8900887 > pnorm(-1.227) [1] 0.1099113 > pnorm(0.151) [1] 0.5600121 Slide 16 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution > pnorm(0.5511) [1] 0.7092174 > pnorm(-0.5511) [1] 0.2907826 Probabilities of centered intervals Chapter 4 summary: main points For any normal distribution: Density 99.7% 95% 68% σ • Comparing data with a normal distribution • Compare histogram with the density of N(ȳ , s 2 ). • Compare the QQ-plot with a straight line P(µ − σ < Y < µ + σ ) = P (−1 < Z < 1) = 0.68 • Linear transformation of a normal is again normal • Standardization and the standard normal distribution, N(0, 1) • Standardization: Z = Y σ−µ ∼ N(0, 1) • • N-probabilities calculated from N(0, 1) • R: Normal probability functions • Sums of normal variables are again normal. • Intervals µ ± σ , µ ± 2 · σ or µ ± 3 · σ • 68% of the population is within µ ± σ • 95% of the population is within µ ± 2 · σ • 99.7% of the population is within µ ± 3 · σ Slide 17 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution Slide 18 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution