Download The Normal distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Program
Faculty of Life Sciences
• The normal distribution (continued)
• Density function
The Normal distribution
• Comparing data with the normal distribution
Ib Skovgaard & Claus Ekstrøm
• The standard normal distribution
• Scaling and standardization
E-mail: [email protected]
• The normal distribution in R
• Sums of normals are again normal
• Normal probability calculations
• Probabilities of centered intervals
• Summary of main points
Slide 2 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
The normal (or Gaussian) distribution
Why the normal distribution?
The normal distribution is used to model residual variation
The normal distribution
95 100
Linear regression
• has nice mathematical properties
●
• arises as a sum of “random errors” from several sources of
Digestibility %
75 80 85 90
●
●
●
variation. This is the result called the Central Limit Theorem
(CLT): sum or mean of many terms resemble a normal
distribution.
●
●
70
●
●
Con
Alp
Enr
Fen
Ive
Spi
0
5
10 15 20 25
Stearic acid %
One sample:
96
• often fit (biological) data
●
65
Organic material
2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1
One-way ANOVA
119
119
Blood pressure
108 126 128
Slide 3 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
30
35
Also called the Gaussian distribution after
Carl Friedrich Gauss — German mathematician and physicist,
1777–1855.
110
105
94
Slide 4 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
Weights of crabs
Density and probabilities
0.15
• Weights of 162 crabs of a
Density for the normal distribution with mean µ and standard
deviation σ :
1
1
f (y ) = √
exp − 2 (y − µ)2
2σ
2πσ 2
certain age: y1 , . . . , y162 .
Density
0.10
• R: ȳ = 12.76, s = 2.25
• Histogram normalized to
a
0.00
density of the normal
distribution
8
f (y ) = √
2π · 2.252
exp −
10
12
14 16
Weight
1
(y − 12.76)2
2 · 2.252
18
20
Density
• Curve for f , where f is the
1
b
0.05
have total area 1
The probability (= population
frequency) of the interval
between a and b is equal to the
area under the density curve
within that interval:
Z
b
P(a < Y < b) =
f (y ) dy
a
The curve fits nicely to the histogram.
Slide 5 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
Slide 6 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
Are the data normally distributed?
Scaling and standardization
Weight of 162 crabs: y1 , . . . , y162 . Sample mean is ȳ = 12.76 and
standard deviation is s = 2.25.
Histogram and density
QQ-plot
0.15
20
●
0.05
Density
0.10
Sample Quantiles
10 12 14 16 18
●
●
●●
●
●●
●●
0.00
8
● ●●
●●
• Scaling preserves normality: If Y is normal then a + b · Y is
normal with the “natural” mean a + bµ and standard
deviation |b| σ .
●
●●●●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●●
●●
●●●
●●
●●
• Standardization: In particular, the standardized version,
Z=
Y −µ
σ
has mean zero and standard deviation one.
●
●
8
10
12
14 16
Weight
18
20
−2
−1
0
1
Theoretical Quantiles
2
• Histogram compared with the density for N(ȳ , s 2 )
• QQ-plot: quantile-quantile plot. Compare with a straight line
with intercept ȳ and slope σ .
R: qqnorm(wgt); qqline(wgt) or abline(..)
Slide 7 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
Slide 8 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
Standard normal distribution
1.0
Distribution function
0.8
0.90
1
1
φ (z) = √ exp − z 2
2
2π
Cdf (Φ)
0.4
0.6
0.05
0.0
0.0
0.05
0.2
0.1
Density of N(0, 1):
Density
0.4
Probabilities from N(µ, σ 2 ) may be converted to probabilities from
N(0, 1). For example,
Y −µ
a−µ
a−µ
a−µ
P(Y ≤ a) = P
≤
=P Z ≤
=Φ
σ
σ
σ
σ
Density (φ)
0.2
0.3
The standard normal distribution
−4
−2
0
z
2
4
−4
−2
0
z
2
4
Distribution function — “area to the left of z”:
Φ(z) = P(Z ≤ z) =
Z
The 95%-quantile is 1.6449:
Φ(1.6449) = P(Z ≤ 1.6449) = 0.95
P(−1.6449 ≤ Z ≤ 1.6449) =
z
−∞
φ (x) dx
Slide 10 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
Slide 9 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
Standard normal distribution
Normal distribution function in R
The distribution function for N(µ, σ ) is
Density
Distribution function
Φµ,σ (x) = P(Y ≤ x),
1.0
0.4
which in R is the function pnorm(..):
0.05
Examples (crab weights):
0.0
0.0
0.05
> pnorm(x, mean= mu, sd=sigma)
0.2
0.1
Cdf (Φ)
0.4
0.6
Density (φ)
0.2
0.3
0.8
0.90
−4
−2
0
z
2
4
The 97.5%-quantile is 1.960:
Φ(1.960) = P(Z ≤ 1.960) = 0.975
P(−1.96 ≤ Z ≤ 1.96) =
Slide 11 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
−4
−2
0
z
2
4
> pnorm(14, mean= 12.76, sd=2.25)
[1] 0.7092212 # Prob. of a value less or equal to 14
> pnorm(14, mean= 12.76, sd=2.25, lower.tail=FALSE)
[1] 0.2917888 # Prob. of a value greater than 14
Slide 12 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
Normal distribution quantiles in R
R: The standard normal distribution
Quantiles in N(µ, σ ) are values of the inverse distribution function.
In R it is written
In pnorm(..) and qnorm(..) in R, mu=0 and sigma=1 are
default values, corresponding to the standard normal distribution.
> qnorm(q, mean= mu, sd=sigma)
Example (crab weights):
# 0.75 = Prob. of a value less or equal to ?
> qnorm(0.75, mean= 12.76, sd=2.25)
[1] 14.27760
# Answer: 75% of the population has values below 14.28
Slide 13 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
Sum of normally distributed variables
• Distribution function: pnorm, fx.
> pnorm(1.6449)
[1] 0.9500048
• Quantiles: qnorm, fx.
> qnorm(0.975)
[1] 1.959964
Compare with the table in Appendix C2, p. 400 in the book.
Slide 14 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
N-probabilities
Crab data: assume that crab weight ∼ N(12.76, 2.252 )
What is the probability that
Sum of normals is again normal:
• a randomly selected crab weighs at most 14 gram?
If Y1 and Y2 are independent and both normal then their sum is
again normal with
• en a randomly selected crab weighs at least 10 gram?
• mean of sum = sum of means,
• variance of sum = sum of variances,
• BUT NOT standard deviation of sum = sum of standard
deviations,
This holds also for the sum of more than two variables.
Slide 15 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
• a randomly selected crab weighs between 10 and 14 gram?
• the sum of the weights of two randomly selected crabs is at
least 26 gram?
> pnorm(1.227)
[1] 0.8900887
> pnorm(-1.227)
[1] 0.1099113
> pnorm(0.151)
[1] 0.5600121
Slide 16 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
> pnorm(0.5511)
[1] 0.7092174
> pnorm(-0.5511)
[1] 0.2907826
Probabilities of centered intervals
Chapter 4 summary: main points
For any normal distribution:
Density
99.7%
95%
68%
σ
• Comparing data with a normal distribution
• Compare histogram with the density of N(ȳ , s 2 ).
• Compare the QQ-plot with a straight line
P(µ − σ < Y < µ + σ )
= P (−1 < Z < 1)
= 0.68
• Linear transformation of a normal is again normal
• Standardization and the standard normal distribution, N(0, 1)
• Standardization: Z = Y σ−µ ∼ N(0, 1)
•
• N-probabilities calculated from N(0, 1)
• R: Normal probability functions
• Sums of normal variables are again normal.
• Intervals µ ± σ , µ ± 2 · σ or µ ± 3 · σ
• 68% of the population is within µ ± σ
• 95% of the population is within µ ± 2 · σ
• 99.7% of the population is within µ ± 3 · σ
Slide 17 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
Slide 18 — Statistics for Life Science (Week 3-1 2010) — The Normal distribution
Related documents