Download The Normal Distribution - Appalachian State University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Outline
Continuous Random Variables
Normal Distribution
The Normal Distribution∗
Alan T. Arnholt
Department of Mathematical Sciences
Appalachian State University
[email protected]
Spring 2006 R Notes
∗
1
c 2006 Alan T. Arnholt
Copyright The R Script
Outline
Continuous Random Variables
Normal Distribution
Continuous Random Variables
Overview of Continuous Random Variables
Normal Distribution
Overview of Normal Distribution
The R Script
2
The R Script
Outline
Continuous Random Variables
Normal Distribution
The R Script
Continuous Random Variable
Recall that discrete random variables could only assume a
countable number of outcomes. When we have a random variable
whose set of possible values is an entire interval of numbers, we
say that X is a continuous random variable. For example, if we
randomly select a 12 ounce can of beer and measure its actual fluid
contents X, then X is a continuous random variable because any
value for X between 0 and the capacity of the beer can is possible.
3
Outline
Continuous Random Variables
Normal Distribution
The R Script
Properties of Continuous Random Variables
Continuous Probability Density Functions’ Properties
The function f (x) is a pdf for the continuous random variable X,
defined over the set of real numbers R if,
1. f (x) ≥ 0, −∞ < x < ∞.
4
Outline
Continuous Random Variables
Normal Distribution
The R Script
Properties of Continuous Random Variables
Continuous Probability Density Functions’ Properties
The function f (x) is a pdf for the continuous random variable X,
defined over the set of real numbers R if,
1. f (x) ≥ 0, −∞ < x < ∞.
2.
Z∞
f (x) dx = 1. (The total area under the probability density
−∞
curve is 1.00, which corresponds to 100%.)
5
Outline
Continuous Random Variables
Normal Distribution
The R Script
Properties of Continuous Random Variables
Continuous Probability Density Functions’ Properties
The function f (x) is a pdf for the continuous random variable X,
defined over the set of real numbers R if,
1. f (x) ≥ 0, −∞ < x < ∞.
2.
Z∞
f (x) dx = 1. (The total area under the probability density
−∞
curve is 1.00, which corresponds to 100%.)
Zb
3. P(a ≤ X ≤ b) = f (x) dx. (Area under the density curve
a
between a and b.)
6
Outline
Continuous Random Variables
Normal Distribution
The R Script
Graphical Illustration of Continuous Distribution
P(a ≤ X ≤ b)
P(X ≤ b)
f (x)
Rb
a
a
b
f (x) dx
P(X ≤ a)
f (x)
Rb
−∞
b
f (x) dx
f (x)
Ra
a
f (x) dx
−∞
Figure: Graphical illustration of P(a ≤ X ≤ b) = P(X ≤ b) − P(X ≤ a).
7
Outline
Continuous Random Variables
Normal Distribution
The R Script
Normal Distribution
• The normal or Gaussian distribution is more than likely the
most important distribution in statistical applications. This is
due to the fact that many numerical populations have
distributions that can be approximated with the normal
distribution.
8
Outline
Continuous Random Variables
Normal Distribution
The R Script
Normal Distribution
• The normal or Gaussian distribution is more than likely the
most important distribution in statistical applications. This is
due to the fact that many numerical populations have
distributions that can be approximated with the normal
distribution.
• Examples of distributions following an approximate normal
distribution include physical characteristics such as the height
and weight of a particular species. Further, certain statistics,
such as the mean, follow an approximate normal distribution
when certain conditions are satisfied.
9
Outline
Continuous Random Variables
Normal Distribution
Normal PDF
Normal Distribution X ∼ N (µ, σ)
(x−µ)2
1
e− 2σ2 , −∞ < x < ∞,
f (x) = √
2πσ 2
where − ∞ < µ < ∞, and 0 < σ < ∞.
E[X] = µ
Var[X] = σ 2
10
The R Script
Outline
Continuous Random Variables
Normal Distribution
The R Script
Three Different Normal Distributions
σ
µ
µ
µ
Figure: Three normal distributions each with an increasing σ value as
read from left to right.
11
Outline
Continuous Random Variables
Normal Distribution
The R Script
Standard Normal Distribution
A normal random variable with µ = 0 and σ = 1, often denoted Z,
is called a standard normal random variable. The cdf for the
standard normal distribution, given in (2), is computed by first
standardizing the random variable X, where X ∼ N (µ, σ), using
the change of variable formula in (1).
Z=
X −µ
∼ N (0, 1)
σ
x−µ
F (x) = P(X ≤ x) = P Z ≤
σ
1
=√
2π
(1)
Z
(x−µ)
σ
z2
e− 2 dz
−∞
(2)
12
Outline
Continuous Random Variables
Normal Distribution
The R Script
Graphical representation for computing P (a ≤ X ≤ b)
X ∼ N (µ, σ)
P(a ≤ X ≤ b)
P(X ≤ b)
f (x)
a
Rb
a
f (x)
b
P( a−µ
σ ≤Z ≤
b−µ
σ )
Rb
−∞
13
a−µ
σ
b−µ
σ
a−µ
σ
a
m
b−µ
σ )
Ra
−∞
f (x)dx
m
P(Z ≤
f (z)
b−µ
σ
f (z)dz
f (x)dx
P(Z ≤
f (z)
R
f (x)
b
f (x)dx
m
P(X ≤ a)
f (z)
a−µ
σ
b−µ
σ
R
b−µ
σ
−∞
f (z)dz
a−µ
σ )
R
a−µ
σ
−∞
f (z)dz
14
Outline
Continuous Random Variables
Normal Distribution
The R Script
Example
Scores on a particular standardized test follow a normal
distribution with a mean of 100 and standard deviation of 10.
(a) What is the probability that a randomly selected individual will
score between 90 and 115?
Outline
Continuous Random Variables
Normal Distribution
The R Script
Example
Scores on a particular standardized test follow a normal
distribution with a mean of 100 and standard deviation of 10.
(a) What is the probability that a randomly selected individual will
score between 90 and 115?
(b) What score does one need to be in the top 10%?
15
Outline
Continuous Random Variables
Normal Distribution
The R Script
Example
Scores on a particular standardized test follow a normal
distribution with a mean of 100 and standard deviation of 10.
(a) What is the probability that a randomly selected individual will
score between 90 and 115?
(b) What score does one need to be in the top 10%?
(c) Find the constant c such that P(105 ≤ X ≤ c) = 0.10.
16
Outline
Continuous Random Variables
Normal Distribution
The R Script
Solution
To find P(90 ≤ X ≤ 115), we first draw a picture representing the
desired area such as the one in Figure 4 on page 23. Note that
finding the area between 90 and 115 is equivalent to finding the
area to the left of 115 and from that area, subtracting the area to
the left of 90. In other words,
P(90 ≤ X ≤ 115) = P(X ≤ 115) − P(X ≤ 90).
To find P(X ≤ 115) and P(X ≤ 90), we standardize using (1).
That is,
115 − 100
= P(Z ≤ 1.5),
P(X ≤ 115) = P Z ≤
10
and
90 − 100
P(X ≤ 90) = P Z ≤
= P(Z ≤ −1.0).
10
17
Outline
Continuous Random Variables
Normal Distribution
The R Script
Using pnorm()
The R function pnorm() finds P(X ≤ x) given X ∼ N (µ, σ).
1. The default arguments for pnorm() are pnorm(q, mean=0,
sd=1, lower.tail = TRUE, log.p = FALSE).
18
Outline
Continuous Random Variables
Normal Distribution
The R Script
Using pnorm()
The R function pnorm() finds P(X ≤ x) given X ∼ N (µ, σ).
1. The default arguments for pnorm() are pnorm(q, mean=0,
sd=1, lower.tail = TRUE, log.p = FALSE).
2. For more information please read the help file for pnorm() by
typing ?pnorm at the R prompt.
19
Outline
Continuous Random Variables
Normal Distribution
The R Script
Using pnorm()
The R function pnorm() finds P(X ≤ x) given X ∼ N (µ, σ).
1. The default arguments for pnorm() are pnorm(q, mean=0,
sd=1, lower.tail = TRUE, log.p = FALSE).
2. For more information please read the help file for pnorm() by
typing ?pnorm at the R prompt.
3. Note that the default values for pnorm() are µ = 0 and
σ = 1.
20
Outline
Continuous Random Variables
Normal Distribution
The R Script
Solution Continued
Using the R function pnorm() we find the areas to the left of 1.5
and −1.0 to be 0.9332 and 0.1586 respectively. Consequently,
P(90 ≤ X ≤ 115) = P(−1.0 ≤ Z ≤ 1.5)
= P(Z ≤ 1.5) − P(Z ≤ −1.0)
= 0.9332 − 0.1587 = 0.7745.
> pnorm(1.5,mean=0,sd=1)
[1] 0.9331928
> pnorm(-1,mean=0,sd=1)
[1] 0.1586553
> pnorm(1.5)-pnorm(-1)
[1] 0.7745375
> pnorm(115,100,10) - pnorm(90,100,10)
[1] 0.7745375
21
22
Outline
Continuous Random Variables
Normal Distribution
The R Script
Graphical representation for finding P (90 ≤ X ≤ 115)
given X ∼ N (100, 10).
The area between 90 and 115 is 0.7745
90
100
115
X~Normal (µ = 100, σ = 10)
Figure: Graphical representation for finding P(90 ≤ X ≤ 115) given
X ∼ N (100, 10).
Outline
Continuous Random Variables
Normal Distribution
The R Script
Graphical representation for finding P (90 ≤ X ≤ 115)
given X ∼ N (100, 10).
X ∼ N (100, 10)
P(90 ≤ X ≤ 115)
P(X ≤ 115)
f (x)
f (x)
90 115
115
m
m
P( 90−100
≤Z≤
10
115−100
)
10
f (z)
23
−1 1.5
P(X ≤ 90)
P(Z ≤
f (x)
90
m
115−100
)
10
P(Z ≤
f (z)
1.5
90−100
10 )
f (z)
−1
Outline
Continuous Random Variables
Normal Distribution
The R Script
Solution Part (b)
Finding the value c such that 90% of the area is to its left is
equivalent to finding the value c such that 10% of its area is to the
right. That is, finding the value c that satisfies P(X ≤ c) = 0.90 is
equivalent to finding the value c such that P(X ≥ c) = 0.10.
X − 100
c − 100
P(X ≤ c) = P Z =
≤
= 0.90 for c.
10
10
Using the R function qnorm(), we find the Z value (1.2816) such
that 90% of the area in the distribution is to the left of that value.
Consequently, to be in the top 10%, we need to be more than
1.2816 standard deviations above the mean.
c − 100 set
= 1.2816
10
and solve for c ⇒ c = 112.816.
To be in the top 10%, one needs to score 112.816 or higher.
24
Outline
Continuous Random Variables
Normal Distribution
The R Script
Using qnorm()
The function qnorm() finds the quantile in a normal distribution.
It has the same default values as does pnorm().
> qnorm(.90)
[1] 1.281552
> qnorm(.90,100,10)
[1] 112.8155
• pnorm() finds P(X ≤ x), the area to the left of x (a number
between 0 and 1).
25
Outline
Continuous Random Variables
Normal Distribution
The R Script
Using qnorm()
The function qnorm() finds the quantile in a normal distribution.
It has the same default values as does pnorm().
> qnorm(.90)
[1] 1.281552
> qnorm(.90,100,10)
[1] 112.8155
• pnorm() finds P(X ≤ x), the area to the left of x (a number
between 0 and 1).
• qnorm() finds the value c such that P(X ≤ c) = some area (a
number between 0 and 1).
26
Outline
Continuous Random Variables
Normal Distribution
The R Script
Using qnorm()
The function qnorm() finds the quantile in a normal distribution.
It has the same default values as does pnorm().
> qnorm(.90)
[1] 1.281552
> qnorm(.90,100,10)
[1] 112.8155
• pnorm() finds P(X ≤ x), the area to the left of x (a number
between 0 and 1).
• qnorm() finds the value c such that P(X ≤ c) = some area (a
number between 0 and 1).
• The first argument of pnorm() is x while the first argument
to qnorm() is the area to the left of c. pnorm() returns the
cdf of x while qnorm() returns the inverse cdf of x.
27
Outline
Continuous Random Variables
Normal Distribution
The R Script
Solution Part (c)
P(105 ≤ X ≤ c) = 0.10 is the same as
105 − 100
P(X ≤ c) = 0.10 + P(X ≤ 105) = 0.10 + P Z ≤
.
10
105 − 100
P Z≤
10
= P(Z ≤ 0.5) = 0.6915.
It follows then that P(X ≤ c) = 0.7915.
c − 100
X − 100
≤
= 0.7915
P(X ≤ c) = P Z =
10
10
c − 100
= 0.8116 ⇒ c = 108.116
10
Note that a Z value of 0.8116 has 79.15% of its area to the left of
that value.
is found by solving
28
Outline
Continuous Random Variables
Normal Distribution
The R Script
Solution Part (c) with R
The solution to P(105 ≤ X ≤ c) = 0.10 when X ∼ N (100, 10) is
the same as
P(X ≤ c) = 0.10 + P(X ≤ 105)
which can be computed as
> qnorm(.10 + pnorm(105,100,10), 100, 10)
[1] 108.1151
29
Outline
Continuous Random Variables
Normal Distribution
Link to the R Script
• Go to my web page Script for Normal Distribution
• Homework: problems 3.53 - 3.63
• See me if you need help!
30
The R Script