Download 7. Families of Continuous Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
ST 371 (VII): Families of Continuous
Distributions
1
Normal Distribution
The family of normal random variables plays a central role in probability
and statistics. This distribution is also called the Gaussian distribution
after Carl Friedrich Gauss, who proposed it as a model for measurement
errors. The central limit theorem, which will be discussed in Chapter 5,
justifies the use of the normal distribution in many applications. Roughly,
the central limit theorem says that if a random variable is the sum of a
large number of independent random variables, it is approximately normally
distributed. The normal distribution has been used as a model for such
diverse phenomena as a person’s height, the distribution of IQ scores and
the velocity of a gas molecule.
The bell-shaped curve: In day to day life much use is made of statistics, in
many cases without the person doing so even realising it. If you were to go
into a shop and you noticed that everybody waiting to be served was over 7
feet tall, you would more than likely be surprised. You probably would have
expected most people to be around the “average” height, maybe spotting
just one or two people in the shop that would be taller than 6 and a half feet.
In making this judgement you are actually employing a well used statistical
distribution known as the Normal Distribution. A histogram which plots of
the heights of women in US aged 18-24 illustrates this situation.
We say that X is a normal random variable or simply that X is normally
distributed, with parameters µ and σ if the pdf of X is given by
(1.1)
f (x; µ, σ 2 ) = √
1
2
2
e−(x−µ) /2σ , − ∞ < x < ∞.
2πσ
Techniques from multivariate calculus can be used to show that
Z ∞
f (x; µ, σ 2 )dx = 1,
−∞
1
100 Intervals
0
0
5000
10000
Frequency
50000
Frequency
100000
15000
150000
10 Intervals
55
60
65
70
75
55
Height
60
65
70
75
Height
Figure 1: Heights of Women in US aged 18-24.
therefore f (x; µ, σ 2 ) fulfills the condition necessary for specifying a pdf.
Mean, Variance and SD of a Normal Distribution: If X is normally distributed with parameters µ and σ, then it can be shown that E(X) = µ and
Var(X) = σ 2 . Figure 2 presents graphs of f (x; µ, σ 2 ) for several different
(µ, σ 2 ) pairs. We can see that each graph is symmetric about µ and bellshaped. The center of the bell is the mean of the distribution. The value of
σ determines the spread of the distribution.
0.5
0.35
0.45
0.3
0.4
0.25
0.35
0.3
0.2
0.25
0.15
0.2
0.15
0.1
0.1
0.05
0.05
0
−5
0
0
−5
5
0
Figure 2: Density Functions of Normal Distribution
2
5
Example 1 The test scores of an examination can be approximated by
a normal density curve. In other words, a graph of the frequency of grade
scores should have approximately the bell-shaped form of the normal density.
The instructor often uses the test scores to estimate the normal parameters
µ and σ 2 and then assigns the letter grade A to those whose test score is
greater than µ + σ, B to those whose score is between µ and µ + σ, C to
those whose score is between µ − σ and µ, D to those whose score is between
µ − 2σ and µ − σ, and F to those getting a score below µ − 2σ. This is
sometimes referred to as grading “on the curve”. Decide the percent of the
class receiving grades A-F.
3
1.1
Standard Normal Distribution
An important fact about normal random variables is that if X is normally
distributed with parameters µ and σ 2 , then aX + b is normally distributed
with parameters aµ + b and a2 σ 2 . Therefore if X is normally distributed
with parameters µ and σ 2 , then
(1.2)
Z=
X −µ
σ
is normally distributed with parameters 0 and 1. Such a random variable
is said to be a standard normal random variable and will be denoted by Z.
The pdf of a standard normal variable is
1
2
φ(z) = f (z; 0, 1) = √ e−z /2 , −∞ < z < ∞
2π
Rz
The cdf of Z is P (Z ≤ z) = −∞ φ(y)dy, which will be denoted by Φ(z).
Φ(z) = P (Z ≤ z) gives the area under the graph of the standard normal
pdf to the left of z. It is easy to see that φ(z) = φ(−z). Note the z-curve is
symmetric about 0, we have
(1.3)
Φ(−z) = 1 − Φ(z).
Example 2 (z-curve) Find normal curve areas:
• P (Z ≤ 1.96)
• P (Z > 1.96)
4
• P (Z ≤ −1.96)
• P (−0.38 ≤ z ≤ 1.25).
Example 3 (z-percentile) The 100pth percentile of the standard normal
distribution is the value on the horizontal axis such that the area under the
curve to its left is p. Find (a) the 95th z-percentile, (b) the 5th z-percentile.
zα notation: zα is the value on x-axis for which α of the area under the
z-curve lies to the right of zα . In other words, zα is the 100(1 − α)th
percentile of the standard normal distribution. The zα ’s are usually referred
to as z critical values. Important z critical values include: z0.10 = 1.28,
z0.05 = 1.645, z0.025 = 1.96 and z0.005 = 2.33.
5
1.2
Non-Standard Normal Distribution
When X ∼ N (µ, σ 2 ), probabilities involving X can be computed by “standardizing”. Let Z = (X − µ)/σ, then Z ∼ N (0, 1). Thus
b−µ
a−µ
≤Z≤
)
σ
σ
b−µ
a−µ
= Φ(
) − Φ(
)
σ
σ
P (a ≤ X ≤ b) = P (
Similarly we have
P (X ≤ a) = Φ(
a−µ
b−µ
) and P (X ≥ b) = 1 − Φ(
).
σ
σ
The key idea is that by standardizing, any probability involving X can be
expressed as a probability involving Z.
Example 4 If X is a normal random variable with parameters µ = 3 and
σ 2 = 9, find (a) P (2 < X < 5); (b) P (X > 0); (c) P (|X − 3| > 6).
6
Example 5 The army is developing a new missile and is concerned about its
precision. By observing points of impact, launchers can adjust the missile’s
initial trajectory, thereby controlling the mean of its impact distribution. If
the standard deviation of the impact distribution is too large, though, the
missile will be ineffective. Suppose the Pentagon requires that at least 95%
of the missiles must fall within 1/8 mile of the target when the missiles are
aimed properly. Assume the impact distribution is normal. What is the
maximum allowable standard deviation for the impact distribution?
7
1.3
R commands for Normal Distribution
The command dnorm(x, mu, s2) calculates the probability that a normal
random variable with mean mu and variance s2 equals x, e.g.,
> dnorm(2, 2, 1.5^2)
[1] 0.1773077
is the value of pdf f (2; µ = 2, σ 2 = 1.52 ).
The command pnorm(x, mu, s2) calculates the probability that a normal rv with mean mu and variance s2 is less than or equal to x, e..g,
> pnorm(2, 2, 1.5^2)
[1] 0.5
is the value of cdf P (X ≤ 2), where X ∼ N (2, 1.52 ).
1.4
Normal Approximation to Binomial
An important result in probability theory, known as the DeMoivre-Laplace
limit theorem, states that when n is large, a binomial random variable with
parameters n and p will have approximately the same distribution as a
normal random variable with the same mean and variance as the binomial.
DeMoivre-Laplace Limit Theorem: If Sn denotes the number of successes
that occur when n independent trials, each resulting in a success with probability p, are performed, then for any a < b,
!
Ã
Sn − np
≤ b → Φ(b) − Φ(a)
(1.4)
P a≤p
np(1 − p)
as n → ∞.
Normal Approx. vs. Poisson Approx. to Binomial: The normal distribution
provides an approximation to the binomial distribution when n is large [The
normal approximation to the binomial will, in general, be quite good for
8
Binomial(20, 0.7)
0.10
pmf
0.15
0.00
0.00
0.05
0.05
0.10
pmf
0.20
0.15
0.25
Binomial(10, 0.7)
2
4
6
8
10
0
5
10
15
X
X
Binomial(30, 0.7)
Binomial(40, 0.7)
20
0.10
pmf
0.15
0.00
0.00
0.05
0.05
0.10
pmf
0.20
0.15
0.25
0
0
5
10
15
20
25
30
0
X
10
20
30
40
50
X
Figure 3: The pmf of a Binomial (n, p) random variable becomes more and more “normal”
ad n becomes larger and larger.
9
values n of satisfying np(1 − p) ≥ 10]. The Poisson distribution provides an
approximation to the binomial distribution when n is large and p is small
so that np is moderate (n > 50 and np < 5).
Example 6 Airlines A and B offer identical service on two flights leaving
at the same time (meaning that the probability of a passenger choosing
either is 1/2). Suppose that both airlines are competing for the same pool
of 400 potential passengers. Airline A sells tickets to everyone who requests
one, and the capacity of its plane is 230. Approximate the probability that
airline A overbooks.
10
2
2.1
The Exponential and Gamma Distributions
Exponential distribution
Exponential Random Variables: A random variable with CDF F (x) = 1 −
e−λx , x ≥ 0 is called an exponential random variable with parameter λ. The
probability density function of an exponential random variable is for x ≥ 0,
(2.5)
f (x) =
d
d
F (x) =
1 − e−λx = λe−λx .
dx
dx
1.0
and 0 for x < 0. The following plot shows the pdfs for some exponential
distributions.
0.0
0.2
0.4
pdf
0.6
0.8
Exponential(0.5)
Exponential(1)
Exponential(2)
Exponential(3)
0
1
2
3
4
5
x
Figure 4: Exponential distribution densities.
11
6
Example 7 The time until the first event in a Poisson process with rate
λ is an exponential random variable with parameter λ. Suppose customers
arrive at a bank according to a Poisson process with arrival intensity 2 per
minute. What is the probability that starting at 1 p.m., the first customer
arrives within two minutes?
Mean and variance of exponential random variable: we will show that
E(X) =
1
1
, Var(X) = 2 .
λ
λ
For n > 0, we have
Z
∞
n
E(X ) =
xn λe−λx dx.
0
, u = xn ) yields
Z ∞
¯
n
n −λx ¯∞
E(X ) = −x e
+
e−λx nxn−1 dx
0
0
Z ∞
n
λe−λx nxn−1 dx
= 0+
λ 0
n
= E[X n−1 ].
λ
Integrating by parts (dv = λe
−λx
12
Thus,
1
1
E(X 0 ) =
λ
λ
2
2
E(X 2 ) = E(X) = 2
λ
µ ¶2λ
2
1
1
V ar(X) = 2 −
= 2
λ
λ
λ
E(X) =
Memorylessness of exponential random variable: Let X be an exponential
random variable with parameter λ. We have for all s, t ≥ 0,
P ({X > s + t} ∩ X > t)
P (X > t)
P (X > s + t)
=
P (X > t)
1 − (1 − e−λ(s+t) )
=
1 − (1 − e−λt )
= e−λs
P (X > s + t|X > t) =
In other words,
(2.6)
P (X ≤ s + t|X > t) = 1 − e−λs = P (X ≤ s).
If we think of X as the lifetime of some electrical device, equation (2.6)
states the probability that the device survives for at least s + t hours given
that it has survived t hours is the same as the initial probability that it
survives for at least s hours. In other words, if the device is alive at age
t, the distribution of the remaining amount of time that it survives is the
same as the original lifetime distribution (that is, it is as if the instrument
does not remember that it has already been in use for a time t). This is
called the memorylessness property of the exponential distribution.
Question: Is the exponential distribution a good model for the distribution of human lifetimes?
13
Example 8 Consider a post office that is staffed by two clerks. Suppose
that when Mr. Smith enters the post office, he discovers that Ms. Jones is
being served by one of the clerks and Mr. Brown by the other. Suppose also
that Mr. Smith is told that his service will begin as soon as either Jones or
Brown leaves. If the amount of time that a clerk spends with a customer is
exponentially distributed with parameter , what is the probability that, of
the three customers, Mr. Smith is the last to leave the post office?
14
2.2
The Gamma Distribution
Suppose events occur according to a Poisson process with arrival intensity
λ. Suppose we start observing the process at some time (which we denote
by time 0). The time until the first event occurs has an exponential (λ)
distribution. Let X denote the time until the first α events occur. Let W
denote the number of occurrences of the event in the interval [0, x]. Then
W is a Poisson random variable with parameter λx. The pdf of X can be
obtained using W as
λα
fX (x) =
xα−1 e−λx
(α − 1)!
The gamma family can be generalized to cases in which α is positive but
not necessarily an integer. To do this, we replace (α − 1)! with a continuous
function of (nonnegative) α, Γ(α), the latter reducing to (α − 1)! when α is
a positive integer. For any real number α > 0, the gamma function (of α)
is given by
Z
∞
Γ(α) =
xα−1 e−x dx.
0
The most important properties of the gamma function are the following:
1. For any α > 1, Γ(α) = (α − 1) · Γ(α − 1).
2. For any positive integer, n, Γ(n) = (n − 1)!.
√
3. Γ(1/2) = π.
Gamma Distribution: Let X be a random variable with pdf
( −λx α−1
λe
(λx)
x≥0
Γ(α)
f (x) =
0
x<0
Then X is said to have a gamma distribution with parameters α and γ.
The mean and variance of the gamma distribution are
E(X) =
α
α
, Var(X) = 2 .
λ
λ
15
1.0
The gamma family of distributions is a flexible family of probability distributions for modeling nonnegative valued random variables. The following
plot shows the pdfs for some gamma distributions.
0.0
0.2
0.4
pdf
0.6
0.8
Gamma(0.5, 1)
Gamma(1, 1)
Gamma(5, 1)
Gamma(10, 1)
0
5
10
15
20
x
Figure 5: Gamma distribution densities.
Example 9 Suppose customers arrive at a bank according to a Poisson
process with arrival intensity 2 per minute. What is the probability that
starting at 1 p.m., the first two customers have arrived within three minutes?
What is the expected value of the amount of time it takes for two customers
to arrive?
16
2.3
The Chi-Squared Distribution
The chi-squared distribution is important because it is the basis for statistical procedures for testing hypotheses and constructing confidence intervals.
Relation to normal distribution (not required): Let X1 , · · · , Xn be a random sample from N (µ, σ 2 ). Then the rv
P
(n − 1)S 2
(Xi − X̄)2
(2.7)
=
σ2
σ2
has a χ2 distribution with (n − 1) df.
Let ν be a positive integer. Then a rv X is said to have a chi-squared
distribution with parameter ν if the pdf of X is the gamma density with
α = ν/2 and β = 2. The pdf of a chi-squared rv is thus
f (x, ν) =
1
xν/2−1 e−x/2 .
ν/2
2 Γ(ν/2)
0.10
0.00
0.05
density
0.15
χ2
4
χ2
8
χ2
12
χ2
20
0
10
20
30
40
x
Figure 6: Chi-Square Densities. The χ2 curve is positively skewed and becomes more flat
symmetric as n increases.
17
3
Distribution of Functions of R.V. (optional)
It is often the case that we know the probability distribution of a random
variable and are interested in determining the distribution of some function
of it as in Example 1 above. Suppose that we know the distribution of X and
we want to find the distribution of g(X). Our basic approach for continuous
random variables is the following:
1. Find the CDF of g(X). To do so, write the event g(X) ≤ y in terms
of X being in some set.
2. Differentiate the CDF of g(X) to find the pdf.
Example 10 Suppose that a random variable X has pdf
½
6x(1 − x) 0 < x < 1
fX (x) =
0
elsewhere
Let Y be a random variable that equals 2X + 1. What is the pdf of Y ?
Note, first that the random variable Y maps the range of X-values (0,1)
onto the interval (1,3). Let 1 < y < 3. Then
¶
µ
y−1
FY (y) = P (Y ≤ y) = P (2X + 1 ≤ y) = P X ≤
2
µ
¶
y−1
= FX
2
We have

 0R
t
FX (t) =
6x(1 − x)dx = 3t2 − 2t3
0

1
18
t≤0
0<t<1
t≥1
Therefore,


0


³ ´2

¡ ¢
¡ y−1 ¢  3 y-1 − 2 y−1 3
2
2
FY (y) = FX 2 =
2
3


= − y4 + 3y2 − 9y

4 +1

1
Differentiating FY (y) gives fY (y):

0
2
fY (y) =
− 3y4 + 3y −

0
4
4.1
y≤1
1<y<3
y≥3
y≤1
1<y<3
y≥3
9
4
Other Useful Distributions (optional)
The Weibull Distribution
The Weibull distribution is widely used in engineering as a model for the lifetime of objects. A random variable X is said to have a Weibull distribution
with parameters (α, β) and ν if its pdf is
(
0
x≤ν
n ¡ ¢ o
¡ ¢
f (x) =
β
β x−ν β−1
exp − x−ν
x>ν
α
α
α
4.2
The Lognormal Distribution
A nonnegative rv X is said to have a lognormal distribution if the rv Y =
log(X) has a normal distribution. The resulting pdf of a lognormal rv when
log(X) is normally distributed with parameter µ and σ is
f (x; µ, σ) = √
1
2
2
e−[log(x)−µ] /(2σ ) .
2πσx
19
4.3
The Beta Distribution
The beta distribution is a distribution for a continuous random variable
that lies between 0 and 1, and is widely used as a model for proportions.
A random variable X is said to have a beta distribution with parameters a
and b if its pdf is
½ 1 a−1
x (1 − x)b−1
0<x<1
f (x) = B(a,b)
0
otherwise
R1
where B(a, b) = 0 xa−1 (1 − x)b−1 dx.
4.4
The Cauchy Distribution
A random variable X is said to have a Cauchy distribution with parameter
θ if its pdf is given by
f (x) =
1
1
,
π 1 + (x − θ)2
20
−∞ < x < ∞