Download Continuous Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
PROBABILITY MODELS
51
13. Continuous random variables
Imagine a stick of unit length that is oriented horizontally from left to right. Consider
choosing a point x uniformly at random between 0 and 1 and breaking the stick x units
from the left end. In this example, the sample space can be regarded
as Ω = [0, 1). For
S
finitely many subintervals (non-overlapping) J1 , . . . , Jn , with E := ni=1 Ji , we define
 n 
n
[  X
P[E] := P 
Ji  =
length(Ji ).
i=1
i=1
The intuition here is that there are too many possible values of X, so each individual
value has an excessively small (in fact, zero) probability of occurring. Think back to our
interpretation of the event space (Ω, E) at the beginning of the semester. The elements E ∈ E
are the events that we can obtain information about. In the above example, even if we could
generate a uniform number x to infinite precision, you could never know exactly what the
number is; therefore, even an observation such as x = 0.1542847 only tells you that x is in
the interval [0.15428465, 0.15428475).
By this definition, we have P[[0, 1)] = P[Ω] = 1 − 0 = 1, P[[0, 1/2)] = 1/2 and P[[x, x]] =
P[{x}] = x − x = 0 for every x ∈ [0, 1). This example
S emphasizes the importance of countable
additivity of P. In particular, we can write Ω = ω∈[0,1) {ω}, which is an uncountable union,
and


 [  X
1 = P[Ω] = P 
{ω} ,
P[{ω}] = 0.
ω∈Ω
ω∈Ω
Definition 13.1 (Continuous random variables). Let (Ω, E, P) be a probability model. A function
X : Ω → R has a continuous distribution is there exists a function f : R → [0, ∞) such that
Z
P{X ∈ B} =
f (x)dx
B
for all events B ∈ E. Specifically, f satisfies
Z
∞
P{X ∈ R} = P{−∞ < X < ∞} =
f (x)dx = 1.
−∞
We call f the density of X.
If f is continuous at x, then
1
P{X ∈ [x, x + ∆]}
= lim
lim
∆↓0 ∆
∆↓0 length[x, x + ∆]
x+∆
Z
f (y)dy = f (x).
x
Therefore, we interpret f (x)dx as the probability that X is in an infinitesimally small window
[x, x + dx] of x.
The cumulative distribution function FX : R → [0, 1] of X is defined by
Z x
FX (x) := P{X ≤ x} = P{X ∈ (−∞, x]} =
f (y)dy
−∞
and, for x < z, P{x < X ≤ z} = FX (z) − FX (x).
Conversely, if F is continuously differentiable, or everywhere continuous and differentiable at all but finitely many points, then
Z z
F(z) − F(x) =
F0 (y)dy,
x
52
HARRY CRANE
so X has density f = F0 by the fundamental theorem of calculus.
13.1. Uniform random variables. Define
(
1, 0 ≤ x ≤ 1
f (x) :=
0, otherwise.
Then f is the density of a Uniform(0, 1) random variable. If U ∼ Unif(0, 1), then
Z u
FU (u) := P{U ≤ u} =
dx = u, 0 ≤ u ≤ 1,
0
1
Z
EU =
udu = 1/2,
0
1
Z
EU =
2
u2 du = 1/3,
and
0
Var(U) = EU2 − [EU]2 = 1/3 − (1/2)2 = 1/12.
For a < b, W = (b − a)U + a is uniformly distributed on (a, b). The density of W is
(
1
, a≤w≤b
f (w) = b−a
0, otherwise.
Also,
1
EW = (b − a)EU + a = (a + b)
2
(b − a)2
.
Var(W) =
12
Recall the definition of the discrete Uniform distribution: P{X = k} = 1/n for k = 1, . . . , n.
In this case, EX = (n + 1)/2 and Var(X) = (n2 − 1)/12. Let Nn have the discrete uniform
distribution on [n] := {1, . . . , n}, for each n = 1, 2, . . .. Then we say (Nn /n, n ≥ 1) converges
in distribution to the Uniform distribution on [0, 1], in the following sense. If kn → ∞ in
such a way that kn /n → u ∈ [0, 1], then
P{Nn ≤ kn } = kn /n = P{Nn /n ≤ kn /n}
and kn /n → u and P{Nn /n ≤ kn /n} → u = P{U ≤ u}, for all u ∈ (0, 1). Note also that
n+1
1
→ = EU and
2n
2
2
n −1
1
Var(Nn /n) = n−2 Var(Nn ) =
→
= Var(U).
12
12n2
ENn /n = n−1 ENn =
Therefore, one way to generate an approximately uniform random variable U is to sample x
uniformly from {1, . . . , n}, for some large value of n = 1, 2, . . ., and then put U = x/n. This
procedure gives a uniform random variable up to precision 1/n. See Section 13.3 for a much
better way to do this.
PROBABILITY MODELS
53
13.2. Expectation of continuous random variables. Let X be a random variable with
density f . Then, for g : R → R,
Z ∞
Z ∞
Z ∞
+
Eg(X) :=
g(x) f (x)dx =
g (x) f (x)dx −
g− (x) f (x)dx,
−∞
−∞
−∞
where g± (x) := max(±g(x), 0). The same rules apply for computing expectations of continuous random variables, with the modification that we now compute the integral instead of a
sum. In particular, for random variables X, Y ∈ R and a, b ∈ R, we have
•
•
•
•
E(aX + bY + c) = aEX + bEY + c,
Var(X) = EX2 − [EX]2 ,
Var(aX + bY + c) = a2 Var(X) + b2 Var(Y) + 2ab Cov(X, Y), where
Cov(X, Y) = EXY − EXEY.
Now, suppose X ≥ 0 is a continuous random variable with density f . Then
Z ∞
EX =
x f (x)dx
0
Z ∞ "Z x #
=
dy f (x)dx
0
0
Z ∞Z ∞
=
I{0≤y≤x} f (x)dydx
0
0
#
Z ∞ "Z ∞
=
f (x)dx dy
y
0
∞
Z
=
P{X ≥ y}dy.
0
Also,
Z
EX
2
∞
=
P{X2 ≥ y}dy
Z0 ∞
=
P{X ≥
√
y}dy
0
Z
=
∞
2xP{X ≥ x}dx.
0
Example 13.2 (Order statistics). Let U1 , U2 , U3 , U4 be i.i.d. Uniform(0, 1) random variables and
write U(1) < U(2) < U(3) < U(4) to denote the order statistics. Then
P{U(1) ≤ u}
=
=
P{min Ui ≤ u}
1 − P{Ui > u for all i = 1, . . . , 4}
= 1 − (1 − u)4
=: F(1) (u).
The density of U(1) is
(
f(1) (u) =
F0(1) (u)
=
4(1 − u)3 , 0 < u < 1
0,
otherwise.
54
HARRY CRANE
We can compute the expectation of U(1) in a couple of different ways. By definition,
1
Z
EU(1) =
1
Z
4u(1 − u) du =
3
0
Z
4(1 − y)y dy =
3
0
1
4y3 − 4y4 dy = 1 − 4/5 = 1/5.
0
Alternatively,
1
Z
EU =
1
Z
P{U(1) ≥ u}du =
0
(1 − u)4 du = 1/5.
0
13.3. Constructing Uniform (0,1) random variables. Let I1 , I2 , . . . be independent Bernoulli(1/2)
random variables (based on repeated flips of a fair coin). Given I1 , I2 , . . . , we define
Y :=
∞
X
I j 2−j .
j=1
Since {I j } j≥1 are random, so is Y. Y is a random variable with dyadic (binary) expansion
0.I1 I2 I3 . . .. The set
n
o
D := 0.1 2 . . . : j ∈ {0, 1}
of dyadic rationals is dense in [0,1], i.e., for every u ∈ [0, 1] there exists a δ > 0 such that some
d ∈ (u − δ, u + δ) is in D.
Suppose we terminate our expansion above at j = 5. Then we obtain an expansion
Y5 := 0.i1 i2 i3 i4 i5 , where i1 , . . . , i5 are realizations of independent Bernoulli trials. What is the
distribution of Y5 ?
(
2−5 , u = 0.1 . . . 5 , j ∈ {0, 1}
P{Y5 = u} =
0,
otherwise.
In general, if we stop after n ≥ 1 trials and define Yn := i1 . . . in , then
( −n
2 , u = 0.1 . . . n , j ∈ {0, 1}
P{Yn = u} =
0,
otherwise.
In other words, for n ≥ 1, Yn is uniformly distributed on
Dn := {0.1 . . . n : j ∈ {0, 1}, j = 1, . . . , n}.
Now, take any u ∈ [0, 1]. Since D is dense in u, there exists a sequence d1 , d2 , . . . , with
dn ∈ Dn for each n ≥ 1 such that dn → u. In fact, we can choose dn so that |u − dn | ≤ 2−n
for each n ≥ 1. Now, generate a sequence I1 , I2 , . . . i.i.d. Bernoulli(1/2) and, for each n ≥ 1,
define Yn := I1 . . . In as above. For the dn we have chosen, we have
dn − 2−n ≤ P{Yn ≤ u} ≤ dn + 2−n
for all n ≥ 1.
By the sandwich lemma, we have limn→∞ P{Yn ≤ u} = limn→∞ dn = u, for all u ∈ (0, 1). This
is the notion of convergence in distribution. This method tells us that we can approximate a
uniform random variable to precision 2−n by flipping a coin only n times. This approach
is more efficient than that described above, where we must generate a number between 1
and n to obtain precision only up to n−1 .
PROBABILITY MODELS
55
13.4. The Normal (Gaussian) distribution. A random variable Z has the standard normal
distribution if it is continuous with density
2
1
fZ (z) = √ e−z /2 , −∞ < z < ∞.
2π
The Normal distribution is the most important distribution in statistics, as we will see later
in the course. The expectation and variance can be computed as follows:
Z ∞
2
1
EZ = √
ze−z /2 dz = 0 and
2π −∞
Z ∞
2
1
2
Var(Z) = EZ = √
z × ze−z /2 dz
2π −∞
Z ∞ −z2 /2
h
√ i∞
e
−z2 /2
−
= 1.
= z(−e
/ 2π)
−√
−∞
2π
−∞
The density of Z is commonly denoted φ and the distribution function is denoted Φ.
For α ∈ (0, 1), zα is the number such that Φ(zα ) = α, e.g., z0.5 = 0, z0.84 ≈ 1, z0.025 ≈ −2, and
so on. These are called the α-quantiles of the standard normal distribution and are used
repeatedly in statistical inference.
A random variable X is said to obey the Normal distribution with parameter (µ, σ2 ), for
−∞ < µ < ∞ and σ2 > 0, written X ∼ N(µ, σ2 ), if
X−µ
Z :=
is a standard Normal random variable.
σ
We denote the distribution of X by Φµ,σ . The above transformation of X is called a change of
location of µ and change of scale by σ. Consequently, the family {Φµ,σ : −∞ < µ < ∞, σ2 > 0}
of Normal distributions is called a location-scale family of distributions, which will be covered
in another course on statistical inference.
Since X = µ + σZ, where Z ∼ N(0, 1), we have
EX = µ + σEZ = µ and
Var(X) = Var(µ + σZ) = σ2 Var(Z) = σ2 .
Furthermore, X has a density
fX (x)dx = P{x ≤ X ≤ x + dx}
(
)
x − µ X − µ x − µ dx
= P
≤
≤
+
σ
σ
σ
σ
(
)
x−µ
x − µ dx
= P
≤Z≤
+
σ
σ
σ
x − µ
dx
= fZ
.
σ
σ
Therefore, we denote the density of X by φµ,σ , which satisfies
(
)
(x − µ)2
1
(22)
φµ,σ (x) = √ exp −
, −∞ < x < ∞.
2σ2
σ 2π
Theorem 13.3 (Demoivre–Laplace Theorem). Let X be a Binomial random variable with parameter (n, p). Then X is approximately Normally distributed with parameter (np, npq).
56
HARRY CRANE
Remark 13.4. This approximation requires npq ≥ 10 (or so) in order to be accurate.
Example 13.5 (Normal approximation to Binomial). Spin a pointer 10 times with 4 possible
outcomes, A,B,C,D. Let NA denote the number of times the pointer points toward A. Then NA
follows the Binomial distribution with parameter (10, 1/4). We have
! 10
X
10 1 k 3 10−k
P{NA ≥ 4} =
≈ 0.224.
k 4
4
k=4
By Demoivre–Laplace, NA is approximately Normally distributed with mean µ = np = 2.5 and
√
standard deviation σ = npq ≈ 1.37. The Normal approximation (with continuity correction)
gives
D − 2.5 3.5 − 2.5
P{D ≥ 4} ≈ P
≥
= 1 − Φ(0.73) = 0.233.
1.37
1.37
Example 13.6 (Marbles). A large collection of marbles has diameters that are approximately
Normally distributed with mean 1 cm. One third of the marbles have diameters greater than 1.1 cm.
(a) What is the standard deviation? From the normal table, z0.667 = 0.43; hence,
N(0, 1) and
X−µ
σ
∼
1.1 − 1
= 0.43 =⇒ σ = 0.1cm/0.43 = 0.23cm.
σ
(b) What fraction of marbles have diameters within 0.2 cm of the mean?
−0.2
1.2 − 1
P{0.8 ≤ D ≤ 1.2} = P
≤Z≤
0.23
0.23
= P {−0.86 ≤ Z ≤ 0.86}
= Φ(0.86) − Φ(−0.86)
= Φ(0.86) − [1 − Φ(0.86)]
= 2Φ(0.86) − 1
= 0.61.
(c) What diameter is exceeded by 75% of marbles? The 0.25 quantile of Z is z0.25 = −0.675.
Hence, the diameter exceeded by 75%, denoted d75 , is
d75 = µ − 0.675σ = 1 − 0.675 × 0.23 = 0.84 cm.
13.5. The Exponential distribution. Consider modeling arrivals of a bus as follows. Fix
some λ > 0 and assume that within each time interval, the number of bus arrivals in that
interval follows a Poisson distribution with parameter λ × (length of the interval). For
example, if λ = 2/hour, then the number of arrivals between 12:00PM and 2:00PM is
Poisson with parameter 2 × 2 = 4. Further, we assume that the number of arrivals in
non-overlapping intervals are independent Poisson random variables with the appropriate
means. Therefore, the number of arrivals between 12:00 and 2:00 is Poisson(4) and is
independent of the number of arrivals between 2:30 and 3:00, which is Poisson(1). The
above model is called a time-homogeneous Poisson process with intensity parameter λ > 0.
Writing N(s,t) to denote the number of arrivals in the interval (s, t), the Poisson process is
determined by the marginal distributions
N(s,t) ∼ Poisson(λ(t − s)).
PROBABILITY MODELS
57
Let T := inf{t > 0 : N[0,t] = 1} denote the time of the first arrival. What is the distribution
of T?
• First method: Note that the event {T > t} is the same as the event that there are no
arrivals in the interval [0, t] for our Poisson process. Therefore,
P{T > t} = P{N[0,t] = 0} = e−λt ,
by the assumption that counts follow the Poisson distribution; therefore, FT (t) =
P{T ≤ t} = 1 − P{T > t} = 1 − e−λt and fT (t) = F0T (t) = λe−λt .
• Second method: For t ≥ 0 and dt > 0,
fT (t)dt = P{t ≤ T ≤ t + dt}
= P{no arrival in [0, t), one arrival in [t, t + dt]}
= P{N[0,t) = 0} × P{N[t,t+dt) = 1}
= e−λt × e−λdt (λdt)1 /1!
= λe−λt × e−λdt dt.
For the density of T, we have
fT (t) = lim
dt↓0
P{t ≤ T ≤ t + dt}
λe−λt × e−λdt dt
= lim
= λe−λt .
dt
dt
dt↓0
You can check that fT above integrates to one and so is the proper probability density. The
random variable T is said to have the Exponential distribution with intensity parameter λ > 0,
denoted T ∼ Exp(λ).
Now, put X = λT, where T ∼ Exp(λ). Then X is called standard Exponential random
variable:
FX (x) = P{X ≤ x} = P{T ≤ x/λ} = 1 − e−x ,
x>0
and
fX (x) = e .
−x
To compute the moments of the Exponential distribution, we introduce the Gamma
function, which for real numbers r > 0 is defined by
Z ∞
Γ(r) :=
xr−1 e−x dx.
0
The Gamma function takes some special values. In particular, Γ(1) = 1 and, for any r > 0,
Z ∞
Γ(r + 1) =
xr e−x dx
0
Z ∞
r −x ∞
= x (−e ) 0 −
rxr−1 (−e−x )dx
= rΓ(r).
0
Therefore, if r is a positive integer, then Γ(r + 1) = r(r − 1) · · · 2 · 1 · Γ(1) = r!.
58
HARRY CRANE
Now, let X be a standard Exponential random variable. Using the Gamma function, we
have
Z ∞
EX =
xe−x dx = Γ(2) = 1,
Z0 ∞
EX2 =
x2 e−x dx = Γ(3) = 2, and
0
Z ∞
n
EX =
xn e−x dx = Γ(n + 1) = n!.
0
We also have Var(X) =
−
distribution as X/λ and so
EX2
[EX]2
ET = 1/λ;
= 2 − 1 = 1. Now, let T ∼ Exp(λ). Then T has the same
Var(T) = 1/λ2 ;
SD(T) = 1/λ.
Example 13.7 (Electrical systems). Consider two components C1 , C2 of an electrical system. The
system functions as long as both components are functioning. Suppose lifetimes T1 , T2 of C1 and C2
are modeled by independent Exponential random variables with parameters λ1 , λ2 and let T be the
lifetime of the system. What is the distribution of T := min(T1 , T2 )?
• Method 1: CDF approach
Let t ≥ 0, then
P{T > t} =
=
=
=
=
P{T1 > t, T2 > t}
P{T1 > t}P{T2 > t}
e−λ1 t e−λ2 t
e−(λ1 +λ2 )t
e−λt ,
where λ = λ1 + λ2 .
• Method 2: Infinitesimals approach
Let t ≥ 0, then
f (t)dt =
=
=
=
=
P{t − dt < T ≤ t}
P{t − dt < T1 ≤ t, T2 > t} + P{t − dt < T2 ≤ t T > t}
P{t − dt < T1 ≤ t}P{T2 > t} + P{t − dt < T2 ≤ t}P{T1 > t}
λ1 e−λ1 t dte−λ2 t + λ2 e−λ2 t dte−λ1 t
λe−λt dt.
We conclude that T = min(T1 , T2 ) ∼ Exp(λ1 + λ2 ) and ET = 1/(λ1 + λ2 ).
What is the probability that component 1 fails before component 2?
Z ∞
P{T2 > T2 } =
P{T2 > T1 | T1 = t} fT1 (t)dt
Z0 ∞
=
P{T2 > t | T1 = t}λ1 e−λ1 t dt
Z0 ∞
=
e−λ2 t λ1 e−λ1 t dt
0
= λ1 /(λ1 + λ2 ).
For s ≥ 0, what is P{T > s and T2 > T1 }? (Exercise.)
PROBABILITY MODELS
59
The following is a (very good) exercise.
Proposition 13.8. Let T1 , T2 , . . . , Tn be independent Exponential random variables with intensities
λ1 , . . . , λn , respectively. Then the minimum of T1 , . . . , Tn follows the Exponential distribution with
parameter λ• := λ1 + · · · + λn .
When studying discrete distributions, we mentioned that the Geometric distribution is
the unique discrete distribution with the memoryless property. In fact, the Exponential
distribution is the unique continuous distribution with the memoryless property: Let
T ∼ Exp(λ), s , t ≥ 0, then
P{T ≥ t + s, T ≥ t}
P{T ≥ t}
−λ(t+s)
e
=
e−λt
= e−λs .
P{T ≥ t + s | T ≥ t} =
Given T ≥ t, the residual lifetime T − t has the Exponential distribution with parameter λ.
The Exponential distribution is the only memoryless non-negative continuous distribution.
The memoryless property is useful for quick probability calculations. For example, suppose
the lifetime of a light bulb is Exponentially distributed with parameter 1/200. Then, given
the bulb is still working after 800 hours, how much longer do you expect it to work? By the
memoryless property, the residual lifetime is still Exponential with parameter 1/200, and so
we expect it to work for another 200 hours.
13.6. The Gamma distribution. Consider a Poisson process with intensity λ. Let Wr be
the waiting time of the rth arrival. Then
P{Wr > t} = P{N[0,t] ≤ r − 1} =
r−1
X
e−λt (λt) j / j!
j=0
and
fr (t)dt = P{t ≤ Wr ≤ t + dt}
= P{N[0,t) = r − 1, N[t,t+dt) = 1}
=
=
e−λt (λt)r−1
× λdte−λdt
(r − 1)!
λr r−1 −λt −λdt
t e dte
.
Γ(r)
A random variable W with density
f (t) =
λr r−1 −λt
t e I(0,∞) (t)
Γ(r)
is said to have the Gamma distribution with shape parameter r and intensity λ, written
W ∼ Gamma(r, λ).
60
HARRY CRANE
If W ∼ Gamma(r, λ), then X = λW ∼ Gamma(r, 1). Now, suppose X ∼ Gamma(r, 1), then
Z ∞
Γ(r + 1) rΓ(r)
1
x × xr−1 e−x dx =
EX =
=
= r,
Γ(r) 0
Γ(r)
Γ(r)
EX2 = Γ(r + 2)/Γ(r) = r(r + 1),
and
Var(X) = r2 + r − r2 = r.
Therefore, if W ∼ Gamma(r, λ), then EW = r/λ and Var(W) = r/λ2 .
13.7. Hazard functions. Let T be a nonnegative continuous random variable with density
f and distribution function F. We interpret T as the lifetime of something. Define
P{t < T ≤ t + dt | T > t}
dt
to be the ratio of (the probability of dying in the next dt time units, given that it has lived at
least t ≥ 0) to dt. We call λ(t) the death rate, failure rate, or hazard rate at time t. The survival
probability is given by
S(t) := P{T > t} = 1 − F(t), t > 0.
λ(t) :=
From the hazard rate, we obtain the distribution of T:
P{t ≤ T ≤ t + dt | T > t}
dt
P{t ≤ T ≤ t + dt}
=
dtP{T > t}
f (t)
=
1 − F(t)
d
= − log(1 − (F(t))).
dt
As an example, for T following the Exponential distribution with rate λ, we have
λ(t) =
λe−λt
= λ,
e−λt
which is called the constant force of mortality in actuarial modeling.
In general, let λ(t) be a hazard function for some random variable T. Then
Z t
λ(s)ds = [− log(1 − F(s))]t0 = − log(1 − F(t)) + log(1 − F(0)).
λ(t) =
0
Therefore,
( Z t
)
S(t) = 1 − F(t) = exp −
λ(s)ds
0
and
( Z t
)
d
f (t) = − S(t) = − exp −
λ(s)ds (−λ(t)) = λ(t)S(t).
dt
0
This shows that the Exponential distribution is the only memoryless distribution since, if T
is memoryless, then λ(t) = f (0) for all t ≥ 0. Thus, T has constant hazard rate function and
must follow the Exponential distribution with parameter λ = f (0).
PROBABILITY MODELS
61
Example 13.9 (The Rayleigh distribution). Suppose T has hazard function λ(t) = t. Then
( Z t
)
S(t) = exp −
λ(s)ds
0
( Z t )
= exp −
sds
0
−t2 /2
= e
2 /2
and f (t) = λ(t)S(t) = te−t
. The mean of T is
√ Z ∞
Z ∞
p
2
2
2π
1
ET =
t2 e−t /2 dt =
√ t2 e−t /2 dt = π/2.
2
2π
−∞
0
Also,
Z
ET =
∞
∞
Z
−t2 /2
S(t)dt =
0
e
0
√ Z ∞
2
1
dt =
u−1/2 e−u du = √ Γ(1/2).
2 0
2
Therefore,
√
π
√
1
Γ(3/2) = Γ(1/2) = π/2
2
√
3
Γ(5/2) = Γ(3/2) = 3 π/4.
2
Γ(1/2) =
We have
∞
Z
ET =
∞
Z
2 /2
2tS(t)dt = 2
2
0
te−t
dt = 2.
0
and Var(T) = 2 − π/2.
We say that T has the (standard) Rayleigh distribution.
13.8. Some extreme value theory. Suppose we want to analyze extreme (possibly catastrophic) events, such as the occurrence of a “storm of the century” or an economic collapse.
Then we consider the distribution of the maximum of a sequence of random variables.
For example, we might be interested in the largest daily drop of the Dow Jones Industrial
Average, the maximum insurance claim submitted over a certain period, maximum weekly
rainfall, etc.
Let X1 , X2 , . . . be i.i.d. standard exponential random variables and, for n ≥ 1, let Mn :=
max(X1 , . . . , Xn ). Then, for x ≥ 0,
P{Mn ≤ x} = P{X1 ≤ x, . . . , Xn ≤ x} = (1 − e−x )n ,
which has density
fMn (x) = ne (1 − e )
−x
−x n−1
!
n −x
=
e × (1 − e−x )n−1 .
1
62
HARRY CRANE
Now, put Yn := Mn − log n, let y ∈ R and xn := y + log n. Then, for all large n, xn ≥ 0 and
P{Yn ≤ y} = P{Mn ≤ xn }
= (1 − e−xn )n
= (1 − e−y−log n )n
= (1 − e−y /n)n
= P{N = 0},
where N has the Binomial distribution with parameter (n, e−y /n). Consequently, as n → ∞,
pn = e−y /n → 0, npn = ne−y /n → e−y and Bn → B∞ ∼ Poisson(e−y ), by the law of small
numbers. We therefore have
−y
P{Yn ≤ y} = (1 − e−y /n)n → e−e
= P{B∞ = 0}.
−y
A random variable Y with distribution function F(y) = e−e
Exponential Extreme Value distribution, denoted Y ∼ EE(1).
Now, suppose T ∼ Exp(1). Then for all y ∈ R,
has the (standard) Double
−y
P{− log T ≤ y} = P{log T ≥ −y} = P{T ≥ e−y } = e−e ,
and so − log T ∼ EE(1).
Some exercises on the Extreme value distribution:
• If Y ∼ EE(1), what is its density?
• Fix k and let Mn:k be the kth largest of X1 , . . . , Xn . For y ∈ R, what is limn→∞ P{Mn:k −
log n ≤ y}?
Related documents