Download Continuous Distributions - Department of Statistics, Yale

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Simulated annealing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Fisher–Yates shuffle wikipedia , lookup

Transcript
Page 1
Chapter 6
Continuous Distributions
In principle it is easy to calculate probabilities such as P{Bin(30, p) ≥ 17} for various
values of p: one has only to sum the series
µ ¶
µ ¶
30 18
30 17
p (1 − p)12 + . . . + (1 − p)30
p (1 − p)13 +
18
17
With a computer (compare with the Matlab m-file BinProbs.m) such a task would not be as
arduous as it used to be back in the days of hand calculation. In this Chapter I will discuss
another approach, based on an exact representation of the sum as a beta integral, as a sneaky
way of introducing the the concept of a continuous distribution. (Don’t worry if you have
never heard of the beta integral; I’ll explain it all.)
For many purposes it would suffice to have to good approximation to the sum. The best
known method—the normal approximation, due to de Moivre (1733)—will be described in
Chapter 7. The beta integral will be starting point for one derivation of the normal approximation.
Binomial to beta
•uniformly
distributed
•continuous
•discrete
The connection between the Binomial distribution and the beta integral becomes evident
when we consider a special method for simulating coin tosses. Start from a random variable U that is uniformly distributed on the interval [0, 1]. That is,
P{a ≤ U ≤ b} = b − a
distribution
distribution
for all 0 ≤ a ≤ b ≤ 1.
This distribution is denoted by Uniform[0, 1]. It is a different sort of distribution from the
geometric or Binomial. Instead of having only a discrete range of possible values, U ranges
over a continuous interval. It is said to have a continuous distribution. Instead of
giving probabilities for U taking on discrete values, we must specify probabilities for U to
lie in various subintervals of its range. Indeed, if you put a equal to b you will find that
P{U = b} = 0 for each b in the interval [0,1].
To distinguish more clearly between continuous distributions and the sort of distributions we have been working with up to now, a random variable like X n that take values in a
discrete range, will be said to have a discrete distribution.
Of course, to actually simulate a Uniform[0, 1] distribution on a computer one would
work with a discrete approximation. For example, if numbers were specified to only 7 decimal places, one would be approximating Uniform[0,1] by a discrete distribution placing
probabilities of about 10−7 on a fine grid of about 107 equi-spaced points in the interval.
You might think of the Uniform[0, 1] as a convenient idealization of the discrete approximation.
For a fixed n (such as n = 30), generate independently n random variables U1 , . . . , Un ,
each distributed uniformly on [0, 1]. Fix a p in [0, 1]. Then the independent events
{U1 ≤ p}, {U2 ≤ p}, . . . , {Un ≤ p}
Statistics 241: 12 October 1997
c David
°
Pollard
Chapter 6
Continuous Distributions
are like n independent flips of a coin that lands heads with probability p. The number, X n ,
of such events that occur has a Bin(n, p) distribution.
The trick for reexpressing Binomial probabilities as integrals involves new random variables defined from the Ui . Write T1 , . . . , Tn for the values of the U1 , . . . , Un rearranged into
increasing order:
U4
0
U1
p
U5 U2
U6
U3 1
T1
T2
T3
T4
•order
The {Ti } are called the order statistics of the {Ui }, and are often written as U(1) ,
U(2) ,. . . , U(n) or Un:1 , Un:2 , . . . , Un:n . (Exercise: What is P{U(1) = U1 }?)
The random variables Ti , which have continuous distributions, are related to X n , which
has a discrete distribution, by the equivalence:
statistics
X n ≥ k if and only if Tk ≤ p.
That is, there are k or more of the Ui ’s in [0, p] if and only if the kth smallest of them is
in [0, p]. Thus
P p {X n ≥ k} = P p {Tk ≤ p}.
I have added a subscript p to the probability to remind you that the definition of X n involves p; the random variable X n has a Bin(n, p) distribution.
Now all we have to do is find the distribution of Tk , or, more specifically, find the probability that it lies in the interval [0, p].
The distribution of the order statistics from the uniform distribution
To specify the distribution of the random variable Tk we need to find the probability that it
lies in a subinterval [a, b], for all choices of a, b with 0 ≤ a < b ≤ 1.
Start with a simpler case where the interval is very short. For 0 < t < t + δ < 1 and δ
very small, find P{t ≤ Tk ≤ t + δ}.
Decompose first according to the number of points {Ui } in [t, t + δ]. If there is only
one point in [t, t + δ] then we must have exactly k − 1 points in [0, t) to get t ≤ Tk ≤ t + δ.
If there are two or more points in [t, t + δ] then it becomes more complicated to describe
all the ways that we would get t ≤ Tk ≤ t + δ. Luckily for us, the contributions from all
those complicated expressions will turn out to be small enough to ignore if δ is small. Let
us calculate.
<6.1>
P{t ≤ Tk ≤ t + δ} = P{exactly 1 point in [t, t + δ], exactly k − 1 points in [0, t)}
+ P{t ≤ Tk ≤ t + δ, two or more points in [t, t + δ]}
Let me first dispose of the second contribution on the right-hand side of <6.1>. The
indicator function of the event
F2 = {t ≤ Tk ≤ t + δ, two or more points in [t, t + δ]}
is less than the sum of indicator functions
X
1{Ui , U j both in [t, t + δ]}
1≤i< j≤n
You should check this assertion by verifying that the sum of indicators is nonnegative and
that it takes a value ≥ 1 if the event F2 occurs. Take expectations, remembering that the
Statistics 241: 12 October 1997
c David
°
Pollard
Page 2
Chapter 6
Continuous Distributions
probability of an event is equal to the expectation of its indicator function, to deduce that
X
P{Ui , U j both in [t, t + δ]}
PF2 ≤
By symmetry, all
¡n ¢
2
1≤i< j≤n
terms in the sum are equal to
P{U1 , U2 both in [t, t + δ]}
by independence
= P{t ≤ U1 ≤ t + δ}P{t ≤ U2 ≤ t + δ}
2
=δ .
¡n ¢ 2
Thus PF2 ≤ 2 δ , which tends to zero much faster than δ as δ → 0. (The value of n stays
fixed throughut the calculation.)
Next consider the first contribution on the right-hand side of <6.1>. The event
F1 = {exactly 1 point in [t, t + δ], exactly k − 1 points in [0, t)}
can be broken into disjoint pieces like
{U1 , . . . , Uk−1 in [0, t), Uk in [t, t + δ], Uk+1 , . . . , Un in (t + δ, 1]}.
Again by virtue of the independence between the {Ui }, the piece has probability
P{U1 < t}P{U2 < t} . . . P{Uk−1 < t}P{Uk in [t, t + δ]}P{Uk+1 > t + δ} . . . P{Un > t + δ},
Invoke the defining property of the uniform distribution to factorize the probability as
t k−1 δ(1 − t − δ)n−k = t k−1 (1 − t)n−k δ + terms of order δ 2 or smaller.
¡ n ¢
ways to choose the k − 1 of the Ui ’s to
How many such pieces are there? There are k−1
land in [0, t), and for each of these ways there are n − k + 1 ways to choose the observation
to land in [t, t + δ]. The remaining observations must go in (t + δ, 1]. We must add up
µ
¶
n
n!
× (n − k + 1) =
(k − 1)!(n − k)!
k−1
pieces with the same probability to calculate PF1 .
Consolidating all the small contributions from PF1 and PF2 we get
<6.2>
P{t ≤ Tk ≤ t + δ} =
n!
t k−1 (1 − t)n−k δ + terms of order δ 2 or smaller.
(k − 1)!(n − k)!
The function
<6.3>
•density
function
n!
t k−1 (1 − t)n−k
(k − 1)!(n − k)!
is called the density function for the distribution ofTk .
f (t) =
for 0 < t < 1
Calculate the probability that Tk lies in a longer interval [a, b] by breaking the interval
into many short pieces. For m a large integer, let I1 , . . . , Im be the disjoint subintervals with
lengths δ = (b − a)/m and left end points ti−1 = a + (i − 1)δ.
f(.)
t0 = a
ti-1 = a+(i-1)δ
Statistics 241: 12 October 1997
tm = b
c David
°
Pollard
Page 3
Chapter 6
Continuous Distributions
From <6.2>,
P{Tk ∈ Ii } = f (ti )δ + terms of order δ 2 or smaller
Sum over the subintervals.
P{Tk ∈ [a, b]} = δ
m
X
f (ti ) + remainder of order δ or smaller.
i=1
Notice how m contributions of order δ 2 (or smaller) can amount to a remainder of order at
worst δ (or smaller), because m increases like 1/δ. (Can you make this argument rigorous?)
Pm
f (ti ) is an approximation to the integral of f over [a, b]. As δ tends
The sum δ i=1
to zero, the sum converges to that integral. The remainder terms tend to zero with δ. The
left-hand side just sits there. In the limit we get
Z b
f (t) dt,
P{Tk ∈ [a, b]} =
a
where f denotes the density function from <6.3>.
<6.4>
Definition. A random variable Y is said to have a continuous distribution with
density function g(·) if
Z b
P{a ≤ Y <≤ b} =
g(t) dt
a
for all intervals [a, b]. In particular (at least at points t where g(·) is continuous)
P{t ≤ Y ≤ t + δ} = g(t)δ + terms of order δ or smaller.
¤
Notice that g must be non-negative, for otherwise some tiny interval would receive a
negative probability. Also
Z ∞
g(t) dt.
1 = P{−∞ < Y < ∞} =
−∞
In particular, for the density of the Tk distribution,
Z ∞
Z 0
Z 1
n!
0dt +
t k−1 (1 − t)n−k dt +
0dt
1=
(k − 1)!(n − k)! 0
−∞
1
That is,
<6.5>
Z
(k − 1)!(n − k)!
,
n!
0
a fact that you might try to prove by direct calculation.
1
t k−1 (1 − t)n−k dt =
Remark:I prefer to think of densities as being defined on the whole real line,
with values outside the range of the random variable being handled by setting the
density function equal to zero appropriately. That way my integrals always run
over the whole line with the zero density killing off unwanted contributions. This
convention will be useful when we consider densities that vanish outside a range
depending on a parameter of the distribution; it will also help us avoid some amusing calculus blunders.
•beta
function
The Tk distribution is a member of a family whose name is derived from the beta
function, defined by
Z 1
t α−1 (1 − t)β−1 dt
for α > 0, β > 0.
B(α, β) =
0
Equality <6.5> gives the value for B(k, n − k + 1). If we divide through by B(α, β) we get
a candidate for a density function: non-negative and integrating to 1.
Statistics 241: 12 October 1997
c David
°
Pollard
Page 4
Chapter 6
<6.6>
Continuous Distributions
Definition. For α > 0 and β > 0 the Beta(α, β) distribution is defined by the density
function
t α−1 (1 − t)β−1
for 0 < t < 1
B(α, β)
The density is zero outside (0, 1).
¤
For example, Tk has Beta(k, n − k + 1) distribution.
β=1
β=2
β=3
β=4
β=5
Beta densities: tα-1 (1-t) β-1 /B(α,β) for 0 < t <1 and vertical range (0,5)
α=1
α=2
α=3
α=4
α=5
See the Matlab m-file drawbeta.m for the calculations used to draw all these density functions.
A Matlab digression
The function beta in Matlab calculates the beta function, defined for z > 0 and w > 0 by
Z 1
t z−1 (1 − t)w−1 dt.
beta(z, w) =
0
The function betainc in Matlab calculates the incomplete beta function, defined by
Z x z−1
t (1 − t)w−1
betainc(x, z, w) =
dt
for 0 ≤ x ≤ 1.
beta(z, w)
0
Expectation of a random variable with a continuous distribution
Consider a random variable X whose distribution has density function f (·). Let W = g(X )
be a new random variable defined as a function of X . How can we calculate EW ?
Statistics 241: 12 October 1997
c David
°
Pollard
Page 5
Chapter 6
Continuous Distributions
Construct discrete approximations to g by cutting the line into small intervals of length
δ. Define
0
= max{g(x) : iδ ≤ x < (i + 1)δ}
gi,δ
00
gi,δ = min{g(x) : iδ ≤ x < (i + 1)δ}
0
00
≈ g(iδ) ≈ gi,δ
. Define upper (Wδ0 ) and lower (Wδ00 )
Near points where g is continuous, gi,δ
0
00
and Wδ00 = gi,δ
if iδ ≤ X < (i + 1)δ, for
approximations to g(X ) by putting Wδ0 = gi,δ
i = 0, ±1, ±2, . . ..
Notice that Wδ00 ≤ g(X ) ≤ Wδ0 always. Rule E3 for expectations (see Chapter 2) therefore gives
EWδ00 ≤ Eg(X ) ≤ EWδ0
Both Wδ0 and Wδ00 have discrete distributions, for which we can find expected values by
the conditioning rule E4:
X
P{iδ ≤ X < (i + 1)δ}E(Wδ0 | iδ ≤ X < (i + 1)δ)
EWδ0 =
i
≈
X
δ f (iδ)g(iδ)
i
<6.7>
R∞
You should recognize the last sum as an approximation to the integral −∞ g(x) f (x) d x.
The expectation EWδ00 approximates the same integral.
As δ → 0, both approximating sums converge to the integral. The fixed value EW gets
sandwiched between converging upper and lower bounds. In the limit we must have
Z ∞
Eg(X ) =
g(x) f (x) d x
−∞
when X has a continuous distribution with density f (·).
Compare with the formula for a random variable X ∗ taking only a discrete set of values
x1 , x2 , . . .:
X
g(xi )P{X ∗ = xi }
Eg(X ∗ ) =
i
In the passage from discrete to continuous distributions, discrete probabilities get replaced by
densities and sums get replaced by integrals.
You should be very careful not to confuse the formulae for expectations in the discrete
and continuous cases. Think again if you find yourself integrating probabilities or summing expressions involving probability densities.
Statistics 241: 12 October 1997
c David
°
Pollard
Page 6