Download Cumulative Distribution Functions and Continuous Random Variables

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Statistics wikipedia, lookup

Probability wikipedia, lookup

The Cumulative Distribution Function
Definition The cumulative distribution function of a random variable X is
the function FX : R → R defined by
FX (r) = P(X ≤ r)
for all r ∈ R.
Proposition 13.1 (Properties of the cumulative distribution function). Let
X be a random variable. Then:
(a) 0 ≤ FX (x) ≤ 1 for all x ∈ R.
(b) FX (x) ≤ FX (y) whenever x ≤ y i.e. FX is an increasing function.
(c) P(a < X ≤ b) = FX (b) − FX (a) for all a, b ∈ R with a ≤ b.
(d) limx→−∞ FX (x) = 0.
(e) limx→∞ FX (x) = 1.
The proof of this proposition follows easily from the definition of FX .
Remark Suppose X is a discrete random variable. If we know one of the
probability mass function and cumulative distribution function of X then we
can determine the other. For example, if the range of X is {0, 1, 2, . . .}, then,
for all r ∈ R,
FX (r) =
P(X = i)
and, for all k ∈ {0, 1, 2, ...},
P(X = k) = P(X ≤ k) − P(X ≤ k − 1) = FX (k) − FX (k − 1).
Continuous Random Variables
Introduction to Continuous Random Variables
If the set of values taken by a random variable does not satisfy the definition
of a discrete random variable (for example the values taken form an interval in
R) then we have to use different techniques. We can no longer work with the
probability mass function. However, the cumulative distribution function is
still useful. One important family of random variables which are not discrete
is described by the following definition.
Definition A random variable X is continuous if its cumulative distribution
function FX is a continuous function.
If X is a continuous random variable then we must have P(X = x) = 0 for all
x ∈ R. This implies that the probability mass function gives no information
on the distribution of X. It also implies that P(X < x) = P(X ≤ x).
Definition Let X be a continuous random variable. Then a median of X is
a number m such that FX (m) = 1/2. The lower and upper quartiles of X are
the numbers `, u such that FX (`) = 1/4 and FX (u) = 3/4. More generally,
the number ak is a kth percentile of X if FX (ak ) = k/100.
Definition The probability density function of a continuous random variable
X is the function fX we obtain by differentiating the cumulative distribution
function FX . So
fX (x) =
FX (x).
I’ve been a little informal here as fX is not defined at points where FX
is not differentiable. We can either leave it undefined at these points or give
it any reasonable values. It is a fact (from calculus) that the cumulative
distribution function of a continuous random variable is differentiable except
possibly at a few “corners”, so whatever we do will make no difference to
integrals involving fX . Everything that follows will be unaffected by the
value of fX at these “bad” points.
Proposition 14.1 (Properties of the probability density function). Let X
be a continuous random variable. Then:
(a) fX (x) ≥ 0 for all x ∈ R.
(b) P(a < X ≤ b) = FX (b) − FX (a) =
a ≤ b.
(c) FX (b) = −∞ fX (x)dx for all b ∈ R.
(d) −∞ fX (x)dx = 1.
fX (x)dx for all a, b ∈ R with
This follows from the definition of fX and Proposition 13.1. (We use the
Fundamental Theorem of Calculus to deduce (b).)
The probability density function plays a similar role in the theory of continuous random variables as the probability mass function in the theory of
discrete random variables. In particular we can use it to define the expectation and variance of a continuous random variable.
Definition Suppose X is a continuous random variable with probability
density function fX . Then
Z ∞
E(X) =
xfX (x)dx
Var(X) =
[x − E(X)]2 fX (x)dx.
The variance can also be written as follows (compare Proposition 11.1):
Z ∞
Var(X) =
x2 fX (x)dx − E(X)2 .
The properties of E and Var that we proved in the discrete case (Propositions
11.3 and 11.4) also hold for continuous random variables. We also have
the result that, if X is a continuous random variable and g : R → R is a
continuous function, then g(X) is also a continuous random variable and
Z ∞
E(g(X)) =
g(x)fX (x)dx.
In particular we may rewrite equation (1) above as
Var(X) = E(X 2 ) − E(X)2 .
Note that in all these definitions the integrals go from −∞ to ∞. However, in practice the probability density function is often 0 outside a smaller
range and so we can integrate over this smaller range only (see examples in
notes and on problem sheets).
Some Special Continuous Probability Distributions
As for the discrete case the probability distributions of some continuous random variables occur so frequently that we give them special names. We look
at two such distributions.
The Uniform Distribution
Suppose that a real number X is chosen from the interval [a, b], in such a
way that the probability the number is in any given sub-interval of [a, b] is
proportional to the length of the sub-interval. We say that X has the uniform
distribution on [a, b] and write X ∼ Uniform[a, b] or X ∼ U [a, b]. Informally,
X is equally likely to be anywhere in the interval. It is not difficult to see
that the cumulative distribution function and probability density function of
X are given by
if x < a
 0
if a ≤ x ≤ b
FX (x) =
 b−a
if x > b
fX (x) =
if a ≤ x ≤ b
To find the expectation and variance just substitute this fX into the
definitions and integrate. We obtain E(X) = (a + b)/2 and Var(X) = (b −
a)2 /12.
The Exponential Distribution
The second special distribution we look at is related to the Poisson distribution. Suppose that, on average, λ incidents occur in a unit time interval.
Then for any fixed x ∈ R with x ≥ 0, the number of incidents occurring
in a given time interval of length x will be a discrete random Y which has
the Poisson(λx) distribution. Instead of counting the number of incidents
in a fixed interval, we look at the time T at which the first incident occurs
(so T is a continuous random variable.) We say that T has the exponential
distribution and write T ∼ Exponential(λ) or T ∼ Exp(λ). We used the connection with the Poisson distribution in lectures to show that the cumulative
distribution function of T is given by:
FT (x) =
P(T ≤ x)
1 − P(T > x)
1 − P(there are no incidents in the interval [0, x])
1 − P(Y = 0)
= 1 − e−λx
= 1 − e−λx ,
if x ≥ 0 and FT (x) = 0 if x < 0. Note that FT is a non-decreasing continuous function which tends to 1. Differentiating gives the probability density
if t < 0
fT (t) =
if t > 0.
The expectation and variance of the exponential distribution can be found
by integrating (hint: use integration by parts). We obtain:
E(T ) =
Var(T ) =
Another important continuous probability distribution is the normal distribution; you will meet this in your statistics module next semester.