Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
13 The Cumulative Distribution Function Definition The cumulative distribution function of a random variable X is the function FX : R → R defined by FX (r) = P(X ≤ r) for all r ∈ R. Proposition 13.1 (Properties of the cumulative distribution function). Let X be a random variable. Then: (a) 0 ≤ FX (x) ≤ 1 for all x ∈ R. (b) FX (x) ≤ FX (y) whenever x ≤ y i.e. FX is an increasing function. (c) P(a < X ≤ b) = FX (b) − FX (a) for all a, b ∈ R with a ≤ b. (d) limx→−∞ FX (x) = 0. (e) limx→∞ FX (x) = 1. The proof of this proposition follows easily from the definition of FX . Remark Suppose X is a discrete random variable. If we know one of the probability mass function and cumulative distribution function of X then we can determine the other. For example, if the range of X is {0, 1, 2, . . .}, then, for all r ∈ R, X FX (r) = P(X = i) 0≤i≤r and, for all k ∈ {0, 1, 2, ...}, P(X = k) = P(X ≤ k) − P(X ≤ k − 1) = FX (k) − FX (k − 1). 14 14.1 Continuous Random Variables Introduction to Continuous Random Variables If the set of values taken by a random variable does not satisfy the definition of a discrete random variable (for example the values taken form an interval in R) then we have to use different techniques. We can no longer work with the 1 probability mass function. However, the cumulative distribution function is still useful. One important family of random variables which are not discrete is described by the following definition. Definition A random variable X is continuous if its cumulative distribution function FX is a continuous function. If X is a continuous random variable then we must have P(X = x) = 0 for all x ∈ R. This implies that the probability mass function gives no information on the distribution of X. It also implies that P(X < x) = P(X ≤ x). Definition Let X be a continuous random variable. Then a median of X is a number m such that FX (m) = 1/2. The lower and upper quartiles of X are the numbers `, u such that FX (`) = 1/4 and FX (u) = 3/4. More generally, the number ak is a kth percentile of X if FX (ak ) = k/100. Definition The probability density function of a continuous random variable X is the function fX we obtain by differentiating the cumulative distribution function FX . So d fX (x) = FX (x). dx I’ve been a little informal here as fX is not defined at points where FX is not differentiable. We can either leave it undefined at these points or give it any reasonable values. It is a fact (from calculus) that the cumulative distribution function of a continuous random variable is differentiable except possibly at a few “corners”, so whatever we do will make no difference to integrals involving fX . Everything that follows will be unaffected by the value of fX at these “bad” points. Proposition 14.1 (Properties of the probability density function). Let X be a continuous random variable. Then: (a) fX (x) ≥ 0 for all x ∈ R. (b) P(a < X ≤ b) = FX (b) − FX (a) = a ≤ b. Rb (c) FX (b) = −∞ fX (x)dx for all b ∈ R. R∞ (d) −∞ fX (x)dx = 1. 2 Rb a fX (x)dx for all a, b ∈ R with This follows from the definition of fX and Proposition 13.1. (We use the Fundamental Theorem of Calculus to deduce (b).) The probability density function plays a similar role in the theory of continuous random variables as the probability mass function in the theory of discrete random variables. In particular we can use it to define the expectation and variance of a continuous random variable. Definition Suppose X is a continuous random variable with probability density function fX . Then Z ∞ E(X) = xfX (x)dx −∞ and Z ∞ Var(X) = [x − E(X)]2 fX (x)dx. −∞ The variance can also be written as follows (compare Proposition 11.1): Z ∞ Var(X) = x2 fX (x)dx − E(X)2 . (1) −∞ The properties of E and Var that we proved in the discrete case (Propositions 11.3 and 11.4) also hold for continuous random variables. We also have the result that, if X is a continuous random variable and g : R → R is a continuous function, then g(X) is also a continuous random variable and Z ∞ E(g(X)) = g(x)fX (x)dx. −∞ In particular we may rewrite equation (1) above as Var(X) = E(X 2 ) − E(X)2 . Note that in all these definitions the integrals go from −∞ to ∞. However, in practice the probability density function is often 0 outside a smaller range and so we can integrate over this smaller range only (see examples in notes and on problem sheets). 14.2 Some Special Continuous Probability Distributions As for the discrete case the probability distributions of some continuous random variables occur so frequently that we give them special names. We look at two such distributions. 3 The Uniform Distribution Suppose that a real number X is chosen from the interval [a, b], in such a way that the probability the number is in any given sub-interval of [a, b] is proportional to the length of the sub-interval. We say that X has the uniform distribution on [a, b] and write X ∼ Uniform[a, b] or X ∼ U [a, b]. Informally, X is equally likely to be anywhere in the interval. It is not difficult to see that the cumulative distribution function and probability density function of X are given by if x < a 0 x−a if a ≤ x ≤ b FX (x) = b−a 1 if x > b and ½ fX (x) = 1 b−a if a ≤ x ≤ b otherwise 0 To find the expectation and variance just substitute this fX into the definitions and integrate. We obtain E(X) = (a + b)/2 and Var(X) = (b − a)2 /12. The Exponential Distribution The second special distribution we look at is related to the Poisson distribution. Suppose that, on average, λ incidents occur in a unit time interval. Then for any fixed x ∈ R with x ≥ 0, the number of incidents occurring in a given time interval of length x will be a discrete random Y which has the Poisson(λx) distribution. Instead of counting the number of incidents in a fixed interval, we look at the time T at which the first incident occurs (so T is a continuous random variable.) We say that T has the exponential distribution and write T ∼ Exponential(λ) or T ∼ Exp(λ). We used the connection with the Poisson distribution in lectures to show that the cumulative 4 distribution function of T is given by: FT (x) = = = = P(T ≤ x) 1 − P(T > x) 1 − P(there are no incidents in the interval [0, x]) 1 − P(Y = 0) (λx)0 = 1 − e−λx 0! = 1 − e−λx , if x ≥ 0 and FT (x) = 0 if x < 0. Note that FT is a non-decreasing continuous function which tends to 1. Differentiating gives the probability density function ½ 0 if t < 0 fT (t) = −λt λe if t > 0. The expectation and variance of the exponential distribution can be found by integrating (hint: use integration by parts). We obtain: E(T ) = 1 ; λ Var(T ) = 1 . λ2 Another important continuous probability distribution is the normal distribution; you will meet this in your statistics module next semester. 5