Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Document related concepts

Transcript

7 Random variables A random variable is a real-valued function defined on some sample space. That is, it associates to each elementary outcome in the sample space a numerical value. Example 1. Consider tossing a coin n times. If X is the number of “heads” obtained, X is a random variable. Example 2. Consider a stock price which moves each day either up one unit or down one unit, and suppose its initial value is 10$. Let T be the first time the value of the stock hits either 0$ or 20$. Then T is a random variable. Example 3. The lifetime T of a lightbulb is a random variable. In the last example, if we can measure time with infinite precision, then the possible values of T are the non-negative real numbers [0, ∞). This is an uncountable set: there is no way to enumerate [0, ∞) in a sequence. We will have to treat random variables of this type separately from the random variables which take values in a countable set. While in practice time can only be measured up to finite precision and consequently the possible values of T will in fact be countable, it is still more convenient mathematically to make the idealization that all values in [0, ∞) are possible, and we will do so. 7.1 Distribution functions For a random variable X, we can associate the distribution function FX (·), sometimes called the cumulative distribution function, defined by FX (t) = P (X ≤ t) . (1) Notice that FX (·) is defined on all real numbers. The distribution function determines the probability that X falls in an interval: P (a < X ≤ b) = P (X ≤ b) − P (X ≤ a) = FX (b) − FX (a) . Example 4. Suppose a coin with probability p of landing “heads” is tossed until the first time a “heads” appears. Let T be the number of tosses required. For a real number t, let [t] denote the integer part of t. We have P (T > t) = P (T > [t]) = P (first [t] tosses are “tails”) = (1 − p)[t] . 1 Consequently, ( 1 − (1 − p)[t] FT (t) = P (T ≤ t) = 0 if t ≥ 0 , if t < 0 . We can use this to compute P (2 < T ≤ 5) = 1 − (1 − p)5 − 1 − (1 − p)2 = (1 − p)2 − (1 − p)5 . We may want to find the probability that X falls in a closed interval. To this end, we need the following: Proposition 1. Let X be a random variable with distribution function F . Then P (X < t) = lim F (s) . s↑t The value lims↑t F (s) is called the left limit of F at t, and is denoted sometime by F (t−). Part of the conclusion of Proposition 1 is that a distribution function has left limits everywhere. To prove this, we first need to show that a probability P (·) obeys a certain kind of continuity. Lemma 2. (i) Let A1 ⊂ A2 ⊂ A3 ⊂ · · · be a non-decreasing sequence of events. Then lim P (An ) = P ( n→∞ ∞ [ Ak ) . k=1 (ii) Let A1 ⊃ A2 ⊃ A3 ⊃ · · · be a non-increasing sequence of events. Then lim P (An ) = P ( n→∞ ∞ \ Ak ) . k=1 Proof. We prove (i). The proof of (ii) is obtained by looking at complements and using (i), and is left to the reader as an exercise. Sn S∞ S∞Define Bk = Ak \ Ak−1 . The events {Bk } are disjoint, An = k=1 Bk , and k=1 Ak = k=1 Bk . Thus, P( ∞ [ k=1 Ak ) = P ( ∞ [ k=1 Bk ) = ∞ X P (Bk ) k=1 = lim n X n→∞ k=1 2 P (Bk ) = lim P ( n→∞ n [ k=1 Bk ) = lim P (An ) . n→∞ S 1 Proof of Proposition 1. Notice that {X < t} = ∞ k=1 {X ≤ t − k }. Then applying Lemma 2 gives 1 1 P (X < t) = lim P (X ≤ t − ) = lim F (t − ) = lim F (s) . n→∞ n→∞ s↑t n n Thus, we can also use the distribution function of X to calculate other probabilities involving X: P (a ≤ X ≤ b) = P (X ≤ b) − P (X < a) = F (b) − F (a−) P (X = a) = P (X ≤ a) − P (X < a) = F (a) − F (a−) . (2) A random variable X is proper if P (−∞ < X < ∞) = 1. Almost all random variables we will encounter will be proper, but it is worth noting that there do exist random variables which are not proper. Example 5. Suppose a particle moves on the integer {. . . , −3, −2, −1, 0, 1, 2, 3, . . .} as follows: at each move, it moves up one integer with probability 2/3, and moves down one integer with probability 1/3. The particle starts at 0. Let T be the first time that the particle is at −1. The event that the particle never hits −1 is {T = ∞}. We will see later that P (T < ∞) < 1, so that T is not a proper random variable. S Writing {X < ∞} = ∞ k=1 {X < k}, if X is a finite random variable then applying Lemma 2 again shows 1 = P (X < ∞) = lim P (X ≤ n) = lim FX (n) . n→∞ n→∞ Also, 0 = P (X < −∞) = lim P (X ≤ −n) = lim FX (−n) = lim FX (n) . m→∞ n→∞ n→−∞ T∞ 1 Finally, since {X ≤ t} = k=1 {X ≤ t + k }, using part (ii) of Lemma 2 shows that lims↓t F (s) = F (t), and so F is right-continuous everywhere. We summarize the properties of the distribution function of a random variable X as follows: Proposition 3. Let X be a proper random variable with distribution function F . Then (i) F is right-continuous: lims↓t F (s) = F (t), (ii) the left-limits of F exist everywhere, 3 (iii) limt→∞ F (t) = 1. (iv) limt→−∞ F (t) = 0. The first two properties imply that the worst behavior possible of a distribution function is that it jumps. 8 Discrete random variables We call a random variable which can take on only countable many values a discrete random variable. 8.1 Probability mass functions Let X be a discrete random variable which takes values in the set A = {a0 , a1 , a2 , . . .}. Associated to X is the function pX (·), defined by pX (a) = P (X = a) . (3) The function pX (·) is defined for all real numbers, although it will be strictly positive only for a in the set A. A function p(·) satisfying (i) p(a) ≥ 0 for all a, P (ii) a p(a) = 1 is called a probability mass function, or pmf for short. It is easily checked that pX (·) satisfies these conditions, and we call it the pmf of X. We write X ∼ p(·) to indicate that X has pmf p(·). Notice that from (2) we have pX (a) = P (X = a) = FX (a) − FX (a−) , (4) so the pmf of X can be determined if the distribution function of X is known. Example 6. Suppose n independent experiments are performed, each of which can result in either “success” or “failure”, and suppose that the probability of success on each experiment is p. Such a sequence of experiments is called Bernoulli trials. Let X be the number of successes in these n experiments. The event {X = k} contains all outcomes containing 4 exactly k successes and n−k failures. There are nk such outcomes, each having probability pk (1 − p)n−k (by independence.) Thus n k pX (k) = P (X = k) = p (1 − p)n−k . (5) k A random variable X having the pmf in (5) is called a Binomial(n, p) random variable, and we write X ∼ Binomial(n, p). 8.2 The distribution of a discrete random variable An event determined by X is an event of the form {X ∈ A}, where A is a subset of the real numbers. We can find the probability of any event determined by X using only the pmf of X: X pX (a) . (6) P (X ∈ A) = a∈A Applying (6) to the set A = (−∞, t] gives FX (t) = P (X ≤ t) = X pX (a) . (7) a∈A Thus the distribution function of X can be computed from the pmf of X. To summarize, we record the following: Proposition 4. Let X be a discrete random variable. Each of the following can be computed using any of the others: (i) The probabilities of all events determined by X, that is, the collection of probabilities {P (X ∈ A) : A ⊂ R}, (ii) the pmf pX (·) of X, (iii) the distribution function FX (·) of X. Proof. This is the content of the combination of equations (4), (6), and (7). The collection of probabilities {P (X ∈ A) : A ⊂ R} is called simply the distribution of X. It contains all the probabilistic information about the random variable X. Proposition 4 says that for a discrete random variable, it is enough to specify either the pmf or the distribution function to specify the distribution. Thus, if one is asked to determine the distribution of X, it is sufficient to provide either the pmf or the distribution function. 5 9 Continuous random variables A probability density function (abbreviated pdf or sometimes simply density) is a realvalued function f defined on the real numbers satisfying (i) f (t) ≥ 0 for all real numbers t, R∞ (ii) −∞ f (t)dt = 1. A continuous random variable is a random variable X for which there exists a pdf fX so that Zb (8) P (a < X ≤ b) = fX (t)dt for all a < b . a In fact, if (8) holds, then for any subset of real numbers A such that the identity Z P (X ∈ A) = f (t)dt R A f (t)dt is defined, (9) A is valid. Applying (9) to the set (−∞, t] shows that Zt FX (t) = P (X ≤ t) = f (s)ds , (10) −∞ and so the distribution function of X can be determined from the density function of X. Note that a consequence of (10) is that FX is a continuous function for a continuous random variable, and in particular P (X = a) = FX (a) − FX (a−) = 0 . Applying the Fundamental Theorem of Calculus to (10) shows that d FX (t) = fX (t) , dt (11) at all points t where fX is continuous. Thus if fX is piecewise continuous, as will be the case in this course, then it can be determined from the distribution function via (11). The following summarizes the situation for continuous random variables with piecewise continuous densities: 6 Proposition 5. Let X be a continuous random variable with piecewise continuous density. Each of the following can be computed using any of the others: (i) The probabilities of all events determined by X, that is, the collection of probabilities R {P (X ∈ A) : A ⊂ R such that A f (t)dt is defined}, (ii) the pdf fX (·) of X, (iii) the distribution function FX (·) of X. Proof. This is what equations (9), (10), (11) say. 9.1 Interpretation of density function What is the interpretation of the density function? Suppose that X has a density f , which is continuous at the point a. We have P (a ≤ X ≤ a + ∆) F (a + ∆) − F (a) = . ∆ ∆ The right-hand side tends to F 0 (a) = f (a) as ∆ → 0. Thus we can write that P (a ≤ X ≤ a + ∆) = f (a) + ε0 (∆) , ∆ where ε0 (∆) → 0 as ∆ → 0. Multiplying both sides by ∆, we have that P (a ≤ X ≤ a + ∆) = f (a)∆ + ∆ε0 (∆) . | {z } ε(∆) If ε(∆) = ∆ε0 (∆), then ε(∆)/∆ → 0 as ∆ → 0. Thus we can write P (a ≤ X ≤ a + ∆) ≈ f (a)∆ , (12) where the error in the approximation is ε(∆) and satisfies ε(∆)/∆ → 0 as ∆ → 0. Equation (12) is useful in interpreting the meaning of a probability density function: the probability of X falling in a very small interval near a is approximated by f (a)∆, where ∆ is the length of the interval. 7 10 Expected Value Let X be a discrete random variable with 2 5 2 pX (a) = 51 5 0 the following pmf if if if if a = −1 , a = 0, a = 1, a 6∈ {−1, 0, 1} . How should the “average” value of X be defined? A first attempt might be to say that the average value should be 0, since 0 is in the center of the three possible values {−1, 0, 1}. But this does not take into account that X does not assume these values with equal probability. The average should account for not just the values taken on by X, but also the probabilities associated to each of these values. This leads to the definition of the expectation of X, which is a weighted average of the values of X, the weights determined by the pmf or pdf. Precisely, we define (P apX (a) if X is discrete (13) E(X) = R ∞a tfX (t)dt if X is continuous . −∞ E(X) is only defined when the sum or integral in (13) converges absolutely, that is, we need (P |a|pX (a) < ∞ if X is discrete R ∞a |t|fX (t)dt < ∞ if X is continuous −∞ In the example above, 2 1 1 2 E(X) = −1 + 0 + 1 = − . 5 5 5 5 We use the terms expected value, mean, and moment all to refer to the expectation of X. Example 7. Let X be a Binomial(n, p) random variable. This means that X has a pmf given by n k pX (k) = p (1 − p)n−k , k 8 for k = 0, 1, . . . , n. The pmf is 0 for any other values. Then n X n k E(X) = k p (1 − p)n−k k k=0 n X n! = k (n − k)!pk (1 − p)n−k k! k=1 = n X k−1 = np = np n! pk (1 − p)n−k (k − 1)!(n − k)! n X k=1 n−1 X (n − 1)! pk−1 (1 − p)n−1−(k−1) (k − 1)!(n − 1 − (k − 1))! (n − 1)! pk (1 − p)n−1−k k!(n − 1 − k)! k=0 | {z } (n−1 ) k | {z } pmf of Binomial(n − 1, p) r.v. = np . Example 8. We say that X is an Exponential random variable with parameter λ if it has a pdf ( 1 1 −λ e t if t ≥ 0 , λ f (t) = 0 if t < 0 . X has the property that P (X > t + s | X > t) = P (X > t) . (The reader should check that!) The expected value is the integral Z∞ E(X) = 1 1 t e− λ t dt . λ 0 We can evaluate this by integration by parts: Set 1 v = −e− λ t 1 1 dv = e− λ t dt λ u=t du = dt 9 so that Z∞ 0 Z∞ 1 ∞ 1 1 −1t t e λ dt = −te− λ t + e− λ t dt . λ 0 0 1 ∞ = −λe− λ t 0 =λ 10.1 Functions of random variables If g : R → R is a function, and X is a random variable, then Y = g(X) is a new random variable. To calculate E(Y ) according to the definition (13), we need the pmf if X is discrete, or the pdf if X is continuous. Fortunately, the following proposition tells us how to compute E(Y ) without finding its density or pmf. Proposition 6. Let X be a random variable, and g a real-valued function. (P g(a)pX (a) if X is discrete with pmf pX , E(g(X)) = R ∞a g(t)fX (t)dt if X is continuous with pdf fX . −∞ Proof. We prove the case where X is discrete: X bP (g(X) = b) E(g(X)) = b = X X b b a : g(a)=b = X X = X X = X b b P (X = a) bpX (a) a : g(a)=b g(a)pX (a) a : g(a)=b g(a)pX (a) . a An immediate corollary is the following: 10 Corollary 7. Let X be a random variable, and let α and β be constants. Then E(αX + β) = αE(X) + β . Proof. We write the continuous case, the discrete case is similar: Applying Proposition 6 to g(x) = αx + β gives Z E(αX + β) = (αt + β)fX (t)dt Z Z = α tfX (t) + β fX (t)dt = αE(X) + β . 11 Variance Expectation measures the center of mass of a density or pmf. Variance is a measure of how spread out the density or pmf of X is. The random variable Y = (X − E(X))2 gives the squared distance of X to its mean value. This measures how far X is from its center of mass. Taking the expectation of Y gives the variance of X: (P (a − E(X))2 pX (a) if X is discrete with pmf pX , V (X) = E(X − E(X))2 = R ∞a 2 (t − E(X)) fX (t)dt if X is continuous with pdf pX . −∞ The following is a useful way to compute variance Proposition 8. for a random variable X, V (X) = E(X 2 ) − [E(X)]2 . Proof. The proof is similar in the continuous and discrete cases, we show here the discrete 11 case: V (X) = X = X = X (a − E(X))2 pX (a) a (a2 − 2aE(X) + [E(X)]2 )pX (a) a a2 pX (a) − 2E(X) a X apX (a) + [E(X)]2 a a 2 2 = E(X ) − 2E(X)E(X) + [E(X)] = E(X 2 ) − [E(X)]2 12 X pX (a)