Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
http://statwww.epfl.ch 3. Discrete Random Variables Contents Idea of a random variable; probability mass function. Types of discrete random variables: Bernoulli, indicator, binomial, geometric, hypergeometric, Poisson. Distribution function and its properties. Expectation, variance, and other moments. Conditional distributions. Convergence of distributions, and moment generating function. References: Ross (Chapter 4); Ben Arous notes (Chapters III, IV). Exercises: 60–64, 66–78 of Recueil d’exercices. Probabilité et Statistique I — Chapter 3 1 http://statwww.epfl.ch Petit Vocabulaire Probabiliste Mathematics English Français random experiment une expérience aléatoire Ω sample space l’ensemble fondamental ω outcome, elementary event une épreuve, un événement élémentaire event un événement event space l’espace des événements sigma-algebra une tribu probability distribution/probability function une loi de probabilité probability space un espace de probabilité inclusion-exclusion formulae formule d’inclusion-exclusion probability of A given B la probabilité de A sachant B independence indépendance (mutually) independent events les événements (mutuellement) indépendants pairwise independent events les événements indépendants deux à deux conditionally independent events les événements conditionellement indépendants random variable une variable aléatoire I indicator random variable une variable indicatrice fX probability mass function fonction de masse FX probability distribution function fonction de répartition A, B, . . . F P (Ω, F , P) P(A | B) X, Y, . . . Probabilité et Statistique I — Chapter 3 2 http://statwww.epfl.ch Random Variables In applications we usually want to consider numerical random quantities. Example 3.1: A family has three children. Let X represent the number of boys. Possible values for X are {0, 1, 2, 3}. Find the probabilities that X takes these values. • Definition: Let (Ω, F, P) be a probability space. A random variable X : Ω 7→ R is a mapping from the sample space Ω to the real numbers R. If the range of X, D = {x ∈ R : ∃ω ∈ Ω such that X(ω) = x} is countable, then X is called a discrete random variable. Probabilité et Statistique I — Chapter 3 • 3 http://statwww.epfl.ch This induces probabilities on subsets S of the real line, given by P(X ∈ S) = P({w ∈ Ω : X(w) ∈ S}). In particular, we set Ax = {ω ∈ Ω : X(ω) = x}. Example 3.2 (Coin): A coin is tossed repeatedly and independently. Let X be the random variable representing the number of tosses needed until the first head shows. Compute P(X = 3), P(X = 15), P(X ≤ 3.5), P(X > 1.7), P(1.7 ≤ X ≤ 3.5). • Example 3.3 (Dartboard): A natural set Ω when I play at darts is the wall on which the dartboard hangs. The dart lands at a point ω ∈ Ω ⊂ R2 . My score X(ω) takes values in D = {0, 1, . . . , 60}. • Probabilité et Statistique I — Chapter 3 4 http://statwww.epfl.ch Bernoulli Random Variable Definition: A random variable that takes only the values 0 and 1 is called an indicator random variable, or a Bernoulli random variable, or sometimes a Bernoulli trial. We use I(C) to denote the indicator of the event C. Example 3.4 (Coins): Suppose that n identical coins are tossed independently, let Hi be the event that the ith coin shows a head, and let Ii = I(Hi ) be the indicator of this event. Then P(Ii = 1) = P(Hi ) = p, P(Ii = 0) = P(Hic ) = 1 − p, where p is the probability that a coin shows a head. Write down the sample space and the sets Ax when n = 3. What is the random variable X = I1 + · · · + In ? • Probabilité et Statistique I — Chapter 3 5 http://statwww.epfl.ch Probability Mass Function We have already seen that a random variable X induces probabilities on subsets of R. In particular when X is discrete, we have Ax = {ω ∈ Ω : X(ω) = x}, and can define: Definition: The probability mass function (pmf ) (fonction de masse) of X is the function fX (x) = P(X = x) = P(Ax ), x ∈ R. The probability mass function has two key properties: (i) fX (x) ≥ 0, and is positive only for x ∈ D, where D is the range of X, also called P the support of fX ; (ii) the total probability {i:xi ∈D} fX (xi ) = 1. When there is no risk of confusion we write fX ≡ f . Probabilité et Statistique I — Chapter 3 6 http://statwww.epfl.ch Binomial Random Variable Example 3.4 (ctd): Compute the probability mass functions and support of Ii and of X. • Definition: A binomial random variable X has pmf n x fX (x) = p (1 − p)n−x , x = 0, 1, . . . , n, n ∈ N, 0 ≤ p ≤ 1. x We write X ∼ B(n, p), and call n the denominator and p the success probability. • Note: we use ∼ as shorthand for ‘has the distribution’. The binomial model is used when considering the number of “successes” occurring in a fixed number of independent trials, and each trial has the same success probability. Probabilité et Statistique I — Chapter 3 7 http://statwww.epfl.ch Binomial Probability Mass Functions f(x) 0.15 0.00 0.00 f(x) 0.15 0.30 B(10,0.3) 0.30 B(10,0.5) 4 6 8 10 0 2 4 6 x x B(20,0.1) B(40,0.9) 8 10 f(x) 0.15 0.00 0.00 f(x) 0.15 0.30 2 0.30 0 0 5 10 x Probabilité et Statistique I — Chapter 3 15 20 0 10 20 x 30 40 8 http://statwww.epfl.ch Example 3.5: Certain physical traits are determined by a pair of genes, of which there are two types: dominant d and recessive r. A person with genotype dd has pure dominance, one with dr is hybrid, and one with rr is recessive. The genotypes dd and rd have the same phenotype, so cannot be distinguished physically. A child receives a gene from each parent. If the parents are both hybrid and have 4 children, what is the probability that just three of them show the dominant trait? What is the probability that at most three of them show the dominant trait? • Theorem (Stability of binomial): Let X ∼ B(n, p) and Y ∼ B(m, p) be independent binomial random variables. Then X + Y ∼ B(n + m, p). Probabilité et Statistique I — Chapter 3 • 9 http://statwww.epfl.ch Waiting Times Definition: A geometric random variable X has pmf fX (x) = p(1 − p)x−1 , x = 1, 2, . . . , 0 ≤ p ≤ 1. We write X ∼ Geom(p), and call p the success probability. • This is used to model a waiting time to the first event in a series of independent trials, each with the same success probability. Example 3.6: To start a board game, players take it in turns to throw a die. The first to obtain a six starts. What is the probability that the 3rd player starts? What is the probability of waiting until at least 6 throws before starting? • Theorem (Lack of memory): If X ∼ Geom(p), then P(X > n + m | X > m) = P(X > n). Probabilité et Statistique I — Chapter 3 • 10 http://statwww.epfl.ch Geometric and Negative Binomial PMFs 0.0 0.0 f(x) 0.2 0.4 Geom(0.1) f(x) 0.2 0.4 Geom(0.5) 0 10 20 x+1 30 40 0 10 30 40 30 40 f(x) 0.10 0.00 0.00 f(x) 0.10 0.20 NegBin(6,0.3) 0.20 NegBin(4,0.5) 20 x+1 0 10 20 x+4 Probabilité et Statistique I — Chapter 3 30 40 0 10 20 x+6 11 http://statwww.epfl.ch Definition: A negative binomial random variable X with parameters n and p has pmf x−1 n fX (x) = p (1 − p)x−n , x = n, n + 1, n + 2, . . . , 0 ≤ p ≤ 1. n−1 We write X ∼ NegBin(n, p). When n = 1, X ∼ Geom(p). • This is used to model a waiting time to the nth success in a series of independent trials, each with the same success probability. Example 3.7: Two players toss a fair coin successively. What is the probability that 2 heads appear before 5 tails? • Theorem (Stability of negative binomial): Let X1 , . . . , Xn be independent geometric random variables with success probability p. Then X1 + · · · + Xn ∼ NegBin(n, p). • Probabilité et Statistique I — Chapter 3 12 http://statwww.epfl.ch Banach’s Match Problem Example 3.8: A pipe-smoking mathematician carries a box of matches in each of the pockets of his jacket, one on the right and one on the left. Initially both boxes contain m matches. Each time he lights his pipe, he chooses a box of matches at random, and throws the spent match away. After a while he finds that the box he has chosen is empty. What is then the distribution of the number of matches in the other box? • Probabilité et Statistique I — Chapter 3 13 http://statwww.epfl.ch Hypergeometric Distribution Example 3.9 (Capture-recapture): In order to estimate the unknown number of fish N in a lake, we first catch r ≤ N fish, mark them, and put them back. After waiting long enough for the fish population to be well-mixed, we take a further sample of size s, of which 0 ≤ m ≤ s are marked. Let M be the random variable representing the number of marked fish in this sample. Show that N −r r P(M = m) = m s−m N s , m ∈ {max(0, s + r − N ), . . . , min(r, s)}. This is the pmf of the hypergeometric distribution. Show that the value of N that maximises this P(M = m) is brs/mc. Compute the best estimate of N when s = 50, r = 40, and m = 4. • Probabilité et Statistique I — Chapter 3 14 http://statwww.epfl.ch Example 3.10: An electrician buys components in packets of 10. He examines three components chosen at random from a packet, and accepts the packet only if the three chosen are faultless. If 30% of packets contain 4 bad components and the other 70% contain just one bad component, what proportion of packets does he reject? • Probabilité et Statistique I — Chapter 3 15 http://statwww.epfl.ch Distribution Function Definition: Let X be a random variable. Its cumulative distribution function (CDF) (fonction de répartition) is FX (x) = P(X ≤ x), x ∈ R. If X is discrete, this can be written as X FX (x) = P(X = xi ), {xi ∈D:xi ≤x} which is a step function with jumps at the support of fX (x), i.e. {x ∈ R : fX (x) > 0}. When there is no risk of confusion, we write F ≡ FX . Probabilité et Statistique I — Chapter 3 16 http://statwww.epfl.ch Example 3.11: Give the support, pmf, and distribution function • for a Bernoulli random variable. Example 3.12 (Die): Give the support, pmf, and distribution function for the value obtained when a fair die is thrown. • Definition: A discrete uniform random variable X has probability mass function 1 , x = a, a + 1, . . . , b, a < b, a, b ∈ Z. fX (x) = b−a+1 • Definition: A Poisson random variable X has probability mass function λx −λ fX (x) = e , x = 0, 1, . . . , λ > 0. x! We write X ∼ Pois(λ). Probabilité et Statistique I — Chapter 3 • 17 http://statwww.epfl.ch Poisson Probability Mass Functions 0.0 0.0 f(x) 0.3 0.6 Pois(1) f(x) 0.3 0.6 Pois(0.5) 0 5 10 x 15 20 0 5 15 20 15 20 f(x) 0.10 0.00 0.00 f(x) 0.10 0.20 Pois(10) 0.20 Pois(4) 10 x 0 5 10 x Probabilité et Statistique I — Chapter 3 15 20 0 5 10 x 18 http://statwww.epfl.ch Properties of a Distribution Function Theorem : Let (Ω, F, P) be a probability space and X : Ω 7→ R a random variable. Its cumulative distribution function FX satisfies: (a) limx→−∞ FX (x) = 0; (b) limx→∞ FX (x) = 1; (c) FX is non-decreasing, that is, FX (x) ≤ FX (y) whenever x ≤ y; (d) FX is continuous to the right, that is, lim FX (x + t) = FX (x), t↓0 x ∈ R; (e) P(X > x) = 1 − FX (x); (f) if x < y, then P(x < X ≤ y) = FX (y) − FX (x). Probabilité et Statistique I — Chapter 3 19 http://statwww.epfl.ch Note: The pmf is obtained from the CDF by f (x) = F (x) − lim F (y), y↑x where y < x. In many cases X takes only integer values, and then f (x) = F (x) − F (x − 1) for integer x. Example 3.13 (Urn): An urn contains tickets labelled 1, . . . , n, from which r are drawn at random. Let X be the largest number removed if the tickets are replaced in the urn after each drawing, and let Y be the largest number removed if the drawn tickets are not replaced. Find fX (x), FX (x), fY (x), and FY (x). Show that FY (k) < FX (k) for k = 1, . . . , n − 1. • Example 3.14 (Poisson): Find FX (x) when X ∼ Pois(λ). Probabilité et Statistique I — Chapter 3 • 20 http://statwww.epfl.ch Poisson Cumulative Distribution Functions 0.0 0.0 F(x) 0.4 0.8 Pois(1) F(x) 0.4 0.8 Pois(0.5) 0 5 10 x 15 20 0 5 15 20 15 20 0.0 0.0 F(x) 0.4 0.8 Pois(10) F(x) 0.4 0.8 Pois(4) 10 x 0 5 10 x Probabilité et Statistique I — Chapter 3 15 20 0 5 10 x 21 http://statwww.epfl.ch Transformations of Discrete Random Variables Real-valued functions of random variables are themselves random variables, so they too have probability mass functions. Theorem : If X and Y are random variables such that Y = g(X), then Y has probability mass function X fX (x). fY (y) = x:g(x)=y • Example 3.15: Find the pmf of Y = I(X > 0) when X ∼ Pois(λ). Example 3.16: Let Y be the reminder when the total from a throw of two independent dice is divided by 4. Find the pmf of Y . Probabilité et Statistique I — Chapter 3 22 http://statwww.epfl.ch Mathematical Honesty From now on we mostly ignore the underlying probability space (Ω, F, P) when dealing with a random variable X and think in terms of X, FX (x), and fX (x). It can be proved that this is mathematically legitimate. Probabilité et Statistique I — Chapter 3 23 http://statwww.epfl.ch 3.2 Expectation Definition: Let X be a discrete random variable for which P x∈D |x|fX (x) < ∞, where D is the support of fX . The expectation (l’espérance) or mean of X is defined to be X X E(X) = xP(X = x) = xfX (x). x∈D Note: E(X) is sometimes called the average value (la moyenne) of X. We confine the use of ‘average’ to empirical quantities. Example 3.17: Find the expected score on the throw of a fair die. • Example 3.18: Find the means of the random variables with pmfs fX (x) = 4 , x(x + 1)(x + 2) Probabilité et Statistique I — Chapter 3 fY (x) = 1 , x(x + 1) x = 1, 2, . . . . 24 http://statwww.epfl.ch Example 3.19: Find the mean of a Bernoulli variable with probability p. • Example 3.20: Find the mean of X ∼ B(n, p). • Theorem : Let X be a random variable with mass function f , and let g be a real-valued function on R. Then X E{g(X)} = g(x)f (x), x whenever P x • |g(x)|f (x) < ∞. Example 3.21: Let X ∼ Pois(λ). Find the expectations of X, X(X − 1), X(X − 1) · · · (X − r + 1), cos(θX). • Note: Expectation is analogous to the idea from mechanics of the centre of mass of an object whose mass is distributed according to fX . Probabilité et Statistique I — Chapter 3 25 http://statwww.epfl.ch Properties of Expectation Theorem : Let X be a random variable with finite mean E(X), and let a, b be constants. Then (a) E(·) is a linear operator, i.e. E(aX + b) = aE(X) + b; (b) if P(X = b), then E(X) = b; (c) if P(a < X ≤ b) = 1, then a < E(X) ≤ b; (d) if g(X) and h(X) have finite means, then E{g(X) + h(X)} = E{g(X)} + E{h(X)}; (e) finally, {E(X)}2 ≤ {E(|X|)}2 ≤ E(X 2 ). • Probabilité et Statistique I — Chapter 3 26 http://statwww.epfl.ch Note: The linearity of expectation is extremely useful in practice. Example 3.22: Let X = I1 + · · · + In , where I1 , . . . , In are independent Bernoulli variables with probability p. Find E(X). Is independence of the Ii needed? • Example 2.16 (Matching, ctd): Show that the expected number of men who leave with the correct hats is 1, for all n. • Example 3.23 (Indicator random variables): Let IA , IB , . . . denote indicators of events A, B, . . .. Show that IA∩B = IA IB , IA∪B = 1 − (1 − IA )(1 − IB ), E(IA ) = P(A). and hence establish the inclusion-exclusion formulae. Probabilité et Statistique I — Chapter 3 • 27 http://statwww.epfl.ch Moments of a Distribution Definition: If X has a pmf f (x) such that (a) the rth moment of X is E(X r ); P x |x|r f (x) < ∞, then (b) the rth central moment of X is E[{X − E(X)}r ]; (c) the rth factorial moment of X is E{X(X − 1) · · · (X − r + 1)}; (d) the variance of X is var(X) = E[{X − E(X)}2 ]. Note: Of these the mean and variance are most important, as they measures the location and spread of fX . The variance is analogous to the moment of inertia in mechanics. Example 3.24: Find the variance of the score when a die is cast. • Probabilité et Statistique I — Chapter 3 28 http://statwww.epfl.ch Properties of Variance Theorem : Let X be a random variable whose variance exists, and let a, b be constants. Then var(X) = E(X 2 ) − E(X)2 = E{X(X − 1)} + E(X) − E(X)2 ; var(aX + b) = a2 var(X); var(X) = 0 ⇒ X is constant with probability 1. Example 3.25: Find the various moments of a Poisson random variable. Probabilité et Statistique I — Chapter 3 • 29 http://statwww.epfl.ch Theorem : If X takes values in {0, 1, . . .}, r ≥ 2, and E(X) < ∞, then E(X) = ∞ X P(X ≥ x), x=1 ∞ X E{X(X − 1) · · · (X − r + 1)} = r (x − 1) · · · (x − r + 1)P(X ≥ x). x=r • Example 3.26: Let X ∼ Geom(p). Find E(X) and var(X). • Example 3.27 (Coupons): Each packet of some product is equally likely to contain any one of n different types of coupon, independently of every other packet. What is the expected number of packets you must buy to obtain at least one of each type of coupon?• Probabilité et Statistique I — Chapter 3 30 http://statwww.epfl.ch 3.3 Conditional Distributions Definition: Let (Ω, F, P) be a probability space, on which a random variable X is defined, and let B ∈ F. Then the conditional probability mass function of X given B is fX (x | B) = P(X = x | B) = P(Ax ∩ B)/P(B), where Ax = {ω ∈ Ω : X(ω) = x}. Theorem : The function fX (x | B) satisfies X fX (x | B) ≥ 0, fX (x | B) = 1, x and so is a well-defined probability mass function. • Example 3.28: Find the conditional pmf of the result of tossing a die, given that the result is odd. • Probabilité et Statistique I — Chapter 3 31 http://statwww.epfl.ch Example 3.29: Find the conditional pmf of X ∼ Geom(p), given that X ≤ n. • P Definition: Suppose that x |g(x)|fX (x | B) < ∞. Then the conditional expectation of g(X) given B is X E{g(X) | B} = g(x)fX (x | B). x Theorem : Let X be a random variable with mean E(X) and let B be an event with P(B), P(B c ) > 0. Then E(X) = E(X | B)P(B) + E(X | B c )P(B c ). More generally, whenever {Bi }∞ i=1 is a partition of Ω, P(Bi ) > 0 for all i, and the sum is absolutely convergent, then E(X) = ∞ X E(X | Bi )P(Bi ). i=1 Probabilité et Statistique I — Chapter 3 32 http://statwww.epfl.ch Example 3.30: The truncated Poisson distribution is defined by taking X ∼ Pois(λ) and B = {X > 0}. Find the conditional probability mass function, mean, and variance for this distribution. • Example 3.31: A coin is tossed repeatedly. Find the expected numbers of tosses to the first head, and to the first two consecutive heads. • Probabilité et Statistique I — Chapter 3 33 http://statwww.epfl.ch Example 3.32: Bilbo the hobbit and Smaug the dragon have b and s gold coins respectively. They play a series of independent games in which the loser gives the winner a gold coin, stopping when one of them has no coins remaining. If Bilbo wins each game with probability p (and p 6= q = 1 − p), find the expected number of games before they stop. They then redivide the b + s coins by tossing them all. One player gets those showing a head, and the other player gets the rest. Now they play as before. What is the expected number of games until one • or other player has all the coins? Probabilité et Statistique I — Chapter 3 34 http://statwww.epfl.ch 3.4 Convergence of Distributions In applications we often want to approximate one distribution by another. The mathematical basis for doing so is provided by convergence of distributions. Example 3.33 (Law of small numbers): Let Xn ∼ B(n, p), and suppose that np → λ > 0 while n → ∞. Show that the limiting • probability mass function of Xn is Pois(λ). Example 3.34 (Matching, again): In Example 2.16 we saw that the probability of exactly r fixed points in a random permutation of n objects is n−r 1 X (−1)k e−1 → as n → ∞. r! k! r! k=0 Thus the number of fixed points has a limiting Pois(1) distribution. • Probabilité et Statistique I — Chapter 3 35 http://statwww.epfl.ch Law of Small Numbers 0.00 0.00 f(x) 0.15 B(20,0.25) f(x) 0.15 B(10,0.5) 5 10 15 0 5 x B(50,0.1) Pois(5) 10 15 10 15 0.00 0.00 f(x) 0.15 x f(x) 0.15 0 0 5 10 x Probabilité et Statistique I — Chapter 3 15 0 5 x 36 http://statwww.epfl.ch Definition: Let f (x) be a probability mass function which is non-zero for x ∈ D, and zero for x ∈ R\D = C. Let F (x) be the corresponding distribution function X F (x) = f (xi ). xi ≤x A sequence of distribution functions Fn (x) is said to converge to F (x) if Fn (x) → F (x) for x ∈ C as n → ∞. The corresponding random variables {Xn } are then said to converge in distribution (or in law) to a random variable X, that is, D Xn −→ X, where Xn has distribution function Fn and X has distribution function F . • If D ⊂ Z, then Fn (x) → F (x) if fn (x) → f (x) for all x as n → ∞. Probabilité et Statistique I — Chapter 3 37 http://statwww.epfl.ch Example 3.35: Let XN have hypergeometric probability mass function m N −m P(XN = i) = i n−i N n , i = max(0, m + n − N ), . . . , min(m, n). This is the distribution of the number of white balls obtained when a random sample of size n is taken without replacement from an urn containing m white and N − m black balls. Show that as N, m → ∞ in such a way that m/N → p, where 0 < p < 1, n i P(XN = i) → p (1 − p)n−i , i = 0, . . . , n. i Thus the limiting distribution of XN is B(n, p). Probabilité et Statistique I — Chapter 3 • 38 http://statwww.epfl.ch Inequalities Theorem (Basic inequality): If h(x) is a non-negative function, then for a > 0, P{h(X) ≥ a} ≤ E{h(X)}/a. Theorem : Let a > 0 and let g be a convex function. Then: P(|X| ≥ a) ≤ E(|X|)/a, (Markov’s inequality) P(|X| ≥ a) ≤ E(X 2 )/a2 , (Chebyshov’s inequality) var(X) P{X − E(X) ≥ a} ≤ , (one-sided Chebyshov’s inequality) 2 a + var(X) E{g(X)} ≥ g{E(X)}. (Jensen’s inequality) Theorem : If var(X) = 0, then X is constant with probability one. Probabilité et Statistique I — Chapter 3 39 http://statwww.epfl.ch 3.5 Moment Generating Functions Definition: The moment generating function of a random variable X is defined as MX (t) = E(etX ), for t ∈ R such that MX (t) < ∞. Example 3.36: Find MX (t) when: (a) X is an indicator random variable; (b) X ∼ B(n, p), (c) X ∼ Pois(λ). Theorem : There is a one-one correspondence between distribution functions FX (x) and moment generating functions MX (t). Example 3.37: Let X ∼ B(n, p) and Y ∼ Pois(λ). Show that as D n → ∞, p → 0 in such a way that np → λ, X −→ Y , that is, X converges in distribution to Y . Probabilité et Statistique I — Chapter 3 40