Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability Dr J. Marchini January 10, 2011 Contents 1 Lecture 1 1.1 Random Walks . . . . . . . . . . . . . . . . . 1.1.1 Reminders . . . . . . . . . . . . . . . . 1.1.2 Random Walks - simplest case . . . . 1.1.3 Random Walks - Questions of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 4 5 2 Lecture 2 2.1 Random Walks Continued . . . . . . . 2.2 Using Conditioning and Expectation . 2.2.1 Some reminders . . . . . . . . . 2.2.2 Application to Random Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 8 8 8 12 . . . . . . . . . . . . . . . . 3 Lecture 3 14 3.1 Using Conditioning and Expectation Continued . . . . . . . . . . 14 3.1.1 Application to Random Walks Continued . . . . . . . . . 14 4 Lecture 4 18 4.1 Using Conditioning and Expectation Continued . . . . . . . . . . 18 4.2 Random Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3 Sums of Random Variables . . . . . . . . . . . . . . . . . . . . . 21 5 Lecture 5 5.1 Continuous Random Variables . . . . . . . . . 5.1.1 Reminder: Discrete Random Variables 5.1.2 Continuous Random Variables . . . . 5.1.3 Properties: pdf’s and pmf’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 23 23 24 25 6 Lecture 6 26 6.1 Continuous Random Variables Continued . . . . . . . . . . . . . 26 6.1.1 Expectation of a Continuous Random Variable . . . . . . 27 6.1.2 Functions of Random Variables and their pdfs . . . . . . 31 1 7 Lecture 7 33 7.1 Continuous Random Variables Continued . . . . . . . . . . . . . 33 7.1.1 Functions of Random Variables and their pdfs Continued 33 7.1.2 The Normal Distribution . . . . . . . . . . . . . . . . . . 37 8 Lecture 8 8.1 Continuous Random Variables Continued . . . . 8.1.1 Jointly Continuous Distributions . . . . . 8.1.2 Expectation with Joint Distributions . . . 8.1.3 Independence . . . . . . . . . . . . . . . . 8.1.4 Random Samples for Continuous Random 2 . . . . . . . . . . . . . . . . . . . . . . . . Variables . . . . . . . . . . . . . . . 39 39 39 40 42 43 1 Lecture 1 1.1 Random Walks 1.1.1 Reminders We will need the Partition Theorem, which rests on the definitition of conditional probability as: P(A ∩ B) P(A | B) = P(A) Definition 1 A Partition {Bi } satisfies B1 ∪ . . . ∪ Bn = Ω with Bi ∩ Bj = ∅ when i 6= j, where Ω is the set of all possible outcomes (called the sample space). Theorem 1 (Partition Theorem) P(A) = n X P(A | Bi )P(Bi ) 1 Proof P(A) P(A ∩ Ω) P(A ∩ (B1 ∪ . . . ∪ Bn )) P((A ∩ B1 )) ∪ . . . ∪ (A ∩ Bn )) P(A ∩ B1 ) + . . . + P(A ∩ Bn ) n X = P(A ∩ Bi ) = = = = (de Morgan’s Law) (Bi disjoint) 1 now use definition of conditional probability = Pn 1 P(A | Bi )P(Bi ) Difference Equations - second reminder Example 1 (1st order) un − 2un−1 = 2n for n = 1, 2, 3, . . .. Complementary Solution un − 2un−1 = 0. Try un = A.λn which gives λ−2=0 =⇒ un = A.2n Particular Solution No good trying α2n as it is part of the complementary solution. Try un = αn2n which gives αn2n − 2α(n − 1)2n−1 ⇒ α2n α = 2n = 2n = 1 Hence un = A.2n + n2n . We would require a value of u0 , say, to fix A. 3 Example 2 (2nd order) un − 3un−1 + 2un−2 = 3n for n = 2, 3, . . .. Complementary Solution un − 3un−1 + 2un−2 = 0. Try un = A.λn ⇒ λ2 − 3λ + 2 = 0 ⇒ λ = 1 or 2 ⇒ un = A + B.2n Particular Solution Try un = α3n ⇒ α3n − 3α3n−1 + 2α3n−2 ⇒ α(32 − 32 + 2) ⇒ α = 3n = 32 = 29 Hence un = A + B.2n + 29 .3n and we would need uo and u1 to determine A and B. 1.1.2 Random Walks - simplest case A random walk on the integers 0, 1, 2, . . . , N . If the walk gets to 0 or N it stops - so these points are called absorbing boundaries. At each (discrete) unit of time the walk moves to an adjacent point. So suppose we are at point k: p = Probability of moving to k + 1 from k q = Probability of moving to k − 1 from k and in the simplest case, p + q = 1 and p, q do not depend on k. Example 3 (Gambler’s ruin) A gambler playing a series of games in a casino where in each game he has a probability p of winning £1 and probability q of losing £1. He has £k in his pocket. He stops if he loses all £k or wins and has £N in which case he leaves. 4 In this case we can assume p < q. At the start, we set X0 = k, say. X1 be the position after 1 move Let X2 be the position after 2 moves, and so on If X0 = k, then P(X1 = k + 1) = p P(X1 = k − 1) = q P(X1 = j) = 0 j 6= k + 1, k − 1 and P(X2 = k + 2) = p2 P(X2 = k) = 2pq P(X2 = k − 2) = q2 P(X2 = j) = 0 j 6= k + 2, k, k − 2 So we can build up probabilities of where the walk is after 2 moves, 3 moves and even n moves. But it gets very complicated. The gambler is interested in the probability of going broke or winning £N (some given large amount). This is a simpler question. 1.1.3 Random Walks - Questions of interest Question 1 What is the probability of absorption at 0? (i.e. arriving at 0 before N ). Solution The outcome is a path on the integers 0 to N , ending at either 0 or N and consists of the entire set of moves to absorption. Let wk be the probability of reaching 0 before N starting from k. (So the question is: what is the probability of absorption at 0 rather than N ?). 5 The partition we use is the set of first moves: k → k + 1 or k → k − 1. P(0 before N from k) ⇒ wk ⇒ wk = P(0 before N from k | k → k + 1)P(k → k + 1) + P(0 before N from k | k → k − 1)P(k → k − 1) = P(0 before N from k + 1)p + P(0 before N from k − 1)q = pwk+1 + qwk−1 with boundaries w0 = 1 and wN = 0. We now solve: pwk+1 − wk + qwk−1 = 0 which gives the auxilliary equation pλ2 − λ + q = 0 (pλ − q)(λ − 1) = 0 q hence λ = or 1 and wk = A p k q + B, assuming p 6= q. Furthermore: p from w0 : 1 = from wN : 0 = ⇒ 1 = A+B N q A +B p N ! q A 1− p which gives wk = = ( pq )k 1 − ( pq )N − ( pq )N 1 − ( pq )N ( pq )k − ( pq )N 1 − ( pq )N NB If you solve the problem of absorption at N before 0 you can check P(absorption) = 1 from any starting point. What happens if p = q? Then λ = 1 twice and you need to recall how to deal with a repeated root. 6 Question 2 Suppose we know absorption occurs at 0. What is the probability that the first move was k → k − 1? (we assume p 6= q). Solution P(k → k − 1 | 0 from k) P((k → k − 1) ∩ (0 from k)) P(0 from k) P(0 from k | k → k − 1)P(k → k − 1) P(0 from k) P(0 from k − 1)P(k → k − 1) P(0 from k) wk−1 q wk (( pq )k−1 − ( pq )N )q = = = = = ( pq )k − ( pq )N We would expect this result to be bigger than q - it is! Check this result for p > q and q > p. Example 4 Now suppose we randomly allocate the initial point on the integers 0, 1, . . . , N (again assuming p 6= q). What is the probability that absorption occurs at 0 and not N ? Partitioning over the starting points 0, 1, . . . , N , the Partition Theorem gives P(absorption at 0) = N X P(absorption at 0 | k)P(k) k=0 Random selection of the starting point gives P(k) = Hence: P(absorption at 0) = N X wk k=0 = = = = 1 , for k = 0, 1, . . . , N . N +1 1 N +1 q N q k N 1 X (p) − (p) N +1 1 − ( pq )N k=0 P N q k ( ) − (N + 1)( pq )N k=0 p 1 N +1 1 − ( pq )N 1 − ( pq )N +1 − (N + 1)(1 − ( pq ))( pq )N (N + 1)(1 − ( pq ))(1 − ( pq )N ) 1 − (N + 1)( pq )N + N ( pq )N +1 (N + 1)(1 − ( pq ))(1 − ( pq )N ) 7 2 Lecture 2 2.1 Random Walks Continued Example 5 In principle, we can deal with different move possiblities and probabilities. P(k → k − 1) = q P(k → k + 1) = p1 P(k → k + 2) = p2 with q + p1 + p2 = 1. Partitioning over these cases gives: P(0 from k) = P(0 from k | k → k − 1)q + P(0 from k | k → k + 1)p1 + P(0 from k | k → k + 2)p2 Let wk = P(getting to 0 before N ), then we have: wk = qwk−1 + p1 wk+1 + p2 wk+2 ⇒ 0 = p2 wk+2 + p1 wk+1 − wk + qwk−1 Obviously a much more complicated system - a third order difference equation. In principle we need to solve the cubic: wk = A.λk ⇒ p2 λ3 + p1 λ2 − λ + q = 0 but since we know that λ = 1 is still a root, in practice we only need to solve a quadratic. Note that the boundary conditions might become rather complicated and we would need 3 conditions. 2.2 2.2.1 Using Conditioning and Expectation Some reminders Definition 2 (Conditional Expectation) If X is a discrete random variable and A is an event with P(A) > 0, then the expectation of X given A, denoted X by E(X | A) is defined by E(X | A) = x P(X = x | A). x 8 Example 6 A coin with probability p of throwing a head is repeatedly tossed. Let a run be an unbroken series of heads or tails. What is the expected number in the run given that the first in the run is a head? Solution Let H be the event that the first throw in the sequence is a head, and X be the number in the run. Then: P(X = k | H) = pk−1 q for q = 1 − p and k = 1, 2, . . . Therefore E(X | H) ∞ X = k=1 ∞ X = k P(X = k | H) kpk−1 q k=1 ∞ X = q kpk−1 k=1 q (1 − p)2 1 q = = Theorem 2 (Partition Theorem for Expectation) For a discrete random variable X and a partition of the sample space {A1 , A2 , . . . , An } such that n X P(Ai ) > 0 for each i, then E(X) = E(X | Ai )P(Ai ). i=1 Proof Note that E(X | Ai ) = X xP(X = x | Ai ). This gives: x X E(X | Ai )P(Ai ) = i XX i x P(X = x | Ai )P(Ai ) x interchanging order of summation (assume absolute convergence) ! = X x X x P(X = x | Ai )P(Ai ) i then by the partition theorem, we have = X xP(X = x) x = E(X) 9 Example 7 Suppose in the coin experiment of the previous example, we now ask, what is the expected length of the first sequence? Solution Let X be the length of the first sequence, and partition over the two events: H = {1st toss results in a head} T = {1st toss results in a tail} Then: E(X) since E(X = x | H) = 1 q = E(X | H)P(H) + E(X | T )P(T ) 1 1 .p + .q = q p 1 − 2pq = pq and by symmetry E(X = x | T ) = 1 p Example 8 Suppose a coin is tossed (with p = q = 12 ) until 2 heads appear consecutively for the first time. Calculate the expected number of throws of the coin. Solution Partition on the 1st and 2nd throws: st HT = {1 T = {1st result is a tail} result is a head, 2nd result is a tail} HH = {1st result is a head, 2nd result is a head} ⇒ start all over again ⇒ start all over again ⇒ stop and let X be the total number of throws. Then: E(X) = E(X | T )P(T ) + E(X | HT )P(HT ) + E(X | HH)P(HH) 1 1 1 = (1 + E(X)) + (2 + E(X)) + 2. 2 4 4 3 3 = + E(X) 2 4 = 6 Try extending this argument to 3 heads consecutively. 10 Example 9 Suppose that household i for i = 1, 2, . . . , n has probability pi of owning at least one computer. From n households a subsample of m houses is to be selected at random. (a) If one household is selected at random, what is the probability that it has at least one computer? Suppose 1 if the selected household has at least one computer X= 0 otherwise What is E(X)? (b) If m households are randomly selected, what is the expected number with at least one computer? Solution (a) Suppose that household i is selected. Then we see P(X = 1|i selected) = P(X = 1|i) = pi . Partitioning on household i for i = 1, . . . , n gives: P(X = 1) = = n X 1 n X P(X = 1 | i)P(i selected) pi 1 1 n n = 1X pi n 1 Again, using partition for expectation: E(X) = = n X 1 n X E(X | i)P(i selected) (1.pi + 0.(1 − pi )) 1 n = 1X pi n 1 11 1 n (b) Let Z be the number of households with at least one computer if m households are randomly selected, and 1 if j th selected house has a computer 0 Xj = 0 otherwise with j = 1, 2, . . . be the j th selection (as opposed to the j th household). [Note that the {Xj0 } are not independent.] Then 0 Z = X10 + . . . + Xm 0 0 ⇒ E(Z) = E(X1 + . . . + Xm ) 0 0 = E(X1 ) + . . . + E(Xm ) n X m = pi n 1 as Xj0 are identically distributed and E(Xj0 ) = 2.2.2 (by linearity of E(·)) 1X pi , from part (a). n Application to Random Walks What is the expected number of steps to absorption (either 0 or N )? Let Xk be the number of steps to absorption from k. Then E(Xk ) = E(Xk | k → k − 1)P(k → k − 1) + E(Xk | k → k + 1)P(k → k + 1) and if we set ek = E(Xk ), then ek ⇒ pek+1 − ek + qek−1 = = = (1 + ek−1 )q + (1 + ek+1 )p −q − p −1 with boundary conditions e0 = eN = 0. Solving: Complementary Solution The homologous equation pek+1 − ek + qek−1 = 0 gives the auxilliary equation pλ2 − λ + q ⇒ λ = 0 = 1 or ⇒ = A+B ek q p k q , assuming p 6= q p Particular Solution Try ek = αk (ek = constant is no good as this is part of the complementary solution). Then pα(k + 1) − αk + qα(k − 1) = −1 (p − q)α = −1 −1 ⇒ α = p−q ⇒ 12 k q k General Solution Hence ek = A + B − . Solving for boundary p p−q conditions: e0 = 0 : 0 eN = 0 : ⇒ B = A+B 0 = N ! q 1− = p A+B − N q N − p p−q N p−q Hence ek = N N − (p − q)(1 − ( pq )N ) (p − q)(1 − ( pq )N ) [Similarly, we can solve for p = q = 12 .] 13 k q k − p p−q 3 Lecture 3 3.1 3.1.1 Using Conditioning and Expectation Continued Application to Random Walks Continued Example 10 (Reflecting barrier at N , absorption at 0) Suppose for some random walk with p 6= q, there is a barrier at N which simply reflects back. Solution Let ek be the expected number of steps to absorption (which can now only happen at 0). The general equation is the same: pek+1 − ek + qek−1 = −1 with e0 = 0, but at N E(XN ) eN = E(XN | N → N − 1)P(N → N − 1) = (1 + eN −1 ).1 so the boundary condition is eN = 1 + eN −1 . Solving: ek = A + B k q k − p p−q (from previous lecture) with e0 = 0 = A + B and eN = eN −1 + 1 gives ⇒ N q N A+B − p p−q = B = = ⇒ ek = N −1 q N −1 A+B − +1 p p−q N −1 2p2 p − (p − q)2 q −A N −1 k ! 2p2 p q k 1− − 2 (p − q) q p p−q for p 6= q and k = 0, 1, . . . , N . Note that we need to re-solve the difference equation when p = q = 21 . 14 Corollary 1 (to Partition Theorem fo Expectation) If Y = h(X) is a function of a discrete random variable X, and B1 , B2 , . . . , Bn is a partition of the sample space, then E(Y ) = E(h(X)) = n X E(h(X) | Bi )P(Bi ) = n X 1 E(Y | Bi )P(Bi ). 1 In particular, if the probability generating function (p.g.f.) of X GX (s) = ∞ X px sx = E(sX ) x=0 then E(sX ) = n X E(sX | Bi )P(Bi ) 1 Proof Any function of a random variable is itself a random variable. Example 11 Suppose that there are N red balls, N white balls and 1 blue ball in an urn. A ball is selected at random and then replaced. Let X be the number of red balls selected before a blue ball is chosen. Find: (a) the probability generating function of X, (b) E(X), (c) Var X Solution Condition on the first selected ball, i.e. partition on the colour of the first selection, Red (R), White (W) or Blue(B). Then (a) = E(sX ) = E(sX | R)P(R) + E(sX | W )P(W ) + E(sX | B)P(B) N 1 N + E(sX ) + E(s0 ) = E(s1+X ) 2N + 1 2N + 1 2N + 1 N N 1 = sE(sX ) + E(sX ) + 1. 2N + 1 2N + 1 2N + 1 N 1 Ns GX (s) + GX (s) + = 2N + 1 2N + 1 2N + 1 1 GX (s) = N + 1 − Ns GX (s) ⇒ which is the p.g.f. for a geometric distribution with parameter p = 15 1 N +1 . (b) E(X) = = = = G0X (1) N (N + 1 − N s)2 N 1 −1 + p s=1 (c) Var X = = = = = G00X (1) + G0X (1) − (G0X (1))2 2N 2 + N − N2 (N + 1 − N s)3 s=1 2N 2 + N − N2 13 N (N + 1) q p2 If we were asked for just E(X), it would be easier to calculate: E(X) = = = E(X | R)P(R) + E(X | W )P(W ) + E(X | B)P(B) N 1 N + E(X). + 0. (1 + E(X)). 2N + 1 2N + 1 2N + 1 N (as before) To calculate Var X, it’s easier to find GX (s) = E(sX ) first. Corollary 2 Consider the special case when the partition is given by another discrete random variable Z, so that we want E(X) and the partition is given by {Z = n}n=0,1,2,... then E(X) = ∞ X E(X | Z = n)P(Z = n) n=0 provided this sum converges absolutely. For those with an eye to mathematical elegance, this is more succinctly expressed as E(E(X | Z)) = E(X). Example 12 In a comercial market garden, let Xi be the number of fruit produced by a plant which germinates from a seed and K be the number of seeds which germinate from a total of n seeds planted. Let Xi for i = 0, 1, 2, . . . , K be independent random variables PK having Poisson distributions with mean µ, let K ∼ B(n, p), and let Z = 0 Xi be the total number of fruit which the comercial grower has from the planted seeds. Find the expected number of fruit and the variance. 16 Solution First of all: E(sXi ) = ∞ X sk P(Xi = k) 0 = = ∞ X sk uk e−µ e 0 µs−µ k! Hence EXi = VarXi = µ. Also K ∼ B(n, p), hence: GK (s) ⇒ E(K) ⇒ VarK = E(sK ) = (q + ps)n = G0K (1) = np = G00K (1) + G0K (1) − (G0K (1))2 = np(1 − p). This gives: E(Z) = = = = n X k=0 n X 0 n X 0 n X 0 E(Z | K = k)P(K = k) E(X1 + . . . + Xn )P(K = k) kE(X1 )P(K = k) n k kµ p (1 − p)n−k k = µE(K) = µnp. 17 4 Lecture 4 4.1 Using Conditioning and Expectation Continued Solution (continued) E(sZ ) = n X E(sZ | K = k)P(K = k) 0 = = n X 0 n X 0 = ⇒ G0Z (s) ⇒ E(sX1 +...+Xk )P(K = k) (e X1 , . . . , Xk independent n k ) p (1 − p)n−k k µs−µ k (1 − p + peµs−µ )n = µnpeµs−µ (1 − p + peµs−µ )n−1 G00Z (s) = µnp.µeµs−µ (1 − p + peµs−µ )n−1 +µnpeµs−µ (n − 1)pµeµs−µ (1 − p + peµs−µ )n−2 = µ2 np + µ2 np2 (n − 1) + µnp − µ2 n2 p2 ⇒ VarZ = µ2 np − µ2 np2 + µnp = µ2 np(1 − p) + µnp Alternatively we could find the variance directly. X1 , . . . , XK are independent (here all Poisson) and K (here Binomial) independent of Xi . We require VarZ = Var(X1 + . . . + XK ) = E(Z 2 ) − (E(Z))2 Hence E(Z 2 ) = = n X k=0 n X E(Z 2 | K)P(K = k) E((X1 + . . . + Xk )2 )P(K = k) 0 = n X 0 E(X12 + . . . + Xk2 + X1 X2 + . . . + Xk−1 Xk )P(K = k) {z } {z } | | k(k−1) terms k terms Also E(X12 ) = VarX1 + µ2X = µ + µ2 for k terms and E(X1 X2 ) = E(X1 ).E(X2 ) = µ2 | {z } X1 ,X2 independent 18 for k(k − 1) terms So E(Z 2 ) = n X (k(µ + µ2 ) + k(k − 1)µ2 )P(K = k) 0 = n X (kµ + k 2 µ2 )P(K = k) 0 = µ n X 2 kP(K = k) + µ 0 ⇒ VarZ 4.2 n X k 2 P(K = k) 0 2 µK ) 2 2 = µE(K) + µ2 (VarK + = µnp + µ2 (np(1 − p) + n p ) = µnp + µ2 np(1 − p) + µ2 n2 p2 − µ2 n2 p2 = µnp + µ2 np(1 − p) Random Samples Definition 3 Let X1 , X2 , . . . , Xn denote n independent random variables, each of which has the same distribution. These random variables are said to constitute a random sample from the distribution. Statistics often involves random samples where the distribution (“parent distribution”) is unknown. A realisation of such a random sample is used to make inferences about the distribution. n Definition 4 The sample mean is X̄ = 1X Xi . n 1 This is a key random variable which itself has an expectation and a variance. Expectation of X̄ n E(X̄) = E( 1X Xi ) n 1 n = 1X E(Xi ) n 1 by linearity of E 1 .nE(Xi ) n = E(Xi ) = Xi are identically distributed = µX , say Variance of X̄ VarX̄ = E((X̄ − µX )2 ) 19 Lemma 3 Var(X + Y ) = VarX + VarY + 2Cov(X, Y ) where the covariance is defined to be Cov(X, Y ) = E((X − µX )(Y − µY )). Proof E((X + Y − µX − µY )2 ) E(((X − µX ) + (Y − µY ))2 ) = = E((X − µx )2 ) + 2E((X − µX )(Y − µY )) + E((Y − µY )2 ) = (by linearity of E) VarX + 2Cov(X, Y ) + VarY If X, Y are independent then Cov(X, Y ) = 0 since E((X − µX )(Y − µY )) = E(X − µX ).E(Y − µY ) = 0 | {z } by indepencence By a straightforward, but cumbersome extension, Var(X1 + . . . + Xk ) = k X VarXi + i=1 k X X Cov(Xi , Xj ) i=1 j6=i and so if Xi , Xj are all independent then Var(X1 + . . . + Xk ) = k X VarXi . 1 So returning to X̄ we have VarX̄ 1 Var( (X1 + . . . + Xn )) n 1 = Var(X1 + . . . + Xn ) n2 n 1 X VarXi since Xi independent = n2 1 = = = 1 .nVar(Xi ) n2 1 VarXi . n since X1 , . . . , Xn indentically distributed This is the variance of X̄ for a random sample. Example 13 Let X1 , . . . , Xn be a random sample from a Bernoulli distribution with parameter p. E(Xi ) = p, VarXi = p(1 − p). Hence E(X̄) = p and Var(X̄) = np (1 − p). 20 4.3 Sums of Random Variables Because the sample mean X̄ is important, it is generally the case that the sum X1 + . . . + Xn is also important. Example 14 Sum of two independent Poisson variables. Suppose X1 ∼ P oi(µ), X2 ∼ P oi(µ). What is the distribution of Z = X1 + X2 ? Solution X1 E(s ) ∞ X = 0 ∞ X = sk PK = k sk 0 µk −µ e k! eµs−µ = Hence GX1 +X2 (s) = E(sX1 +X2 ) = E(sX1 .sX2 ) = E(sX1 ).E(sX2 ) = ⇒ X1 + X2 since X1 , X2 are independent eµ(s−1) .eµ(s−1) = e2µ(s−1) ∼ P oi(2µ) Similarly for Xi ∼ P oi(µ), random sample X1 , . . . , Xn GX1 +...+Xn (s) = E(sX1 +...+Xn ) = E(sX1 ).E(sX2 ). . . . .E(sXn ) = ⇒ X1 + . . . + Xn e by independence nµ(s−1) ∼ P oi(nµ) A summary of discrete random variables is included in table 4.3, page 22. Finally: E(X) = G0X (1) VarX = G00X (1) + G0X (1) − (G0X (1))2 21 22 pk q n−k (Sum of k independent Geometric type (a)) Negative Binomial P(X = n) = n−1 k−1 q n−k pk n = k, k + 1, k + 2, . . . k = 0, 1, . . . P(X = k) = q k p (b) min X = 0 k = 0, 1, 2, . . . k = 0, 1, . . . , n k = 1, 2, . . . µk −µ k! e n k P(X = k) = q k−1 p P(X = k) = P(X = k) = GX (s) = GX (s) = GX (s) = ps 1−qs k p kq p2 q p2 q p p 1−qs q p2 µ np(1 − p) = npq Variance pq 1 p µ np Mean p ps 1−qs k GX (s) = eµ(s−1) GX (s) = (q + ps)n Table 1: Summary of Discrete Random Variables Mass Function Range of Values Generating function P(X = 1) = p {0,1} GX (s) = q + ps P(X = 0) = (1 − p) = q Geometric (a) min X = 1 (sum of independent Poisson is Poisson) Poisson P oi(µ) (sum of n independent i.d. Bernoulli) Binomial B(n, p) Distribution Bernoulli B(1, p) 5 5.1 5.1.1 Lecture 5 Continuous Random Variables Reminder: Discrete Random Variables A discrete random variable is: X : Ω −→ a countable subset (could be finite) of R For each outcome there is an associated real number which is the observed value of X, e.g. toss a fair coin 3 times so that p(H) = 12 , and X is the number of heads, then X : {T T T, HT T, . . . , HHH} −→ {0, 1, 2, 3} and for example P(X = 2) = P({HHT, HT H, T HH}) 3 1 . = 3. 2 X has a probability mass function (pmf): P(X = x) = px , for x in image of X with px > 0, X px = 1 x Then if the coin is fair, we know p0 = p3 = 1 8 and p1 = p2 = 38 . There is a fundamental function which we can also define which is shared with continuous random variables. Definition 5 The Cumulative Distribution Function (cdf ) of a random variable X is the function FX , with domain R and codomain [0, 1], defined by FX (x) = P(X 6 x) ∀x ∈ R Example 15 For a fair coin, tossed 3 times, p0 = 81 , p1 = p2 = 38 , p3 = Then 0 , x<0 1 , 06x<1 8 1 3 1 + = , 16x<2 P(X 6 x) = FX (x) = 8 8 2 1 3 3 7 + + = 8 , 26x<3 8 8 8 1 , x>3 23 1 8. Properties of F (cdf ) 1. Non-decreasing, 2. 0 6 F (x) 6 1, 3. As x → −∞, F (x) → 0; as x → ∞, F (x) → 1, 4. F (x) is right continuous, i.e. lim F (x + h) = F (x), h↓0 5. P(X ∈ (a, b ]) = F (b) − F (a). 5.1.2 Continuous Random Variables Definition 6 If X associates a real number with each possible outcome in the sample space Ω (X : Ω → R) in such a way that FX (x) exists for all x and is left continuous (as well as right continuous), then X is said to be a continuous random variable. What this actually means for us is that FX (x) is continuous for all x. In this course for continuous random variables, FX (x) is not only continuous but 0 also differentiable almost everywhere. Suppose f (x) = FX (x), then P(X ∈ (x − h, x ]) = FX (x) − FX (x − h) 0 = FX (x) − FX (x) + hFX (x) + O(h2 ) by taking the Taylor expansion. Taking the limit as h → 0 gives lim h↓0 P(X ∈ (x − h, x ]) 0 = fX (x)(= FX (x)) h Definition 7 If FX (x) is differentiable (almost everywhere) and is continuous dFX 0 is called the probability denstiy function (pdf) of then fX (x) = FX (x) = dx X. The Fundamental Theorem of Calculus then gives: P(X ∈ (a, b ]) = F (b) − F (a) Z b = fX (x)dx a Hence P(X < ∞) = “F (∞) − F (−∞)” Z ∞ fX (x)dx = 1 = −∞ 0 Additionally, F is non-decreasing gives that FX (x) = fX (x) > 0. N.B. If F is a step function, then X is a discrete random variable. 24 5.1.3 Properties: pdf ’s and pmf ’s pdf (continuous) fX (x) > 0 ∀x ∈ R Z ∞ fX (x) = 1 pmf (discrete) px > 0 P −∞ Z ∀x in image of X x px = 1 x FX (x) = fX (x)dx FX (x) = −∞ P k6x pk Why do we need continuous random variables with their pdf’s? A Suppose you wish to calculate the number of working adults who regularly contribute to charity. You might model this number as X out of n, where n is the total number of working adults in the UK. We could, in theory model this as a binomial B(n, p) where p = P(adult contributes), but n is measured in millions. So instead consider Y ≈ X n as a continuous random variable, the “proportion”, with observations in [0, 1 ]. B Some outcomes are essentially continuous. Suppose you are making a precision part for a piece of sophisticated equipment (e.g. a NASA rocket) and a specific length is required. There will be tolerances below and above which parts have to be rejected. Modern measuring techniques imply looking at a continuous outcome for the length of the part. Example 16 In one gram of soil there are possibly 1010 microbes. If one was looking for the number of the most abundant species, one would not use a discrete r.v. Instead, we would use a continuous r.v., say X to be the proportion of the most abundant species. Hence X takes values in (0, 1 ). Earthquake data We have data on the time in days between successive serious earthquakes worldwide which measured at least 7.5 on the Richter Scale, or killed over 1000 people. Data was recorded between 16/12/1902 and 4/03/1977 for 63 earthquakes, which gives 62 recorded waiting times, with a minimum of 9 days and a maximum of 1901 days. If earthquakes occur at random, an exponential model should fit. Histogram Area is proportional to the frequency of times within the interval on the time axis (x-axis). 25 6 Lecture 6 6.1 Continuous Random Variables Continued Example 17 Due to variations in a commercial coffee blender, the proportion of coffee in a coffee-chicory mis is represented by a continuous random variable X with pdf cx2 (1 − x) x ∈ [0, 1] fX (x) = 0 otherwise Find the constant c and an expression for the cdf. Solution • The constant, c ∞ Z 1 = fX (x)dx −∞ Z 1 cx2 (1 − x)dx = 0 = c ⇒c = x4 x3 − 3 4 1 0 12 • The cdf Z FX (x) x = fX (x)dx −∞ = 0, Rx fX (x)dx, 0 1, x<0 06x<1 x>1 Since Z x 2 12x (1 − x)dx = 0 ⇒ FX (x) x4 x3 − 3 4 12 0, 3 = 12 x3 − 1, 26 x<0 x4 4 , 06x<1 x>1 Example 18 The duration in minutes of mobile phone calls made by students is represented by a random variable, X with pdf 1 −x/6 , x>0 6e fX (x) = 0 otherwise. What is the probability that a call lasts: (i) between 3 and 6 minutes, (ii) more than 6 minutes? Solution (i) Z P(X ∈ (3, 6)) 6 = fX (x)dx 3 Z 6 = 3 = = 1 −x/6 e dx 6 i6 h −e−x/6 e − 21 −e 3 −1 (ii) Z P(X > 6) ∞ = fX (x)dx Z6 ∞ = 1 −x/6 e dx 6 i∞ = h6 −e−x/6 = e−1 6 6.1.1 Expectation of a Continuous Random Variable Recall the definitions of mean and variance for a discrete r.v: X µX = E(X) = xpx x X 2 σX = VarX = (x − µX )2 px = E(X 2 ) − µ2X x and in general that E(h(X)) = X h(x)px x 27 Definition 8 Let X be a continuous r.v. with pdf fX and let h be a real function Z ∞ such that h(x)fX (x)dx exists and is absolutely convergent i.e. −∞ Z ∞ | h(x) | fX (x)dx < ∞. −∞ Then we define the expectation of h(X) to be: Z ∞ E(h(X)) = h(x)fX (x)dx −∞ In particular (a) mean Z ∞ µX = E(X) = xfX (x)dx −∞ (b) variance 2 σX = VarX = E(X 2 ) − µ2X where E(X 2 ) = Z ∞ x2 fX (x)dx. −∞ Properties of expectation (assuming all exist as appropriate) 1. If c is a constant and h(x) = c, for all x then Z ∞ Z ∞ E(c) = cfX (x)dx = c fX (x)dx = c. −∞ −∞ 2. E(ch(X)) = cE(h(X)) (from properties of integrals). 3. E(c1 h1 (X) + c2 h2 (X)) = c1 E(h1 (X)) + c2 E(h2 (X)) (from properties of integrals) These properties hold whether X is continuous or discrete. [NB E(sa+X ) = E(sa .sX ) = sa E(sX ) because s is a real-valued variable rather than a random variable.] 28 Example 19 An archer fires at a target which has four concentric circles of radii 15, 30, 45 and 60cm. A shot is subject to error and the distance in cm from the centre is represented by a random variable X with pdf: x e−x/10 x > 0 fX (x) = 100 0 x<0 Find (i) The expectation of X (ii) The probability that he hits gold (inside the centre circle) (iii) The variance of X. Solution (i) = = Z ∞ x2 −x/10 e dx 100 0 Z ∞ i∞ 1 h 1 2xe−x/10 dx −10x2 e−x/10 + 100 10 0 0 Z E(X) ∞ x −x/10 e dx = 1, gives E(X) = 2 × 10 = 20. (Alternatively 100 0 integrate again by parts.) Then (ii) Z P(X 6 15) 15 x −x/10 e dx 100 0 i15 1 h = −10xe−x/10 + 10 −10e−x/10 100 0 1 −1.5 = (−150 − 100)e + 100 100 = 1 − 2.5e−1.5 = = 0.442 . . . 29 (iii) E(X 2 ) = ⇒ VarX ∞ x3 −x/10 e dx 100 0 ∞ Z ∞ x3 3 = − x2 e−x/10 dx .10e−x/10 + 100 10 0 0 = 30 × 20 Z = 600 = E(X 2 ) − (E(X))2 = 600 − 400 = 200 √ √ Standard deviation is VarX and this is 10 2. Example 20 A gambling game works as follows: Bets are placed on the position of a light on the horizontal axis. Thie light is generated by an arm which rotates with uniform speed. It is randomly stopped and the light appears at P . (i) What is the expected distance of the light from O? (ii) Where should you bet if the winner is the one whose guess is closest to the light when it stops? 30 Answer (i) Let Θ be a random variable representing the angle when the arm is stopped. Then Θ ∼ U (0, 2π), the uniform distribution on (0, 2π), and so 1 0 6 θ < 2π fΘ (θ) = 2π 0 otherwise Then the length OP is | cos Θ |, hence 2π Z E(| cos Θ |) | cos θ | . = 0 Z = 4 0 = = π 2 1 dθ 2π 1 cos θdθ 2π π 2 [sin θ]02 π 2 π (ii) Intuition suggests that we should bet on the extremities of the diameter. (Why?) Can we derive this? 6.1.2 Functions of Random Variables and their pdfs We have Θ in the previous example, together with its pdf. Can we find the pdf of the position on the horizontal diameter? Example 21 1-1 transformation: Let R represent the distance from a given tree to its nearest neighbour in a forest. Suppose it has pdf: r e−r2 /2λ , r > 0 fR (r) = λ 0 elsewhere Find the distribution of the “tree-free” area, A around the given tree, so that A = πR2 . Z r Solution We use the cdf, FR (r) = fR (r)dr. −∞ FR (r) = P(R 6 r) ( 0h ir r < 0 = −r 2 /2λ −e r>0 0 0 r<0 2 = 1 − e−r /2λ r > 0 31 Hence FA (a) = P(A 6 a) = P(πR2 6 a) r a = P(R 6 ) π 0 a<0 = 1 − e−a/2πλ a > 0 a<0 0 1 −a/2πλ ⇒ fA (a) = e a>0 2πλ Hence A is distributed exponentially, with mean 2πλ. The method used is to consider the cdf, and re-write it in terms of A. 32 7 7.1 7.1.1 Lecture 7 Continuous Random Variables Continued Functions of Random Variables and their pdfs Continued Example 22 (extra) Let X be a random variable with pdf: 4xe−2x , x > 0 fX (x) = 0, elsewhere Find (a) FX (x) (b) E(X) and E 1 . X Solution (a) FX (x) = = (b) Z x 0dx, x<0 Z−∞ x 4xe−2x dx, x > 0 0 0, x<0 1 − (1 + 2x)e−2x , x > 0 (i) E(X): Z E(X) ∞ = xfX (x)dx −∞ ∞ Z = 0 = 33 1 x.4xe−2x dx (ii) E 1 : X E 1 X Z ∞ = −∞ ∞ 1 fX (x)dx x Z 4e−2x dx ∞ = −2e−2x 0 = 0 = NB in general E 1 X 6= 2 1 . E(X) Example 23 transformation which is not 1-1: Let Θ be the random variable (as before) with pdf: 1 0 6 θ < 2π 2π fΘ (θ) = 0 elsewhere Let X be the position on the diameter (as before), so that X = cos Θ. Find the pdf of X. Solution FΘ (θ) = = P(Θ 6 θ) θ<0 0 θ 0 < θ < 2π 2π 1 θ > 2π We want FX (x) = P(X 6 x) = P(cos Θ 6 x). We note that observable values of x satisfy −1 6 x 6 1, and that for each x there are 2 values of θ. Hence, selecting arccos x ∈ [0, π]: FX (x) = P(cos Θ 6 x) = P(arccos x 6 Θ 6 2π − arccos x) (2 values of θ for each x) = FΘ (2π − arccos x) − FΘ (arccos x) 1 1 = (2π − arccos x) − arccos x 2π 2π 1 = (2π − 2arccosx) 2π 1 = 1 − arccos x (for −1 6 x 6 1) π 1√ 1 −1 < x < 1 ⇒ fX (x) = π 1 − x2 0 otherwise (undefined at x = ±1) 34 Hence intuition is satisfied in example 20 as fX → ∞ as x → ±1 (even though area under curve is 1). A summary of common continuous distributions is shown in table 7.1.1, page 36. 35 36 (x−µ)2 2σ 2 , x∈R 1 , a<x<b b−a e− Uniform (U (a, b)) 2πσ 2 1 λα α−1 −λx x e , x>0 Γ(α) x2 1 √ e− 2 , x ∈ R 2π √ fX (x) = fX (x) = λe−λx , x > 0 x x2 1 √ e− 2 dx 2π x−µ σ −∞ Z x−a , a<x<b b−a Φ(x) = Φ FX (x) = 1 − e−λx , x > 0 Table 2: Common Continuous Distributions pdf cdf Standard Normal (N (0, 1)) Normal (N (µ, σ 2 )) Gamma (Γ(α, λ)) Exponential (exp(λ)) Name a+b 2 0 µ (b − a)2 12 1 σ2 α λ2 1 λ2 1 λ α λ variance mean • Exponential (Gamma also) distributions can arise from measuring “lifetime”, e.g. genuinely life of a person. • Normal distributions can arise from very many distributions in the following way: generate 100 samples, each containing 62 observations, from an 1 ). exponential distribution with (parameter) mean of 437, i.e. X ∼ exp( 437 P 1 Calculate x̄ = 62 xi for each of the 100 samples. Then x̄ is approximately drawn form a normal distribution with mean 437 and standard deviation 437. 7.1.2 The Normal Distribution A Normally distributed random variable, X ∼ N (µ, σ 2 ), has pdf: fX (x) = √ If we set Z = X−µ σ , 1 2πσ 2 e− (x−µ)2 2σ 2 x∈R , then z2 1 fZ (z) = √ e− 2 , 2π z∈R hence Z ∼ N (0, 1), where N (0, 1) is the standard normal distribution. Z ∞ z2 To justify the normalising constant: Let I = e− 2 dz, then −∞ I 2 Z ∞ = 2 e Z − z2 −∞ ∞ Z ∞ = −∞ Z ∞ 2 dz e − y2 dy −∞ 1 e− 2 (y 2 +z 2 ) dydz. −∞ ∂(y, z) = r, and Let y = r cos θ, z = r sin θ then | J |= ∂(r, θ) I2 Z ∞ Z 2π e− = r=0 = h −e− θ=0 i∞ r2 2 0 Z ∞ ⇒ −∞ z2 1 √ e− 2 dx 2π = 2π = 1 37 r2 2 rdrdθ 2π . [θ]0 Mean and Variance: • N (0, 1): ∞ Z E(Z) = −∞ z2 1 √ ze− 2 dz 2π Integrand is odd, hence E(Z) = 0. Hence: VarZ (setting = E(Z 2 ) Z ∞ z2 1 z 2 e− 2 dz = √ 2π −∞ z2 du = ze− 2 , v = z) dz = Z ∞ i∞ z2 z2 1 h 1 √ +√ e− 2 dz −ze− 2 −∞ 2π 2π −∞ So VarZ = 1. • N (µ, σ 2 ): X −µ ∼ N (0, 1). So: σ E(X) = E(σZ + µ) X ∼ N (µ, σ 2 ) ⇐⇒ Z = = σE(Z) + µ = µ (E is linear) and VarX = Var(σZ + µ) = E((σZ + µ − µ)2 ) = E(σ 2 Z 2 ) = σ 2 E(Z 2 ) = σ2 N (0, 1) cdf: Let Z Φ(z) | {z } z = −∞ z 02 1 √ e− 2 dz 0 2π standard notation = P(Z 6 z) = FZ (z) Then for X ∼ N (µ, σ 2 ) FX (x) = P(X 6 x) = P(σZ + µ 6 x) x−µ = P(Z 6 ) σ x−µ = Φ( ) σ 38 8 Lecture 8 8.1 8.1.1 Continuous Random Variables Continued Jointly Continuous Distributions Definition 9 Let X be a continuous r.v. with pdf fX (x) and cdf FX (x) = Rx f (x)dx, and similarly define Y , fY (y) and FY (y). Then X, Y are jointly −∞ X continuous if ∃fX,Y such that ∀x, y ∈ R Z y Z x fX,Y (x, y)dxdy = P(X 6 x, Y 6 y). FX,Y (x, y) = −∞ −∞ Then fX,Y (x, y) is called the joint density function of X and Y . For example, in 1 gram of soil, let X be the proportion of the largest species of microbes, and Y be the proportion of the second largest. Certainly not independent since X > Y > 0 and X + Y 6 1. Properties 1. fX,Y > 0, ∀x, y Z ∞Z ∞ 2. fX,Y (x, y)dxdy = 1 −∞ −∞ Example 24 Let the joint pdf of X and Y be 1 2 (x + y), x ∈ [0, 1], y ∈ [1, 2] f (x, y) = 0 otherwise Is f (x, y) a pdf ? Solution 1. Clearly f > 0. 2. Z ∞ Z ∞ Z f dxdy −∞ = −∞ = = = = 39 1 Z 2 1 (x + y)dxdy x=0 y=1 2 2 Z 1 1 y2 xy + dx 2 4 1 0 Z 1 1 3 x+ dx 2 4 0 1 1 2 3 x + x 4 4 0 1 Marginal pdf Consider the previous example. Suppose we wish to find P(X > 21 ). There are two methods: 1. 1 Z 1 P(X > ) 2 Z 2 = x= 21 1 Z = 1 2 y=1 1 (x + y)dxdy 2 1 3 ( x + )dx 2 4 9 16 = 2. Alternatively, find the marginal distribution of X by integrating Y out: Z ∞ fX (x) = fX,Y (x, y)dy −∞ 2 Z 1 (x + y)dy 1 2 1 3 x+ , 06x61 2 4 = = Then P(X > Z 1 ) 2 1 = fX (x)dx 1 2 = 9 16 Definition 10 If X, Y have joint density function fX,Y (x, y) then the marginal pdf of X is Z ∞ fX (x) = fX,Y (x, y)dy −∞ and similarly for Y , the marginal pdf is Z ∞ fY (y) = fX,Y (x, y)dx. −∞ 8.1.2 Expectation with Joint Distributions Z ∞ Z ∞ E(h(X, Y )) = h(x, y)fX,Y (x, y)dxdy −∞ −∞ In particular Cov(X, Y ) = E(X − µX )(Y − µY ) = E(XY ) − µX µY where µX = E(X) and µY = E(Y ). 40 Example 25 As in the previous example, let X and Y have joint pdf 1 2 (x + y), x ∈ [0, 1], y ∈ [1, 2] f (x, y) = 0 otherwise Find (i) E(X), E(Y ) (ii) Cov(XY ) Solution (i) 1 Z EX 2 Z = 0 1 1 x. (x + y)dxdy 2 13 24 = or 1 Z = xfX (x)dx Z 1 3 1 x+ dx = x 2 4 0 13 = 24 E(X) 0 Similarly 1 Z E(Y ) Z 2 1 1 y. (x + y)dxdy 2 Z 2 = 0 = 37 24 (ii) Z E(XY ) = = = = ⇒ Cov(X, Y ) = = = 1 1 xy (x + y)dxdy 2 0 1 3 1 2 2 2 1 3 2 x y x y + 6 2 2 6 0 1 0 1 1 1 1 8 1 2− + − 6 2 2 6 2 5 6 E(XY ) − µX µY 5 37 13 − . 6 24 24 −1 576 41 Note that E(XY ) 6= E(X)E(Y ) and so X, Y are not independent. If X and Y are independent, then Cov(X, Y ) = 0 as in the discrete case. 8.1.3 Independence Definition 11 Let the continuous random variables X, Y have joint pdf fX,Y and repsective marginal pdfs fX , fY . Then X and Y are independent if fX,Y (x, y) = fX (x)fY (y) for almost all points (x, y) in R2 . Example 26 Are X, Y independent if (a) fX,Y (x, y) = 2, for 0 < x < y < 1, (b) fX,Y (x, y) = x + y, for 0 < x < 1, 0 < y < 1, (c) fX,Y (x, y) = 1 − 12 (x2 +y 2 ) , 2π e for x, y ∈ R? Solution (a) 1 Z fX (x) = 2dy = 2(1 − x), 0<x<1 2dx = 2y, 0<y<1 Z xy fY (y) = 0 Hence fX,Y 6= fX fY . (b) 1 Z fX (x) = (x + y) dy = Z01 fY (y) = (x + y) dx = 0 x + 21 , 0<x<1 1 2 0<y<1 + y, Hence fX,Y 6= fX fY . (c) Z fX (x) = = ∞ 2 2 1 1 √ e−x /2 . √ e−y /2 dy 2π 2π −∞ Z 1 −x2 /2 ∞ 1 −y2 /2 √ e √ e dy 2π 2π −∞ | {z } =1 = 2 1 √ e−x /2 , 2π x∈R 2 1 Similarly fY (y) = √ e−y /2 for y ∈ R. Hence fX,Y = fX fY everywhere 2π in R2 . Hence X, Y are independent in (c) only. 42 8.1.4 Random Samples for Continuous Random Variables Recall that X1 , . . . , Xn are a random sample if they are independent and identically distributed with pdf f (x), say. Then the joint pdf is fX1 ,...,Xn (x1 , . . . , xn ) = f (x1 ) . . . f (xn ) = n Y f (xi ) 1 Mean of Sample X̄ = 1 n (X1 + . . . + Xn ) n • E(X̄) = 1X E(Xi ) = E(Xi ), as before. n 1 • Var(X̄) = Var 1 n n X !! Xi (NB Cov(Xi , Xj ) = 0) 1 = " n # 1 X VarXi n2 1 = 1 VarXi n since Xi are independent and identically distributed. Example 27 Suppose that Xi ∼ U (a, b) and that our random sample is X1 , . . . , Xn . Find the mean and variance of X̄. Solution b+a (b − a)2 , VarXi = . 2 12 a+b (b − a)2 So E(X̄) = and VarX̄ = . 2 12n We quote the following results: E(Xi ) = 43