* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download B 1
Survey
Document related concepts
Transcript
Discrete Probability Sample space (set) S of elementary event eg. An event A is a subset of S eg. The 36 ways of 2 dices can fall Rolling 7 with 2 dices A probability distribution Pr{} is a map from events of S to R Probability Axiom: Pr{ A} 0 Pr{S } 1 Pr{ A B} Pr{ A} Pr{B}, if A B Theorem : Pr A B Pr A Pr B Pr A B Pr A Pr B Theorem : For discrete probability distributions Pr A Pr s sA A random variable (r.v.) X is a function from S to R The event “X = x” is defined as {sS : X(s) = x} eg. Rolling 2 dices: |S|=36 possible outcomes Uniform distribution: Each element has the same probability 1/|S|=1/36 Let X be the sum of dice Pr{ X = 5 } = 4/36, {(1, 4), (2, 3), (3, 2), (4, 1)} Expected value: E X x x Pr X x Linearity: E aX Y aE X E Y X1: number on dice 1 X2: number on dice 2 X=X1+X2, E[X1]=E[X2]=1/6(1+2+3+4+5+6)=21/6 Independence Two random variables X and Y are independent if x, y, Pr X x and Y y Pr X x Pr Y y If X and Y are independent, then E XY E X E Y Indicator random variables Given a sample space S and an event A, the indicator random variable I{A} associated with event A is defined as: I A 1 0 if A occurs o/w E.g.: Consider flipping a fair coin: Sample space S = { H,T } Define random variable Y with Pr{ Y=H } = Pr{ Y=T }=1/2 We can define an indicator r.v. XH associated with the coin coming up heads, i.e. Y=H X H I Y H 1 0 if Y H if Y T E X H E I Y H 1 Pr Y H 0 Pr Y T 1 Pr Y H 2 Lemma : Given a sample space S and an event A in the sample space S, let X A I { A} Then E X A Pr A Proof : E X A E I A 1 Pr A 0 Pr A Pr A The birthday paradox: How many people must there be in a room before there is a 50% chance that two of them born on the same day of the year? (1) Suppose there are k people and there are n days in a year, bi : i-th person’s birthday, i =1,…,k Pr{bi=r}=1/n, for i =1,…,k and r=1,2,…,n Pr{bi=r, bj=r}=Pr{bi=r}.Pr{bj=r} = 1/n2 1 Pr b r , b r i j n r 1 Define event Ai : Person i’s birthday is different from person j’s for j < i k Bk A : the event that k people have distinct birthday i 1 i Bk 1 Ak Pr{Bk} = Pr{Bk-1∩Ak} = Pr{Bk-1}Pr{Ak|Bk-1} where Pr{B1} = Pr{A1}=1 Pr b b i n j Pr{Bk } Pr{Bk 1}Pr{ Ak | Bk 1} Pr{Bk 2 }Pr{ Ak 1 | Bk 2 }Pr{ Ak | Bk 1} ... Pr{B1}Pr{ A2 | B1}Pr{ A3 | B2 }...Pr{ Ak | Bk 1} 1 ( nn1 )( n n 2 )...( n nk 1 ) 1n n2 ( k 1) n 1 (1 1n )(1 n2 )...(1 kn1 ) e e e 1 x ex k 1 k ( k 1) i / n 2n i 1 e e 12 where k (2kn1) ln( 12 ) k (k 1) 2n ln 2 the prob. 12 , For n 365, we have k 23 k (1 1 (8ln 2)n ) / 2 ■ (2) Analysis using indicator random variables For each pair (i, j) of the k people in the room, define the indicator r.v. Xij, for 1≤ i < j ≤ k, by X ij I person i and person j have the same birthday 1 i and j have the same birthday 0 o/w E X ij Pr person i and j have the same birthday 1n k k Let X i 1 j i 1 X ij k k E X E i 1 j i 1 X ij k (k 1) k k k i 1 j i 1 E X ij / n 2n 2 k(k-1) ≥ 2n, the expected number of pairs of people with the same birthday is at least 1 When 1 1 8n k k 2n 0 k 2 k ( n ), n 365 k 28, we expect to find at least one matching pair ■ 2 Balls and bins problem: Randomly toss identical balls into b bins, numbered 1,2,…,b. The probability that a tossed ball lands in any given bin is 1/b (a) How many balls fall in a given bin? If n balls are tossed, the expected number of balls that fall in the given bin is n/b (b) How many balls must one toss, on the average, until a given bin contains a ball? By geometric distribution with probability 1/b e 1 b1 2 (1 b1 ) b1 3 (1 b1 ) 2 b1 ... e (1 b1 )e b1 (1 b1 ) b1 (1 b1 ) 2 b1 ... b1 ( 1(11 1 ) ) 1 e 1 e b 1 b b (c) (Coupon collector’s problem) How many balls must one toss until every bin contains at least one ball? Want to know the expected number n of tosses required to get b hits. The ith stage consists of the tosses after the (i-1)st hit until the ith hit. For each toss during the ith stage, there are i-1 bins that contain balls and b-i+1 empty bins Thus, for each toss in the ith stage, the probability of obtaining a hit is (b-i+1)/b Let ni be the number of tosses in the ith stage. Thus the number of tosses required to get b hits is n=∑bi=1 ni Each ni has a geometric distribution with probability of success (b-i+1)/b → E[ni]=b/b-i+1 b b b b 1 b E n E i 1 ni i 1 E ni i 1 b i 1 b i 1 i b(ln b O(1)) O(b ln b) ■ Streaks Flip a fair coin n times, what is the longest streak of consecutive heads? Ans:θ(lg n) Let Aik be the event that a streak of heads of length at least k begins with the ith coin flip Pr Aik 1/ 2k For k 2 lg n Pr Ai ,2lg n 2 2 lg n 1 n2 For j=0,1,2,…,n, let Lj be the event that the longest streak of heads has Length exactly j, and let L be the length of the longest streak. n E L j 0 j Pr L j Note that the events L j for j 0,1,...,n are disjoint. So the probability that a streak of heads of length 2 lg n begins anywhere is j 2lg n Pr L j n 1 Thus, j 2 lg n Pr L j n n Pr L 1 Pr L 1. We have E L j Pr L j Pr L j Pr L (2 lg n ) Pr L n Pr L 2 lg n Pr L n Pr L while 2 lg n 1 n j 0 n j 0 j j 0 2 lg n 1 j j n j 0 2 lg n 1 j 2 lg n j j n j 0 j 2 lg n 1 j 0 j 2 lg n 1 n (1/ n) O(lg n) j 2 lg n n j 2 lg n j j r lg n r Pr A 1 2 1 n i , r lg n The probability is n n r 1 n r 1 that the largest streak is r lg n Claim : The expected length of the longest streak of heads in n coin flips is lg n We look for streaks of length s by partitioning the n flips into approximately n/s groups of s flips each. Take s (lg n) / 2 Pr Ai ,(lg n ) / 2 1 2 s s s n (lg n ) / 2 1 n The probability that a streak of heads of length (lg n) / 2 does not begin in position i is 1 1 n The n groups are mutually exclusive, ind. coin flips, (lg n) / 2 the prob. that every one of the groups fails to be a streak of length (lg n) / 2 is at most (1 1 n) n (lg n ) / 2 1 n / (lg n ) / 2 1 n 2 n / lg n 1 / n (1 e ) (1 1 2 n / lg n 1 n ) O e lg n O 1n Thus, the prob. that the longest streak exceeds (lg n) / 2 is n j (lg n ) / 2 1 E L n j Pr L j j 0 (lg n ) / 2 j 0 n Pr L j 1 O 1/ n WHY? j Pr L j j (lg n ) / 2 1 j Pr L j j (lg n ) / 2 1 n (lg n) / 2 Pr L j (lg n) / 2 j (lg n ) / 2 1 Pr L j n (lg n) / 2 1 O 1/ n (lg n) ■ Using indicator r.v. : Let X ik I Aik n k 1 Let X X ik i 1 E X E n k 1 X ik i 1 n k 1 n k 1 n k 1 i 1 E X ik i 1 Pr Aik i 1 1/ 2k n k 1 2k k c lg n, for some constant c, n c lg n 1 n c lg n 1 1 (c lg n 1) / n EX 2c lg n nc nc 1 nc 1 1 ( c 1 ) n If If c is large, the expected number of streaks of length clgn is very small. 1 1 If c , then we obtain E X 1 1 n 2 n2 and we expect that there will be a large number of streaks of length ( 12 ) lg n 1 2 Therefore, one streak of such a length is very likely to occur. Conclusion : The length of the longest streak is (lg n) ■ The on-line hiring problem: To hire an assistant, an employment agency sends one candidate each day. After interviewing that person you decide to either hire that person or not. The process stops when a person is hired. What is the trade-off between minimizing the amount of interviewing and maximizing the quality of the candidate hired? What is the best k? Let M(j) = max 1ij{score(i)}. Let S be the event that the best-qualified applicant is chosen. Let Si be the event the best-qualified applicant chosen is the i-th one interviewed. Si are disjoint and we have Pr{S}= ji=1Pr{Si}. If the best-qualified applicant is one of the first k, we have that Pr{Si}=0 and thus Pr{S}= ji=k+1Pr{Si}. Let Bi be the event that the best-qualified applicant must be in position i. Let Oi denote the event that none of the applicants in position k+1 through i-1 are chosen If Si happens, then Bi and Oi must both happen. Bi and Oi are independent! Why? Pr{Si} = Pr{Bi Oi} = Pr{Bi} Pr{Oi}. Clearly, Pr{Bi} = 1/n. Pr{Oi} = k/(i-1). Why??? Thus Pr{Si} = k/(n(i-1)). n Pr{S} = Pr{Si } i k 1 n k i k 1 n(i 1) n 1 ( k / n) i k 1 (i 1) n 1 1 ( k / n) i k i k n1 n1 1 1 dx dx k 1 x x i i k n1 k (ln n ln k ) Pr{S} k (ln(n 1) ln(k 1)). n n k Differentiate (ln n ln k )with respect to k. n 1 We have (ln n ln k 1) 0. n Thus k n / e and Pr{S} 1/ e.