Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Chernoff bound Speaker: Chuang-Chieh Lin Coworker: Chih-Chieh Hung Advisor: Maw-Shang Chang National Chung-Cheng University and National Chiao-Tung University Outline • • • • • • • • Introduction The Chernoff bound Markov Inequality The Moment Generating Functions The Chernoff Bound for a Sum of Poisson Trials The Chernoff Bound for Special cases Set Balancing Problem References 2 Introduction • Goal: – The Chernoff bound can be used in the analysis on the tail of the distribution of the sum of independent random variables, with some extensions to the case of dependent or correlated random variables. • Markov Inequality and Moment generating functions which we shall introduce will be greatly needed. 3 Math tool Professor Herman Chernoff’s bound, Annal of Mathematical Statistics 1952 4 Chernoff bounds In it s most general form, the Cherno® bound for a random variable X is obt ained as follows: for any t > 0, E[et X ] Pr [X ¸ a] · et a A moment generating function or equivalent ly, ln Pr [X ¸ a] · ¡ ta + log E[et X ]: tX The value of t t hat minimizes E[e ] gives t he best poset a sible bounds. 5 Markov’s Inequality For any random variable X ¸ 0 and any a > 0, E[X ] Pr [X ¸ a] · a : We can use Markov s Inequality to derive t he famous ’ Chebyshev s Inequality: ’ Var [X ] Pr [jX ¡ E[X ]j ¸ a] = Pr [(X ¡ E[X ]) 2 ¸ a2] · : a2 6 Proof of the Chernoff bound It follows directly from Markov s inequality: ’ Pr [X ¸ a] = Pr [et X ¸ et a ] E[et X ] · et a So, how to calculate this term? 7 Moment Generating Functions M X (t) = E[et X ]: This funct ion gets it s name because we can generate t he i t h moment by di®erent iating M X (t) i t imes and then evaluat ing t he result for t = 0: di M X (t)j = dt i t= 0 E[X i ]: T he i t h moment of r.v. X 8 Moment Generating Functions (cont’d) We can easily see why t he moment generat ing funct ion works as follows: di di M X (t)j t = 0 = E[et X j t = 0 dt i dt i di X = et s Pr [X = s]j t = 0 dt i s X di = et s Pr [X = s]j t = 0 dt i Xs = si et s Pr [X = s]j t = 0 Xs = si Pr [X = s] s = E[X i ]: 9 Moment Generating Functions (cont’d) F Fact : If M X (t) = M Y (t) for all t 2 (¡ c; c) for some c > 0, then X and Y have t he same distribut ion. F If X and Y are two independent random variables, t hen M X + Y (t) = M X (t)M Y (t): F Let X 1; : : : ; X k be independent random variables wit h mgf s M 1(t); : : : ; M k (t). Then t he mgf of t he random P ’ variable Y = k X i is given by i= 1 Yk M Y (t) = M i (t): i= 1 10 Chernoff bound for the sum of Poisson trials • Poisson trials: – The distribution of a sum of independent 0-1 random variables, which may not be identical. • Bernoulli trials: – The same as above except that all the random variables are identical. 11 Chernoff bound for the sum of Poisson trials (cont’d) F X i : i = 1; : : : ; n; mut ually independent 0-1 random variables wit h P r [X i = 1] = pi and P r [X i = 0] = 1 ¡ pi . Let X = X 1 + : : : + X n and E[X ] = ¹ = p1 + : : : + pn . ) M X (t) = E[et X i ] = pi et ¢1 + (1 ¡ pi )et ¢0 = pi et + (1 ¡ pi ) i = 1 + pi (et ¡ 1) · epi ( et ¡ 1) . (Since 1 + y ≤ e y.) F M X (t) = E[et X ] = M X (t)M X (t) : : : M X (t) · e(p1 + p2 + :::+ pn )(et ¡ 1 = e( et ¡ 1) ¹ 2 1) n , since ¹ = p1 + p2 + : : : + pn . We will use this result later. 12 Chernoff bound for the sum of Poisson trials (cont’d) Poisson trials T heorem 1: Let X = X 1 + ¢¢¢+ X n , where X 1 ; : : : ; X n are n independent t rials such t hat P r [X i = 1] = pi holds for each i = 1; 2; : : : ; n. T hen, ³ ´¹ ed (1) for any d > 0, P r [X ¸ (1 + d)¹ ] · ; ( 1+ d) 1 + (2) for d 2 (0; 1], P r [X ¸ (1 + d)¹ ] · e¡ (3) for R ¸ 6¹ , P r [X ¸ R] · 2¡ d ¹ d 2 =3 ; R. 13 Proof of Theorem 1: By M arkov inequality, for any t > 0 we have For any random variable X ¸ 0 and any a > 0, P r [X ¸ a] · E [X ] : a P r [X ¸ (1 + d)¹ ] = P r [et X ¸ et ( 1+ d) ¹ ] · E [et X ]=et ( 1+ d) ¹ · e( et ¡ 1) ¹ =et ( 1+ d) ¹ . For any d > 0, set t = ln(1 + d) > 0 we have (1). To prove (2), we need t o show for 0 < d · 1, ed =(1+ d) ( 1+ d) · e¡ d 2 =3 . Taking t he logarit hm of bot h sides, we have d¡ (1+ d) ln(1+ d) + d2 =3 · 0, which can be proved wit h calculus. To prove (3), let R = (1+ d)¹ . T hen, for R ¸ 6¹ , d = R=¹ ¡ ed 1 ¸ 5. Hence, using (1), P r [X ¸ (1+ d)¹ ] · ( )¹ · ( 1+ d) ( 1 + ( e ) ( 1+ d) ¹ 1+ d · (e=6) R · 2¡ d) R. from (1) 14 probability • Similarly, we have: ¹ ¡ d¹ ¹ ¹ + d¹ X P n T heorem Let X = X i , where X 1 ; : : : ; X n are n indepeni= 1 dent Poisson t rials such t hat P r [X i = 1] = pi . Let ¹ = E [X ]. T hen, for 0 < d < 1: ³ ´ ¹ e¡ d (1) P r [X · (1 ¡ d)¹ ] · ; (2) P r [X · (1 ¡ d)¹ ] · Corollary ( 1¡ d) ( 1 ¡ e¡ ¹ d 2 =2 . d) For 0 < d < 1, P r [jX ¡ ¹ j ¸ d¹ ] · 2e¡ ¹ d 2 =3 : 15 • Example: Let X be the number of heads of n independent fair coin flips. Applying the above Corollary, we have: P r [jX ¡ n=2j ¸ p 6n ln n=2] · 2 exp(¡ P r [jX ¡ n=2j ¸ n=4] · 2 exp(¡ 1 n 1) 3 2 4 1 n 6 ln n 3 2 n = 2e¡ ) = 2=n: n =24 : Better!! By Chebyshev s inequality, i.e. P r [jX ¡ E [X ]j ¸ a] ’ · V a r [X ] , a2 we have P r [jX ¡ n=2j ¸ n=4] · 4=n. 16 Better bounds for special cases T h eor em Let X = X 1 + ¢¢¢+ X n , where X 1 ; : : : ; X n are n independent random variables wit h P r [X i = 1] = P r [X i = ¡ 1] = 1=2. For any a > 0, P r [X ¸ a] · e¡ a 2 =2n . P r oof: For any t > 0, E [et X i ] = et ¢1 =2 + et ¢( ¡ 1) =2. Since et = 1 + t + t 2 =2! + ¢¢¢+ t i =i ! + ¢¢¢ and e¡ t = 1 ¡ t + t 2 =2! + ¢¢¢+ (¡ 1) i t i =i ! + ¢¢¢, using Taylor series, we have P P t X 2i E [e i ] = t =(2i )! · (t 2 =2) i =i ! = et 2 =2 . i¸ 0 E [et X ] = Q n i¸ 0 E [et X i ] · et 2 n =2 and P r [X ¸ a] = P r [et X ¸ et a ] · i= 1 E [et X ]=et a · et 2 n =2 =et a . Set t ing t = a=n, we have P r [X ¸ a] · e¡ a 2 =2n . By symmet ry, we have P r [X · ¡ a] · e¡ a 2 =2n . 17 Better bounds for special cases (cont’d) C or ol l ar y Let X = X 1 + ¢¢¢ + X n , where X 1 ; : : : ; X n are n independent random variables wit h P r [X i = 1] = P r [X i = ¡ 1] = 1=2. For any a > 0, P r [jX j ¸ a] · 2e¡ a 2 =2n . Let Yi = (X i + 1)=2, we have t he following corollary. 18 Better bounds for special cases (cont’d) C or ol l ar y Let Y = Y1 + ¢¢¢+ Yn , where Y1 ; : : : ; Yn are n independent random variables wit h P r [Yi = 1] = P r [Yi = 0] = 1=2. Let ¹ = E[Y ] = n=2. (1) For any a > 0, P r [Y ¸ ¹ + a] · e¡ 2a 2 =n . (2) For any d > 0, P r [Y ¸ (1 + d)¹ ] · e¡ d2 ¹ (3) For any ¹ > a > 0, P r [Y · ¹ ¡ a] · e¡ . 2a 2 =n . (4) For any 1 > d > 0, P r [Y · (1 ¡ d)¹ ] · e¡ d2 ¹ . Note: The details can be left for exercises. (See [MU05], pp. 70-71.) 19 Exercise Let X be a random variable such t hat X » Geomet ric(p). Please derive t he Cherno® bound for X , i.e., prove t hat for any a > 0, Pr [X > a] · K e¡ ap , for some const ant K . 20 An Application: Set Balancing • Given an 0 n m matrix A with entries 1 0 in {0,1}, 1 0let a11 a12 : : : a1m v1 c1 B a21 a22 : : : a2m C B v2 C B c2 B . CB . C = B . . . . @ .. @ .. .. .. .. A @ .. A an1 an2 : : : anm vm cm 1 C C A • Suppose that we are looking for a vector v with entries in {1, 1} that minimizes k A v k1 = max jci j: i = 1;:::;n 21 Set Balancing (cont’d) • The problem arises in designing statistical experiments. • Each column of matrix A represents a subject in the experiment and each row represents a feature. • The vector v partitions the subjects into two disjoint groups, so that each feature is roughly as balanced as possible between the two groups. 22 Set Balancing (cont’d) For example, A v: A: 斑馬 老虎 鯨魚 企鵝 v: 1 1 肉食性 0 1 0 0 1 2 陸生 1 1 0 0 1 1 哺乳類 1 1 1 0 1 1 產卵 0 0 0 1 We obtain that k A v k1 = 2. 23 Set Balancing (cont’d) Set balancing: Given an n £ m matrix A wit h ent ries 0 or 1, let v be an m-dimensional vector with ent ries in f 1; ¡ 1g and c be an n-dimensional vect or such t hat A v = c. T heor em For a random vect or v wit h ent ries chosen randomly and withpequal probability from t he set f 1; ¡ 1g, Pr [maxi jci j ¸ 4m ln n] · 2=n. randomly chosen n A = v m c n m 24 Proof of Set Balancing: P r oof: Consider t he i -t h row of A : ai =p (ai ;1 ; ¢¢¢; ai ;m ). Suppose t here p are k 1s in ai . If k · p 4m ln n, t hen clearly jai v j · 4m ln n. Suppose k > 4m ln n, t hen t here are P m k non-zero t erms in Z i = a v , which are independent j = 1 i ;j j random variables, each wit h probability 1/ 2 of being eit her + 1 or ¡ 1. By t he Cherno® bound and t he fact m ¸ k, we have p P r [jZ i j > 4m ln n] · 2e¡ 4m l n n =2k · 2=n 2 . By t he union bound we have t he bound for every row is at most 2=n. n A v m m ai p Sn P r[ (jZ i j > 4m ln n)] i= 1 25 References • [MR95] Rajeev Motwani and Prabhakar Raghavan, Randomized algorithms, Cambridge University Press, 1995. • [MU05] Michael Mitzenmacher and Eli Upfal, Probability and Computing - Randomized Algorithms and Probabilistic Analysis, Cambridge University Press, 2005. • 蔡錫鈞教授上課投影片 26