Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSIS8601: Probabilistic Method & Randomized Algorithms Lecture 3: Chernoο¬ Bound: Measure Concentration Lecturer: Hubert Chan Date: 16 Sept 2009 These lecture notes are supplementary materials for the lectures. They are by no means substitutes for attending lectures or replacement for your own notes! 1 Measure Concentration As we have seen in the previous lectures, the objective function of a problem can be expressed as some random variable π , and we analyze the performance of a randomized algorithm in terms of the expectation (or mean) πΈ[π ]. We often wish to show that with a large probability, the random variable π is near its mean πΈ[π ]. We see that if π is a sum of independent random variables, then this is indeed the case. This phenomenon is known as measure concentration. 1.1 Example: Chebyshevβs Inequality Suppose π0 , π1 , . . . , ππβ1 are independentβ{0, 1}-random variables such that for each π, π π(ππ = 1) = π and π π(ππ = 0) = 1 β π. Let π := π ππ . We have πΈ[π ] = ππ. Remark 1.1 By using pairwise independence of the ππ βs, we have π£ππ[π ] = ππ(1 β π). Using the Chebyshevβs Inequality, we have for 0 < π < 1, π π(β£π β πΈ[π ]β£ β₯ ππΈ[π ]) β€ π£ππ[π ] 1βπ 1 = 2 β 2 (ππΈ[π ]) π π π We have only used the fact that any two diο¬erent random variables ππ and ππ are independent. The goal is to show that if we fully exploit the fact that all the random variables π0 , π1 , . . . , ππβ1 are independent of one another, we can obtain a much better result. Theorem 1.2 (Basic Chernoο¬ Bound) Suppose π is the sum of π independent {0, 1}-random variables ππ βs such that for each π, π π(ππ = 1) = π. Let π := πΈ[π ] = ππ. Then, for 0 < π < 1, 1 π π(β£π β πΈ[π ]β£ β₯ ππΈ[π ]) β€ 2 exp{β π2 ππ}. 3 2 Using Moment Generating Function The bound in Theorem 1.2 measures, in terms of πΈ[π ], how far the random variable π is away from its mean πΈ[π ]. One can instead measure this in terms of the total number of random variables π, i.e., one wants to analyze the probability π π(β£π β πΈ[π ]β£ β₯ ππ). Of course, a diο¬erent bound would be obtained. There are a number of variations of this inequality: Hoeο¬dingβs Inequality, Azumaβs Inequality, McDiarmidβs Inequalities. Each one of them has slightly diο¬erent assumptions, and it would be confusing to learn them separately. Fortunately, there is a generic method to obtain all 1 of them: the method of moment generating function. We describe in general terms. Suppose π0 , π1 , . . . , ππβ1 are independent random variables. They can take β any value (not necessarily in {0, 1}), and need not even be identically distributed. Let π := π ππ and π := πΈ[π ]. The goal is to give an upper bound on the probability π π[β£π β πβ£ β₯ πΌ], for some value πΌ > 0. We outline the steps in the following. 2.1 Transform the Inequality into a Convenient Form We ο¬rst use the union bound: π π[β£π β πβ£ β₯ πΌ] β€ π π[π β π β₯ πΌ] + π π[π β π β€ βπΌ]. We bound each of the term on the right hand side separately. Recall that π := it would be convenient to ο¬rst rescale each random variable ππ . For example, (2.1) β π ππ . Sometimes 1. ππ := ππ . The simplest case. We can just work with ππ . 2. ππ := ππ β πΈ[ππ ]. We have πΈ[ππ ] = 0. 3. ππ := ππ π . If ππ is in the range [0, π ], then we now have ππ β [0, 1]. Since the ππ βs are independent, the ππ βs are also independent. After the transformation, the two terms in (2.1) have the form β (i) π π[ π ππ β₯ π½], or β (ii) π π[ π ππ β€ π½]. Note that the π½ in each case is diο¬erent. The direction of the inequality is also diο¬erent. We use a trick to turn both inequalities into the same form. In case (i), let π‘ > 0; in case (ii), let π‘ < 0. Now, both inequalities have the same form π π[π‘ β ππ β₯ π‘π½] (2.2) π The value π‘ would be chosen later to get the best possible bound. Note that we have to remember whether π‘ is positive or negative. Example. As part of the Basic Chernoο¬ Bound, suppose we wish to consider the part π π[π β π β€ βππ]. In this case, we just let ππ := ππ and let π‘ < 0 to obtain β π π[π β π β€ βππ] = π π[π‘ π ππ β₯ π‘(1 β π)π]. 2 2.2 Using Moment Generating Function and Independence Notation: we write exp(π₯) = ππ₯ = π₯π π=0 π! . ββ Observe that the exponentiation function is strictly increasing, i.e. π₯ < π¦ iο¬ exp(π₯) β€ exp(π¦). Hence, β β π π[π‘ π ππ β₯ π‘π½] = π π[exp(π‘ π ππ ) β₯ exp(π‘π½)]. Notice that now both sides of the inequality are positive. Hence, by Markovβs Inequality, we have β β π π[exp(π‘ π ππ ) β₯ exp(π‘π½)] β€ exp(βπ‘π½)πΈ[exp(π‘ π ππ )]. The next step is where we use the fact that the ππ βs are independent: β β πΈ[exp(π‘ π ππ )] = π πΈ[exp(π‘ππ )]. Deο¬nition 2.1 Given a random variable π, the moment generating function is given by the mapping π‘ 7β πΈ[ππ‘π ]. Hence, it suο¬ces to ο¬nd an upper bound for πΈ[ππ‘ππ ], for each π. Remark 2.2 We wish to ο¬nd an upper bound of the form πΈ[ππ‘ππ ] β€ exp(ππ (π‘)) for some appropriate function ππ (π‘). Note that this is often the most technical part of the proof, and requires tools from calculus. Hence, we obtain the bound β β β β π π[π‘ π ππ β₯ π‘π½] β€ exp(βπ‘π½) π πΈ[ππ‘ππ ] β€ exp(βπ‘π½) π exp(ππ (π‘)) = exp(βπ‘π½ + π ππ (π‘)) = exp(π(π‘)), β where π(π‘) := βπ‘π½ + π ππ (π‘). Example. Continuing with our example, if ππ = ππ is a {0, 1}-random variable such that π π(ππ = 1) = π, then we have πΈ[ππ‘ππ ] = (1 β π) β π0 + π β ππ‘ = 1 + π(ππ‘ β 1) β€ exp(π(ππ‘ β 1)), where we have used the inequality 1 + π₯ β€ ππ₯ , for all real numbers π₯. Hence, β π π[π‘ π ππ β₯ π‘(1 β π)π] β€ exp{βπ‘(1 β π)π + ππ(ππ‘ β 1)} = exp(π(π‘)), where π(π‘) := π(ππ‘ β π‘(1 β π) β 1). 2.3 Find the Best Value for π‘ to Minimize π(π‘) We ο¬nd the value π‘ that minimizes the function π(π‘) := βπ‘π½ + whether π‘ is positive or negative! Example. In our example, we have π(π‘) := π(ππ‘ β π‘(1 β π) β 1). 3 β π ππ (π‘). Be careful to remember Note that π β² (π‘) = π(ππ‘ β (1 β π)) and π β²β² (π‘) = πππ‘ > 0. It follows that π attains its minimum when π β² (π‘) = 0, i.e., when π‘ = ln(1 β π) < 0. We check that in our example, π‘ < 0. So, we can set the value π‘ := ln(1 β π). Using the expansion β 2 2 π for 0 < π < 1, β ln(1 β π) = πβ₯1 ππ , we have π(ln(1 β π)) β€ β π 2π = β π 2ππ . So, we have one part of the Basic Chernoο¬ Bound, 2 2 π π[π β π β€ βππ] β€ exp(β π 2ππ ) β€ exp(β π 3ππ ). Theorem 2.3 Suppose βπ0 , π1 , . . . , ππβ1 are independent {0, 1}-random variables, each having expectation π. Let π := π ππ and π := πΈ[π ]. 2 Then, for 0 < π < 1, π π[π β€ (1 β π)π] β€ exp(β π 2π ). 3 The Other Half of Chernoο¬ To complete the proof of the Chernoο¬ Bound, one also needs to obtain an upper bound for [π βπ β₯ ππ]. The same technique of moment generating function can be applied. The calculations might be diο¬erent though. We would leave the details as a homework problem. Lemma 3.1 Suppose β π0 , π1 , . . . , ππβ1 are independent {0, 1}-random variables, each having expectation π. Let π := π ππ and π := πΈ[π ]. π π π Then, for all π > 0, π π[π β₯ (1 + π)π] β€ ( (1+π) 1+π ) . Corollary 3.2 For 0 < π < 1, using the inequality (1 + π) ln(1 + π) β₯ π + π π[π β₯ (1 + π)π] β€ exp(β π2 π 3 π2 3, we have: ). Corollary 3.3 (Chernoο¬ Bound with Large π) For all π > 0, using the inequality ln(1 + π) > 2π 2+π , we have: 2 π π ). π π[π β₯ (1 + π)π] β€ exp(β 2+π 4 2-Coloring Subsets: Revisited Consider a ο¬nite set π and subsets π1 , π2 , . . . , ππ of π such that each ππ has size β£ππ β£ = π, where π > 12 ln π. Is it possible to color each element of π red or blue such that each set ππ contains roughly the same number of red and blue elements? Proposition 4.1 Fix a subset ππ , let ππ be the number of red elements in ππ . β Then, π π[β£ππ β 2π β£ β₯ 3π ln π] β€ π22 . Proof: Note that πΈ[ππ ] = 2π . By Chernoο¬ Bound, for 0 < π < 1, π π[β£ππ β 2π β£ β₯ ππΈ[ππ ]] β€ 2 exp(β 31 π2 πΈ[ππ ]). β Substituting π := 12 lnπ π < 1, we have the result. 4 Corollary 4.2 By the union bound, π π[βπ, β£ππ β 2π β£ β₯ 5 β 3π ln π] β€ 2 π. π Balls into π Bins: Load Balancing Suppose one throws π balls into π bins, independently and uniformly at random. We wish to analyze the maximum number of balls in any single bin. A similar situation arises when there are π jobs independently and randomly assigned to π machines, and we wish to analyze the number of jobs assigned to the busiest machine. Consider the ο¬rst bin, and let π1 be the number of balls in it. Note that π1 is a sum of π independent {0, 1}-random variables, each having expectation π1 . Proposition 5.1 π π[π1 β₯ 4 ln π + 1] β€ Proof: have: 1 ]. π2 Observe that πΈ[π1 ] = 1, we use Chernoο¬ Bound with large π > 0 (Corollary 3.3). We 2 π ). π π[π1 β₯ 1 + π] β€ exp(β 2+π We wish to ο¬nd a value for π so that the last quantity is at most 2 π For π β₯ 2, we have 2+π β₯ we set π := 4 ln π β₯ 2. π2 2π 1 . π2 = 2π . Hence, the last quantity is at most exp(β 2π ), which equals 1 , π2 if Corollary 5.2 Using union bound, the probability that there exists a bin with more than 1 + 4 ln π balls is at most π1 . 6 Homework Preview 1. The Other Half of Chernoο¬. Suppose π0 , πβ 1 , . . . , ππβ1 are independent {0, 1}-random variables, each having expectation π. Let π := π ππ and π := πΈ[π ]. Using the method of moment generating function, prove the following. ( )π ππ For all π > 0, π π[π β π β₯ ππ] β€ (1+π) . 1+π 2. π Balls into π Bins (Revisited). Using the Chernoο¬ Bound from the previous question, we can obtain a better bound for the balls and bins problem. Suppose π balls are thrown independently and uniformly at random into π bins. Let π1 be the number of balls in the ο¬rst bin. (a) Find a number π in terms of π such that π π[π1 β₯ π ] β€ π12 . Please give the exact form and do not use big O notation for this part of the question. π ln π (Hint: if you need to ο¬nd a number π such that π ln π β₯ ln π, try setting π := ln ln π , for some constant π > 0. You can also assume that π is large enough, say π β₯ 100.) (b) Show that with probability at least 1 β π1 , no bin contains more than Ξ( logloglogπ π ) balls. 5