Download 1 Measure Concentration 2 Using Moment Generating Function

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CSIS8601: Probabilistic Method & Randomized Algorithms
Lecture 3: Chernoff Bound: Measure Concentration
Lecturer: Hubert Chan
Date: 16 Sept 2009
These lecture notes are supplementary materials for the lectures. They are by no means substitutes
for attending lectures or replacement for your own notes!
1
Measure Concentration
As we have seen in the previous lectures, the objective function of a problem can be expressed as
some random variable π‘Œ , and we analyze the performance of a randomized algorithm in terms of
the expectation (or mean) 𝐸[π‘Œ ]. We often wish to show that with a large probability, the random
variable π‘Œ is near its mean 𝐸[π‘Œ ]. We see that if π‘Œ is a sum of independent random variables, then
this is indeed the case. This phenomenon is known as measure concentration.
1.1
Example: Chebyshev’s Inequality
Suppose 𝑋0 , 𝑋1 , . . . , π‘‹π‘›βˆ’1 are independentβˆ‘{0, 1}-random variables such that for each 𝑖, 𝑃 π‘Ÿ(𝑋𝑖 =
1) = 𝑝 and 𝑃 π‘Ÿ(𝑋𝑖 = 0) = 1 βˆ’ 𝑝. Let π‘Œ := 𝑖 𝑋𝑖 . We have 𝐸[π‘Œ ] = 𝑛𝑝.
Remark 1.1 By using pairwise independence of the 𝑋𝑖 ’s, we have π‘£π‘Žπ‘Ÿ[π‘Œ ] = 𝑛𝑝(1 βˆ’ 𝑝).
Using the Chebyshev’s Inequality, we have for 0 < πœ– < 1,
𝑃 π‘Ÿ(βˆ£π‘Œ βˆ’ 𝐸[π‘Œ ]∣ β‰₯ πœ–πΈ[π‘Œ ]) ≀
π‘£π‘Žπ‘Ÿ[π‘Œ ]
1βˆ’π‘ 1
= 2 β‹…
2
(πœ–πΈ[π‘Œ ])
πœ– 𝑝 𝑛
We have only used the fact that any two different random variables 𝑋𝑖 and 𝑋𝑗 are independent.
The goal is to show that if we fully exploit the fact that all the random variables 𝑋0 , 𝑋1 , . . . , π‘‹π‘›βˆ’1
are independent of one another, we can obtain a much better result.
Theorem 1.2 (Basic Chernoff Bound) Suppose π‘Œ is the sum of 𝑛 independent {0, 1}-random
variables 𝑋𝑖 ’s such that for each 𝑖, 𝑃 π‘Ÿ(𝑋𝑖 = 1) = 𝑝. Let πœ‡ := 𝐸[π‘Œ ] = 𝑛𝑝. Then, for 0 < πœ– < 1,
1
𝑃 π‘Ÿ(βˆ£π‘Œ βˆ’ 𝐸[π‘Œ ]∣ β‰₯ πœ–πΈ[π‘Œ ]) ≀ 2 exp{βˆ’ πœ–2 𝑛𝑝}.
3
2
Using Moment Generating Function
The bound in Theorem 1.2 measures, in terms of 𝐸[π‘Œ ], how far the random variable π‘Œ is away from
its mean 𝐸[π‘Œ ]. One can instead measure this in terms of the total number of random variables 𝑛,
i.e., one wants to analyze the probability 𝑃 π‘Ÿ(βˆ£π‘Œ βˆ’ 𝐸[π‘Œ ]∣ β‰₯ πœ–π‘›). Of course, a different bound would
be obtained. There are a number of variations of this inequality: Hoeffding’s Inequality, Azuma’s
Inequality, McDiarmid’s Inequalities. Each one of them has slightly different assumptions, and it
would be confusing to learn them separately. Fortunately, there is a generic method to obtain all
1
of them: the method of moment generating function.
We describe in general terms. Suppose 𝑋0 , 𝑋1 , . . . , π‘‹π‘›βˆ’1 are independent random variables. They
can take
βˆ‘ any value (not necessarily in {0, 1}), and need not even be identically distributed. Let
π‘Œ := 𝑖 𝑋𝑖 and πœ‡ := 𝐸[π‘Œ ]. The goal is to give an upper bound on the probability 𝑃 π‘Ÿ[βˆ£π‘Œ βˆ’ πœ‡βˆ£ β‰₯ 𝛼],
for some value 𝛼 > 0. We outline the steps in the following.
2.1
Transform the Inequality into a Convenient Form
We first use the union bound:
𝑃 π‘Ÿ[βˆ£π‘Œ βˆ’ πœ‡βˆ£ β‰₯ 𝛼] ≀ 𝑃 π‘Ÿ[π‘Œ βˆ’ πœ‡ β‰₯ 𝛼] + 𝑃 π‘Ÿ[π‘Œ βˆ’ πœ‡ ≀ βˆ’π›Ό].
We bound each of the term on the right hand side separately. Recall that π‘Œ :=
it would be convenient to first rescale each random variable 𝑋𝑖 . For example,
(2.1)
βˆ‘
𝑖 𝑋𝑖 .
Sometimes
1. 𝑍𝑖 := 𝑋𝑖 . The simplest case. We can just work with 𝑋𝑖 .
2. 𝑍𝑖 := 𝑋𝑖 βˆ’ 𝐸[𝑋𝑖 ]. We have 𝐸[𝑍𝑖 ] = 0.
3. 𝑍𝑖 :=
𝑋𝑖
𝑅.
If 𝑋𝑖 is in the range [0, 𝑅], then we now have 𝑍𝑖 ∈ [0, 1].
Since the 𝑋𝑖 ’s are independent, the 𝑍𝑖 ’s are also independent. After the transformation, the two
terms in (2.1) have the form
βˆ‘
(i) 𝑃 π‘Ÿ[ 𝑖 𝑍𝑖 β‰₯ 𝛽], or
βˆ‘
(ii) 𝑃 π‘Ÿ[ 𝑖 𝑍𝑖 ≀ 𝛽].
Note that the 𝛽 in each case is different. The direction of the inequality is also different. We use
a trick to turn both inequalities into the same form. In case (i), let 𝑑 > 0; in case (ii), let 𝑑 < 0.
Now, both inequalities have the same form
𝑃 π‘Ÿ[𝑑
βˆ‘
𝑍𝑖 β‰₯ 𝑑𝛽]
(2.2)
𝑖
The value 𝑑 would be chosen later to get the best possible bound. Note that we have to remember
whether 𝑑 is positive or negative.
Example.
As part of the Basic Chernoff Bound, suppose we wish to consider the part 𝑃 π‘Ÿ[π‘Œ βˆ’ πœ‡ ≀ βˆ’πœ–πœ‡]. In
this case, we just let 𝑍𝑖 := 𝑋𝑖 and let 𝑑 < 0 to obtain
βˆ‘
𝑃 π‘Ÿ[π‘Œ βˆ’ πœ‡ ≀ βˆ’πœ–πœ‡] = 𝑃 π‘Ÿ[𝑑 𝑖 𝑋𝑖 β‰₯ 𝑑(1 βˆ’ πœ–)πœ‡].
2
2.2
Using Moment Generating Function and Independence
Notation: we write exp(π‘₯) = 𝑒π‘₯ =
π‘₯𝑖
𝑖=0 𝑖! .
βˆ‘βˆž
Observe that the exponentiation function is strictly increasing, i.e. π‘₯ < 𝑦 iff exp(π‘₯) ≀ exp(𝑦).
Hence,
βˆ‘
βˆ‘
𝑃 π‘Ÿ[𝑑 𝑖 𝑍𝑖 β‰₯ 𝑑𝛽] = 𝑃 π‘Ÿ[exp(𝑑 𝑖 𝑍𝑖 ) β‰₯ exp(𝑑𝛽)].
Notice that now both sides of the inequality are positive. Hence, by Markov’s Inequality, we have
βˆ‘
βˆ‘
𝑃 π‘Ÿ[exp(𝑑 𝑖 𝑍𝑖 ) β‰₯ exp(𝑑𝛽)] ≀ exp(βˆ’π‘‘π›½)𝐸[exp(𝑑 𝑖 𝑍𝑖 )].
The next step is where we use the fact that the 𝑍𝑖 ’s are independent:
βˆ‘
∏
𝐸[exp(𝑑 𝑖 𝑍𝑖 )] = 𝑖 𝐸[exp(𝑑𝑍𝑖 )].
Definition 2.1 Given a random variable 𝑍, the moment generating function is given by the mapping 𝑑 7β†’ 𝐸[𝑒𝑑𝑍 ].
Hence, it suffices to find an upper bound for 𝐸[𝑒𝑑𝑍𝑖 ], for each 𝑖.
Remark 2.2 We wish to find an upper bound of the form 𝐸[𝑒𝑑𝑍𝑖 ] ≀ exp(𝑔𝑖 (𝑑)) for some appropriate
function 𝑔𝑖 (𝑑). Note that this is often the most technical part of the proof, and requires tools from
calculus.
Hence, we obtain the bound
βˆ‘
∏
∏
βˆ‘
𝑃 π‘Ÿ[𝑑 𝑖 𝑍𝑖 β‰₯ 𝑑𝛽] ≀ exp(βˆ’π‘‘π›½) 𝑖 𝐸[𝑒𝑑𝑍𝑖 ] ≀ exp(βˆ’π‘‘π›½) 𝑖 exp(𝑔𝑖 (𝑑)) = exp(βˆ’π‘‘π›½ + 𝑖 𝑔𝑖 (𝑑)) =
exp(𝑔(𝑑)),
βˆ‘
where 𝑔(𝑑) := βˆ’π‘‘π›½ + 𝑖 𝑔𝑖 (𝑑).
Example.
Continuing with our example, if 𝑍𝑖 = 𝑋𝑖 is a {0, 1}-random variable such that 𝑃 π‘Ÿ(𝑋𝑖 = 1) = 𝑝,
then we have
𝐸[𝑒𝑑𝑍𝑖 ] = (1 βˆ’ 𝑝) β‹… 𝑒0 + 𝑝 β‹… 𝑒𝑑 = 1 + 𝑝(𝑒𝑑 βˆ’ 1) ≀ exp(𝑝(𝑒𝑑 βˆ’ 1)),
where we have used the inequality 1 + π‘₯ ≀ 𝑒π‘₯ , for all real numbers π‘₯.
Hence,
βˆ‘
𝑃 π‘Ÿ[𝑑 𝑖 𝑋𝑖 β‰₯ 𝑑(1 βˆ’ πœ–)πœ‡] ≀ exp{βˆ’π‘‘(1 βˆ’ πœ–)πœ‡ + 𝑛𝑝(𝑒𝑑 βˆ’ 1)} = exp(𝑔(𝑑)),
where 𝑔(𝑑) := πœ‡(𝑒𝑑 βˆ’ 𝑑(1 βˆ’ πœ–) βˆ’ 1).
2.3
Find the Best Value for 𝑑 to Minimize 𝑔(𝑑)
We find the value 𝑑 that minimizes the function 𝑔(𝑑) := βˆ’π‘‘π›½ +
whether 𝑑 is positive or negative!
Example.
In our example, we have 𝑔(𝑑) := πœ‡(𝑒𝑑 βˆ’ 𝑑(1 βˆ’ πœ–) βˆ’ 1).
3
βˆ‘
𝑖 𝑔𝑖 (𝑑).
Be careful to remember
Note that 𝑔 β€² (𝑑) = πœ‡(𝑒𝑑 βˆ’ (1 βˆ’ πœ–)) and 𝑔 β€²β€² (𝑑) = πœ‡π‘’π‘‘ > 0. It follows that 𝑔 attains its minimum when
𝑔 β€² (𝑑) = 0, i.e., when 𝑑 = ln(1 βˆ’ πœ–) < 0.
We check that in our example, 𝑑 < 0. So, we can set the value 𝑑 := ln(1 βˆ’ πœ–). Using the expansion
βˆ‘
2
2
𝑖
for 0 < πœ– < 1, βˆ’ ln(1 βˆ’ πœ–) = 𝑖β‰₯1 πœ–π‘– , we have 𝑔(ln(1 βˆ’ πœ–)) ≀ βˆ’ πœ– 2πœ‡ = βˆ’ πœ– 2𝑛𝑝 .
So, we have one part of the Basic Chernoff Bound,
2
2
𝑃 π‘Ÿ[π‘Œ βˆ’ πœ‡ ≀ βˆ’πœ–πœ‡] ≀ exp(βˆ’ πœ– 2𝑛𝑝 ) ≀ exp(βˆ’ πœ– 3𝑛𝑝 ).
Theorem 2.3 Suppose
βˆ‘π‘‹0 , 𝑋1 , . . . , π‘‹π‘›βˆ’1 are independent {0, 1}-random variables, each having expectation 𝑝. Let π‘Œ := 𝑖 𝑋𝑖 and πœ‡ := 𝐸[π‘Œ ].
2
Then, for 0 < πœ– < 1, 𝑃 π‘Ÿ[π‘Œ ≀ (1 βˆ’ πœ–)πœ‡] ≀ exp(βˆ’ πœ– 2πœ‡ ).
3
The Other Half of Chernoff
To complete the proof of the Chernoff Bound, one also needs to obtain an upper bound for [π‘Œ βˆ’πœ‡ β‰₯
πœ–πœ‡]. The same technique of moment generating function can be applied. The calculations might be
different though. We would leave the details as a homework problem.
Lemma 3.1 Suppose βˆ‘
𝑋0 , 𝑋1 , . . . , π‘‹π‘›βˆ’1 are independent {0, 1}-random variables, each having expectation 𝑝. Let π‘Œ := 𝑖 𝑋𝑖 and πœ‡ := 𝐸[π‘Œ ].
πœ–
𝑒
πœ‡
Then, for all πœ– > 0, 𝑃 π‘Ÿ[π‘Œ β‰₯ (1 + πœ–)πœ‡] ≀ ( (1+πœ–)
1+πœ– ) .
Corollary 3.2 For 0 < πœ– < 1, using the inequality (1 + πœ–) ln(1 + πœ–) β‰₯ πœ– +
𝑃 π‘Ÿ[π‘Œ β‰₯ (1 + πœ–)πœ‡] ≀ exp(βˆ’
πœ–2 πœ‡
3
πœ–2
3,
we have:
).
Corollary 3.3 (Chernoff Bound with Large πœ–) For all πœ– > 0, using the inequality ln(1 + πœ–) >
2πœ–
2+πœ– , we have:
2
πœ– πœ‡
).
𝑃 π‘Ÿ[π‘Œ β‰₯ (1 + πœ–)πœ‡] ≀ exp(βˆ’ 2+πœ–
4
2-Coloring Subsets: Revisited
Consider a finite set π‘ˆ and subsets 𝑆1 , 𝑆2 , . . . , π‘†π‘š of π‘ˆ such that each 𝑆𝑖 has size βˆ£π‘†π‘– ∣ = 𝑙, where
𝑙 > 12 ln π‘š. Is it possible to color each element of π‘ˆ red or blue such that each set 𝑆𝑖 contains
roughly the same number of red and blue elements?
Proposition 4.1 Fix a subset 𝑆𝑖 , let 𝑋𝑖 be the number of red elements in 𝑆𝑖 .
√
Then, 𝑃 π‘Ÿ[βˆ£π‘‹π‘– βˆ’ 2𝑙 ∣ β‰₯ 3𝑙 ln π‘š] ≀ π‘š22 .
Proof: Note that 𝐸[𝑋𝑖 ] = 2𝑙 . By Chernoff Bound, for 0 < πœ– < 1,
𝑃 π‘Ÿ[βˆ£π‘‹π‘– βˆ’ 2𝑙 ∣ β‰₯ πœ–πΈ[𝑋𝑖 ]] ≀ 2 exp(βˆ’ 31 πœ–2 𝐸[𝑋𝑖 ]).
√
Substituting πœ– := 12 ln𝑙 π‘š < 1, we have the result.
4
Corollary 4.2 By the union bound, 𝑃 π‘Ÿ[βˆƒπ‘–, βˆ£π‘‹π‘– βˆ’ 2𝑙 ∣ β‰₯
5
√
3𝑙 ln π‘š] ≀
2
π‘š.
𝑛 Balls into 𝑛 Bins: Load Balancing
Suppose one throws 𝑛 balls into 𝑛 bins, independently and uniformly at random. We wish to
analyze the maximum number of balls in any single bin. A similar situation arises when there are
𝑛 jobs independently and randomly assigned to 𝑛 machines, and we wish to analyze the number of
jobs assigned to the busiest machine.
Consider the first bin, and let π‘Œ1 be the number of balls in it. Note that π‘Œ1 is a sum of 𝑛 independent
{0, 1}-random variables, each having expectation 𝑛1 .
Proposition 5.1 𝑃 π‘Ÿ[π‘Œ1 β‰₯ 4 ln 𝑛 + 1] ≀
Proof:
have:
1
].
𝑛2
Observe that 𝐸[π‘Œ1 ] = 1, we use Chernoff Bound with large πœ– > 0 (Corollary 3.3). We
2
πœ–
).
𝑃 π‘Ÿ[π‘Œ1 β‰₯ 1 + πœ–] ≀ exp(βˆ’ 2+πœ–
We wish to find a value for πœ– so that the last quantity is at most
2
πœ–
For πœ– β‰₯ 2, we have 2+πœ–
β‰₯
we set πœ– := 4 ln 𝑛 β‰₯ 2.
πœ–2
2πœ–
1
.
𝑛2
= 2πœ– . Hence, the last quantity is at most exp(βˆ’ 2πœ– ), which equals
1
,
𝑛2
if
Corollary 5.2 Using union bound, the probability that there exists a bin with more than 1 + 4 ln 𝑛
balls is at most 𝑛1 .
6
Homework Preview
1. The Other Half of Chernoff. Suppose 𝑋0 , π‘‹βˆ‘
1 , . . . , π‘‹π‘›βˆ’1 are independent {0, 1}-random
variables, each having expectation 𝑝. Let π‘Œ := 𝑖 𝑋𝑖 and πœ‡ := 𝐸[π‘Œ ]. Using the method of
moment generating function, prove the following.
(
)πœ‡
π‘’πœ–
For all πœ– > 0, 𝑃 π‘Ÿ[π‘Œ βˆ’ πœ‡ β‰₯ πœ–πœ‡] ≀ (1+πœ–)
.
1+πœ–
2. 𝑛 Balls into 𝑛 Bins (Revisited). Using the Chernoff Bound from the previous question,
we can obtain a better bound for the balls and bins problem. Suppose 𝑛 balls are thrown
independently and uniformly at random into 𝑛 bins. Let π‘Œ1 be the number of balls in the
first bin.
(a) Find a number 𝑁 in terms of 𝑛 such that 𝑃 π‘Ÿ[π‘Œ1 β‰₯ 𝑁 ] ≀ 𝑛12 . Please give the exact form
and do not use big O notation for this part of the question.
πœ† ln 𝑛
(Hint: if you need to find a number π‘Š such that π‘Š ln π‘Š β‰₯ ln 𝑛, try setting π‘Š := ln
ln 𝑛 ,
for some constant πœ† > 0. You can also assume that 𝑛 is large enough, say 𝑛 β‰₯ 100.)
(b) Show that with probability at least 1 βˆ’ 𝑛1 , no bin contains more than Θ( logloglog𝑛 𝑛 ) balls.
5