Download The Chernoff bounds

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The Chernoff bound
Speaker: Chuang-Chieh Lin
Coworker: Chih-Chieh Hung
Advisor: Maw-Shang Chang
National Chung-Cheng University and
National Chiao-Tung University
Outline
•
•
•
•
•
•
•
•
Introduction
The Chernoff bound
Markov Inequality
The Moment Generating Functions
The Chernoff Bound for a Sum of Poisson Trials
The Chernoff Bound for Special cases
Set Balancing Problem
References
2
Introduction
• Goal:
– The Chernoff bound can be used in the analysis on the
tail of the distribution of the sum of independent
random variables, with some extensions to the case of
dependent or correlated random variables.
• Markov Inequality and Moment generating functions
which we shall introduce will be greatly needed.
3
Math tool
Professor Herman Chernoff’s bound,
Annal of Mathematical Statistics 1952
4
Chernoff bounds
In it s most general form, the Cherno® bound for a random variable X is obt ained as follows: for any t > 0,
E[et X ]
Pr [X ¸ a] ·
et a
A moment generating
function
or equivalent ly,
ln Pr [X ¸ a] · ¡ ta + log E[et X ]:
tX
The value of t t hat minimizes E[e ] gives t he best poset a
sible bounds.
5
Markov’s Inequality
For any random variable X ¸ 0 and any a > 0,
E[X ]
Pr [X ¸ a] ·
a :
We can use Markov s Inequality to derive t he famous
’
Chebyshev s Inequality:
’
Var [X ]
Pr [jX ¡ E[X ]j ¸ a] = Pr [(X ¡ E[X ]) 2 ¸ a2] ·
:
a2
6
Proof of the Chernoff bound
It follows directly from Markov s inequality:
’
Pr [X ¸ a] = Pr [et X ¸ et a ]
E[et X ]
·
et a
So, how to calculate this term?
7
Moment Generating Functions
M X (t) = E[et X ]:
This funct ion gets it s name because we can generate t he
i t h moment by di®erent iating M X (t) i t imes and then
evaluat ing t he result for t = 0:
di
M X (t)j
=
dt i
t= 0
E[X i ]:
T he i t h moment of r.v. X
8
Moment
Generating
Functions
(cont’d)
We can easily see why t he moment generat ing funct ion
works as follows:
di
di
M X (t)j t = 0 =
E[et X j t = 0
dt i
dt i
di X
=
et s Pr [X = s]j t = 0
dt i
s
X di
=
et s Pr [X = s]j t = 0
dt i
Xs
=
si et s Pr [X = s]j t = 0
Xs
=
si Pr [X = s]
s
= E[X i ]:
9
Moment Generating Functions (cont’d)
F Fact : If M X (t) = M Y (t) for all t 2 (¡ c; c) for some
c > 0, then X and Y have t he same distribut ion.
F If X and Y are two independent random variables, t hen
M X + Y (t) = M X (t)M Y (t):
F Let X 1; : : : ; X k be independent random variables wit h
mgf s M 1(t); : : : ; M k (t). Then t he mgf of t he random
P
’
variable Y = k X i is given by
i= 1
Yk
M Y (t) =
M i (t):
i= 1
10
Chernoff bound for the sum of Poisson
trials
• Poisson trials:
– The distribution of a sum of independent 0-1 random
variables, which may not be identical.
• Bernoulli trials:
– The same as above except that all the random variables are
identical.
11
Chernoff bound for the sum of Poisson
trials (cont’d)
F X i : i = 1; : : : ; n; mut ually independent 0-1 random variables wit h
P r [X i = 1] = pi and P r [X i = 0] = 1 ¡ pi .
Let X = X 1 + : : : + X n and E[X ] = ¹ = p1 + : : : + pn .
) M X (t) = E[et X i ] = pi et ¢1 + (1 ¡ pi )et ¢0 = pi et + (1 ¡ pi )
i
= 1 + pi (et ¡ 1) · epi ( et ¡
1) .
(Since 1 + y ≤ e y.)
F M X (t) = E[et X ] = M X (t)M X (t) : : : M X (t) · e(p1 + p2 + :::+ pn )(et ¡
1
= e( et ¡
1) ¹
2
1)
n
,
since ¹ = p1 + p2 + : : : + pn .
We will use this result later.
12
Chernoff bound for the sum of Poisson
trials (cont’d)
Poisson trials
T heorem 1: Let X = X 1 + ¢¢¢+ X n , where X 1 ; : : : ; X n are
n independent t rials such t hat P r [X i = 1] = pi holds for
each i = 1; 2; : : : ; n. T hen,
³
´¹
ed
(1) for any d > 0, P r [X ¸ (1 + d)¹ ] ·
;
( 1+ d) 1 +
(2) for d 2 (0; 1], P r [X ¸ (1 + d)¹ ] · e¡
(3) for R ¸ 6¹ , P r [X ¸ R] · 2¡
d
¹ d 2 =3 ;
R.
13
Proof of Theorem 1:
By M arkov inequality, for any t > 0 we have
For any random variable X ¸ 0 and any
a > 0, P r [X ¸ a] ·
E [X ] :
a
P r [X ¸ (1 + d)¹ ] = P r [et X ¸ et ( 1+ d) ¹ ] · E [et X ]=et ( 1+ d) ¹ ·
e( et ¡ 1) ¹ =et ( 1+ d) ¹ . For any d > 0, set t = ln(1 + d) > 0 we
have (1).
To prove (2), we need t o show for 0 < d · 1, ed =(1+ d) ( 1+ d) ·
e¡ d 2 =3 .
Taking t he logarit hm of bot h sides, we have d¡ (1+ d) ln(1+
d) + d2 =3 · 0, which can be proved wit h calculus.
To prove (3), let R = (1+ d)¹ . T hen, for R ¸ 6¹ , d = R=¹ ¡
ed
1 ¸ 5. Hence, using (1), P r [X ¸ (1+ d)¹ ] · (
)¹ ·
( 1+ d) ( 1 +
(
e ) ( 1+ d) ¹
1+ d
· (e=6) R · 2¡
d)
R.
from (1)
14
probability
• Similarly, we have:
¹ ¡ d¹
¹
¹ + d¹
X
P
n
T heorem Let X =
X i , where X 1 ; : : : ; X n are n indepeni= 1
dent Poisson t rials such t hat P r [X i = 1] = pi . Let ¹ = E [X ].
T hen, for 0 < d < 1:
³
´
¹
e¡ d
(1) P r [X · (1 ¡ d)¹ ] ·
;
(2) P r [X · (1 ¡ d)¹ ] ·
Corollary
( 1¡ d) ( 1 ¡
e¡ ¹ d 2 =2 .
d)
For 0 < d < 1, P r [jX ¡ ¹ j ¸ d¹ ] · 2e¡
¹ d 2 =3 :
15
• Example: Let X be the number of heads of n
independent fair coin flips. Applying the above
Corollary, we have:
P r [jX ¡ n=2j ¸
p
6n ln n=2] · 2 exp(¡
P r [jX ¡ n=2j ¸ n=4] · 2 exp(¡
1 n 1)
3 2 4
1 n 6 ln n
3 2
n
= 2e¡
) = 2=n:
n =24 :
Better!!
By Chebyshev s inequality, i.e. P r [jX ¡ E [X ]j ¸ a]
’
·
V a r [X ] ,
a2
we have P r [jX ¡ n=2j ¸ n=4] · 4=n.
16
Better bounds for special cases
T h eor em Let X = X 1 + ¢¢¢+ X n , where X 1 ; : : : ; X n are n
independent random variables wit h P r [X i = 1] = P r [X i =
¡ 1] = 1=2. For any a > 0, P r [X ¸ a] · e¡ a 2 =2n .
P r oof: For any t > 0, E [et X i ] = et ¢1 =2 + et ¢( ¡
1) =2.
Since et = 1 + t + t 2 =2! + ¢¢¢+ t i =i ! + ¢¢¢ and e¡ t = 1 ¡ t +
t 2 =2! + ¢¢¢+ (¡ 1) i t i =i ! + ¢¢¢, using Taylor series, we have
P
P
t
X
2i
E [e i ] =
t =(2i )! ·
(t 2 =2) i =i ! = et 2 =2 .
i¸ 0
E [et X ] =
Q
n
i¸ 0
E [et X i ] · et 2 n =2 and P r [X ¸ a] = P r [et X ¸ et a ] ·
i= 1
E [et X ]=et a ·
et 2 n =2 =et a . Set t ing t = a=n, we have P r [X ¸ a] ·
e¡ a 2 =2n . By symmet ry, we have P r [X · ¡ a] · e¡ a 2 =2n .
17
Better bounds for special cases (cont’d)
C or ol l ar y Let X
=
X 1 + ¢¢¢ + X n , where
X 1 ; : : : ; X n are n independent random variables
wit h P r [X i = 1] = P r [X i = ¡ 1] = 1=2. For any a > 0,
P r [jX j ¸ a] · 2e¡ a 2 =2n .
Let Yi = (X i + 1)=2, we have t he following corollary.
18
Better bounds for special cases (cont’d)
C or ol l ar y Let Y = Y1 + ¢¢¢+ Yn , where Y1 ; : : : ; Yn
are n independent random variables wit h P r [Yi = 1] =
P r [Yi = 0] = 1=2. Let ¹ = E[Y ] = n=2.
(1) For any a > 0, P r [Y ¸ ¹ + a] · e¡
2a 2 =n .
(2) For any d > 0, P r [Y ¸ (1 + d)¹ ] · e¡
d2 ¹
(3) For any ¹ > a > 0, P r [Y · ¹ ¡ a] · e¡
.
2a 2 =n .
(4) For any 1 > d > 0, P r [Y · (1 ¡ d)¹ ] · e¡
d2 ¹
.
Note: The details can be left for exercises. (See [MU05], pp. 70-71.)
19
Exercise
Let X be a random variable such t hat X » Geomet ric(p).
Please derive t he Cherno® bound for X , i.e., prove t hat for
any a > 0,
Pr [X > a] · K e¡
ap ,
for some const ant K .
20
An Application: Set Balancing
• Given an
0 n  m matrix A with entries
1 0 in {0,1},
1
0let
a11 a12 : : : a1m
v1
c1
B a21 a22 : : : a2m C B v2 C B c2
B .
CB . C = B .
.
.
.
@ ..
@ ..
..
..
.. A @ .. A
an1 an2 : : : anm
vm
cm
1
C
C
A
• Suppose that we are looking for a vector v with entries in {1, 1}
that minimizes
k A v k1 = max jci j:
i = 1;:::;n
21
Set Balancing (cont’d)
• The problem arises in designing statistical experiments.
• Each column of matrix A represents a subject in the
experiment and each row represents a feature.
• The vector v partitions the subjects into two disjoint
groups, so that each feature is roughly as balanced as
possible between the two groups.
22
Set Balancing (cont’d)
For example,
A v:
A:
斑馬 老虎 鯨魚 企鵝
v:
1
1
肉食性
0
1
0
0
1
2
陸生
1
1
0
0
1
1
哺乳類
1
1
1
0
1
1
產卵
0
0
0
1
We obtain that k A v k1 = 2.
23
Set Balancing (cont’d)
Set balancing: Given an n £ m matrix A wit h ent ries
0 or 1, let v be an m-dimensional vector with ent ries
in f 1; ¡ 1g and c be an n-dimensional vect or such t hat
A v = c.
T heor em For a random vect or v wit h ent ries chosen randomly and withpequal probability from t he set
f 1; ¡ 1g, Pr [maxi jci j ¸
4m ln n] · 2=n.
randomly chosen
n
A

=
v
m
c
n
m
24
Proof of Set Balancing:
P r oof: Consider t he i -t h row of A : ai =p (ai ;1 ; ¢¢¢; ai ;m ). Suppose t here
p are k 1s in ai . If k · p 4m ln n, t hen clearly
jai v j ·
4m ln n.
Suppose
k >
4m ln n, t hen t here are
P
m
k non-zero t erms in Z i =
a v , which are independent
j = 1 i ;j j
random variables, each wit h probability 1/ 2 of being eit her + 1
or ¡ 1.
By t he Cherno®
bound and t he fact m ¸ k, we have
p
P r [jZ i j >
4m ln n] · 2e¡ 4m l n n =2k · 2=n 2 . By t he union
bound we have t he bound for every row is at most 2=n.
n
A
v
m
m
ai
p
Sn
P r[ (jZ i j > 4m ln n)]
i= 1
25
References
• [MR95] Rajeev Motwani and Prabhakar Raghavan,
Randomized algorithms, Cambridge University Press, 1995.
• [MU05] Michael Mitzenmacher and Eli Upfal, Probability and
Computing - Randomized Algorithms and Probabilistic
Analysis, Cambridge University Press, 2005.
• 蔡錫鈞教授上課投影片
26