Download [pdf]

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Machine Learning theory (CS 6783)
Problem set 0
September 2, 2014
This assignment is not counted towards your grade and you don’t need to submit it. Its
meant to brush up some basics and get your hands wet.
Some facts/results you will need for this assignment :
1. Markov Inequality : For any non-negative integrable random variable X, and any > 0,
P (X ≥ ) ≤
E [X]
2. Hoeffding inequality : Let X1 , . . . , Xn be n independent identically distributed (iid)
random variables such that each Xi ∈ [a, b] and let µ be the expected value of these
random variables. Then,
!
n
1 X
2n2
P Xt − µ ≥ ≤ 2 exp −
n t=1
(b − a)2
3. Berntein inequality : Let X1 , . . . , Xn be n independent identically distributed (iid)
random variables such that each Xi ∈ [a, b] and let µ be the expected value of these
random variables and let σ 2 be their variance. Then,
!
n
1 X
n2
P Xt − µ ≥ ≤ 2 exp − 2
n t=1
2σ + (b − a)/3
4. Hoeffding-Azuma inequality : Let (Xt )t≥0 be a martingale and for any t ≥ 2, |Xt −
Xt−1 | ≤ c then, for any n,
2
P (|XN − X0 | ≥ ) ≤ 2 exp −
2nc
1
Q1 Let X1 , . . . , Xn be n independent identically distributed (iid) random variables that are possibly unbounded having expected value µ. Also assume that for any i ∈ [n], |Xi | ≤ c. Use
Markov inequality to provide a bound of form
n
!
1 X
Xt − µ ≥ ≤ F (n, c, )
P n
t=1
What is the form of F (n, c, ).
Hint :
1 Pn
0
0
0
t=1 Xt ) where X1 , . . . , Xn are drawn iid from same distribution.
Xt −Xt0 has same distribution as εt (Xt −Xt0 ) where each εt is a Rademacher
(a) Write µ as E
n
(b) Notice that
random variable (ie. {±1} valued random variable which is either +1 or −1 with equal
probability)
q
P
(c) Use the fact that E n1 nt=1 εt ≤ n2
Q2 Markov Vs Hoeffding Vs Bernstein
Let D be some distribution over the interval [a, b] such that expectation of random variables
X’s drawn from D is µ and variance is σ 2 . We want the following statement to hold :
For any δ > 0 and > 0, as long as n > n(, δ), if X1 , . . . , Xn are drawn iid from D,
with probability at least 1 − δ,
n
1 X
Xt − µ ≤ n
t=1
What is n(, δ) implied by
(a) Markov bound (more specifically bound from Q1)
(b) Hoeffding inequality
(c) Bernstein inequality
Which of the three is better when
(a) σ 2 is much smaller compared to (a − b)2 and δ is large (say 1/2)
(b) σ 2 is much smaller compared to (a − b)2 and δ is small
(c) σ 2 is large and δ is small
Basically get a feel for when each of these bounds are useful.
Q3 In this question we will learn to derive what is called Bounded difference inequality or McDiarmid’s inequality using Hoeffding-Azuma inequality. The bounded difference inequality
states that :
2
Theorem 1. Consider independent X valued random variables X1 , . . . , Xn . Let F : X n 7→
R be any function such that for any x1 , . . . , xn , x01 , . . . , x0n and any i ∈ [n],
|F (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) − F (x1 , . . . , xi−1 , x0i , xi+1 , . . . , xn )| ≤ c
(the above is called the bounded difference property). Then we have that
2
P (|F (X1 , . . . , Xn ) − E [F ]| ≥ ) ≤ 2 exp −
2nc2
Prove the above theorem using Hoeffding Azuma bound.
Hint :
(a) Define the right martingale sequence using conditional expectations of the function
(b) Use the bounded difference inequality to show that the premise of the HoeffdingAzuma bound holds.
3