Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Machine Learning theory (CS 6783) Problem set 0 September 2, 2014 This assignment is not counted towards your grade and you don’t need to submit it. Its meant to brush up some basics and get your hands wet. Some facts/results you will need for this assignment : 1. Markov Inequality : For any non-negative integrable random variable X, and any > 0, P (X ≥ ) ≤ E [X] 2. Hoeffding inequality : Let X1 , . . . , Xn be n independent identically distributed (iid) random variables such that each Xi ∈ [a, b] and let µ be the expected value of these random variables. Then, ! n 1 X 2n2 P Xt − µ ≥ ≤ 2 exp − n t=1 (b − a)2 3. Berntein inequality : Let X1 , . . . , Xn be n independent identically distributed (iid) random variables such that each Xi ∈ [a, b] and let µ be the expected value of these random variables and let σ 2 be their variance. Then, ! n 1 X n2 P Xt − µ ≥ ≤ 2 exp − 2 n t=1 2σ + (b − a)/3 4. Hoeffding-Azuma inequality : Let (Xt )t≥0 be a martingale and for any t ≥ 2, |Xt − Xt−1 | ≤ c then, for any n, 2 P (|XN − X0 | ≥ ) ≤ 2 exp − 2nc 1 Q1 Let X1 , . . . , Xn be n independent identically distributed (iid) random variables that are possibly unbounded having expected value µ. Also assume that for any i ∈ [n], |Xi | ≤ c. Use Markov inequality to provide a bound of form n ! 1 X Xt − µ ≥ ≤ F (n, c, ) P n t=1 What is the form of F (n, c, ). Hint : 1 Pn 0 0 0 t=1 Xt ) where X1 , . . . , Xn are drawn iid from same distribution. Xt −Xt0 has same distribution as εt (Xt −Xt0 ) where each εt is a Rademacher (a) Write µ as E n (b) Notice that random variable (ie. {±1} valued random variable which is either +1 or −1 with equal probability) q P (c) Use the fact that E n1 nt=1 εt ≤ n2 Q2 Markov Vs Hoeffding Vs Bernstein Let D be some distribution over the interval [a, b] such that expectation of random variables X’s drawn from D is µ and variance is σ 2 . We want the following statement to hold : For any δ > 0 and > 0, as long as n > n(, δ), if X1 , . . . , Xn are drawn iid from D, with probability at least 1 − δ, n 1 X Xt − µ ≤ n t=1 What is n(, δ) implied by (a) Markov bound (more specifically bound from Q1) (b) Hoeffding inequality (c) Bernstein inequality Which of the three is better when (a) σ 2 is much smaller compared to (a − b)2 and δ is large (say 1/2) (b) σ 2 is much smaller compared to (a − b)2 and δ is small (c) σ 2 is large and δ is small Basically get a feel for when each of these bounds are useful. Q3 In this question we will learn to derive what is called Bounded difference inequality or McDiarmid’s inequality using Hoeffding-Azuma inequality. The bounded difference inequality states that : 2 Theorem 1. Consider independent X valued random variables X1 , . . . , Xn . Let F : X n 7→ R be any function such that for any x1 , . . . , xn , x01 , . . . , x0n and any i ∈ [n], |F (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) − F (x1 , . . . , xi−1 , x0i , xi+1 , . . . , xn )| ≤ c (the above is called the bounded difference property). Then we have that 2 P (|F (X1 , . . . , Xn ) − E [F ]| ≥ ) ≤ 2 exp − 2nc2 Prove the above theorem using Hoeffding Azuma bound. Hint : (a) Define the right martingale sequence using conditional expectations of the function (b) Use the bounded difference inequality to show that the premise of the HoeffdingAzuma bound holds. 3