* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 14.1 Covering and Packing - Department of Statistics, Yale
Survey
Document related concepts
Transcript
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 14: Packing, covering, and consequences on minimax risk Lecturer: Yihong Wu Scribe: Yingxiang Yang, Mar 10, 2016 [Ed. Mar 12] Last lecture, we lower bounded minkθ−θ̂k I(θ; θ̂) using Shannon lower bound, and we saw that for the p dimensional n sample GLM, R∗ (Rp ) & 1 2 p nvol (Bk·k ) with respect to a loss `(θ, θ̂) = kθ − θ̂k2 and an arbitrary norm k · k. To understand why some sort of volume shows up, we further extend the lower bound obtained using Fano’s method. We first introduce the concept of packing, covering, relate them to the notion of volume, and then plug them into the lower bound obtained using the Fano’s inequality. When applied to GLM, this alternative method gives the same dependence on the dimension and the sample size for `q norms with q < ∞. 14.1 Covering and Packing Definition 14.1 (-covering). Let (V, k · k) be a normed space, and Θ ⊂ V . {V1 , ..., VN } is an -covering of Θ if Θ ⊂ ∪N i=1 B(Vi , ), or equivalently, ∀θ ∈ Θ, ∃i such that kθ − Vi k ≤ . Definition 14.2 (-packing). Let (V, k·k) be a normed space, and Θ ⊂ V . {θ1 , ..., θM } is an -packing of Θ if mini6=j kθi − θj k > (notice the inequality is strict), or equivalently ∩M i=1 B(θi , /2) = ∅. Upon defining -covering and -packing, one naturally asks what is the minimal number of -balls one needs in order to cover Θ, and what is the maximal number of /2-balls one can pack in Θ. Those numbers are defined as covering and packing numbers. Definition 14.3 (Covering number). N (Θ, k · k, ) := min{n : ∃-covering over Θ of size n}. Definition 14.4 (Packing number). M (Θ, k · k, ) := max{m : ∃-packing of Θ of size m}. Remark 14.1. Some basic remarks. • M (Θ, k · k, ) and N (Θ, k · k, ) are often abbreviated as M (), N (). • For -covering, the balls need not be disjoint. • N (Θ, k · k, ) is a decreasing function of when the norm and Θ are fixed. That is, if 0 < 1 , N and {V1 , ..., VN } is an -covering of Θ, then Θ ⊂ ∪N i=1 B(Vi , 0 ) ⊂ ∪i=1 B(Vi , 1 ). • Metric entropy: log M () and log N (). • N () < ∞ ∀ > 0 ⇔ Θ is totally bounded (In topology, a metric space is said to be totally bounded if for every > 0 there is a finite covering of the space by -balls). For example, a metric space is compact iff it is complete and totally bounded. Hence a compact metric space is totally bounded. 1 The relation between the packing number and the covering number is described in the following theorem. Theorem 14.1. Let (V, k · k) be a normed space, and Θ ⊂ V . Then (a) (b) M (Θ, k · k, 2) ≤ N (Θ, k · k, ) ≤ M (Θ, k · k, ). Proof. First prove part (b). Suppose E = {θ1 , ..., θM } is a maximal packing. Then ∀θ ∈ Θ\E, ∃i such that kθ − θi k ≤ (if this does not hold for θ then we can construct a bigger packing with θM +1 = θ). Hence E is automatically an -covering. Since N (Θ, k · k, ) is the minimal size of all possible coverings, we have M (Θ, k · k, ) ≥ N (Θ, k · k, ). We next prove part (a) by contradiction. Suppose there exists a 2-packing {θ1 , ..., θM } and an -covering {x1 , ..., xN } such that M ≥ N + 1. Then by pigeonhole, we must have θi and θj belonging to the same -ball B(xk , ) for some i = 6 j and k. This means that the distance between θi and θj cannot be more than the diameter of the ball, i.e., kθi − θj k ≤ 2, which leads to a contradiction since kθi − θj k > 2 for a 2-packing. Hence the size of any 2-packing is less or equal to the size of any -covering. Hence M (Θ, k · k, 2), the maximal size of a 2-packing is at most N (Θ, k · k, ), the minimal size of an -covering. When (V, k · k) is the d-dimensional Euclidean space, we can extend the previous theorem by further lower/upper bounding the covering/packing numbers. The result is given as follows. Theorem 14.2. Let Θ ⊂ V = Rd . Then d (b) vol(Θ + B) 1 vol(Θ) (a) 2 ≤ N (Θ, k · k, ) ≤ M (Θ, k · k, ) ≤ vol(B) vol( 2 B) (c) ≤ Θ convex B⊂Θ vol( 23 Θ) = vol( 2 B) d 3 vol(Θ) . vol(B) where + is the Minkovski sum, and B() is the norm ball with radius and B is the unit norm ball. Proof. First prove (a). For a covering of minimal size, Θ ⊂ ∪ni=1 B(Xi , ). Hence N () N () vol(Θ) ≤ vol(∪i=1 B(Xi , )) ≤ X vol(B(Xi , )). i=1 Since vol(B(Xi , )) = d vol(B), we have vol(Θ) ≤ N ()d vol(B). Hence (a) is proved. M () Next we prove (b). For an -packing, the balls B(θi , /2) are disjoint, and ∪i=1 B(θi , /2) ⊂ Θ + 2 B. Taking the volume on both sides, we have M () vol(Θ + B) ≥ vol(∪i=1 B(θi , /2)) = M ()vol( B). 2 2 This proves (b). To prove (c), we prove two statements. (1) When B ⊂ Θ, Θ + 2 B ⊂ Θ + 12 Θ, and (2) when Θ is convex, Θ + 12 Θ = 32 Θ. To prove (1), notice for any z ∈ Θ + 2 B, we have z = x + y where x ∈ 2 B and y ∈ Θ. Since x ∈ 2 B ⇒ x ∈ Θ, we immediately have z ∈ Θ + 21 Θ. 2 To prove (2), first notice that ∀θ ∈ 32 Θ, θ = 13 θ + 23 θ. Since 13 θ ∈ 12 Θ, and 23 θ ∈ Θ, 32 Θ ⊆ Θ + 12 Θ. On the other hand, for any x ∈ Θ + 12 Θ, we have x = y + 12 z with y, z ∈ Θ. When Θ is convex, 2 2 1 3 1 3 3 x = 3 y + 3 z ∈ Θ. Hence x ∈ 2 Θ, implying Θ + 2 Θ ⊆ 2 Θ. With (1) and (2), (c) follows immediately. Remark 14.2. Why is Theorem 14.1? • (a) is a converse, saying that the minimal covering size cannot be too small. When combined with N () ≤ M (), this turns into a existential statement: It is possible to construct a packing of size at least vol(Θ)/vol(B()). From the proof we see that this corresponds to a greedy construction. Furthermore, for Hamming space and Hamming distance, this is exactly the Gilbert-Varshanov bound. • (b) is a converse, saying that the maximal packing size cannot be too large. When combined with N () ≤ M (), this turns into a existence statement: there exists a small covering. Example 14.1 (Euclidean norm ball). Consider N (B2 (1), k·k2 , ).1 When ≥ 1, N (B2 (1), k·k2 , ) = 1. When < 1, activating the previous theorem we have (noticing Θ = B2 ) d d vol((1 + 2 )B) 2 d 1 3 vol(Θ) = 1+ = ≤ N () ≤ ≤ . vol(B2 ) vol( 2 B) Hence d log 1 ≤ log N () ≤ d log 3 . This relationship holds for all metric norms (those such that vol(B)d = vol(B)). If we fix the dimension and drive → 0, then because all norms in Euclidean space are equivalent, whenever Θ has interior, log N () = (d + o(1)) log 1 . 14.2 Applying metric entropy + Fano’s inequality to minimax risk We now apply metric entropy and Fano’s inequality to lower bound the minimax risk. The key idea is to reduce estimation over Θ to testing between a packing E = {θ1 , ..., θM } within T ⊂ Θ. Then R∗ (Θ) ≥ R∗ (T ) ≥ Rπ∗ where π is equi-probable over E. Let E = {θ1 , ..., θM } be an -packing on T ⊂ Θ. Let θ̃ be the quantized version of θ̂ restricted to E (θ − X − θ̂ − θ̃), and consider a quadratic loss function `(θ, θ̂) = kθ − θ̂k2 . Recall that radKL (T ) = inf sup D(Pθ kQ), Q θ∈T and diamKL (T ) = sup D(Pθ0 kPθ ). θ,θ0 ∈T 1 For unit ball B2 (1) := {x ∈ Rd : kxk2 ≤ 1} we abbreviate it as B2 . 3 We immediately have i h i M arkov 2 h i 2 h P kθ − θ̂k ≥ P θ 6= θ̃ E kθ − θ̂k2 ≥ ≥ 2 2 2 F ano I(θ; X) + log 2 ≥ 1− 2 log M () 2 radKL (T ) + log 2 1− ≥ 4 log M () 2 diamKL (T ) + log 2 ≥ sup , 1− T ⊂Θ,>0 4 log vol(T ) (14.1) vol(B) where in the last step, the inequality holds true for all choices of T and , and the supremum is placed to obtain a better bound. For GLM, we can use the above method (Fano+packing) to obtain the same result (up to constant factor) by Shannon Lower Bound. Example 14.2 (p-dimensional n-sample GLM). Let Θ = Rp , and T = B2 (s). Then diamKL (T ) = supθ,θ0 ∈T D(Pθ kPθ0 ) = supθ,θ0 ∈T D(N (θ, Ip )⊗n kN (θ0 , Ip )⊗n ) = supθ,θ0 n2 kθ − θ0 k2 = n2 diam2 (T ) = n 2 2 s . By (14.1), we have n 2 2 2 s + log 2 diam (T ) + log 2 KL 1 − = 1 − 2 p . R∗ ≥ vol(T ) 4 4 log log sp vol(B2 ) vol(B) vol(Bk·k ) We now choose and s. Denote vol(Bk·k ) = V , and recall that vol1/p (B2 ) √1 , p i.e., c1 √1p < vol1/p (B2 ) < c2 √1p . If we choose r s = c3 then c24 R∗ ≥ 4nV 2/p p 1 , = c4 √ 1/p , n nV c23 p 2 + log 2 1− p log c1c4c3 ! c24 ≥ 4nV 2/p c23 2 + log 2 1− log c1c4c3 ! . c2 As long as we choose c1 , c2 , c3 , c4 such that ( 23 + log 2)/ log c1c4c3 < c < 1, we have 1 c24 (1 − c) & . (14.2) 2/p 4nV nV 2/p Remark 14.3. When the specified norm is k · k∞ , the norm ball becomes a cube, and the volume is (for fixed values of p) V = 2p . R∗ ≥ Hence R∗ & n1 ; however, we know R∗ log p n and we lose the dependence on the dimension p. So what is to be blamed? It turns out our mutual information method, and in fact, its further relaxation via packing and Fano’s inequality is tight in this case. What is loose is the volume ratio bound on packing number in Theorem 14.1. In the next lecture, we will prove ( p log √1 p , . √1p log N (B, k · k, ) . 1 log(p2 ), & √1p 2 4 This will lead to the tight result R∗ log p n . 5