* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download -1.5cm Intrinsic Volumes of Convex Cones Theory and Applications
Survey
Document related concepts
Transcript
M\cr NA Manchester Numerical Analysis Intrinsic Volumes of Convex Cones Theory and Applications Martin Lotz School of Mathematics The University of Manchester with the collaboration of Dennis Amelunxen, Michael B. McCoy, Joel A. Tropp Chicago, July 11, 2014 Outline Problems involving cones Some conic integral geometry Concentration Outline Problems involving cones Some conic integral geometry Concentration Conic problems Problem: find a “structured” solution x0 of m × d system (m < d) Ax = b by minimizing a convex regularizer minimize f (x) subject to Ax = b. (?) 1 / 27 Conic problems Problem: find a “structured” solution x0 of m × d system (m < d) Ax = b by minimizing a convex regularizer minimize f (x) subject to Ax = b. (?) Examples include: I x0 sparse: f (x) = kxk1 ; I X0 low-rank matrix: f (X) nuclear norm; I Models with simultaneous structures (→ talk by M. Fazel), atomic norms (→ talk by B. Recht) 1 / 27 Example: `1 minimization Let x0 be s-sparse, b = Ax0 for random A ∈ Rm×d (s < m < d). minimize kxk1 subject to Ax = b. 1 0.9 Probability of success 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 Number of equations m 150 200 s = 50, m = 25, d = 200 2 / 27 Example: `1 minimization Let x0 be s-sparse, b = Ax0 for random A ∈ Rm×d (s < m < d). minimize kxk1 subject to Ax = b. 1 0.9 Probability of success 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 Number of equations m 150 200 s = 50, m = 50, d = 200 2 / 27 Example: `1 minimization Let x0 be s-sparse, b = Ax0 for random A ∈ Rm×d (s < m < d). minimize kxk1 subject to Ax = b. 1 0.9 Probability of success 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 Number of equations m 150 200 s = 50, m = 75, d = 200 2 / 27 Example: `1 minimization Let x0 be s-sparse, b = Ax0 for random A ∈ Rm×d (s < m < d). minimize kxk1 subject to Ax = b. 1 0.9 Probability of success 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 Number of equations m 150 200 s = 50, m = 100, d = 200 2 / 27 Example: `1 minimization Let x0 be s-sparse, b = Ax0 for random A ∈ Rm×d (s < m < d). minimize kxk1 subject to Ax = b. 1 0.9 Probability of success 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 Number of equations m 150 200 s = 50, m = 125, d = 200 2 / 27 Example: `1 minimization Let x0 be s-sparse, b = Ax0 for random A ∈ Rm×d (s < m < d). minimize kxk1 subject to Ax = b. 1 0.9 Probability of success 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 Number of equations m 150 200 s = 50, m = 150, d = 200 2 / 27 Example: `1 minimization Let x0 be s-sparse, b = Ax0 for random A ∈ Rm×d (s < m < d). minimize kxk1 subject to Ax = b. 1 0.9 Probability of success 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 Number of equations m 150 200 s = 50, m = 175, d = 200 2 / 27 Example: `1 minimization Let x0 be s-sparse, b = Ax0 for random A ∈ Rm×d (s < m < d). minimize kxk1 subject to Ax = b. 1 0.9 Probability of success 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 50 100 Number of equations m 150 200 s = 50, m = 200, d = 200 2 / 27 Phase transitions 900 100 75 600 50 300 25 0 0 25 50 75 100 0 0 10 20 30 Phase transitions for `1 and nuclear norm minimization. 3 / 27 Conic problems Problem: find a “structured” solution x0 of m × d system (m < d) Ax = b by minimizing a convex regularizer minimize f (x) subject to Ax = b. (?) (?) has x0 as unique solution if and only if the optimality condition ker A ∩ D(f, x0 ) = {0} is satisfied, where D(f, x0 ) is the convex descent cone of f at x0 : [ D(f, x0 ) := y ∈ Rd : f (x0 + τ y) ≤ f (x0 ) τ >0 4 / 27 Success of `1 -minimization minimize kxk1 subject to Ax = b x0 {Ax = b} “success” {kxk1 ≤ kx0 k1 } 5 / 27 Success of `1 -minimization minimize kxk1 subject to Ax = b x0 “failure” 5 / 27 Success of `1 -minimization minimize kxk1 subject to Ax = b x0 “success” descent cone 5 / 27 Success of `1 -minimization minimize kxk1 subject to Ax = b x0 “success” descent cone `1 minimization succeeds at finding a sparse vector x0 if and only if the kernel of A misses the cone of descent directions of k·k1 at x0 . 5 / 27 Conic problems Problem: reconstruct two signals x0 , y0 from the observation z0 = x0 + Qy0 , where Q ∈ O(d), by solving minimize f (x) subject to g(y) ≤ g(y0 ) and z0 = x + Qy. (?) for suitable convex functions f and g. 6 / 27 Conic problems Problem: reconstruct two signals x0 , y0 from the observation z0 = x0 + Qy0 , where Q ∈ O(d), by solving minimize f (x) subject to g(y) ≤ g(y0 ) and z0 = x + Qy. (?) for suitable convex functions f and g. I both are sparse I x0 sparse (corruption), y0 ∈ {±1}d (message) I x0 low-rank matrix, y0 sparse (corruption) 6 / 27 Conic problems Problem: reconstruct two signals x0 , y0 from the observation z0 = x0 + Qy0 , where Q ∈ O(d), by solving minimize f (x) subject to g(y) ≤ g(y0 ) and z0 = x + Qy. (?) for suitable convex functions f and g. (?) uniquely recovers x0 , y0 , if and only if D(f, x0 ) ∩ −QD(g, y0 ) = {0}, (McCoy-Tropp (2012,2013) → Mike’s upcoming talk) 6 / 27 Conic problems Projections of polytopes: Let A : Rd → Rm , m < d, P ⊂ Rd polytope, F ⊆ P a face. Then AF is a face of AP if and only if ker A ∩ TF (P ) = {0}, where TF (P ) is tangent cone to F at P (Donoho-Tanner (2006-)). Compressive separation: Given disjoint convex sets S1 , S2 ∈ Rd , then AS1 ∩ AS2 = ∅ if and only if ker A ∩ cone(S1 − S2 ) = {0}. (Bandeira-Mixon-Recht (2014)) 7 / 27 The mathematical problem The problems mentioned motivate the following question: Given closed convex cones C, D ⊆ Rd and a random orthogonal transformation Q, what is the probability that they intersect: C ∩ QD 6= {0}, (?) I Bounds on the intersection probability of cone with linear subspace follow from Gordon’s escape through the mesh argument; I Exact formulas for probability of intersection are based on the kinematic formula from (spherical) integral geometry. → topic of this talk. 8 / 27 Outline Problems involving cones Some conic integral geometry Concentration The kinematic formula The probability that a randomly rotated cone intersects another (not both linear spaces) is given by a discrete probability distribution, the spherical intrinsic volumes v0 (C), . . . , vd (C): X X P{C ∩ QD 6= {0}} = 2 vi (C)vj (D). k odd i+j=d+k For the case where D = L is a linear subspace of codimension m, vi (L) = 1 if i = d − m and vi (L) = 0 else, X vm+k (C). P{C ∩ QL 6= {0}} = 2 (?) k odd (?) is essentially the tail of a discrete probability distribution. 9 / 27 Spherical intrinsic volumes v2 (C) C v1 (C) 0 v1 (C) v0 (C) Let C ⊆ Rd be a polyhedral cone, Fk (C) set of k-dimensional faces. The k-th (spherical) intrinsic volume of C is defined as X vk (C) = P{ΠC (g) ∈ relint(F )}. F ∈Fk (C) I For non-polyhedral cones by approximation or via Steiner formula. I A long history in geometry, have also appeared in statistics (as weights of χ2 distributions). 10 / 27 Spherical intrinsic volumes: examples I Linear subspace L: vk (L) = 1 if dim L = k, vk (L) = 0 else. 11 / 27 Spherical intrinsic volumes: examples I Linear subspace L: vk (L) = 1 if dim L = k, vk (L) = 0 else. I Orthant Rd≥0 : vk (Rd≥0 ) = d −d 2 k 11 / 27 Spherical intrinsic volumes: examples I Linear subspace L: vk (L) = 1 if dim L = k, vk (L) = 0 else. I Orthant Rd≥0 : vk (Rd≥0 ) = I d −d 2 k Second order cones: 1 vk Circ(d, α) = 2 d−2 2 k−1 2 sink−1 (α) cosd−k−1 (α). 11 / 27 Spherical intrinsic volumes: examples I Linear subspace L: vk (L) = 1 if dim L = k, vk (L) = 0 else. I Orthant Rd≥0 : vk (Rd≥0 ) = I Second order cones: 1 vk Circ(d, α) = 2 I d −d 2 k d−2 2 k−1 2 sink−1 (α) cosd−k−1 (α). Asymptotics for tangent cones at faces of simplex and `1 -ball (Vershik-Sporyshev (1992), Donoho (2006)). 11 / 27 Spherical intrinsic volumes: examples I Linear subspace L: vk (L) = 1 if dim L = k, vk (L) = 0 else. I Orthant Rd≥0 : vk (Rd≥0 ) = I d −d 2 k Second order cones: 1 vk Circ(d, α) = 2 d−2 2 k−1 2 sink−1 (α) cosd−k−1 (α). I Asymptotics for tangent cones at faces of simplex and `1 -ball (Vershik-Sporyshev (1992), Donoho (2006)). I Integral representations for the semidefinite cone (Amelunxen-Bürgisser (2012)). 11 / 27 Spherical intrinsic volumes: examples I Linear subspace L: vk (L) = 1 if dim L = k, vk (L) = 0 else. I Orthant Rd≥0 : vk (Rd≥0 ) = I d −d 2 k Second order cones: 1 vk Circ(d, α) = 2 d−2 2 k−1 2 sink−1 (α) cosd−k−1 (α). I Asymptotics for tangent cones at faces of simplex and `1 -ball (Vershik-Sporyshev (1992), Donoho (2006)). I Integral representations for the semidefinite cone (Amelunxen-Bürgisser (2012)). Combinatorial expressions for regions of hyperplane arrangements (Klivans-Swartz (2011)). I 11 / 27 Spherical intrinsic volumes: examples 0.18 1 0.9 0.16 0.8 0.14 0.7 0.12 0.6 0.1 0.5 0.08 0.4 0.06 0.3 0.04 0.2 0.02 0.1 0 0 0 5 10 15 20 25 0 5 10 15 20 25 20 25 vk (Rd≥0 ) vk (L), dim L = k 0.35 0.1 0.09 0.3 0.08 0.25 0.07 0.06 0.2 0.05 0.15 0.04 0.03 0.1 0.02 0.05 0.01 0 0 5 10 15 Circ(d, π/4) 20 25 0 0 5 10 15 vk ({x : x1 ≤ · · · ≤ xd }) 12 / 27 Concentration Associate to a cone C the discrete random variable XC with 0.1 0.09 P{XC = k} = vk (C), 0.08 0.07 and define the statistical dimension as the average 0.06 0.05 0.04 0.03 δ(C) = E[XC ] = 0.02 0.01 0 d X kvk (C). k=0 0 5 10 15 Circ(d, π/4) 20 25 In the examples it appears that XC concentrates around δ(C). 13 / 27 Concentration Theorem [ALMT14] Let C be a convex cone, and XC a discrete random variable with distribution P{XC = k} = vk (C). Let δ(C) = E[XC ]. Then −λ2 /8 for λ ≥ 0, P{|XC − δ(C)| > λ} ≤ 4 exp ω(C) + λ where ω(C) := min{δ(C), d − δ(C)}. I Improved bounds by McCoy-Tropp (Discr. Comp. Geom. 2014) 14 / 27 Approximate kinematic formula Applying the concentration result to the kinematic formula X X P{C ∩ QD 6= {0}} = 2 vi (C)vj (D). k odd i+j=d+k gives rise to Theorem [ALMT14] Fix a tolerance η ∈ (0, 1). Assume one of C, D is not a subspace. Then √ δ(C) + δ(D) ≤ d − aη d =⇒ P C ∩ QK = {0} ≥ 1 − η; √ δ(C) + δ(D) ≥ d + aη d =⇒ P C ∩ QK = {0} ≤ η, p where aη := 4 log(4/η) (a0.01 < 10 and a0.001 < 12). 15 / 27 Approximate kinematic formula Applying the concentration result to the kinematic formula X X P{C ∩ QD 6= {0}} = 2 vi (C)vj (D). k odd i+j=d+k gives rise to Theorem [ALMT14] Fix a tolerance η ∈ (0, 1). Assume one of C, D is not a subspace. Then √ δ(C) + δ(D) ≤ d − aη d =⇒ P C ∩ QK = {0} ≥ 1 − η; √ δ(C) + δ(D) ≥ d + aη d =⇒ P C ∩ QK = {0} ≤ η, p where aη := 4 log(4/η) (a0.01 < 10 and a0.001 < 12). I Interpretation: convex cones behave like linear subspaces of dimension δ(C), δ(D) in high dimensions. 15 / 27 Statistical dimension: basic properties I Orthogonal invariance. δ(QC) = δ(C) for each Q ∈ O(d). 16 / 27 Statistical dimension: basic properties I I Orthogonal invariance. δ(QC) = δ(C) for each Q ∈ O(d). Subspaces. For a subspace L ⊂ Rd , δ(L) = dim(L). 16 / 27 Statistical dimension: basic properties I I I Orthogonal invariance. δ(QC) = δ(C) for each Q ∈ O(d). Subspaces. For a subspace L ⊂ Rd , δ(L) = dim(L). Totality. δ(C) + δ(C ◦ ) = d. This generalises dim(L) + dim(L⊥ ) = d for linear L. 16 / 27 Statistical dimension: basic properties I I I Orthogonal invariance. δ(QC) = δ(C) for each Q ∈ O(d). Subspaces. For a subspace L ⊂ Rd , δ(L) = dim(L). Totality. δ(C) + δ(C ◦ ) = d. This generalises dim(L) + dim(L⊥ ) = d for linear L. C C◦ 16 / 27 Statistical dimension: basic properties I I I Orthogonal invariance. δ(QC) = δ(C) for each Q ∈ O(d). Subspaces. For a subspace L ⊂ Rd , δ(L) = dim(L). Totality. δ(C) + δ(C ◦ ) = d. This generalises dim(L) + dim(L⊥ ) = d for linear L. I Direct products. For each cone closed convex cone K, δ(C × K) = δ(C) + δ(K). In particular, invariance under embedding. 16 / 27 Statistical dimension: basic properties I I I Orthogonal invariance. δ(QC) = δ(C) for each Q ∈ O(d). Subspaces. For a subspace L ⊂ Rd , δ(L) = dim(L). Totality. δ(C) + δ(C ◦ ) = d. This generalises dim(L) + dim(L⊥ ) = d for linear L. I Direct products. For each cone closed convex cone K, δ(C × K) = δ(C) + δ(K). In particular, invariance under embedding. I Monotonicity. C ⊂ K implies that δ(C) ≤ δ(K). 16 / 27 Statistical dimension: basic properties I I I Expected squared Gaussian projection: 2 δ(C) = E kΠC (g)k , previously appeared in various contexts, among others as proxy to Gaussian width. Spherical formulation: 2 δ(C) := d E kΠC (θ)k where θ ∼ Uniform(Sd−1 ). sup hx, gi : Relation to Gaussian width w(C) = E x∈C∩S d−1 w(C)2 ≤ δ(C) ≤ w(C)2 + 1. Gaussian width has played a role in the analysis of recovery via Gordon’s comparison inequality (Rudelson-Vershynin (2008), Stojnic (2009-), Oymak-Hassibi (2010-), Chandrasekaran et al. (2012)). 17 / 27 Examples I Linear subspaces. δ(L) = dim L I Non-negative orthant. δ(Rd≥0 ) = d/2. Self-dual cones. We have δ(C) + δ(C ◦ ) = d, so that δ(C) = d/2 for any self-dual cone (for example, positive semidefinite matrices). I I Second-order (ice cream) cones of angle α. Circ(d, α) := x ∈ Rn : x1 / kxk ≥ cos(α) . Then δ Circ(d, α) ≈ d sin2 (α) I The cone CA = {x : x1 ≤ · · · ≤ xd }. δ(CA ) = d X 1 ∼ log(d). k k=1 18 / 27 Computing the statistical dimension In some cases the statistical dimension of a convex cone can be compute exactly from the intrinsic volumes: I Spherical cones; I Descent cone of f = k·k`∞ ; Regions of hyperplane arrangements with high symmetry. I 19 / 27 Computing the statistical dimension In some cases the statistical dimension of a convex cone can be compute exactly from the intrinsic volumes: I Spherical cones; I Descent cone of f = k·k`∞ ; Regions of hyperplane arrangements with high symmetry. I For descent cones of convex regularizers, asymptotically expressions follow from a blueprint developed by Stojnic (2008) and refined since: I x0 s-sparse, f = k·k1 . Asymptotic formula for δ(k·k1 , x0 ) follows from Stojnic 2009. I X0 rank r matrix, f = k·kS1 . Asymptotic formula based on the Marčenko-Pastur characterisation of the empirical eigenvalue distribution of Wishart matrices. 19 / 27 Computing the statistical dimension: an example I I CA = {x : x1 ≤ x2 ≤ · · · ≤ xd }. I Normal cone to vertex at permutahedron, suggested as convex regularizer for “vectors from lists” problem (Chandrasekaran et al (2012)). Using combinatorics (Klivans-Swartz (2011), Stanley): v(t) = d X vk (CA )tk = k=0 1 t · (t + 1) · · · (t + d − 1). d! d I Statistical dimension: δ(CA ) = X1 d v(t)|t=1 = ≈ log(d). dt k k=1 20 / 27 Outline Problems involving cones Some conic integral geometry Concentration The spherical Steiner formula Recall the characterization 2 δ(C) = E kΠC (g)k √ The measure of points within angle arccos( ε) of cone C on the sphere is given by Spherical Steiner Formula (Herglotz, Allendoerfer, Santaló) Xd 2 2 P kΠC (θ)k ≥ ε = P kΠLk (θ)k ≥ ε vk (C) k=1 I Lk : k-dimensional subspace I θ: uniform on S d−1 . 21 / 27 The spherical Steiner formula Recall the characterization 2 δ(C) = E kΠC (g)k √ The measure of points within angle arccos( ε) of cone C on the sphere is given by Spherical Steiner Formula (Herglotz, Allendoerfer, Santaló) Xd 2 2 P kΠC (θ)k ≥ ε = P kΠLk (θ)k ≥ ε vk (C) k=1 I Lk : k-dimensional subspace I θ: uniform on S d−1 . → substantially generalized by M. McCoy (McCoy-Tropp (2014)) 21 / 27 The spherical Steiner formula Volume of neighbourhood of subspheres Xd 2 P kΠC (θ)k ≥ ε = k=1 2 P kΠLk (θ)k ≥ ε vk (C) | {z } Beta distributed I √ Volume of arccos( ε)-neighbourhood of k-dimensional subsphere satisfies 0 if ε > k/d 2 P kΠLk (θ)k ≥ ε ≈ 1 if ε < k/d 22 / 27 The spherical Steiner formula Volume of neighbourhood of subspheres d X 2 P kΠC (θ)k ≥ ε ≈ vk (C). k=dεde I √ Volume of arccos( ε)-neighbourhood of k-dimensional subsphere satisfies 0 if ε > k/d 2 P kΠLk (θ)k ≥ ε ≈ 1 if ε < k/d 23 / 27 The spherical Steiner formula Measure concentration d X 2 P kΠC (θ)k ≥ ε ≈ vk (C). k=dεde ≈ 0 1 if ε > δ(C)/d if ε < δ(C)/d Follows from concentration of measure, since the squared projection is Lipschitz and concentrates near expected value δ(C). 24 / 27 The spherical Steiner formula Let XC be a random variable with distribution given by the spherical intrinsic volumes P{XC = k} = vk (C). By the spherical Steiner formula we have P{XC ≥ εd} ≈ d X k=dεde I vk (C) ≈ 0 1 if ε > δ(C)/d if ε < δ(C)/d Rigorous implementation uses more advanced concentration of measure technology. 25 / 27 Some problems I Spherical Hadwiger conjecture: Each continuous, rotation-invariant valuation on closed convex cones is a linear combination of spherical intrinsic volumes. I Are the spherical intrinsic volumes log concave? vk (C)2 ≥ vk−1 (C) · vk+1 (C) I Is the variance of XC maximised by the Lorentz cone Circπ/4 ? I Further develop the combinatorial approach to computing intrinsic volumes with a view towards cones of interest in statistics (isotonic regression). 26 / 27 For more details: D. Amelunxen, M. Lotz, M. B. McCoy, and J. A. Tropp. Living on the edge: phase transitions in convex probrams with random data. Information and Inference, 2014 arXiv:1303.6672 M. B. McCoy, J. A. Tropp. From Steiner formulas for cones to concentration of intrinsic volumes. Discrete Comput. Geometry, 2014 D. Amelunxen, M. Lotz. Gordon’s inequality and condition numbers in convex optimization. To appear on arXiv these days. Thank You! 27 / 27