Download -1.5cm Intrinsic Volumes of Convex Cones Theory and Applications

Document related concepts

Scale invariance wikipedia , lookup

Rational trigonometry wikipedia , lookup

Simplex wikipedia , lookup

Analytic geometry wikipedia , lookup

Line (geometry) wikipedia , lookup

Multilateration wikipedia , lookup

History of trigonometry wikipedia , lookup

Shapley–Folkman lemma wikipedia , lookup

Invariant convex cone wikipedia , lookup

Transcript
M\cr NA
Manchester Numerical Analysis
Intrinsic Volumes of Convex Cones
Theory and Applications
Martin Lotz
School of Mathematics
The University of Manchester
with the collaboration of
Dennis Amelunxen, Michael B. McCoy, Joel A. Tropp
Chicago, July 11, 2014
Outline
Problems involving cones
Some conic integral geometry
Concentration
Outline
Problems involving cones
Some conic integral geometry
Concentration
Conic problems
Problem: find a “structured” solution x0 of m × d system (m < d)
Ax = b
by minimizing a convex regularizer
minimize
f (x) subject to
Ax = b.
(?)
1 / 27
Conic problems
Problem: find a “structured” solution x0 of m × d system (m < d)
Ax = b
by minimizing a convex regularizer
minimize
f (x) subject to
Ax = b.
(?)
Examples include:
I
x0 sparse: f (x) = kxk1 ;
I
X0 low-rank matrix: f (X) nuclear norm;
I
Models with simultaneous structures (→ talk by M. Fazel), atomic
norms (→ talk by B. Recht)
1 / 27
Example: `1 minimization
Let x0 be s-sparse, b = Ax0 for random A ∈ Rm×d (s < m < d).
minimize
kxk1
subject to
Ax = b.
1
0.9
Probability of success
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50
100
Number of equations m
150
200
s = 50, m = 25, d = 200
2 / 27
Example: `1 minimization
Let x0 be s-sparse, b = Ax0 for random A ∈ Rm×d (s < m < d).
minimize
kxk1
subject to
Ax = b.
1
0.9
Probability of success
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50
100
Number of equations m
150
200
s = 50, m = 50, d = 200
2 / 27
Example: `1 minimization
Let x0 be s-sparse, b = Ax0 for random A ∈ Rm×d (s < m < d).
minimize
kxk1
subject to
Ax = b.
1
0.9
Probability of success
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50
100
Number of equations m
150
200
s = 50, m = 75, d = 200
2 / 27
Example: `1 minimization
Let x0 be s-sparse, b = Ax0 for random A ∈ Rm×d (s < m < d).
minimize
kxk1
subject to
Ax = b.
1
0.9
Probability of success
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50
100
Number of equations m
150
200
s = 50, m = 100, d = 200
2 / 27
Example: `1 minimization
Let x0 be s-sparse, b = Ax0 for random A ∈ Rm×d (s < m < d).
minimize
kxk1
subject to
Ax = b.
1
0.9
Probability of success
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50
100
Number of equations m
150
200
s = 50, m = 125, d = 200
2 / 27
Example: `1 minimization
Let x0 be s-sparse, b = Ax0 for random A ∈ Rm×d (s < m < d).
minimize
kxk1
subject to
Ax = b.
1
0.9
Probability of success
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50
100
Number of equations m
150
200
s = 50, m = 150, d = 200
2 / 27
Example: `1 minimization
Let x0 be s-sparse, b = Ax0 for random A ∈ Rm×d (s < m < d).
minimize
kxk1
subject to
Ax = b.
1
0.9
Probability of success
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50
100
Number of equations m
150
200
s = 50, m = 175, d = 200
2 / 27
Example: `1 minimization
Let x0 be s-sparse, b = Ax0 for random A ∈ Rm×d (s < m < d).
minimize
kxk1
subject to
Ax = b.
1
0.9
Probability of success
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50
100
Number of equations m
150
200
s = 50, m = 200, d = 200
2 / 27
Phase transitions
900
100
75
600
50
300
25
0
0
25
50
75
100
0
0
10
20
30
Phase transitions for `1 and nuclear norm minimization.
3 / 27
Conic problems
Problem: find a “structured” solution x0 of m × d system (m < d)
Ax = b
by minimizing a convex regularizer
minimize
f (x) subject to
Ax = b.
(?)
(?) has x0 as unique solution if and only if the optimality condition
ker A ∩ D(f, x0 ) = {0}
is satisfied, where D(f, x0 ) is the convex descent cone of f at x0 :
[
D(f, x0 ) :=
y ∈ Rd : f (x0 + τ y) ≤ f (x0 )
τ >0
4 / 27
Success of `1 -minimization
minimize kxk1 subject to Ax = b
x0
{Ax = b}
“success”
{kxk1 ≤ kx0 k1 }
5 / 27
Success of `1 -minimization
minimize kxk1 subject to Ax = b
x0
“failure”
5 / 27
Success of `1 -minimization
minimize kxk1 subject to Ax = b
x0
“success”
descent cone
5 / 27
Success of `1 -minimization
minimize kxk1 subject to Ax = b
x0
“success”
descent cone
`1 minimization succeeds at finding a sparse vector x0 if and only if the
kernel of A misses the cone of descent directions of k·k1 at x0 .
5 / 27
Conic problems
Problem: reconstruct two signals x0 , y0 from the observation
z0 = x0 + Qy0 ,
where Q ∈ O(d), by solving
minimize
f (x)
subject to
g(y) ≤ g(y0 )
and
z0 = x + Qy.
(?)
for suitable convex functions f and g.
6 / 27
Conic problems
Problem: reconstruct two signals x0 , y0 from the observation
z0 = x0 + Qy0 ,
where Q ∈ O(d), by solving
minimize
f (x)
subject to
g(y) ≤ g(y0 )
and
z0 = x + Qy.
(?)
for suitable convex functions f and g.
I
both are sparse
I
x0 sparse (corruption), y0 ∈ {±1}d (message)
I
x0 low-rank matrix, y0 sparse (corruption)
6 / 27
Conic problems
Problem: reconstruct two signals x0 , y0 from the observation
z0 = x0 + Qy0 ,
where Q ∈ O(d), by solving
minimize
f (x)
subject to
g(y) ≤ g(y0 )
and
z0 = x + Qy.
(?)
for suitable convex functions f and g.
(?) uniquely recovers x0 , y0 , if and only if
D(f, x0 ) ∩ −QD(g, y0 ) = {0},
(McCoy-Tropp (2012,2013) → Mike’s upcoming talk)
6 / 27
Conic problems
Projections of polytopes: Let A : Rd → Rm , m < d, P ⊂ Rd polytope,
F ⊆ P a face. Then AF is a face of AP if and only if
ker A ∩ TF (P ) = {0},
where TF (P ) is tangent cone to F at P (Donoho-Tanner (2006-)).
Compressive separation: Given disjoint convex sets S1 , S2 ∈ Rd , then
AS1 ∩ AS2 = ∅ if and only if
ker A ∩ cone(S1 − S2 ) = {0}.
(Bandeira-Mixon-Recht (2014))
7 / 27
The mathematical problem
The problems mentioned motivate the following question:
Given closed convex cones C, D ⊆ Rd and a random orthogonal
transformation Q, what is the probability that they intersect:
C ∩ QD 6= {0},
(?)
I
Bounds on the intersection probability of cone with linear subspace
follow from Gordon’s escape through the mesh argument;
I
Exact formulas for probability of intersection are based on the
kinematic formula from (spherical) integral geometry.
→ topic of this talk.
8 / 27
Outline
Problems involving cones
Some conic integral geometry
Concentration
The kinematic formula
The probability that a randomly rotated cone intersects another (not
both linear spaces) is given by a discrete probability distribution, the
spherical intrinsic volumes v0 (C), . . . , vd (C):
X X
P{C ∩ QD 6= {0}} = 2
vi (C)vj (D).
k odd i+j=d+k
For the case where D = L is a linear subspace of codimension m,
vi (L) = 1 if i = d − m and vi (L) = 0 else,
X
vm+k (C).
P{C ∩ QL 6= {0}} = 2
(?)
k odd
(?) is essentially the tail of a discrete probability distribution.
9 / 27
Spherical intrinsic volumes
v2 (C)
C
v1 (C)
0
v1 (C)
v0 (C)
Let C ⊆ Rd be a polyhedral cone, Fk (C) set of k-dimensional faces. The
k-th (spherical) intrinsic volume of C is defined as
X
vk (C) =
P{ΠC (g) ∈ relint(F )}.
F ∈Fk (C)
I
For non-polyhedral cones by approximation or via Steiner formula.
I
A long history in geometry, have also appeared in statistics (as
weights of χ2 distributions).
10 / 27
Spherical intrinsic volumes: examples
I
Linear subspace L: vk (L) = 1 if dim L = k, vk (L) = 0 else.
11 / 27
Spherical intrinsic volumes: examples
I
Linear subspace L: vk (L) = 1 if dim L = k, vk (L) = 0 else.
I
Orthant Rd≥0 :
vk (Rd≥0 ) =
d −d
2
k
11 / 27
Spherical intrinsic volumes: examples
I
Linear subspace L: vk (L) = 1 if dim L = k, vk (L) = 0 else.
I
Orthant Rd≥0 :
vk (Rd≥0 ) =
I
d −d
2
k
Second order cones:
1
vk Circ(d, α) =
2
d−2 2
k−1
2
sink−1 (α) cosd−k−1 (α).
11 / 27
Spherical intrinsic volumes: examples
I
Linear subspace L: vk (L) = 1 if dim L = k, vk (L) = 0 else.
I
Orthant Rd≥0 :
vk (Rd≥0 ) =
I
Second order cones:
1
vk Circ(d, α) =
2
I
d −d
2
k
d−2 2
k−1
2
sink−1 (α) cosd−k−1 (α).
Asymptotics for tangent cones at faces of simplex and `1 -ball
(Vershik-Sporyshev (1992), Donoho (2006)).
11 / 27
Spherical intrinsic volumes: examples
I
Linear subspace L: vk (L) = 1 if dim L = k, vk (L) = 0 else.
I
Orthant Rd≥0 :
vk (Rd≥0 ) =
I
d −d
2
k
Second order cones:
1
vk Circ(d, α) =
2
d−2 2
k−1
2
sink−1 (α) cosd−k−1 (α).
I
Asymptotics for tangent cones at faces of simplex and `1 -ball
(Vershik-Sporyshev (1992), Donoho (2006)).
I
Integral representations for the semidefinite cone
(Amelunxen-Bürgisser (2012)).
11 / 27
Spherical intrinsic volumes: examples
I
Linear subspace L: vk (L) = 1 if dim L = k, vk (L) = 0 else.
I
Orthant Rd≥0 :
vk (Rd≥0 ) =
I
d −d
2
k
Second order cones:
1
vk Circ(d, α) =
2
d−2 2
k−1
2
sink−1 (α) cosd−k−1 (α).
I
Asymptotics for tangent cones at faces of simplex and `1 -ball
(Vershik-Sporyshev (1992), Donoho (2006)).
I
Integral representations for the semidefinite cone
(Amelunxen-Bürgisser (2012)).
Combinatorial expressions for regions of hyperplane arrangements
(Klivans-Swartz (2011)).
I
11 / 27
Spherical intrinsic volumes: examples
0.18
1
0.9
0.16
0.8
0.14
0.7
0.12
0.6
0.1
0.5
0.08
0.4
0.06
0.3
0.04
0.2
0.02
0.1
0
0
0
5
10
15
20
25
0
5
10
15
20
25
20
25
vk (Rd≥0 )
vk (L), dim L = k
0.35
0.1
0.09
0.3
0.08
0.25
0.07
0.06
0.2
0.05
0.15
0.04
0.03
0.1
0.02
0.05
0.01
0
0
5
10
15
Circ(d, π/4)
20
25
0
0
5
10
15
vk ({x : x1 ≤ · · · ≤ xd })
12 / 27
Concentration
Associate to a cone C the discrete
random variable XC with
0.1
0.09
P{XC = k} = vk (C),
0.08
0.07
and define the statistical dimension
as the average
0.06
0.05
0.04
0.03
δ(C) = E[XC ] =
0.02
0.01
0
d
X
kvk (C).
k=0
0
5
10
15
Circ(d, π/4)
20
25
In the examples it appears that
XC concentrates around δ(C).
13 / 27
Concentration
Theorem [ALMT14]
Let C be a convex cone, and XC a discrete random variable with
distribution P{XC = k} = vk (C). Let δ(C) = E[XC ]. Then
−λ2 /8
for λ ≥ 0,
P{|XC − δ(C)| > λ} ≤ 4 exp
ω(C) + λ
where ω(C) := min{δ(C), d − δ(C)}.
I
Improved bounds by McCoy-Tropp (Discr. Comp. Geom. 2014)
14 / 27
Approximate kinematic formula
Applying the concentration result to the kinematic formula
X X
P{C ∩ QD 6= {0}} = 2
vi (C)vj (D).
k odd i+j=d+k
gives rise to
Theorem [ALMT14]
Fix a tolerance η ∈ (0, 1). Assume one of C, D is not a subspace. Then
√
δ(C) + δ(D) ≤ d − aη d =⇒ P C ∩ QK = {0} ≥ 1 − η;
√
δ(C) + δ(D) ≥ d + aη d =⇒ P C ∩ QK = {0} ≤ η,
p
where aη := 4 log(4/η) (a0.01 < 10 and a0.001 < 12).
15 / 27
Approximate kinematic formula
Applying the concentration result to the kinematic formula
X X
P{C ∩ QD 6= {0}} = 2
vi (C)vj (D).
k odd i+j=d+k
gives rise to
Theorem [ALMT14]
Fix a tolerance η ∈ (0, 1). Assume one of C, D is not a subspace. Then
√
δ(C) + δ(D) ≤ d − aη d =⇒ P C ∩ QK = {0} ≥ 1 − η;
√
δ(C) + δ(D) ≥ d + aη d =⇒ P C ∩ QK = {0} ≤ η,
p
where aη := 4 log(4/η) (a0.01 < 10 and a0.001 < 12).
I
Interpretation: convex cones behave like linear subspaces of
dimension δ(C), δ(D) in high dimensions.
15 / 27
Statistical dimension: basic properties
I
Orthogonal invariance. δ(QC) = δ(C) for each Q ∈ O(d).
16 / 27
Statistical dimension: basic properties
I
I
Orthogonal invariance. δ(QC) = δ(C) for each Q ∈ O(d).
Subspaces. For a subspace L ⊂ Rd , δ(L) = dim(L).
16 / 27
Statistical dimension: basic properties
I
I
I
Orthogonal invariance. δ(QC) = δ(C) for each Q ∈ O(d).
Subspaces. For a subspace L ⊂ Rd , δ(L) = dim(L).
Totality.
δ(C) + δ(C ◦ ) = d.
This generalises dim(L) + dim(L⊥ ) = d for linear L.
16 / 27
Statistical dimension: basic properties
I
I
I
Orthogonal invariance. δ(QC) = δ(C) for each Q ∈ O(d).
Subspaces. For a subspace L ⊂ Rd , δ(L) = dim(L).
Totality.
δ(C) + δ(C ◦ ) = d.
This generalises dim(L) + dim(L⊥ ) = d for linear L.
C
C◦
16 / 27
Statistical dimension: basic properties
I
I
I
Orthogonal invariance. δ(QC) = δ(C) for each Q ∈ O(d).
Subspaces. For a subspace L ⊂ Rd , δ(L) = dim(L).
Totality.
δ(C) + δ(C ◦ ) = d.
This generalises dim(L) + dim(L⊥ ) = d for linear L.
I
Direct products. For each cone closed convex cone K,
δ(C × K) = δ(C) + δ(K).
In particular, invariance under embedding.
16 / 27
Statistical dimension: basic properties
I
I
I
Orthogonal invariance. δ(QC) = δ(C) for each Q ∈ O(d).
Subspaces. For a subspace L ⊂ Rd , δ(L) = dim(L).
Totality.
δ(C) + δ(C ◦ ) = d.
This generalises dim(L) + dim(L⊥ ) = d for linear L.
I
Direct products. For each cone closed convex cone K,
δ(C × K) = δ(C) + δ(K).
In particular, invariance under embedding.
I
Monotonicity. C ⊂ K implies that δ(C) ≤ δ(K).
16 / 27
Statistical dimension: basic properties
I
I
I
Expected squared Gaussian projection:
2
δ(C) = E kΠC (g)k ,
previously appeared in various contexts, among others as proxy to
Gaussian width.
Spherical formulation:
2
δ(C) := d E kΠC (θ)k
where θ ∼ Uniform(Sd−1 ).
sup hx, gi :
Relation to Gaussian width w(C) = E
x∈C∩S d−1
w(C)2 ≤ δ(C) ≤ w(C)2 + 1.
Gaussian width has played a role in the analysis of recovery via
Gordon’s comparison inequality (Rudelson-Vershynin (2008), Stojnic
(2009-), Oymak-Hassibi (2010-), Chandrasekaran et al. (2012)).
17 / 27
Examples
I
Linear subspaces. δ(L) = dim L
I
Non-negative orthant. δ(Rd≥0 ) = d/2.
Self-dual cones. We have δ(C) + δ(C ◦ ) = d, so that δ(C) = d/2 for
any self-dual cone (for example, positive semidefinite matrices).
I
I
Second-order (ice cream) cones of angle α.
Circ(d, α) := x ∈ Rn : x1 / kxk ≥ cos(α) .
Then δ Circ(d, α) ≈ d sin2 (α)
I
The cone CA = {x : x1 ≤ · · · ≤ xd }.
δ(CA ) =
d
X
1
∼ log(d).
k
k=1
18 / 27
Computing the statistical dimension
In some cases the statistical dimension of a convex cone can be compute
exactly from the intrinsic volumes:
I
Spherical cones;
I
Descent cone of f = k·k`∞ ;
Regions of hyperplane arrangements with high symmetry.
I
19 / 27
Computing the statistical dimension
In some cases the statistical dimension of a convex cone can be compute
exactly from the intrinsic volumes:
I
Spherical cones;
I
Descent cone of f = k·k`∞ ;
Regions of hyperplane arrangements with high symmetry.
I
For descent cones of convex regularizers, asymptotically expressions
follow from a blueprint developed by Stojnic (2008) and refined since:
I
x0 s-sparse, f = k·k1 .
Asymptotic formula for δ(k·k1 , x0 ) follows from Stojnic 2009.
I
X0 rank r matrix, f = k·kS1 .
Asymptotic formula based on the Marčenko-Pastur characterisation
of the empirical eigenvalue distribution of Wishart matrices.
19 / 27
Computing the statistical dimension: an example
I
I
CA = {x : x1 ≤ x2 ≤ · · · ≤ xd }.
I
Normal cone to vertex at permutahedron,
suggested as convex regularizer for
“vectors from lists” problem
(Chandrasekaran et al (2012)).
Using combinatorics (Klivans-Swartz (2011), Stanley):
v(t) =
d
X
vk (CA )tk =
k=0
1
t · (t + 1) · · · (t + d − 1).
d!
d
I
Statistical dimension: δ(CA ) =
X1
d
v(t)|t=1 =
≈ log(d).
dt
k
k=1
20 / 27
Outline
Problems involving cones
Some conic integral geometry
Concentration
The spherical Steiner formula
Recall the characterization
2
δ(C) = E kΠC (g)k
√
The measure of points within angle arccos( ε) of cone C on the sphere
is given by
Spherical Steiner Formula (Herglotz, Allendoerfer, Santaló)
Xd
2
2
P kΠC (θ)k ≥ ε =
P kΠLk (θ)k ≥ ε vk (C)
k=1
I
Lk : k-dimensional subspace
I
θ: uniform on S d−1 .
21 / 27
The spherical Steiner formula
Recall the characterization
2
δ(C) = E kΠC (g)k
√
The measure of points within angle arccos( ε) of cone C on the sphere
is given by
Spherical Steiner Formula (Herglotz, Allendoerfer, Santaló)
Xd
2
2
P kΠC (θ)k ≥ ε =
P kΠLk (θ)k ≥ ε vk (C)
k=1
I
Lk : k-dimensional subspace
I
θ: uniform on S d−1 .
→ substantially generalized by M. McCoy (McCoy-Tropp (2014))
21 / 27
The spherical Steiner formula
Volume of neighbourhood of subspheres
Xd
2
P kΠC (θ)k ≥ ε =
k=1
2
P kΠLk (θ)k ≥ ε vk (C)
|
{z
}
Beta distributed
I
√
Volume of arccos( ε)-neighbourhood of k-dimensional subsphere
satisfies
0
if ε > k/d
2
P kΠLk (θ)k ≥ ε ≈
1
if ε < k/d
22 / 27
The spherical Steiner formula
Volume of neighbourhood of subspheres
d
X
2
P kΠC (θ)k ≥ ε ≈
vk (C).
k=dεde
I
√
Volume of arccos( ε)-neighbourhood of k-dimensional subsphere
satisfies
0
if ε > k/d
2
P kΠLk (θ)k ≥ ε ≈
1
if ε < k/d
23 / 27
The spherical Steiner formula
Measure concentration
d
X
2
P kΠC (θ)k ≥ ε ≈
vk (C).
k=dεde
≈
0
1
if ε > δ(C)/d
if ε < δ(C)/d
Follows from concentration of measure, since the squared projection is
Lipschitz and concentrates near expected value δ(C).
24 / 27
The spherical Steiner formula
Let XC be a random variable with distribution given by the spherical
intrinsic volumes
P{XC = k} = vk (C).
By the spherical Steiner formula we have
P{XC ≥ εd} ≈
d
X
k=dεde
I
vk (C) ≈
0
1
if ε > δ(C)/d
if ε < δ(C)/d
Rigorous implementation uses more advanced concentration of
measure technology.
25 / 27
Some problems
I
Spherical Hadwiger conjecture:
Each continuous, rotation-invariant valuation on closed convex cones
is a linear combination of spherical intrinsic volumes.
I
Are the spherical intrinsic volumes log concave?
vk (C)2 ≥ vk−1 (C) · vk+1 (C)
I
Is the variance of XC maximised by the Lorentz cone Circπ/4 ?
I
Further develop the combinatorial approach to computing intrinsic
volumes with a view towards cones of interest in statistics (isotonic
regression).
26 / 27
For more details:
D. Amelunxen, M. Lotz, M. B. McCoy, and J. A. Tropp.
Living on the edge: phase transitions in convex probrams with random data.
Information and Inference, 2014
arXiv:1303.6672
M. B. McCoy, J. A. Tropp.
From Steiner formulas for cones to concentration of intrinsic volumes.
Discrete Comput. Geometry, 2014
D. Amelunxen, M. Lotz.
Gordon’s inequality and condition numbers in convex optimization.
To appear on arXiv these days.
Thank You!
27 / 27