Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
L. Vandenberghe EE236C (Spring 2016) 7. Conjugate functions • closed functions • conjugate function • duality 7-1 Closed set a set C is closed if it contains its boundary: xk ∈ C, xk → x̄ =⇒ x̄ ∈ C Operations that preserve closedness • the intersection of (finitely or infinitely many) closed sets is closed • the union of a finite number of closed sets is closed • inverse under linear mapping: {x | Ax ∈ C} is closed if C is closed Conjugate functions 7-2 Image under linear mapping the image of a closed set under a linear mapping is not necessarily closed Example C = {(x1, x2) ∈ R2+ | x1x2 ≥ 1}, A= 1 0 , AC = R++ Sufficient condition: AC is closed if • C is closed and convex • and C does not have a recession direction in the nullspace of A, i.e., Ay = 0, x̂ ∈ C, x̂ + αy ∈ C for all α ≥ 0 =⇒ y=0 in particular, this holds for any matrix A if C is bounded Conjugate functions 7-3 Closed function Definition: a function is closed if its epigraph is a closed set Examples • f (x) = − log(1 − x2) with dom f = {x | |x| < 1} • f (x) = x log x with dom f = R+ and f (0) = 0 • indicator function of a closed set C : δC (x) = 0 x∈C +∞ otherwise Not closed • f (x) = x log x with dom f = R++, or with dom f = R+ and f (0) = 1 • indicator function of a set C if C is not closed Conjugate functions 7-4 Properties Sublevel sets: f is closed if and only if all its sublevel sets are closed Minimum: if f is closed with bounded sublevel sets then it has a minimizer Common operations on convex functions that preserve closedness • sum: f = f1 + f2 is closed if f1 and f2 are closed • composition with affine mapping: f = g(Ax + b) is closed if g is closed • supremum: f (x) = supα fα(x) is closed if each function fα is closed in each case, we assume dom f 6= ∅ Conjugate functions 7-5 Outline • closed functions • conjugate function • duality Conjugate function the conjugate of a function f is f ∗(y) = sup (y T x − f (x)) x∈dom f f ∗ is closed and convex (even when f is not) Fenchel’s inequality: the definition implies that f (x) + f ∗(y) ≥ xT y for all x, y this is an extension to non-quadratic convex f of the inequality 1 T 1 T x x + y y ≥ xT y 2 2 Conjugate functions 7-6 Quadratic function 1 T f (x) = x Ax + bT x + c 2 Strictly convex case (A 0) 1 f ∗(y) = (y − b)T A−1(y − b) − c 2 General convex case (A 0) 1 f (y) = (y − b)T A†(y − b) − c, 2 ∗ Conjugate functions dom f ∗ = range(A) + b 7-7 Negative entropy and negative logarithm Negative entropy f (x) = n X f ∗(y) = xi log xi i=1 n X eyi−1 i=1 Negative logarithm f (x) = − n X log xi f ∗(y) = − i=1 n X log(−yi) − n i=1 Matrix logarithm f (X) = − log det X (dom f = Sn++) Conjugate functions f ∗(Y ) = − log det(−Y ) − n 7-8 Indicator function and norm Indicator of convex set C : conjugate is the support function of C δC (x) = 0 x∈C +∞ x ∈ 6 C ∗ δC (y) = sup y T x x∈C Indicator of convex cone C : conjugate is indicator of polar (negative dual) cone ∗ δC (y) = δ−C ∗ (y) = δC ∗ (−y) = 0 y T x ≤ 0 ∀x ∈ C +∞ otherwise Norm: conjugate is indicator of unit ball for dual norm f (x) = kxk ∗ f (y) = 0 kyk∗ ≤ 1 +∞ kyk∗ > 1 (see next page) Conjugate functions 7-9 Proof. recall the definition of dual norm: kyk∗ = sup xT y kxk≤1 to evaluate f ∗(y) = supx (y T x − kxk) we distinguish two cases • if kyk∗ ≤ 1, then (by definition of dual norm) y T x ≤ kxk for all x and equality holds if x = 0; therefore supx (y T x − kxk) = 0 • if kyk∗ > 1, there exists an x with kxk ≤ 1, xT y > 1; then f ∗(y) ≥ y T (tx) − ktxk = t(y T x − kxk) and right-hand side goes to infinity if t → ∞ Conjugate functions 7-10 Calculus rules Separable sum f (x1, x2) = g(x1) + h(x2) f ∗(y1, y2) = g ∗(y1) + h∗(y2) Scalar multiplication (α > 0) f (x) = αg(x) f ∗(y) = αg ∗(y/α) f (x) = αg(x/α) f ∗(y) = αg ∗(y) • the operation f (x) = αg(x/α) is sometimes called ‘right scalar multiplication’ • a convenient notation is f = gα for the function (gα)(x) = αg(x/α) • conjugates can be written concisely as (gα)∗ = αg ∗ and (αg)∗ = g ∗α Conjugate functions 7-11 Calculus rules Addition to affine function f (x) = g(x) + aT x + b f ∗(y) = g ∗(y − a) − b Translation of argument f (x) = g(x − b) f ∗(y) = bT y + g ∗(y) Composition with invertible linear mapping (A square and nonsingular) f (x) = g(Ax) f ∗(y) = g ∗(A−T y) Infimal convolution f (x) = inf (g(u) + h(v)) u+v=x Conjugate functions f ∗(y) = g ∗(y) + h∗(y) 7-12 The second conjugate f ∗∗(x) = sup (xT y − f ∗(y)) y∈dom f ∗ • f ∗∗ is closed and convex • from Fenchel’s inequality, xT y − f ∗(y) ≤ f (x) for all y and x; therefore f ∗∗(x) ≤ f (x) for all x equivalently, epi f ⊆ epi f ∗∗ (for any f ) • if f is closed and convex, then f ∗∗(x) = f (x) for all x equivalently, epi f = epi f ∗∗ (if f is closed convex); proof on next page Conjugate functions 7-13 Proof (by contradiction): assume f is closed and convex, and epi f ∗∗ 6= epi f suppose (x, f ∗∗(x)) 6∈ epi f ; then there is a strict separating hyperplane: a b T z−x s − f ∗∗(x) ≤c<0 ∀(z, s) ∈ epi f for some a, b, c with b ≤ 0 (b > 0 gives a contradiction as s → ∞) • if b < 0, define y = a/(−b) and maximize left-hand side over (z, s) ∈ epi f : f ∗(y) − y T x + f ∗∗(x) ≤ c/(−b) < 0 this contradicts Fenchel’s inequality • if b = 0, choose ŷ ∈ dom f ∗ and add small multiple of (ŷ, −1) to (a, b): a + ŷ − T z−x s − f ∗∗(x) ∗ T ∗∗ ≤ c + f (ŷ) − x ŷ + f (x) < 0 now apply the argument for b < 0 Conjugate functions 7-14 Conjugates and subgradients if f is closed and convex, then y ∈ ∂f (x) ⇐⇒ x ∈ ∂f ∗(y) ⇐⇒ xT y = f (x) + f ∗(y) Proof. if y ∈ ∂f (x), then f ∗(y) = supu (y T u − f (u)) = y T x − f (x); hence f ∗(v) = sup (v T u − f (u)) u ≥ v T x − f (x) = xT (v − y) − f (x) + y T x = f ∗(y) + xT (v − y) this holds for all v ; therefore, x ∈ ∂f ∗(y) reverse implication x ∈ ∂f ∗(y) =⇒ y ∈ ∂f (x) follows from f ∗∗ = f Conjugate functions 7-15 Conjugate of strongly convex function assume f is closed and strongly convex with parameter µ > 0 • f ∗ is defined for all y (i.e., dom f ∗ = Rn) • f ∗ is differentiable everywhere, with gradient ∇f ∗(y) = argmax (y T x − f (x)) x • ∇f ∗ is Lipschitz continuous with constant 1/µ 1 k∇f (y) − ∇f (y )k2 ≤ ky − y 0k2 for all y and y 0 µ ∗ Conjugate functions ∗ 0 7-16 Proof: if f is strongly convex and closed • y T x − f (x) has a unique maximizer x for every y • x maximizes y T x − f (x) if and only if y ∈ ∂f (x); from page 7-15 y ∈ ∂f (x) ⇐⇒ x ∈ ∂f ∗(y) = {∇f ∗(y)} hence ∇f ∗(y) = argmaxx (y T x − f (x)) • from convexity of f (x) − (µ/2)xT x: (y − y 0)T (x − x0) ≥ µkx − x0k22 if y ∈ ∂f (x), y 0 ∈ ∂f (x0) • this is co-coercivity of ∇f ∗ (which implies Lipschitz continuity) (y − y 0)T (∇f ∗(y) − ∇f ∗(y 0)) ≥ µk∇f ∗(y) − ∇f ∗(y 0)k22 Conjugate functions 7-17 Outline • closed functions • conjugate function • duality Duality primal: minimize f (x) + g(Ax) dual: maximize −g ∗(z) − f ∗(−AT z) • follows from Lagrange duality applied to reformulated primal minimize subject to f (x) + g(y) Ax = y dual function for the formulated problem is: inf (f (x) + z T Ax + g(y) − z T y) = −f ∗(−AT z) − g ∗(z) x,y • Slater’s condition (for convex f , g ): strong duality holds if there exists an x̂ with x̂ ∈ int dom f, Ax̂ ∈ int dom g this also guarantees that the dual optimum is attained, if optimal value is finite Conjugate functions 7-18 Set constraint minimize subject to f (x) Ax − b ∈ C Primal and dual problem primal: minimize f (x) + δC (Ax − b) dual: ∗ maximize −bT z − δC (z) − f ∗(−AT z) Examples constraint set C ∗ support function δC (z) Ax = b {0} 0 norm inequality kAx − bk ≤ 1 unit k · k-ball kzk∗ conic inequality Ax K b −K δK ∗ (z) equality Conjugate functions 7-19 Norm regularization minimize f (x) + kAx − bk • take g(y) = ky − bk in general problem minimize f (x) + g(Ax) • conjugate of k · k is indicator of unit ball for dual norm g ∗(z) = bT z + δB (z) where B = {z | kzk∗ ≤ 1} • hence, dual problem can be written as maximize subject to Conjugate functions −bT z − f ∗(−AT z) kzk∗ ≤ 1 7-20 Optimality conditions minimize subject to f (x) + g(y) Ax = y assume f , g are convex and Slater’s condition holds Optimality conditions: x is optimal if and only if there exists a z such that 1. primal feasibility: x ∈ dom f and y = Ax ∈ dom g 2. x and y = Ax are minimizers of the Lagrangian f (x) + z T Ax + g(y) − z T y : −AT z ∈ ∂f (x), z ∈ ∂g(Ax) if g is closed, this can be written symmetrically as −AT z ∈ ∂f (x), Conjugate functions Ax ∈ ∂g ∗(z) 7-21 References • J.-B. Hiriart-Urruty, C. Lemaréchal, Convex Analysis and Minimization Algoritms (1993), chapter X. • D.P. Bertsekas, A. Nedić, A.E. Ozdaglar, Convex Analysis and Optimization (2003), chapter 7. • R. T. Rockafellar, Convex Analysis (1970). Conjugate functions 7-22