Download 7. Conjugate functions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
L. Vandenberghe
EE236C (Spring 2016)
7. Conjugate functions
• closed functions
• conjugate function
• duality
7-1
Closed set
a set C is closed if it contains its boundary:
xk ∈ C,
xk → x̄
=⇒
x̄ ∈ C
Operations that preserve closedness
• the intersection of (finitely or infinitely many) closed sets is closed
• the union of a finite number of closed sets is closed
• inverse under linear mapping: {x | Ax ∈ C} is closed if C is closed
Conjugate functions
7-2
Image under linear mapping
the image of a closed set under a linear mapping is not necessarily closed
Example
C = {(x1, x2) ∈
R2+
| x1x2 ≥ 1},
A=
1 0
,
AC = R++
Sufficient condition: AC is closed if
• C is closed and convex
• and C does not have a recession direction in the nullspace of A, i.e.,
Ay = 0,
x̂ ∈ C,
x̂ + αy ∈ C for all α ≥ 0
=⇒
y=0
in particular, this holds for any matrix A if C is bounded
Conjugate functions
7-3
Closed function
Definition: a function is closed if its epigraph is a closed set
Examples
• f (x) = − log(1 − x2) with dom f = {x | |x| < 1}
• f (x) = x log x with dom f = R+ and f (0) = 0
• indicator function of a closed set C :
δC (x) =
0
x∈C
+∞ otherwise
Not closed
• f (x) = x log x with dom f = R++, or with dom f = R+ and f (0) = 1
• indicator function of a set C if C is not closed
Conjugate functions
7-4
Properties
Sublevel sets: f is closed if and only if all its sublevel sets are closed
Minimum: if f is closed with bounded sublevel sets then it has a minimizer
Common operations on convex functions that preserve closedness
• sum: f = f1 + f2 is closed if f1 and f2 are closed
• composition with affine mapping: f = g(Ax + b) is closed if g is closed
• supremum: f (x) = supα fα(x) is closed if each function fα is closed
in each case, we assume dom f 6= ∅
Conjugate functions
7-5
Outline
• closed functions
• conjugate function
• duality
Conjugate function
the conjugate of a function f is
f ∗(y) =
sup (y T x − f (x))
x∈dom f
f ∗ is closed and convex (even when f is not)
Fenchel’s inequality: the definition implies that
f (x) + f ∗(y) ≥ xT y for all x, y
this is an extension to non-quadratic convex f of the inequality
1 T
1 T
x x + y y ≥ xT y
2
2
Conjugate functions
7-6
Quadratic function
1 T
f (x) = x Ax + bT x + c
2
Strictly convex case (A 0)
1
f ∗(y) = (y − b)T A−1(y − b) − c
2
General convex case (A 0)
1
f (y) = (y − b)T A†(y − b) − c,
2
∗
Conjugate functions
dom f ∗ = range(A) + b
7-7
Negative entropy and negative logarithm
Negative entropy
f (x) =
n
X
f ∗(y) =
xi log xi
i=1
n
X
eyi−1
i=1
Negative logarithm
f (x) = −
n
X
log xi
f ∗(y) = −
i=1
n
X
log(−yi) − n
i=1
Matrix logarithm
f (X) = − log det X (dom f = Sn++)
Conjugate functions
f ∗(Y ) = − log det(−Y ) − n
7-8
Indicator function and norm
Indicator of convex set C : conjugate is the support function of C
δC (x) =
0
x∈C
+∞ x ∈
6 C
∗
δC
(y) = sup y T x
x∈C
Indicator of convex cone C : conjugate is indicator of polar (negative dual) cone
∗
δC
(y)
= δ−C ∗ (y) = δC ∗ (−y) =
0
y T x ≤ 0 ∀x ∈ C
+∞ otherwise
Norm: conjugate is indicator of unit ball for dual norm
f (x) = kxk
∗
f (y) =
0
kyk∗ ≤ 1
+∞ kyk∗ > 1
(see next page)
Conjugate functions
7-9
Proof. recall the definition of dual norm:
kyk∗ = sup xT y
kxk≤1
to evaluate f ∗(y) = supx (y T x − kxk) we distinguish two cases
• if kyk∗ ≤ 1, then (by definition of dual norm)
y T x ≤ kxk for all x
and equality holds if x = 0; therefore supx (y T x − kxk) = 0
• if kyk∗ > 1, there exists an x with kxk ≤ 1, xT y > 1; then
f ∗(y) ≥ y T (tx) − ktxk = t(y T x − kxk)
and right-hand side goes to infinity if t → ∞
Conjugate functions
7-10
Calculus rules
Separable sum
f (x1, x2) = g(x1) + h(x2)
f ∗(y1, y2) = g ∗(y1) + h∗(y2)
Scalar multiplication (α > 0)
f (x) = αg(x)
f ∗(y) = αg ∗(y/α)
f (x) = αg(x/α)
f ∗(y) = αg ∗(y)
• the operation f (x) = αg(x/α) is sometimes called ‘right scalar multiplication’
• a convenient notation is f = gα for the function (gα)(x) = αg(x/α)
• conjugates can be written concisely as (gα)∗ = αg ∗ and (αg)∗ = g ∗α
Conjugate functions
7-11
Calculus rules
Addition to affine function
f (x) = g(x) + aT x + b
f ∗(y) = g ∗(y − a) − b
Translation of argument
f (x) = g(x − b)
f ∗(y) = bT y + g ∗(y)
Composition with invertible linear mapping (A square and nonsingular)
f (x) = g(Ax)
f ∗(y) = g ∗(A−T y)
Infimal convolution
f (x) = inf (g(u) + h(v))
u+v=x
Conjugate functions
f ∗(y) = g ∗(y) + h∗(y)
7-12
The second conjugate
f ∗∗(x) =
sup
(xT y − f ∗(y))
y∈dom f ∗
• f ∗∗ is closed and convex
• from Fenchel’s inequality, xT y − f ∗(y) ≤ f (x) for all y and x; therefore
f ∗∗(x) ≤ f (x) for all x
equivalently, epi f ⊆ epi f ∗∗ (for any f )
• if f is closed and convex, then
f ∗∗(x) = f (x) for all x
equivalently, epi f = epi f ∗∗ (if f is closed convex); proof on next page
Conjugate functions
7-13
Proof (by contradiction): assume f is closed and convex, and epi f ∗∗ 6= epi f
suppose (x, f ∗∗(x)) 6∈ epi f ; then there is a strict separating hyperplane:
a
b
T z−x
s − f ∗∗(x)
≤c<0
∀(z, s) ∈ epi f
for some a, b, c with b ≤ 0 (b > 0 gives a contradiction as s → ∞)
• if b < 0, define y = a/(−b) and maximize left-hand side over (z, s) ∈ epi f :
f ∗(y) − y T x + f ∗∗(x) ≤ c/(−b) < 0
this contradicts Fenchel’s inequality
• if b = 0, choose ŷ ∈ dom f ∗ and add small multiple of (ŷ, −1) to (a, b):
a + ŷ
−
T z−x
s − f ∗∗(x)
∗
T
∗∗
≤ c + f (ŷ) − x ŷ + f (x) < 0
now apply the argument for b < 0
Conjugate functions
7-14
Conjugates and subgradients
if f is closed and convex, then
y ∈ ∂f (x)
⇐⇒
x ∈ ∂f ∗(y)
⇐⇒
xT y = f (x) + f ∗(y)
Proof. if y ∈ ∂f (x), then f ∗(y) = supu (y T u − f (u)) = y T x − f (x); hence
f ∗(v) = sup (v T u − f (u))
u
≥ v T x − f (x)
= xT (v − y) − f (x) + y T x
= f ∗(y) + xT (v − y)
this holds for all v ; therefore, x ∈ ∂f ∗(y)
reverse implication x ∈ ∂f ∗(y) =⇒ y ∈ ∂f (x) follows from f ∗∗ = f
Conjugate functions
7-15
Conjugate of strongly convex function
assume f is closed and strongly convex with parameter µ > 0
• f ∗ is defined for all y (i.e., dom f ∗ = Rn)
• f ∗ is differentiable everywhere, with gradient
∇f ∗(y) = argmax (y T x − f (x))
x
• ∇f ∗ is Lipschitz continuous with constant 1/µ
1
k∇f (y) − ∇f (y )k2 ≤ ky − y 0k2 for all y and y 0
µ
∗
Conjugate functions
∗
0
7-16
Proof: if f is strongly convex and closed
• y T x − f (x) has a unique maximizer x for every y
• x maximizes y T x − f (x) if and only if y ∈ ∂f (x); from page 7-15
y ∈ ∂f (x)
⇐⇒
x ∈ ∂f ∗(y) = {∇f ∗(y)}
hence ∇f ∗(y) = argmaxx (y T x − f (x))
• from convexity of f (x) − (µ/2)xT x:
(y − y 0)T (x − x0) ≥ µkx − x0k22
if y ∈ ∂f (x), y 0 ∈ ∂f (x0)
• this is co-coercivity of ∇f ∗ (which implies Lipschitz continuity)
(y − y 0)T (∇f ∗(y) − ∇f ∗(y 0)) ≥ µk∇f ∗(y) − ∇f ∗(y 0)k22
Conjugate functions
7-17
Outline
• closed functions
• conjugate function
• duality
Duality
primal:
minimize f (x) + g(Ax)
dual:
maximize −g ∗(z) − f ∗(−AT z)
• follows from Lagrange duality applied to reformulated primal
minimize
subject to
f (x) + g(y)
Ax = y
dual function for the formulated problem is:
inf (f (x) + z T Ax + g(y) − z T y) = −f ∗(−AT z) − g ∗(z)
x,y
• Slater’s condition (for convex f , g ): strong duality holds if there exists an x̂ with
x̂ ∈ int dom f,
Ax̂ ∈ int dom g
this also guarantees that the dual optimum is attained, if optimal value is finite
Conjugate functions
7-18
Set constraint
minimize
subject to
f (x)
Ax − b ∈ C
Primal and dual problem
primal:
minimize f (x) + δC (Ax − b)
dual:
∗
maximize −bT z − δC
(z) − f ∗(−AT z)
Examples
constraint
set C
∗
support function δC
(z)
Ax = b
{0}
0
norm inequality
kAx − bk ≤ 1
unit k · k-ball
kzk∗
conic inequality
Ax K b
−K
δK ∗ (z)
equality
Conjugate functions
7-19
Norm regularization
minimize
f (x) + kAx − bk
• take g(y) = ky − bk in general problem
minimize
f (x) + g(Ax)
• conjugate of k · k is indicator of unit ball for dual norm
g ∗(z) = bT z + δB (z) where B = {z | kzk∗ ≤ 1}
• hence, dual problem can be written as
maximize
subject to
Conjugate functions
−bT z − f ∗(−AT z)
kzk∗ ≤ 1
7-20
Optimality conditions
minimize
subject to
f (x) + g(y)
Ax = y
assume f , g are convex and Slater’s condition holds
Optimality conditions: x is optimal if and only if there exists a z such that
1. primal feasibility: x ∈ dom f and y = Ax ∈ dom g
2. x and y = Ax are minimizers of the Lagrangian f (x) + z T Ax + g(y) − z T y :
−AT z ∈ ∂f (x),
z ∈ ∂g(Ax)
if g is closed, this can be written symmetrically as
−AT z ∈ ∂f (x),
Conjugate functions
Ax ∈ ∂g ∗(z)
7-21
References
• J.-B. Hiriart-Urruty, C. Lemaréchal, Convex Analysis and Minimization
Algoritms (1993), chapter X.
• D.P. Bertsekas, A. Nedić, A.E. Ozdaglar, Convex Analysis and Optimization
(2003), chapter 7.
• R. T. Rockafellar, Convex Analysis (1970).
Conjugate functions
7-22
Related documents