Download Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 9

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Quadratic form wikipedia , lookup

Cartesian tensor wikipedia , lookup

Bra–ket notation wikipedia , lookup

Determinant wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Capelli's identity wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Universal enveloping algebra wikipedia , lookup

Oscillator representation wikipedia , lookup

Four-vector wikipedia , lookup

Heyting algebra wikipedia , lookup

Homological algebra wikipedia , lookup

Modular representation theory wikipedia , lookup

Matrix calculus wikipedia , lookup

Symmetry in quantum mechanics wikipedia , lookup

Invariant convex cone wikipedia , lookup

Laws of Form wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Complexification (Lie group) wikipedia , lookup

Geometric algebra wikipedia , lookup

Exterior algebra wikipedia , lookup

History of algebra wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Linear algebra wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Jordan normal form wikipedia , lookup

Matrix multiplication wikipedia , lookup

Clifford algebra wikipedia , lookup

Transcript
Semidefinite and Second Order Cone
Programming Seminar
Fall 2001
Lecture 9
Instructor: Farid Alizadeh
Scribe: David Phillips
11/12/2001
1
Overview
We complete last lecture’s discussion of the Newton method for LP, SDP, and
SOCP. Recall that these problems shared the following structure when Newton’s
method was applied to primal and dual feasibility and a relaxed form of complementary slackness condition that arose from applying first order conditions
to the logarithmic barrier function:
A4x
AT 4y + 4z
E4x + F4z
=
=
=
rp
rd
rc
primal feasibility
dual feasibility
complementary slackness conditions
We complete the derivation for the E and F matrices in LP, SDP, and SOCP, in
order motivate the unifying framework of Jordan algebras. Finally, we introduce
Jordan algebras.
2
2.1
Newton Method for LP, SDP, and SOCP
Linear Programming
We include the results for LP here for review. The specifics of the derivation is
in the previous lecture.
Ax
AT y + z
=
=
b
c
primal feasibility
dual feasibility
1
scribe: David Phillips
Lecture 9
Date: 11/12/2001
and any one of mathematically equivalent forms of relaxed complementary slackness conditions:
(1) xi − µz−1
i
(2) zi − µx−1
i
(3) xi zi
=
=
=
0
0
µ
i = 1, . . . , n
i = 1, . . . , n
i = 1, . . . , n
In LP the matrices E and F have the forms:
(1) E = I, F = µ Diag(z−2 )
(2) E = µ Diag(x−2 ), F = I
(3) E = Diag(z), F =Diag(x)
where, for vector v ∈ Rn ,
 −2 
1/v1
1/v−2 
 2 
v−2 =  .  .
 .. 
1/v−2
n
2.2
Semidefinite Programming
Letting

vecT (A1 )


..
A=
,
.

vecT (Am )
the specific relaxed form for SDP becomes:
Avec(4X)
= rp primal feasibility
AT 4y + vec(4Z) = rd dual feasibility
XZ
= µI complementary slackness conditions
The relaxed complementary slackness conditions has many mathematically equivalent conditions, three of which are:
(1) X − µZ−1
(2) Z − µX−1
(3) XZ + ZX
=
=
=
0
0
2µI (recall Lemma 1 of lecture 8)
The situation here is quite analogous to LP. Consider replacing Z with Z + 4Z
in (1):
−1
Z + 4Z − µ X + 4X
= 0.
Since X is in the interior of the feasible region, it is positive definite, implying
that X−1 exists, so this equation can be rewritten as:
−1
Z + 4Z − µ X I + X−1 4X
= 0
2
scribe: David Phillips
Lecture 9
Date: 11/12/2001
But X(I + X−1 4X) is not necessarily symmetric - so instead we will use the
following:
−1
Z + 4Z − µ X1/2 I + X−1/2 4XX−1/2 X1/2
=
0
Z + 4Z − µX−1/2 (I + X−1/2 4XX−1/2 )−1 X−1/2
=
0
Now, recall that for z ∈ R, if |z| < 1, then the following identity holds:
1
= 1 − z + z2 − z3 + . . .
1+z
For matrices, the analogous identity is, for square matrix Z, if the absolute value
of all eigenvalues are all less than 1, then
(1 + Z)−1 = I − Z + Z2 − Z3 + · · ·
Using this identity, we obtain,
Z + 4Z − µX−1/2 (I − X−1/2 4XX−1/2 + (X−1/2 4XX−1/2 )2 − · · · )X−1/2 = 0
Applying Newton’s method means dropping all non linear terms in ∆s:
Z + 4Z − µX−1/2 (I − X−1/2 4XX−1/2 )X−1/2 = 0
Z + 4Z − µX−1 + µX−1 4XX−1 = 0
It is clear that F = I; it will, however, be easier to represent E as E(X), where
E(X) : 4X−→µ X−1 4XX−1 . Since E is a linear transformation, it is possible
to represent it as some matrix dependent on X, and write E(X)vec(4X).
Before continuing, it will simplify the notation to introduce Kronecker products.
2.2.1
Kronecker Products
Let A and B be arbitrary matrices of dimensions m × n and p × q respectively.
Then, let ⊗ be a binary operator such that
A⊗B=C
where C is a mp × nq block matrix of form:

a11 B a12 B . . .
 ..
C= .
am1 B
...
3
a1n B
amn B



scribe: David Phillips
Lecture 9
Date: 11/12/2001
where the ij-th block entry of the product is the p × q matrix B scaled by
the ij-th entry of the A matrix (i.e., aij ), with i = 1, . . . , m and j = 1, . . . , n.
There are a number of properties that make it easy to algebraically manipulate
expressions involving Kronecker products. The fundamental property is stated
in the following
Lemma 1 Let A = (aij ), B = (bij ), C = (cij ), and D = (dij ) be matrices with
consistent dimensions so that the products AB and CD are well-defined. Then
(A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD)
(1)
Proof: To prove (1) we recall the property of matrix multiplication which states
that if X = (Xij ) and Y = (Yij ) are two matrices that are partitioned into blocks,
so that Xij and Yij are also matrices, then
P Z = XY can be written as a block
partitioned matrix Z = Zij where Zij = k Xik Ykj . The only requirement here
is that the partitionings of X and Y should be in such a way that all products
Xik Ykj for all values of i, j and k are well-defined. Now Let us look
P at the ij block
of (AB) ⊗ (CD). By definition this block equals (AB)ij (CD) = k aik bkj (CD).
Now the ij block of (A ⊗ C)(B ⊗ D) equals
X
X
X
(A⊗C)(B⊗D) ij =
(A⊗C)ik (B⊗D)kj =
aik C bkj D =
aik bkj CD,
k
k
k
which is equal to the ij block of (AB) ⊗ (CD).
The Kronecker product has a number of properties all of which can be derived
easily from (1). For example
(A ⊗ B)−1 = A−1 ⊗ B−1 .
(2)
This is proved by observing that A⊗B A−1 ⊗B−1 = AA−1 ⊗ BB−1 = I⊗I.
Now it is obvious from definition that I ⊗ I = I.
Eigenvalues and eigenvectors of Kronecker products can also be determined
easily. Let Ax = λx and By = ωy. Then Kronecker multiplying the two
sides of these equations we get (Ax) ⊗ (By) = (λx) ⊗ (ωy). Using (1) we get
(A ⊗ B)(x ⊗ y) = (λω)(x ⊗ y). Since for vectors x and y, x ⊗ y is also a vector,
we have proved that
Lemma 2 If λ is an eigenvalue of A with corresponding eigenvector x, and ω
is an eigenvalue of B with corresponding eigenvector y, then λω is an eigenvalue
of A ⊗ B with corresponding eigenvector x ⊗ y.
Other properties of Kronecker products are
(A ⊗ B)T = AT ⊗ BT
vec(ABC) = CT ⊗ A vec(B)
Exercise: Prove the above properties.
4
scribe: David Phillips
Lecture 9
2.2.2
Date: 11/12/2001
Back to SDP
Returning to the SDP relaxation, we obtain:
(1) E = I, F = −µE(Z) = −µ(Z−1 ⊗ Z−1 )
(2) E = −µE(X) = −µ(X−1 ⊗ X−1 ), F = I
and for (3):
(X + 4X)(Z + 4Z) + (Z + 4Z)(X + 4X) = 2µI
As usual, dropping nonlinear terms in ∆s we get
X4Z + 4ZX + 4XZ + Z4X = 2µI − XZ − ZX
which in Kronecker product notation can be written as
(I ⊗ X + X ⊗ I)vec(4Z) + (I ⊗ Z + Z ⊗ I)vec(4X) = rc
Thus, E = I ⊗ Z + Z ⊗ I, and F = I ⊗ X + X ⊗ I.
2.3
Second-order cone programming
For simplicity of notation, we will consider the single-block problem. The relaxed form for this is:
A(4x)
AT 4y + 4z
=
=
primal feasibility
dual feasibility
rp
rd
and the relaxed complementarity relations (as developed in our last lecture)
were:
 


x0
z0
 x1 
 −z1 
 


(1)  .  = ρ  .  = ρRz
 .. 
 .. 
xn
−zn
or in a mathematically equivalent form
 


z0
x0
 z1 
 −x1 
 


(2)  .  = γ  .  = γRx
 .. 
 .. 
zn
−xn
where ρ =
2µ
2
z2
0 −||z̄||
and γ =
2µ
2
x2
0 −||x̄||
in (1) and


1 0 ... 0
0 −1 . . . 0 


R = .
.
 ..

0 0 . . . −1
5
scribe: David Phillips
Lecture 9
Date: 11/12/2001
Since x = ρRz, multiplying by zT we get,
zT x = ρzT Rz = ρ z20 −
n
X
i=1
z2i = 2µ.
By the comments at the end of section 2.3 of the last lecture, these conditions
also imply that:
x0 zi + z0 xi = 0, i = 1, . . . , n
Thus, we get
(3) xT z = 2µ
x0 zi + xi z0 = 0
or more succinctly,
(3)
Arw (x)z = Arw (x) Arw (z)e = 2µe
where e = (1, 0, . . . , 0)T , and as we

x0
 x1


Arw (x) =  x2
 ..
 .
xn
have already defined

x1 x2 . . . x n
x0 0 . . . 0 

0 x0 . . . 0 

.. 
. 
0 ...
x0
But then, for (3), replacing x and z with x + 4x and z + 4z,
(i)
or,
(ii)
(x0 + 4x0 )(zi + 4zi ) + (z0 + 4z0 )(xi + 4xi ) = 0
x0 4zi + zi 4x0 + xi 4z0 + z0 4xi
= 2µ − xT z
x0 4z0 + z0 4x0 + . . . + xn 4zn + zn 4xn )
= 2µ − xT z
i = 1, . . . , n
i = 1, . . . , n
But (i) and (ii) are equivalent to:
Arw (x)4z + Arw (z)4x = rc
So, for (3) E = Arw (x), and F = Arw (z).
−2
Exercise: Show that for case (1), E = I and F = −µ Arw (x)
and for case
−2
(2), E = −µ Arw(z)
, and F = I
2.4
Summary
The following two tables have the different versions of the relaxed complementary problems and the corresponding E and F matrix.
6
scribe: David Phillips
Lecture 9
Date: 11/12/2001
LP
SDP
(1)
xi − µz−1
=0
i
X − µZ−1 = 0
(2)
(3)
µx−1
i
zi −
=0
xi zi = µ
−1
Z − µX = 0
XZ + ZX = 2µI
SOCP
x−
2µ
2 Rz
z2
0 −||z̄||
2µ
2 Rx
x2
0 −||x̄||
=0
=0
z−
Arw (x) Arw (z) = 2µe
Then the matrix forms for E and F are:
LP
SDP
SOCP
3
(1)
(2)
(3)
(1)
(2)
(3)
(1)
(2)
(3)
E and F Matrix Forms
E
F
−2
I
−µ
Diag
z
−µ Diag x−2
I
Diag(z)
Diag(x)
I
−µ(Z−1 ⊗ Z−1 )
−1
−1
−µ(X ⊗ X ) I
I⊗Z+Z⊗I
I⊗X+X⊗I
−2
I
−µ Arw(z)
−2
−µ Arw(x)
I
Arw(z)
Arw(x)
Euclidean Jordan Algebras
To unify the presentation of interior point algorithms for LP, SDP and SOCP, it
is convenient to introduce an algebraic structure that provides us with tools for
analyzing these three cases (and several more). This algebraic structure is called
Euclidean Jordan algebra. We first introduce Jordan algebras. (Jordan refers to
Pascal Jordan, the 20th century German physicist who, along with Max Born
and Verner Heisenberg was responsible for the so-called matrix interpretation
of quantum mechanics. It does not refer to another famous Jordan, that is
Camille Jordan, the 19th century French mathematician of Jordan blocks and
the Jordan closed curve theorem fame.)
First we have to make a comment on terminology. The word algebra is used
in two different contexts. The first context is the name a branch of mathematics
that deals with algebraic manipulations at the elementary level, or the study of
groups, rings, fields, vector spaces etc. at higher level. The second meaning of
the word algebra is a mathematical term that is a particular algebraic structure.
Here are the definitions
7
scribe: David Phillips
Lecture 9
3.1
Date: 11/12/2001
Definitions
For the purposes of the definitions, let V be some vector space defined over the
field of real numbers, that is V = Rn for some integer n. Although most of
what we will say is also valid for an arbitrary field F, (with scalar multiplication
and normal addition), this level of generality is not needed in our course and
therefore will not be treated.
Definition 1 (Algebra) An algebra, (V, ∗), is a vector space, V, with an operation, ∗, such that for any x, y, z ∈ V, α, β ∈ R,
x ∗ y ∈ V (closure property),
x ∗ (αy + βz) = α(x ∗ y) + β(x ∗ z)
(αy + βz) ∗ x = α(y ∗ x) + β(z ∗ x)
(distributive law),
Notice that the distributive law implies that the binary operation x∗y is a bilinear function of x and y. In other words, there are n×n matrices, Q1 , Q2 , . . . , Qn
such that
 T

x Q1 y
 xT Q2 y 


x ∗ y = z =  .  (bilinearity)
.
 . 
xT Qn y
Thus, it is immediate that determining the matrices Q1 , . . . , Qn determines
the multiplication, which in turn, determines the algebra. Further, given x ∈ V,
for all y ∈ V, x ∗ y = L(x)y, where L(x) is some matrix linearly dependent on
x. Thus, L() also determines the algebra.
Definition 2 (Identity Element) For a given algebra, (V, ∗), if there exists
an element e ∈ V such that for all x ∈ V
e∗x=x∗e=x ,
then e is the identity element for (V, ∗).
Exercise: Prove that if (V, ∗) has an identity element, e, then e is unique.
Definition 3 (Associative Algebra) An algebra, (V, ∗), is an associative algebra if for all x, y, z ∈ V,
(x ∗ y) ∗ z = x ∗ (y ∗ z)
Example 1 (Matrices under matrix multiplication) Let Mn be the set
of all square matrices of dimension n. Then ordinary matrix multiplication,
(denoted as ·) forms an associative (but not commutative) algebra. Note that,
for X ∈ Mn , L(X) = I ⊗ X.
8
scribe: David Phillips
Lecture 9
Date: 11/12/2001
Example 2 (A commutative, but not associative, algebra) Consider (Mn , ◦),
. This is an algebra which is commutative
where for A, B ∈ Mn , A ◦ B = AB+BA
2
but not associative. For X ∈ Mn , L(X) = I⊗X+X⊗I
.
2
Consider now, an associative algebra, (V, ∗), with a matrix L() determining
∗. We have that:
(x ∗ y) ∗ z = L(x ∗ y)z = x ∗ (y ∗ z) = L(x)(L(y)z)
Since this is true for all z it follows that for all x, y ∈ V L(x ∗ y) = L(x)L(y).
Thus, (V, ∗) is isomorphic to some subalgebra of matrices under matrix multiplication. Let us define these terms.
Definition 4 Let (V, ∗) and (U, ?) be two algebras. Let h : V → U be a function
such that f(V) is a subspace of U, and f(u ∗ v) = f(u) ? f(v). Then f is a
homomorphism from (V, ∗) to (U, ?). If f is one-to-one and onto then f is called
an isomorphism. In that case, the algebras (V, ∗) and (U, ?) are essentially the
same thing.
Definition 5 (Subalgebra) Given an algebra, (V, ∗), and a subspace U of V,
(U, ∗) is a subalgebra, if and only if U is closed under ∗. More generally, we say
that U is a subalgebra of V if it is isomorphic to a subalgebra of V.
Thus we have shown that all associative algebras are essentially homomorphic to some some algebra of square matrices. Let us see some examples of
subalgebras and isomorphisms within the algebra of matrices.
Example 3 (Some subalgebras of (Mn , ·))
1. All n × n diagonal matrices form a subalgebra of (Mn , ·).
2. All n × n upper triangular matrices form subalgebras of (Mn , ·), so do
all block upper triangular matrices. All n × n lower triangular matrices
also form a subalgebra of Mn as do block lower triangular matrices. Also
notice that under the function f(X) = XT , the algebra of upper triangular
matrices is isomorphic to the algebra of lower triangular matrices.
3. All matrices of the form
A 0
0 0
where A ∈ Mk for k < n form a subalgebra of Mn . Also notice that this
subalgebra is isomorphic to (Mk , ·), so it is accurate to say that Mk is a
subalgebra of Mn for all k ≤ n.
4. Someone in class mentioned the set of orthogonal matrices, that is matrices
Q such that QQT = I. This is not a subalgebra. Although it is indeed
closed under matrix multiplication, the set of orthogonal matrices is not
a subspace of Mn : for two orthogonal matrices, Q1 and Q2 , in general,
Q1 + Q2 is not orthogonal, nor is αQi for real numbers α 6= 1.
9
scribe: David Phillips
Lecture 9
Date: 11/12/2001
Definition 6 The subalgebra generated by x, V(x), is the set–intersection–wise
smallest subalgebra of V that contains x. More generally if S ⊂ V is any subset,
then V(S) is the smallest subalgebra of V that contains S.
V(x) can be thought of being created as follows: start with V(x) containing
only x. Then add αx for all α ∈ R. Then multiply ∗-wise everything we already
have with everything and add the products to V(x). Then form all possible linear
combinations and add the results to V(x). Continue this process of multiplying
and forming linear combinations until no new element is created. You are left
with a smallest subalgebra of V that contains x.
Definition 7 (Power Associative Algebra) For an algebra, (V, ∗), if, ∀x ∈
V, the subalgebra V(x) is associative, then (V, ∗) is called a power associative
algebra.
To show that an algebra is power associative it suffices to show that in the
product x ∗ x ∗ · · · ∗ x the order in which multiplications are carried out does not
matter. In that case it is easily seen that
k
X
V(x) := v ∈ V : v =
αi xi , αi ∈ R, k an integer ork = ∞
i=0
that is elements of V(x) are polynomials or infinite power series of an element
x.
In the case of a power-associative algebra V, for each element x ∈ V we may
also define
k
X
V[x] := v ∈ V : v =
αi xi , αi ∈ R, k
an integer
i=0
that is V[x] is the subalgebra of V(x) that is made of polynomials in x.
Remark 1 Later we shall see that V[x] and V(x) are actually equal.
Remark 2 (Mn , ◦) is power associative.
Proof: For p a nonnegative integer and X ∈ Mn , let X◦p = X ◦ X◦(p−1) , with
X◦0 = I. We first show that X◦p = Xp (ordinary matrix multiplication, which
is associative), which we can do by induction. For the base case, note that:
X◦0 = X0 = I.
10
scribe: David Phillips
Lecture 9
Date: 11/12/2001
Now, assume that X◦k = Xk . Then,
X◦(k+1)
=
=
=
=
X ◦ X◦k
X ◦ Xk
XXk + Xk X
2
Xk+1
Now, for arbitrary integers q, r, s, and real numbers αi , βj , γk (i = 0, . . . , q, j =
0, . . . , r, k = 0, . . . , s), let
U
=
q
X
α i Xi
i=0
V
=
r
X
βi Xi
i=0
W
=
s
X
γi Xi
i=0
U ◦ V = U · V, and thus
U ◦ (V ◦ W) = (U ◦ V) ◦ W
Example 4 (The algebra associated with SOCP) Let V = Rn+1 and ◦ :
V × V −→ V such that, for x, y ∈ V:


   
xT By
x0
y0


 x1   y1   x1 y0 + y1 x0 
     x2 y0 + y2 x0 

 ..  ◦  ..  = 

 .   .  
..


.
xn
yn
xn y0 + yn x0
where B is a nonsingular symmetric matrix. Note that:
1. The right-hand side is bilinear and thus (Rn+1 , ◦) is indeed an algebra
satisfying the distributive law.
2. The multiplication ◦ is in general non-associative but it is commutative,
since B is symmetric.
3. That ◦ is power associative will be shown later in the next lecture.
4. When B = I then the complementary slackness theorem for SOCP can be
expressed succinctly as x ◦ z = 0.
11
scribe: David Phillips
Lecture 9
Date: 11/12/2001
5. For B = I, the identity element is e = (1, 0, . . . , 0)T . More generally if B
is such that Be = e, then e is the identity element.
Definition 8 (Jordan Algebra (not necessarily Euclidean)) Let (J, ) be
an algebra. (J, ) is a Jordan algebra if and only if
x y = y x,
2
∀x, y ∈ J
(3)
2
x (x y) = x (x y),
∀x, y ∈ J
(4)
where x2 = x x.
Thus, a Jordan algebra is a commutative algebra which has a property similar to, but weaker than, associativity.
Remark 3 Let xy = L(x)y. Then the property (4) implies that L(x)L(x2 )y =
L(x2 )L(x)y for all y. In other words, (4) is equivalent to
L(x)L(x2 ) = L(x2 )L(x)
(5)
that is L(x) and L(x2 ) commute.
Remark 4 (Mn , ◦) is a Jordan Algebra.
Proof: Note that (3) is trivial by commutativity of matrix addition. To see
(4), note that, for X, Y ∈ Mn ,
2
X ◦ (X◦2 ◦ Y) =
X( X
3
=
=
=
=
(X
Y+YX2
)
2
2
+ (X
2
Y+YX2
)X
2
Y+XYX2 +X2 YX+YX3
)
2
2
3
2
2
+YX3
( X Y+X YX+XYX
)
2
2
X2 ( XY+YX
)
+ ( XY+YX
)X2
2
2
2
X◦2 ◦ (X ◦ Y)
Remark 5 If (V, ∗) is an associative algebra, then it induces a Jordan algebra
(V, ◦) where x ◦ y = x∗y+y∗x
. The proof given to show that (Mn , ◦) is a Jordan
2
algebra, is also valid for (V, ◦).
Lemma 3 Jordan algebras are power associative.
12
scribe: David Phillips
Lecture 9
Date: 11/12/2001
When a Jordan algebra is induced by an associative algebra, then it is easy to
show that it is power associative. Indeed the proof given above that (Mn , ◦) is
power associative is verbatim applicable to such algebras. It turns out that there
are Jordan algebras that are not induced by any associative algebra. Such algebras are called exceptional Jordan algebras. To prove that any Jordan algebra
is power associative in general is a bit complicated and is omitted.
13