* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 9
Quadratic form wikipedia , lookup
Cartesian tensor wikipedia , lookup
Bra–ket notation wikipedia , lookup
Determinant wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Capelli's identity wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Fundamental theorem of algebra wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Universal enveloping algebra wikipedia , lookup
Oscillator representation wikipedia , lookup
Four-vector wikipedia , lookup
Heyting algebra wikipedia , lookup
Homological algebra wikipedia , lookup
Modular representation theory wikipedia , lookup
Matrix calculus wikipedia , lookup
Symmetry in quantum mechanics wikipedia , lookup
Invariant convex cone wikipedia , lookup
Laws of Form wikipedia , lookup
Perron–Frobenius theorem wikipedia , lookup
Complexification (Lie group) wikipedia , lookup
Geometric algebra wikipedia , lookup
Exterior algebra wikipedia , lookup
History of algebra wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Linear algebra wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Jordan normal form wikipedia , lookup
Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 9 Instructor: Farid Alizadeh Scribe: David Phillips 11/12/2001 1 Overview We complete last lecture’s discussion of the Newton method for LP, SDP, and SOCP. Recall that these problems shared the following structure when Newton’s method was applied to primal and dual feasibility and a relaxed form of complementary slackness condition that arose from applying first order conditions to the logarithmic barrier function: A4x AT 4y + 4z E4x + F4z = = = rp rd rc primal feasibility dual feasibility complementary slackness conditions We complete the derivation for the E and F matrices in LP, SDP, and SOCP, in order motivate the unifying framework of Jordan algebras. Finally, we introduce Jordan algebras. 2 2.1 Newton Method for LP, SDP, and SOCP Linear Programming We include the results for LP here for review. The specifics of the derivation is in the previous lecture. Ax AT y + z = = b c primal feasibility dual feasibility 1 scribe: David Phillips Lecture 9 Date: 11/12/2001 and any one of mathematically equivalent forms of relaxed complementary slackness conditions: (1) xi − µz−1 i (2) zi − µx−1 i (3) xi zi = = = 0 0 µ i = 1, . . . , n i = 1, . . . , n i = 1, . . . , n In LP the matrices E and F have the forms: (1) E = I, F = µ Diag(z−2 ) (2) E = µ Diag(x−2 ), F = I (3) E = Diag(z), F =Diag(x) where, for vector v ∈ Rn , −2 1/v1 1/v−2 2 v−2 = . . .. 1/v−2 n 2.2 Semidefinite Programming Letting vecT (A1 ) .. A= , . vecT (Am ) the specific relaxed form for SDP becomes: Avec(4X) = rp primal feasibility AT 4y + vec(4Z) = rd dual feasibility XZ = µI complementary slackness conditions The relaxed complementary slackness conditions has many mathematically equivalent conditions, three of which are: (1) X − µZ−1 (2) Z − µX−1 (3) XZ + ZX = = = 0 0 2µI (recall Lemma 1 of lecture 8) The situation here is quite analogous to LP. Consider replacing Z with Z + 4Z in (1): −1 Z + 4Z − µ X + 4X = 0. Since X is in the interior of the feasible region, it is positive definite, implying that X−1 exists, so this equation can be rewritten as: −1 Z + 4Z − µ X I + X−1 4X = 0 2 scribe: David Phillips Lecture 9 Date: 11/12/2001 But X(I + X−1 4X) is not necessarily symmetric - so instead we will use the following: −1 Z + 4Z − µ X1/2 I + X−1/2 4XX−1/2 X1/2 = 0 Z + 4Z − µX−1/2 (I + X−1/2 4XX−1/2 )−1 X−1/2 = 0 Now, recall that for z ∈ R, if |z| < 1, then the following identity holds: 1 = 1 − z + z2 − z3 + . . . 1+z For matrices, the analogous identity is, for square matrix Z, if the absolute value of all eigenvalues are all less than 1, then (1 + Z)−1 = I − Z + Z2 − Z3 + · · · Using this identity, we obtain, Z + 4Z − µX−1/2 (I − X−1/2 4XX−1/2 + (X−1/2 4XX−1/2 )2 − · · · )X−1/2 = 0 Applying Newton’s method means dropping all non linear terms in ∆s: Z + 4Z − µX−1/2 (I − X−1/2 4XX−1/2 )X−1/2 = 0 Z + 4Z − µX−1 + µX−1 4XX−1 = 0 It is clear that F = I; it will, however, be easier to represent E as E(X), where E(X) : 4X−→µ X−1 4XX−1 . Since E is a linear transformation, it is possible to represent it as some matrix dependent on X, and write E(X)vec(4X). Before continuing, it will simplify the notation to introduce Kronecker products. 2.2.1 Kronecker Products Let A and B be arbitrary matrices of dimensions m × n and p × q respectively. Then, let ⊗ be a binary operator such that A⊗B=C where C is a mp × nq block matrix of form: a11 B a12 B . . . .. C= . am1 B ... 3 a1n B amn B scribe: David Phillips Lecture 9 Date: 11/12/2001 where the ij-th block entry of the product is the p × q matrix B scaled by the ij-th entry of the A matrix (i.e., aij ), with i = 1, . . . , m and j = 1, . . . , n. There are a number of properties that make it easy to algebraically manipulate expressions involving Kronecker products. The fundamental property is stated in the following Lemma 1 Let A = (aij ), B = (bij ), C = (cij ), and D = (dij ) be matrices with consistent dimensions so that the products AB and CD are well-defined. Then (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD) (1) Proof: To prove (1) we recall the property of matrix multiplication which states that if X = (Xij ) and Y = (Yij ) are two matrices that are partitioned into blocks, so that Xij and Yij are also matrices, then P Z = XY can be written as a block partitioned matrix Z = Zij where Zij = k Xik Ykj . The only requirement here is that the partitionings of X and Y should be in such a way that all products Xik Ykj for all values of i, j and k are well-defined. Now Let us look P at the ij block of (AB) ⊗ (CD). By definition this block equals (AB)ij (CD) = k aik bkj (CD). Now the ij block of (A ⊗ C)(B ⊗ D) equals X X X (A⊗C)(B⊗D) ij = (A⊗C)ik (B⊗D)kj = aik C bkj D = aik bkj CD, k k k which is equal to the ij block of (AB) ⊗ (CD). The Kronecker product has a number of properties all of which can be derived easily from (1). For example (A ⊗ B)−1 = A−1 ⊗ B−1 . (2) This is proved by observing that A⊗B A−1 ⊗B−1 = AA−1 ⊗ BB−1 = I⊗I. Now it is obvious from definition that I ⊗ I = I. Eigenvalues and eigenvectors of Kronecker products can also be determined easily. Let Ax = λx and By = ωy. Then Kronecker multiplying the two sides of these equations we get (Ax) ⊗ (By) = (λx) ⊗ (ωy). Using (1) we get (A ⊗ B)(x ⊗ y) = (λω)(x ⊗ y). Since for vectors x and y, x ⊗ y is also a vector, we have proved that Lemma 2 If λ is an eigenvalue of A with corresponding eigenvector x, and ω is an eigenvalue of B with corresponding eigenvector y, then λω is an eigenvalue of A ⊗ B with corresponding eigenvector x ⊗ y. Other properties of Kronecker products are (A ⊗ B)T = AT ⊗ BT vec(ABC) = CT ⊗ A vec(B) Exercise: Prove the above properties. 4 scribe: David Phillips Lecture 9 2.2.2 Date: 11/12/2001 Back to SDP Returning to the SDP relaxation, we obtain: (1) E = I, F = −µE(Z) = −µ(Z−1 ⊗ Z−1 ) (2) E = −µE(X) = −µ(X−1 ⊗ X−1 ), F = I and for (3): (X + 4X)(Z + 4Z) + (Z + 4Z)(X + 4X) = 2µI As usual, dropping nonlinear terms in ∆s we get X4Z + 4ZX + 4XZ + Z4X = 2µI − XZ − ZX which in Kronecker product notation can be written as (I ⊗ X + X ⊗ I)vec(4Z) + (I ⊗ Z + Z ⊗ I)vec(4X) = rc Thus, E = I ⊗ Z + Z ⊗ I, and F = I ⊗ X + X ⊗ I. 2.3 Second-order cone programming For simplicity of notation, we will consider the single-block problem. The relaxed form for this is: A(4x) AT 4y + 4z = = primal feasibility dual feasibility rp rd and the relaxed complementarity relations (as developed in our last lecture) were: x0 z0 x1 −z1 (1) . = ρ . = ρRz .. .. xn −zn or in a mathematically equivalent form z0 x0 z1 −x1 (2) . = γ . = γRx .. .. zn −xn where ρ = 2µ 2 z2 0 −||z̄|| and γ = 2µ 2 x2 0 −||x̄|| in (1) and 1 0 ... 0 0 −1 . . . 0 R = . . .. 0 0 . . . −1 5 scribe: David Phillips Lecture 9 Date: 11/12/2001 Since x = ρRz, multiplying by zT we get, zT x = ρzT Rz = ρ z20 − n X i=1 z2i = 2µ. By the comments at the end of section 2.3 of the last lecture, these conditions also imply that: x0 zi + z0 xi = 0, i = 1, . . . , n Thus, we get (3) xT z = 2µ x0 zi + xi z0 = 0 or more succinctly, (3) Arw (x)z = Arw (x) Arw (z)e = 2µe where e = (1, 0, . . . , 0)T , and as we x0 x1 Arw (x) = x2 .. . xn have already defined x1 x2 . . . x n x0 0 . . . 0 0 x0 . . . 0 .. . 0 ... x0 But then, for (3), replacing x and z with x + 4x and z + 4z, (i) or, (ii) (x0 + 4x0 )(zi + 4zi ) + (z0 + 4z0 )(xi + 4xi ) = 0 x0 4zi + zi 4x0 + xi 4z0 + z0 4xi = 2µ − xT z x0 4z0 + z0 4x0 + . . . + xn 4zn + zn 4xn ) = 2µ − xT z i = 1, . . . , n i = 1, . . . , n But (i) and (ii) are equivalent to: Arw (x)4z + Arw (z)4x = rc So, for (3) E = Arw (x), and F = Arw (z). −2 Exercise: Show that for case (1), E = I and F = −µ Arw (x) and for case −2 (2), E = −µ Arw(z) , and F = I 2.4 Summary The following two tables have the different versions of the relaxed complementary problems and the corresponding E and F matrix. 6 scribe: David Phillips Lecture 9 Date: 11/12/2001 LP SDP (1) xi − µz−1 =0 i X − µZ−1 = 0 (2) (3) µx−1 i zi − =0 xi zi = µ −1 Z − µX = 0 XZ + ZX = 2µI SOCP x− 2µ 2 Rz z2 0 −||z̄|| 2µ 2 Rx x2 0 −||x̄|| =0 =0 z− Arw (x) Arw (z) = 2µe Then the matrix forms for E and F are: LP SDP SOCP 3 (1) (2) (3) (1) (2) (3) (1) (2) (3) E and F Matrix Forms E F −2 I −µ Diag z −µ Diag x−2 I Diag(z) Diag(x) I −µ(Z−1 ⊗ Z−1 ) −1 −1 −µ(X ⊗ X ) I I⊗Z+Z⊗I I⊗X+X⊗I −2 I −µ Arw(z) −2 −µ Arw(x) I Arw(z) Arw(x) Euclidean Jordan Algebras To unify the presentation of interior point algorithms for LP, SDP and SOCP, it is convenient to introduce an algebraic structure that provides us with tools for analyzing these three cases (and several more). This algebraic structure is called Euclidean Jordan algebra. We first introduce Jordan algebras. (Jordan refers to Pascal Jordan, the 20th century German physicist who, along with Max Born and Verner Heisenberg was responsible for the so-called matrix interpretation of quantum mechanics. It does not refer to another famous Jordan, that is Camille Jordan, the 19th century French mathematician of Jordan blocks and the Jordan closed curve theorem fame.) First we have to make a comment on terminology. The word algebra is used in two different contexts. The first context is the name a branch of mathematics that deals with algebraic manipulations at the elementary level, or the study of groups, rings, fields, vector spaces etc. at higher level. The second meaning of the word algebra is a mathematical term that is a particular algebraic structure. Here are the definitions 7 scribe: David Phillips Lecture 9 3.1 Date: 11/12/2001 Definitions For the purposes of the definitions, let V be some vector space defined over the field of real numbers, that is V = Rn for some integer n. Although most of what we will say is also valid for an arbitrary field F, (with scalar multiplication and normal addition), this level of generality is not needed in our course and therefore will not be treated. Definition 1 (Algebra) An algebra, (V, ∗), is a vector space, V, with an operation, ∗, such that for any x, y, z ∈ V, α, β ∈ R, x ∗ y ∈ V (closure property), x ∗ (αy + βz) = α(x ∗ y) + β(x ∗ z) (αy + βz) ∗ x = α(y ∗ x) + β(z ∗ x) (distributive law), Notice that the distributive law implies that the binary operation x∗y is a bilinear function of x and y. In other words, there are n×n matrices, Q1 , Q2 , . . . , Qn such that T x Q1 y xT Q2 y x ∗ y = z = . (bilinearity) . . xT Qn y Thus, it is immediate that determining the matrices Q1 , . . . , Qn determines the multiplication, which in turn, determines the algebra. Further, given x ∈ V, for all y ∈ V, x ∗ y = L(x)y, where L(x) is some matrix linearly dependent on x. Thus, L() also determines the algebra. Definition 2 (Identity Element) For a given algebra, (V, ∗), if there exists an element e ∈ V such that for all x ∈ V e∗x=x∗e=x , then e is the identity element for (V, ∗). Exercise: Prove that if (V, ∗) has an identity element, e, then e is unique. Definition 3 (Associative Algebra) An algebra, (V, ∗), is an associative algebra if for all x, y, z ∈ V, (x ∗ y) ∗ z = x ∗ (y ∗ z) Example 1 (Matrices under matrix multiplication) Let Mn be the set of all square matrices of dimension n. Then ordinary matrix multiplication, (denoted as ·) forms an associative (but not commutative) algebra. Note that, for X ∈ Mn , L(X) = I ⊗ X. 8 scribe: David Phillips Lecture 9 Date: 11/12/2001 Example 2 (A commutative, but not associative, algebra) Consider (Mn , ◦), . This is an algebra which is commutative where for A, B ∈ Mn , A ◦ B = AB+BA 2 but not associative. For X ∈ Mn , L(X) = I⊗X+X⊗I . 2 Consider now, an associative algebra, (V, ∗), with a matrix L() determining ∗. We have that: (x ∗ y) ∗ z = L(x ∗ y)z = x ∗ (y ∗ z) = L(x)(L(y)z) Since this is true for all z it follows that for all x, y ∈ V L(x ∗ y) = L(x)L(y). Thus, (V, ∗) is isomorphic to some subalgebra of matrices under matrix multiplication. Let us define these terms. Definition 4 Let (V, ∗) and (U, ?) be two algebras. Let h : V → U be a function such that f(V) is a subspace of U, and f(u ∗ v) = f(u) ? f(v). Then f is a homomorphism from (V, ∗) to (U, ?). If f is one-to-one and onto then f is called an isomorphism. In that case, the algebras (V, ∗) and (U, ?) are essentially the same thing. Definition 5 (Subalgebra) Given an algebra, (V, ∗), and a subspace U of V, (U, ∗) is a subalgebra, if and only if U is closed under ∗. More generally, we say that U is a subalgebra of V if it is isomorphic to a subalgebra of V. Thus we have shown that all associative algebras are essentially homomorphic to some some algebra of square matrices. Let us see some examples of subalgebras and isomorphisms within the algebra of matrices. Example 3 (Some subalgebras of (Mn , ·)) 1. All n × n diagonal matrices form a subalgebra of (Mn , ·). 2. All n × n upper triangular matrices form subalgebras of (Mn , ·), so do all block upper triangular matrices. All n × n lower triangular matrices also form a subalgebra of Mn as do block lower triangular matrices. Also notice that under the function f(X) = XT , the algebra of upper triangular matrices is isomorphic to the algebra of lower triangular matrices. 3. All matrices of the form A 0 0 0 where A ∈ Mk for k < n form a subalgebra of Mn . Also notice that this subalgebra is isomorphic to (Mk , ·), so it is accurate to say that Mk is a subalgebra of Mn for all k ≤ n. 4. Someone in class mentioned the set of orthogonal matrices, that is matrices Q such that QQT = I. This is not a subalgebra. Although it is indeed closed under matrix multiplication, the set of orthogonal matrices is not a subspace of Mn : for two orthogonal matrices, Q1 and Q2 , in general, Q1 + Q2 is not orthogonal, nor is αQi for real numbers α 6= 1. 9 scribe: David Phillips Lecture 9 Date: 11/12/2001 Definition 6 The subalgebra generated by x, V(x), is the set–intersection–wise smallest subalgebra of V that contains x. More generally if S ⊂ V is any subset, then V(S) is the smallest subalgebra of V that contains S. V(x) can be thought of being created as follows: start with V(x) containing only x. Then add αx for all α ∈ R. Then multiply ∗-wise everything we already have with everything and add the products to V(x). Then form all possible linear combinations and add the results to V(x). Continue this process of multiplying and forming linear combinations until no new element is created. You are left with a smallest subalgebra of V that contains x. Definition 7 (Power Associative Algebra) For an algebra, (V, ∗), if, ∀x ∈ V, the subalgebra V(x) is associative, then (V, ∗) is called a power associative algebra. To show that an algebra is power associative it suffices to show that in the product x ∗ x ∗ · · · ∗ x the order in which multiplications are carried out does not matter. In that case it is easily seen that k X V(x) := v ∈ V : v = αi xi , αi ∈ R, k an integer ork = ∞ i=0 that is elements of V(x) are polynomials or infinite power series of an element x. In the case of a power-associative algebra V, for each element x ∈ V we may also define k X V[x] := v ∈ V : v = αi xi , αi ∈ R, k an integer i=0 that is V[x] is the subalgebra of V(x) that is made of polynomials in x. Remark 1 Later we shall see that V[x] and V(x) are actually equal. Remark 2 (Mn , ◦) is power associative. Proof: For p a nonnegative integer and X ∈ Mn , let X◦p = X ◦ X◦(p−1) , with X◦0 = I. We first show that X◦p = Xp (ordinary matrix multiplication, which is associative), which we can do by induction. For the base case, note that: X◦0 = X0 = I. 10 scribe: David Phillips Lecture 9 Date: 11/12/2001 Now, assume that X◦k = Xk . Then, X◦(k+1) = = = = X ◦ X◦k X ◦ Xk XXk + Xk X 2 Xk+1 Now, for arbitrary integers q, r, s, and real numbers αi , βj , γk (i = 0, . . . , q, j = 0, . . . , r, k = 0, . . . , s), let U = q X α i Xi i=0 V = r X βi Xi i=0 W = s X γi Xi i=0 U ◦ V = U · V, and thus U ◦ (V ◦ W) = (U ◦ V) ◦ W Example 4 (The algebra associated with SOCP) Let V = Rn+1 and ◦ : V × V −→ V such that, for x, y ∈ V: xT By x0 y0 x1 y1 x1 y0 + y1 x0 x2 y0 + y2 x0 .. ◦ .. = . . .. . xn yn xn y0 + yn x0 where B is a nonsingular symmetric matrix. Note that: 1. The right-hand side is bilinear and thus (Rn+1 , ◦) is indeed an algebra satisfying the distributive law. 2. The multiplication ◦ is in general non-associative but it is commutative, since B is symmetric. 3. That ◦ is power associative will be shown later in the next lecture. 4. When B = I then the complementary slackness theorem for SOCP can be expressed succinctly as x ◦ z = 0. 11 scribe: David Phillips Lecture 9 Date: 11/12/2001 5. For B = I, the identity element is e = (1, 0, . . . , 0)T . More generally if B is such that Be = e, then e is the identity element. Definition 8 (Jordan Algebra (not necessarily Euclidean)) Let (J, ) be an algebra. (J, ) is a Jordan algebra if and only if x y = y x, 2 ∀x, y ∈ J (3) 2 x (x y) = x (x y), ∀x, y ∈ J (4) where x2 = x x. Thus, a Jordan algebra is a commutative algebra which has a property similar to, but weaker than, associativity. Remark 3 Let xy = L(x)y. Then the property (4) implies that L(x)L(x2 )y = L(x2 )L(x)y for all y. In other words, (4) is equivalent to L(x)L(x2 ) = L(x2 )L(x) (5) that is L(x) and L(x2 ) commute. Remark 4 (Mn , ◦) is a Jordan Algebra. Proof: Note that (3) is trivial by commutativity of matrix addition. To see (4), note that, for X, Y ∈ Mn , 2 X ◦ (X◦2 ◦ Y) = X( X 3 = = = = (X Y+YX2 ) 2 2 + (X 2 Y+YX2 )X 2 Y+XYX2 +X2 YX+YX3 ) 2 2 3 2 2 +YX3 ( X Y+X YX+XYX ) 2 2 X2 ( XY+YX ) + ( XY+YX )X2 2 2 2 X◦2 ◦ (X ◦ Y) Remark 5 If (V, ∗) is an associative algebra, then it induces a Jordan algebra (V, ◦) where x ◦ y = x∗y+y∗x . The proof given to show that (Mn , ◦) is a Jordan 2 algebra, is also valid for (V, ◦). Lemma 3 Jordan algebras are power associative. 12 scribe: David Phillips Lecture 9 Date: 11/12/2001 When a Jordan algebra is induced by an associative algebra, then it is easy to show that it is power associative. Indeed the proof given above that (Mn , ◦) is power associative is verbatim applicable to such algebras. It turns out that there are Jordan algebras that are not induced by any associative algebra. Such algebras are called exceptional Jordan algebras. To prove that any Jordan algebra is power associative in general is a bit complicated and is omitted. 13