Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Matrix (mathematics) wikipedia , lookup
Gaussian elimination wikipedia , lookup
Matrix calculus wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Four-vector wikipedia , lookup
Perron–Frobenius theorem wikipedia , lookup
Matrix multiplication wikipedia , lookup
Exterior algebra wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 12 Instructor: Farid Alizadeh Scribe: Anton Riabov 12/03/2001 1 Overview In this lecture we continue to study the cone of squares, define inner product with respect to Euclidean Jordan algebra, prove convexity, self-duality, homogeneity, and symmetry of cone of squares. We also define direct sums and simple algebras. We state without proof a theorem that there are only 5 classes of simple Euclidean Jordan algebras, and describe these classes. Finally we briefly outline application of Jordan algebra theory to solving optimization problems over symmetric cones using interior point methods. 2 Cone of Squares Suppose (E, ◦) is an Euclidean Jordan algebra. In the previous lecture we have given the following definition: def Definition 1 KE = {x2 : x ∈ E} is a cone of squares of an associative algebra. √ KE is a cone, since ∀α ≥ 0 ⇒ αx2 = ( αx)2 ∈ KE . In this section we will show that cones of squares of Euclidean Jordan algebras are convex, self-dual and homogeneous. 2.1 Inner Product Definition 2 For any x, y ∈ E the inner product hx, yi with respect to Eudef clidean Jordan algebra (E, ◦) is defined as hx, yi = tr(x ◦ y). Note that this inner product is bilinear and hx, yi = hy, xi, so it conforms to definition of an inner product. 1 scribe:Anton Riabov Lecture 12 Date: 12/03/2001 Fact 1 Inner product is associative: hx ◦ y, zi = hx, y ◦ zi. The proof of this statement above is not straightforward. In fact, we do not have the required machinery to prove it, and will accept it as a fact. The following definition may be needed in future discussions. def Definition 3 τ(x, y) = Trace(L(x ◦ y)). Note that the associativity here holds as well: τ(x ◦ y, z) = τ(x, y ◦ z). Lemma 1 L(x) is a symmetric matrix with respect to h·, ·i. Proof: We need to show that hL(x)y, zi = hy, L(x)zi ⇔ hx ◦ y, zi = hy, x ◦ zi, which follows from commutativity of ◦ and associativity of h·, ·i. Example 1 (SOCP Algebra (Rn+1 , ◦)) hx, yi = tr(x ◦ y) = 2xT y, so h·, ·i corresponds to the definition of the usual inner product, and L is a symmetric matrix in the usual sense. Example 2 (Symmetric Matrices (Sn , ◦)) hX, Yi = Trace(X ◦ Y) = Trace(XY) = X • Y. 2.2 Convexity and Self-Duality of KE To prove that KE is convex we first note Proposition 1 If K is a cone, then K∗ is a convex cone. Proof: By definition, K∗ = {y : hx, yi ≥ 0, ∀x ∈ K}. For any y1 , y2 ∈ K∗ we have: hy1 , xi ≥ 0 hy2 , xi ≥ 0 ∀x, ∀x. Adding these two inequalities, obtain: hy1 + y2 , xi ≥ 0, ∀x. Corollary 1 Every self-dual cone is convex. Now we only need to prove self-duality of KE , and convexity will follow. Lemma 2 K∗E = {y : L(y) < 0}. 2 scribe:Anton Riabov Lecture 12 Date: 12/03/2001 Proof: y ∈ K∗E ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ hy, x2 i ≥ 0 ∀x hy, x ◦ xi ≥ 0 ∀x hy ◦ x, xi ≥ 0 ∀x hL(y)x, xi ≥ 0 ∀x, which means that L(y) < 0. Lemma 3 If c is an idempotent (i.e. c2 = c), then eigenvalues of c are 0,1 and L(c) < 0. Proof: Since c is an idempotent, we can write c2 − c = 0. Minimal polynomial t2 − t = 0 has two roots: {0, 1}. If {c1 , ..., cr } is Jordan frame, then c = 1c1 + 0c2 + ... + 0cr . So it has 1 eigenvalue equal to one, and r − 1 eigenvalues equal to zero. In the previous lecture we have derived the following equation: L(y2 ◦ z) + 2L(y)L(z)L(y) − 2L(y ◦ z)L(y) − L(y ◦ z)L(y) − L(y2 )L(z) = 0. Now, we will substitute y ← c, z ← c: L(c3 ) + 2L3 (c) − 2L(c2 )L(c) − L(c2 )L(c) = 0 ⇒ L(c)[2L2 (c) − 3L(c)|I] = 0, i.e. L(c) satisfies t(2t2 − 3t + 1) = 0, which is equivalent to t(2t − 1)(t − 1) = 0, and eigenvalues of L(c) are in the set {0, 1/2, 1}, i.e. all its eigenvalues are positive therefore L(c) < 0. P Corollary 2 If x = i λi ci and λi ≥ 0 ∀i, then L(x) < 0. P Proof: L(x) = λi L(ci ), and we know that L(ci ) < 0. Fact 2 Let x = λ1 c1 + ... + λr cr . Then eigenvalues of L(x) are eigenvalues of Qx are λi λj . λi +λj 2 , and Proof: We are accepting this statement as a fact, since in the general case we need to know more about Jordan algebras to be able to prove it. However we note that for the case of matrix algebra we have seen the proof in previous lectures. For SOCP case the proof is also easy. We refer the reader to recent survey paper by F. Alizadeh and D. Goldfarb for more details. Theorem 1 KE = K∗E . Proof: KE ⊆ K∗E . Choose an element x ∈ KE ⇒ x = P 2 First, we will show P that 2 λi ci . Then, L(x) = λi L(ci ), and L(ci ) < 0, and λ2i ≥ 0 ∀i. Therefore ∗ L(x) < 0, and thus x ∈ KE . 3 scribe:Anton Riabov Lecture 12 Date: 12/03/2001 Now we will show that K∗E ⊆ KE . Choose y ∈ K∗E . Then y = Hence, X hy, ci i = λj hcj , ci i = λi hci , ci i = λi tr(c2i ). P λj cj . The second equality above follows from the fact that hcj , ci i = 0 if i 6= j. Now we can obtain an expression for λi : λi = hy, ci i . tr(c2i ) We know that tr(c2i ) > 0, and we only need to show that hy, ci i ≥ 0. This will imply that all eigenvalues λi of y are nonnegative and y ∈ KE . Indeed, hy, ci i = hy, ci ◦ ci i = hL(y)ci , ci i ≥ 0, the last inequality follows from L(y) < 0, and this completes the proof. 2.3 Homogeneous Cones Definition 4 Cone K is homogeneous, if it is proper, and for all x, y ∈ Int K there exists a linear transformation T , such that T (x) = y and T (K) = K. Example 3 (Pn×n = K(Sn ,◦) ) Choose any X 0, Y 0. Using eigenvalue decomposition, write: X = QΛQT , Y = PΩPT . Transformation T is then defined as the following sequence of linear transformations: X Q−1 •Q−T −→ Λ Λ−1/2 •Λ−1/2 −→ I Ω1/2 •Ω1/2 −→ P•P T Ω −→ Y, where • is used to show multiplication from left and right. Each of these steps maps Pn×n onto itself because if A is nonsingular then AXAT < 0 if, and only if, X < 0. Thus, Pn×n is homogeneous. Example 4 (Second-Order Cones) The following set of 3 operations is sufficient to obtain the required transformation for any x and y in the interior of the second-order cone, and therefore second-order cones are homogeneous. x0 y x x1 If the points x and y happen to be on the same “circle”, as in the picture on the left, rotation transformation can be applied. It is easy to see that this transform has all the required properties. xn 4 scribe:Anton Riabov Lecture 12 Date: 12/03/2001 x0 y x x1 If the points x and y are on the same ray, we can apply dilation transformation, multiplying by y All the rex I. quired properties are satisfied. xn x0 y x x1 An operation called hyperbolic rotation can be constructed similarly to the usual rotation by replacing sin and cos by sinh and cosh. This operation can be constructed to “rotate” points along a hyperbola, which has cone boundaries as asymptotes. xn Any point in the interior of the second order cone can be transformed into another point in the interior by a combination of dilation and rotation along x0 axis and hyperbolic rotations as follows. Let x = λ1 c1 + λ2 c2 and y = ω1 d1 + ω2 d2 , where c1 , c2 is a Jordan frame and likewise, d1 , d2 is a Jordan frame. To transform x to y, we first rotate c1 to d1 ; this automatically maps c2 to d2 , because c1 ⊥c2 and d1 ⊥d2 . So, now we have x0 = λ1 d1 + λ2 d2 . Next, in the plane spanned by d1 , d2 the vector y has coordinates ω1 , ω2 with respect to the basis d1 , d2 and x0 has coordinates λ1 , λ2 . Applying the dilation ωλ11 λω22 I 2 2 maps x0 to the point x00 = λ001 d1 + λ002 d2 , where λ001 = ωλ1 ω and λ002 = ωλ1 ω . 2 1 00 Now, both y and x are on the same branch of the hyperbola a1 a2 = ω1 ω2 ; thus a hyperbolic rotation will map x00 to y. We are going to claim that cone of squares KE is homogeneous. But before we are able to prove this, we need the following theorem. Theorem 2 If x is invertible, then Qx (Int KE ) = Int KE . Proof: First note that the set of invertible elements is a disconnected set. For example, in the case of second-order cones there are 3 regions of invertible elements, separated by the borders of the cone, as it is illustrated in the figure. In the algebra of symmetric matrices all quarters of the eigenvalue space form a region of invertible elements. Intuitively this is explained by the fact that if for two symmetric matrices eigenvalues have different signs, then there exists a linear combination of these matrices having an eigenvalue 0. One of these connected regions is Int KE . If y ∈ Int KE , then Qx y is also invertible, since (Qx y)−1 = Qx−1 y−1 5 scribe:Anton Riabov Lecture 12 Date: 12/03/2001 1 3 2 Therefore Qx (Int KE ) can not cross any boundary lines, and is either (a) contained in Int KE entirely, i.e. Qx (Int KE ) ⊆ Int KE , or (b) does not have any common points with it, and Qx (Int KE ) ∩ Int KE = ∅. We know that Qx e = x2 ∈ KE . Thus (a) is true, and Qx (Int KE ) ⊆ Int KE for all invertible x. If y ∈ KE , then y−1 ∈ Int KE . Hence, y = Qy y−1 ∈ Qy Int KE , and the inverse inclusion holds: Int KE ⊆ Qx (Int KE ). Corollary 3 KE is a homogeneous cone. Proof: Suppose we are given y2 , x2 ∈ Int KE . The following linear transformation can be used to prove that KE is homogeneous: Q Qy −1 x x2 −→ e −→ y2 . Each of the steps transforms Int KE into itself, by Theorem 2. 2.4 Symmetric Cones, Direct Sums and Simple Algebras Definition 5 A cone is symmetric, if it is proper, self-dual, and homogeneous. Clearly, the cone of squares KE of any Euclidean Jordan algebra (E, ◦) is symmetric. It turns out that the converse is also true. Fact 3 If K is a symmetric cone, then it is the cone of squares of some Euclidean Jordan algebra. In fact, there are not many significantly different classes of symmetric cones. But before we can define these classes, we need to introduce direct sums. Definition 6 Let (E1 , ∗), (E2 , ) be Euclidean Jordan algebras. Then direct sum def of these algebras is (E1 , ∗) ⊕ (E2 , ) = (E1 × E2 , ◦), where for all x1 , x2 ∈ E1 and y1 , y2 ∈ E2 , x1 x2 def x1 ∗ x2 ◦ = . y1 y2 y1 y2 6 scribe:Anton Riabov Lecture 12 Date: 12/03/2001 Proposition 2 Let (E1 × E2 , ◦) be a direct sum of Euclidean Jordan algebras and let x ∈ E1 and y ∈ E2 . The following properties hold: x L(x) 0 1. L = L(x) ⊕ L(y) = y 0 L(y) x Qx 0 2. Q = Qx ⊕ Qy = y 0 Qy x 3. pE1 ⊕E2 = pE1 (x)pE2 (y), where p(·) is the corresponding charactery istic polynomial. x 4. trE1 ⊕E2 = trE1 x + trE2 y y x 5. det = det x det y y 2 x 6. = kxk2F/E1 + kyk2F/E2 y F/E1 ⊕E2 x 7. = kxk2/E1 + kyk2/E2 y 2/E1 ⊕E2 8. KE1 ⊕E2 = KE1 × KE2 9. rk(E1 ⊕ E2 ) = rk(E1 ) + rk(E2 ) Example 5 (Direct sums in SOCPs) min cT1 x1 + ... + cTr xr s.t. A1 x1 + ... + Ar xr = b xi <Q 0 1 ≤ i ≤ r The cone constraint in this SOCP restricts x to a direct sum of quadratic cones: x ∈ Q1 × Q2 × ... × Qr . Example 6 (Direct sums in LPs) The usual boring algebra of real numbers (R, ·) is an Euclidean Jordan algebra, where “·” stands for number multiplication. Then, the algebra underlying the linear programs is a direct sum of such algebras: (Rn , ∗) = (R, ·) ⊕ (R, ·) ⊕ ... ⊕ (R, ·). 7 scribe:Anton Riabov Lecture 12 Date: 12/03/2001 Multiplication operator “∗” is defined as follows: x1 y1 x1 y1 x2 y2 def x2 y2 ∗ = ... ... ... . xn yn xn yn Note that x1 0 L(x)y = 0 0 x2 .. 0 . 0 0 y = x ∗ y xn Since direct sums of Euclidean Jordan algebras are Euclidean Jordan algebras, the theory that we have developed covers any combinations of these algebras. LP variables can be combined with SOCP variables, and with SDP variables, and so on. It would be interesting to find out, what the “minimal” algebras with respect to the direct sum are. In a sense we want to find the “basis” of all possible Euclidean Jordan algebras. The following definition and a theorem (given without proof) answer these questions. Definition 7 An Euclidean Jordan algebra is simple, if it is not isomorphic to a direct sum of other Euclidean Jordan algebras. Theorem 3 There exist only 5 different classes of simple Euclidean Jordan algebras. In the remaining part of this subsection we will briefly describe these classes. 10 . SOCP Algebra (Rn+1 , ◦). This is the familiar algebra associated with SOCP where B = I. (If B is any symmetric positive definite matrix, the corresponding Jordan algebra will be Euclidean, and in fact isomorphic to the case where B = I.) 20 . Symmetric Matrices (Sn×n , ◦). Again, this is the familiar algebra of symmetric matrices, that we have discussed in previous lectures. 30 . Complex Hermitian Matrices (Hn×n , ◦). A matrix X of complex numbers is Hermitian (X ∈ Hn×n ), if X = X∗ . Operation (·)∗ denotes conjugate transpose, which is defined as following: if (X)lk = alk + iblk , then (X∗ )lk = akl − ibkl . For a matrix of complex numbers of size n × n, one can provide a matrix in real numbers of size 2n×2n, for which the algebra operations will carry through 8 scribe:Anton Riabov Lecture 12 Date: 12/03/2001 in exactly the same way. To achieve this, each element is replaced by a 2 × 2 matrix: a b a + ib → . −b a Consider an example: a c − di a 0 0 a c + di → b c −d d c c d −d c b 0 0 b (1) Therefore, it is easy to see that (Hn×n , ◦) is a subalgebra of S2n×2n . Even though Hn×n is a subalgebra of S2n×2n its rank is only r. Let u be a unit length complex vector. Then transforming it to a real matrix by (1) we map u to a n × 2 matrix, and uuT to a rank 2 real matrix. This rank 2 real matrix is not a primitive idempotent within S2n×2n , but it is primitive in Hn×n . 40 . Hermitian Quaternion Matrices. Quaternions are an extension of complex numbers. Each quaternion number is a sum a + bi + cj + dk, where a, b, c, d ∈ R and i, j, k are such that: i2 = j2 = k2 = −1 ij = k = −ji jk = i = −kj ki = j = −ik Analogous to how a complex number can be expressed as a pair of real numbers, a quaternion can be expressed as a pair of complex numbers: a + bi + cj + dk = (a + bi) + (c + di)j. Conjugate transpose X∗ of a quaternion matrix X is defined as following. If (X)pq = a + bi + cj + dk, then (X∗ )qp = a − bi − cj − dk. Hermitian quaternion matrices satisfy X = X∗ . In this algebra multiplication is defined as def X◦Y = XY + YX . 2 50 . Hermitian Matrices of Octonions of Size 3 × 3. Octonions are an extension of quaternions in the same way, as the quaternions are an extension of complex numbers. Introduce a number l, such that l2 = −1. Then an octonion can be written as p1 + lp2 , where p1 and p2 are quaternions. By definition, def (p1 + lp2 )(q1 + lq2 ) = (p1 q1 − q̄2 p2 ) + (q2 p1 + p2 q̄1 )l. 9 scribe:Anton Riabov Lecture 12 Date: 12/03/2001 The main difference between octonions and quaternions is that multiplication in octonions is not associative. Thus, if we build matrices out of octonions, the matrix multiplication will not be associative either. However, by an amazing coincidence, over the set of 3 × 3 octonion matrices that are Hermitian, the def multiplication X ◦ Y = (XY + YX)/2 is a Euclidean Jordan algebra. It is however a Jordan algebra which is not induced by an associative algebra, and in fact it can be shown that it is isomorphic to no Jordan algebra that is a subalgebra of a Jordan algebra induced by an associative algebra. Therefore, this algebra is often called the exceptional Jordan algebra or the Albert algebra, named after Adrian Albert who discovered it. It can be shown that this algebra has rank 3. The underlying vector space of this algebra is a 27-dimensional vector space (there are 3 real numbers on the diagonal, and three octonions on the off-diagonal; since octonions are 8-dimensional the set of such matrices yields a 27-dimensional algebra), 3 Symmetric Cone LP We will give a brief sketch of how this theory is applied for describing interior point methods. Suppose we are given a program: minhc, xi s.t. hai , xi = bi x <KE 0 Its dual is: max bT y s.t. X yi ai + z = c z <KE 0 Complementary slackness conditions: x <KE 0 z <KE 0 hx, zi = 0 ⇒ x◦z=0 As we discussed earlier, (− ln det x) is an appropriate barrier of the primal: min hc, xi − µ ln det x s.t. Ax = b Lagrangian L(x, y) = hc, xi − µ ln det x + yT (b − Ax). It can be shown that the gradient ∇x ln det(x) = x−T . Thus, the optimality conditions imply ∇x L = cT − µx−T + yT A = 0 ∇y L = b − Ax = 0 10 scribe:Anton Riabov Lecture 12 Date: 12/03/2001 def Let z = cT − µx−T . So, we have to solve Ax = b AT y + z = c The following equalities are equivalent: z − µx−1 = 0 x − µz−1 = 0 x ◦ z = µe If we replace x ← x + ∆x, y ← y + ∆y, z ← z + ∆z as before, A 0 0 ∆x rp 0 AT I ∆y = rd ∆z rc E 0 F And now we only need to define what are E and F: z − µx−1 = 0 x − µz−1 = 0 x◦z=µ → E = −µQx−1 , F = I → E = I, F = −µQz−1 → E = L(z), F = L(x)e These relations unify LP, SOCP, and SDP formulations of interior point methods. In fact, We can express by Jordan algebraic notation, any optimization problems with any combination of nonnegativity, second order, or semidefinite constraints. Analysis of interior point algorithms is also streamlined in Jordan algebraic formulation. 11