* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Semidefinite and Second Order Cone Programming Seminar Fall 2012 Lecture 10
Eigenvalues and eigenvectors wikipedia , lookup
Quartic function wikipedia , lookup
Jordan normal form wikipedia , lookup
System of linear equations wikipedia , lookup
Capelli's identity wikipedia , lookup
Bra–ket notation wikipedia , lookup
Exterior algebra wikipedia , lookup
Laws of Form wikipedia , lookup
History of algebra wikipedia , lookup
Gröbner basis wikipedia , lookup
Basis (linear algebra) wikipedia , lookup
Oscillator representation wikipedia , lookup
Clifford algebra wikipedia , lookup
System of polynomial equations wikipedia , lookup
Linear algebra wikipedia , lookup
Polynomial ring wikipedia , lookup
Eisenstein's criterion wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Invariant convex cone wikipedia , lookup
Polynomial greatest common divisor wikipedia , lookup
Factorization wikipedia , lookup
Fundamental theorem of algebra wikipedia , lookup
Factorization of polynomials over finite fields wikipedia , lookup
Semidefinite and Second Order Cone Programming Seminar Fall 2012 Lecture 10 Instructor: Farid Alizadeh Scribe: Joonhee Lee 11/19/2012 1 Overview In this lecture, we show that a number of sets and functions related to the notion of sum-of-squares (SOS) are SD-representable. We will start with positive polynomials. Then, we introduce a general algebraic framework in which the notion of sum-of-squares can be formulated in very general setting. 2 Polynomials Recall the cone of nonnegative univariate polynomials: P2d [t] = p(t) = p0 + p1 t + p2 t2 + · · · + p2d t2d > 0 ∀t ∈ R Earlier we have examined this cone and have shown that it is SD-representable. We now consider the case of multivariate polynomials. The set of nonnegative polynomials is Pn,d [t1 , . . . , td ] = P(t1 , . . . , tn ) ≥ 0 ∀t ∈ R Recall that in the case of univariate polynomials, a polynomial p(t) > 0 for all t ∈ R if and only if there are two polynomials p1 (t), p2 (t) such that p(t) = p21 (t) + p22 (t). In other words, a univariate polynomial is nonnegative over the real line if and only if it is a sum of squares. For multivariate polynomials this is no longer true. Example 1 (Motzkin Polynomial) consider the following polynomial: p(x, y) = x 4 y 2 + x 2 y 4 − 3x 2 y 2 + 1 1 scribe: Joonhee Lee Lecture 10 Date: 11/19/2012 This polynomial is nonnegative for all x, y ∈ R because p x4 y2 + x2 y4 + 1 ≥ 3 x 6y 6 3 by the arithmetic-geometric inequality. On the other hand it cannot be a sum of twoPsquare polynomials. If there were polynomials p1 (x, y), . . . , pn (x, y) such that i p2i (x, y) = p(x, y), then we must have that each pi (x, y) is of the form x2 ai y2 + bi y + ci + x(di y2 + ei y + fi ) + gi y2 + hi y + ki P 2 P 2 P 2 P 2 P 2 hi = 0, and thus, gi = fi = ci = But it is immediate that ai = ai = ci = fi = gi = hi = 0 for all i = 1, . . . , n, otherwise the sum of squares will contains terms which do not appear in p(x, y). This leaves us with X 2 p(x, y) = bi x2 y + di xy2 + ei xy + ki i P 2 i ei = −3 which is impossible. This completes the proof But this implies that that the Motzkin polynomial is not sum of squares of other polynomials. Define def Σn,d [t1 , . . . , td ] = p(t1 , . . . , tn ) | p(t1 , . . . , tn ) = N X p2i (t1 , . . . , td )} i It is clear that Σn,2d ⊆ Pn,2d , since any sum-of-square polynomial is nonnegative. The Motzkin example shows that the inclusion is proper. When can we guarantee P= Σ? It turns out that the following are the only possible cases where P = Σ: 1. Case 1:n = 1. This is the case of univariate polynomials which we have proved earlier 2. Case 2: d=2. All polynomials of degree two can be written in the form of t> At+b> t+c, with A 0 (A is nonsingular). To see this, first suppose d is a vector satisfying d = B−T b, and let A = B> B Then > > > t> B> Bt + b> t + c = Bt + d2 t Bt + d2 t which is a sum-of-squares. 3. Case 3: d = 4, n = 3. A very special case is three-variable polynomials of degree four, where nonnegative polynomials are always sum-of-squares. The proof is somewhat hairy. Hilbert’s seventeenth problem states that all nonnegative polynomials are sumof-squares of rational functions. In other words for each p(t) = p(t1 , . . . , tn ) > 0 for all ti > 0, there are polynomials p1 (t), . . . , pN (t), q(t) such that q2 (t)p(t) = p21 (t) + · · · + p2N (t) 2 scribe: Joonhee Lee Lecture 10 Date: 11/19/2012 Hilbert’s seventeenth problem was proved by E. Artin in 1923, and in the process laid out the foundation of a field of algebra known as real algebraic geometry. Assuming that the greatest common denominator of q(t) and pi (t) is 1, there is still no satisfactory bound on the number N, the number of squares, and D, the largest degree among q(t) and pi (t). In fact, bounds that are known for N and D are not only depend on n, the number of variable, and d the degree of p(t), but also the coefficients of p as well. 3 General Algebra Consider (A, B, ) where is a bilinear operator that is : A × A → B. A and B are finite-dimensional real linear spaces with dim A = m and dim(B) = n. Note that bilinearity assumption is equivalent to the distributive law: • a (αb + βc) = αa b + βa c • (αb + βc) a = αb a + βc a Note also that bilinearity means that there are matrices Qi such that (ab)i = aT Qi b. Indeed, to each element a ∈ A we can associate a linear transformation La mapping A → B, that is La b = a b. The linear transformation La may be represented by a n × m matrix (also written as La, whose entries are linear forms in ai . Our object of interest is the following set: X Σ = ai ai | ai ∈ A ⊆ B. which we call the sum-of-squares cone (or the SOS cone) associated with the algebra. Note that Σ is a convex cone, since adding to sums of squares creates another sum of squares. Example 2 Suppose A = Pd [t], the set of degree d univariate polynomials, and B = P2d [t], the set of degree 2d univariate polynomials. Then the (Pd [t], P2d [t], ∗) forms an algebra, with ∗ indicating the multiplication of polynomials. If we represent each polynomial by the vector of its coefficients, then ∗ is the convolution operation: (p0 , p1 , . . . , pd )∗(q0 , q1 , . . . , qd ) = (p0 q0 , p0 q1+p1 q0 , . . . , p0 qk +p1 qk−1 +· · · , . . . , pd qd ) For this algebra Σ∗ is the set of polynomials of degree 2d which are sum of squares of polynomials. As we know, in this case this cone is exactly the cone of polynomials of degree 2d which are nonnegative for every t ∈ R. We make three assumptions on the algebra (A, B, ) without loss of generality, which will make the presentation cleaner and more streamlined. 3 scribe: Joonhee Lee Lecture 10 Date: 11/19/2012 1. is commutative: a b = b a. If it is not commutative, then we can replace it with its anti-commutator : ab = ab+ba 2 Note that for the algebras (A, B, ) and (A, B, ) we have Σ = Σ . 2. B = span(AA) This assumption ensures that B does not contain elements that are not somehow generated by elements from A, and in turn results in Σ to be full dimensional in B. 3. The mapping L : A → Rn×m is injective. Lx1 = Lx2 ⇒ x1 = x2 . Without the this assumption, there is no way to distinguish x1 and x2 . In this case we observe that the relation Lx1 = Lx2 defines an equivalence relation on A: x1 ' x2 ⇔ Lx1 = Lx2 Then, by replacing A, with A/ ', the set of equivalence classes, and defining on A/ ': def [x] y] = [x y] Using commutativity it is easy to see that this definition is consistent, that is, if x1 ' x2 and y1 ' y2 then x1 y1 ' x2 y2 . Therefore, (A/ ', B, ) satisfies the third assumption. Lemma 1 With assumptions 1, 2, and 3, 1. Σ is full dimensional convex cone. 2. Every element in Σ is sum of at most n = dim(B) squares. Proof: 1) Since the sum of two sums of squares is a another sum of square, Σ is convex. To prove it is full-dimensional we claim that B = Σ −Σ . First note that for any 2 2 − ( a−b On the a, b ∈ A, a b = ( a+b 2 ) 2 ) ; this follows from commutativity. P other hand by assumption 2, every element in B is of the form i ai bi . This shows that B = Σ − Σ , and thus, Σ is full dimensional. 2) By Caratheodory’s Theorem for cones, every element of Σ is sum of at most n extreme rays. But the extreme rays of Σ are among perfect squares a a, so each element of Σ is sum of at most n squares. (A Trivial Example) A = B = C : Complex number under ordinary multiplication. Then, Σ = C 4 scribe: Joonhee Lee Lecture 10 Date: 11/19/2012 Lemma 2 If (A, B, ) is formally real1 ,then Σ is pointed. Conversely, if Σ is pointed and there are no nilpotent2 elements, then (A, B, ) is formally real. Proof: If − X a2 i ∈ Σ , then i ∃ bi : X b2 i =− X i ⇒ a2 i ⇒ i X b2 i + X i a2 i =0 i ai = 0 and bi = 0 (by formally real) X 2 a2 i = 0 ⇒ ai = 0 ⇒ ai = 0 (by nilpotent) i The dual of Σ is Σ∗ = { z | ha, zi ≥ 0, ∀a ∈ Σ }. Then, Theorem 3 Σ is a proper cone iff Σ∗ is. Define 1 Λ and Λ∗ operators (a, b, ), Σ , Σ∗ Λ : B → SA where SA is a set of symmetric bilinear forms (Λ (w), a, b) , hw, a biB aT Λb = bT Λa, (a, b) ∈ A, w ∈ B Theorem 4 w ∈ Σ∗ iff Λ (w) 0. Proof: ⇒: Λ (w) 0 Λ (w)(a, a) = hw, a ai ≥ 0 (∵ w ∈ Σ∗ and a a ∈ Σ ) ⇒ Λ(w) ≥ 0 Next, we need to show (⇐) Λ(w) ≥ 0 ⇒ ⇒ ⇒ P 2 i ai = 2 a2 = 0 i 1 ∀a ∈ A such that Λ (w)(a, a) ≥ 0 hw, xi ≥ 0∀x ∈ Σ w ∈ Σ∗ 0 ⇒ ai = 0 5 scribe: Joonhee Lee Lecture 10 Date: 11/19/2012 Note that, if Λ∗ : S → B, then hx , Λ(w)iSA = hx , Λ∗ (x)iB ∀w ∈ B and x ∈ SA Theorem 5 u ∈ Σ iff ∃Y 0 such that u = Λ∗ (Y ). Proof: Show (⇒) if Y 0 and Λ∗ (mathitY) = u, then ∀v ∈ Σ∗ , hu, viB = hΛ∗ (Y , viB = hY , ΛviSA ≥ 0 ⇒ (∵ Y ≥ 0 and Λ(v) ≥ 0) u ∈ Σ∗∗ = Σ (⇐) if u ∈ Σ , then ∃ai such that u = X a2 i . Let v ∈ B i hu, viB X X h a2 ha2 i , viB = i , viB = i X = (Let Y = P = ⇒ i i Λ(v)(ai , ai ) = i ai aT i) X hΛ(v), ai aTi iSA , i ∗ hΛ(v), Y iSA = hv, Λ (Y iB u = Λ∗ (Y ) Example 3 For the algebra of univariate polynomials, the Λ operator can be computed as follows. If we represent Λ(w) as a matrix, then its i,j entry, by definition, is given by hw, ei ∗ ej i. But, ei ∗ ej = ei+j , thus Λ(w) ij = wi+j . This means that the i, j entry of Λ in this case only depends on i + j, that is, all entries of Λ(w) with the same i + j must be equal. It follows that Λ has all entries on the reverse diagonals are equal. w0 w1 ··· wn w1 w2 · · · wn+1 Λ(w) = . . .. .. .. .. . . wn wn+1 · · · w2n 6 scribe: Joonhee Lee Lecture 10 3.1 Date: 11/19/2012 Squared functional systems Of particular interest are algebras that are induced by functional spaces. Let , fm } where each fi : ∆ → R, is a real-valued function. Let F = {f1 , f2 , · · · P F = span(F) = i αi fi (x), ∀x ∈ ∆ be the linear space spanned by fi , where (αi fi + αj fj )(x) = αi fi (x) + αj fj (x). Now define S = {fi , fj } and (fi , fj )(x) = fi (x)fj (x) P Let S = span(I) (F, S, )˙ is an algebra and ΣF = { i g2i : gi ∈ F}. The algebra (F, S, ·) along with its SOS cone ΣF is called a squared function of system. The univariate polynomial example given earlier is a special case where F = {1, t, t2 , . . . , td }, and S = {1, t, t2 , . . . , t2d }. 3.2 The semidefinite and second order cones as SOS cones For two p × q matrices A and B define the Cracovian multiplication as follows: A B = ABT and AB = AB+BA 2 Then, for the algebra (Rp×q , Rp×p , ), the SOS cone Σ is exactly the cone of positive semidefinite p × p matrices. 4 4.1 Operations on algebras and their SOS cones Bijective linear transformations Let (A, B, ) be an algebra which as usual satisfies assumptions 1,2, and 3, and let C be another linear space, such that the linear transformation F | B → C is bijective. (Note that this means that necessarily dim(B) = n = dim(C)). Define a new binary operation ◦ | A × A → C by L◦ = FL . Then (A, C, ◦) is an algebra, and if satisfying assumptions 1,2, and 3. Furthermore, Σ◦ = FΣ . Definition 6 Two cones K1 and K2 are linearly isomorphic if ∃ bijective linear transformation F : K1 = F K2 . Thus, if Σ1 is an SOS cone, and Σ2 is a cone linearly isomorphic to Σ1 then Σ2 is also an SOS cone for some algebra. 4.2 Isomorphism and linear isomorphism among algebras Let (A1 , B1 , ) and (A2 , B2 , ) be two algebras, and let there be two linear transformation F and G such that F : A1 → A2 G : B1 → B2 If both F and G are bijective, and we have G(a 1 b) = F(a) 2 F(b) 7 scribe: Joonhee Lee Lecture 10 Date: 11/19/2012 we say that these algebras are isomorphic. Note that as opposed to ordinary algebraic structures we need two maps two define isomorphism. Lemma 7 If (A1 , B1 , 1 ) and (A2 , B2 , 2 ) are two algebras isomorphic to each other, then their SOS cones Σ1 and Σ2 are linearly isomorphic. Let y ∈ Σ2 . Then X X y= yi 2 yi = F(xi ) 2 F(xi ) (for some xi ∈ A, since F is surjective) i = X i G(xi 1 xi ) (by definition of homomorphism) i =G X xi 1 xi ∈ G Σ1 (by linearity). i The sequence of implications above goes through in both directions, establishing that Σ2 = G Σ1 . By definition, if G is bijective, then it is also a linear isomorphism between Σ1 and Σ2 . 4.3 Direct sums of algebras For k algebras (A1 , B1 , 1 ), · · · , (Ak , Bk , k ) define a new algebra (A1 × · · · × Ak , B1 × · · · × Bk , ) with a1 1 b1 b1 a1 .. .. .. . . = . ak k bk bk ak This new algebra is called the direct sum algebra. It is immediate that Σ = Σ1 × · · · × Σk And the Λ operator is given by Λ (w1 , . . . , wk ) = Λ1 ⊕ · · · ⊕ Λk . 4.4 Minkowski sum of algebras Consider the algebras (A1 , B, 1 ), · · · , (Ak , B, 2 ) with possible different Ai , but all having the same B. The Minkowski sum algebra is the algebra (A1 × · · · × Ak , B, ), with a1 b1 .. .. . . = a1 1 b1 + · · · + ak k bk ak bk Then we have Σ = Σ1 + · · · + Σk and Λ (w) = Λ1 (w) + · · · + Λk (w) 8 scribe: Joonhee Lee Lecture 10 4.5 Date: 11/19/2012 Weighted sum-of squares Combining the Minkowski sum and linear transformations we can show that a kind of Weighted Sums Of Squares (WSOS) is also in fact a sum-of squares. This follows from the following observation: Let (Ai , Bi , i ), for ı = 1, . . . , k be algebras, and let Fi be linear transformations each mapping Bi to a common set B. Then the cone F1 Σ1 + · · · + Fk Σk is also an SOS cone. Here is an example: d = { pd (t) | p(t) ≥ ∀t ≥ 0 }, where deg(P) = d. Clearly Example 4 Let P[0,∞) this set is a convex cone. Let us show that it is in fact ans SOS cone. d P + tPd−2 , d is even d P[0,∞) = Pd−1 + tPd−1 , d is odd Proof: We have : p(t) ≥ 0 ∀t ≥ 0 ⇔ p(t2 ) ≥ 0 ∀t ∈ R. Thus, q(t) = p(t2 ) = p21 (t) + p22 (t) for some polynomials p1 and p2 . We have, p(t2 ) = q2 (t) + r2 (t) separating odd and even degree terms in q(t) and r(t): 2 2 = q1 (t2 ) + tq2 (t2 ) + r1 (t2 ) + tr2 (t2 ) = q21 (t2 ) + t2 q22 (t2 ) + 2tq1 (t2 )q2 (t2 ) +r21 (t2 ) + t2 r2 (t2 ) + 2tr1 (t2 )r2 (t2 ) {z } {z } | | =0 =0 (> 2tq1 (t2 )q2 (t2 ) = 2tr1 (t2 )r2 (t2 ) = 0 since all terms of p(t2 ) have even degree) ⇒ p(t) = p21 (t) + tp22 (t) (By changing t2 to t) Thus, we have shown that the cone Pd [0,∞] [t] is a weighted sum of squares. However, note that the operation of multiplying by t is a bijective linear transformation mapping the space of degree d polynomials, to the space of degree d + 1 polynomials with a zero constant term. Thus, tPR [t] is an SOS cone, and, its Minkowski sum P[t] + tP[t] is also an SOS cone3 . 4.6 Isomorphism by change of basis and change of variables Our presentation of algebras has been basis-free in that all arguments are made independent of any particular basis for A or B in the algebra (A, B, ). Of course in practice, the multiplication operator L is represented by a matrix, and this representation in turn depends on the particular basis chosen for A 3 More precisely, if d is even then Pd [t] = Pd [t] + tPd−2 [t], and if d is odd Pd [t] = [0,∞] [0,∞] Pd−2 [t] + tPd−1 [t]. 9 scribe: Joonhee Lee Lecture 10 Date: 11/19/2012 and B. When we change basis for A, it is tantamount to replacing L (x) with L (Fx) where F is the change of basis matrix. Similarly changing basis in B is the same as replacing L (x) with L (x)G, where G is the change of basis matrix in B. Needless to say, the resulting algebras are all isomorphic to each other, and thus, the resulting SOS cones are linearly isomorphic. Polynomials, and in general squared functional systems, are functional linear spaces. As such, there are many different ways of choosing a basis for them. For instance, for polynomials, in the ordinary representation p0 + p1 t + · · · + pd td we are using the basis {1, t, t2 , . . . , td }. However, there are many other bases: for instance {1, t+1, (t+1)2 , . . . , (t+1)d } is another basis, and use of orthogonal polynomials (such as Chebychev, Legendre, Laguerre, etc.) are other ways of representing polynomials. Clearly, changing the basis in which polynomials are represented does not affect the SOS cone, nor does it change the fact that it equals the cone of nonnegative polynomials in the univariate case. The second observation is the effect of change of variable even in a nonlinear way. In general, consider the set of polynomials of degree d which are nonnegative over a set ∆ ⊆ R. Let H : Ω → ∆ be an onto mapping from a set Ω to ∆. Note that Ω need not be a subset of R; it is entirely arbitrary. Then the cone PΩ [H] = {f(x) | f(x) = p0 + p1 H(x) + . . . + pd Hd (x) ≥ 0∀x ∈ Ω} is a convex cone linearly isomorphic to P∆ [t]. f(x) ≥ 0 ⇔ p0 + p1 H(x) + · · · + pd Hd (x) ≥ 0 ∀x ∈ Ω ⇔ p(t) ≥ 0 ∀x ∈ ∆ This means that from polynomials, we can construct other sets of SOS cones by functional composition and possibly by change of basis. Two examples follow: Example 5 Consider the set of polynomials which are nonnegative over a finite d interval say [0, 1]. P[0,1] = { pd (t) | p(t) ≥ ∀t ∈ [0, 1] } t Setting H(t) = 1+t , we see that H : [0, ∞] → [0, 1]. Then a polynomial p(t) ≥ 0 for all t ∈ [0, ∞] iff p(s) ≥ 0 for all s ∈ [0, ∞], and this is equivalent to t p 1+t ≥ 0 for all t ∈ [0, 1]. But expanding this function, and observing that multiplying it by (1 + t)d does not change its sign over [0, ∞], we see that ∀t ∈ [ 0, ∞) p t 1+t = q(t) ≥ 0, ⇔ (1 + t)d ∀t ∈ [0, 1] ⇔ q(t) ≥ 0 ∀t ∈ [0, 1] This implies that P[0,1] [t] ' P[0,∞] [t], and both are weighted SOS, and thus SOS cones. Example 6 Cosine polynomials are functions of the form p0 + p1 cos(t) + · · · + pd cos(dt). We are interested in the cone Pcos [t] = {p | p(t) = p0 + p1 cos(t) + · · · pd cos(td) ≥ 0∀t ∈ R} 10 scribe: Joonhee Lee Lecture 10 Date: 11/19/2012 Since the domain of the function cos is the set [-1,1], and since each cos(kt) is a polynomial of degree k in cos(t) (known as the Chebychev polynomials of the first kind), we conclude that Pcos [t] ' P[−1,1] [t] ' P[0,1] [t] ' P[0,∞] [t] References [1] Nesterov, Y. Squared functional systems and optimization problems, In High Performance Optimization, pp. 405-440, Kluwer Acad. Publ., Dordrecht, 2000. [2] Faybusovich, L. On te Nesterov’s Approach to Semi-Infinte Programming, Acta. Appli. Math., 74, (2002), pp. 195-215. [3] Papp, D, and Alizadeh, F., Semidefinite Characterization of Sum-Of-Squares Cones in Algebras, RUTCOR-rrr Report no 11-14, RUTCOR, Rutgers-State University of New Jersey, accepted for publication in SIAM J. on Optimization. 11