Download Semidefinite and Second Order Cone Programming Seminar Fall 2012 Lecture 10

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Eigenvalues and eigenvectors wikipedia , lookup

Quartic function wikipedia , lookup

Jordan normal form wikipedia , lookup

System of linear equations wikipedia , lookup

Capelli's identity wikipedia , lookup

Bra–ket notation wikipedia , lookup

Dual space wikipedia , lookup

Exterior algebra wikipedia , lookup

Laws of Form wikipedia , lookup

History of algebra wikipedia , lookup

Gröbner basis wikipedia , lookup

Basis (linear algebra) wikipedia , lookup

Oscillator representation wikipedia , lookup

Clifford algebra wikipedia , lookup

System of polynomial equations wikipedia , lookup

Polynomial wikipedia , lookup

Linear algebra wikipedia , lookup

Resultant wikipedia , lookup

Polynomial ring wikipedia , lookup

Eisenstein's criterion wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Invariant convex cone wikipedia , lookup

Polynomial greatest common divisor wikipedia , lookup

Factorization wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Transcript
Semidefinite and Second Order Cone
Programming Seminar
Fall 2012
Lecture 10
Instructor: Farid Alizadeh
Scribe: Joonhee Lee
11/19/2012
1
Overview
In this lecture, we show that a number of sets and functions related to the notion of sum-of-squares (SOS) are SD-representable. We will start with positive
polynomials. Then, we introduce a general algebraic framework in which the
notion of sum-of-squares can be formulated in very general setting.
2
Polynomials
Recall the cone of nonnegative univariate polynomials:
P2d [t] = p(t) = p0 + p1 t + p2 t2 + · · · + p2d t2d > 0 ∀t ∈ R
Earlier we have examined this cone and have shown that it is SD-representable.
We now consider the case of multivariate polynomials. The set of nonnegative
polynomials is
Pn,d [t1 , . . . , td ] = P(t1 , . . . , tn ) ≥ 0
∀t ∈ R
Recall that in the case of univariate polynomials, a polynomial p(t) > 0 for all
t ∈ R if and only if there are two polynomials p1 (t), p2 (t) such that p(t) =
p21 (t) + p22 (t). In other words, a univariate polynomial is nonnegative over the
real line if and only if it is a sum of squares. For multivariate polynomials this
is no longer true.
Example 1 (Motzkin Polynomial) consider the following polynomial:
p(x, y) = x 4 y 2 + x 2 y 4 − 3x 2 y 2 + 1
1
scribe: Joonhee Lee
Lecture 10
Date: 11/19/2012
This polynomial is nonnegative for all x, y ∈ R because
p
x4 y2 + x2 y4 + 1
≥ 3 x 6y 6
3
by the arithmetic-geometric inequality. On the other hand it cannot be a sum
of twoPsquare polynomials. If there were polynomials p1 (x, y), . . . , pn (x, y) such
that i p2i (x, y) = p(x, y), then we must have that each pi (x, y) is of the form
x2 ai y2 + bi y + ci + x(di y2 + ei y + fi ) + gi y2 + hi y + ki
P 2 P 2 P 2 P 2 P 2
hi = 0, and thus,
gi =
fi =
ci =
But it is immediate that
ai =
ai = ci = fi = gi = hi = 0 for all i = 1, . . . , n, otherwise the sum of squares
will contains terms which do not appear in p(x, y). This leaves us with
X
2
p(x, y) =
bi x2 y + di xy2 + ei xy + ki
i
P
2
i ei
= −3 which is impossible. This completes the proof
But this implies that
that the Motzkin polynomial is not sum of squares of other polynomials.
Define
def
Σn,d [t1 , . . . , td ] =
p(t1 , . . . , tn ) | p(t1 , . . . , tn ) =
N
X
p2i (t1 , . . . , td )}
i
It is clear that Σn,2d ⊆ Pn,2d , since any sum-of-square polynomial is nonnegative. The Motzkin example shows that the inclusion is proper.
When can we guarantee P= Σ? It turns out that the following are the only
possible cases where P = Σ:
1. Case 1:n = 1. This is the case of univariate polynomials which we have
proved earlier
2. Case 2: d=2. All polynomials of degree two can be written in the form
of t> At+b> t+c, with A 0 (A is nonsingular). To see this, first suppose
d is a vector satisfying d = B−T b, and let A = B> B Then
> >
> t> B> Bt + b> t + c = Bt + d2 t
Bt + d2 t
which is a sum-of-squares.
3. Case 3: d = 4, n = 3. A very special case is three-variable polynomials
of degree four, where nonnegative polynomials are always sum-of-squares.
The proof is somewhat hairy.
Hilbert’s seventeenth problem states that all nonnegative polynomials are sumof-squares of rational functions. In other words for each p(t) = p(t1 , . . . , tn ) > 0
for all ti > 0, there are polynomials p1 (t), . . . , pN (t), q(t) such that
q2 (t)p(t) = p21 (t) + · · · + p2N (t)
2
scribe: Joonhee Lee
Lecture 10
Date: 11/19/2012
Hilbert’s seventeenth problem was proved by E. Artin in 1923, and in the process
laid out the foundation of a field of algebra known as real algebraic geometry.
Assuming that the greatest common denominator of q(t) and pi (t) is 1,
there is still no satisfactory bound on the number N, the number of squares,
and D, the largest degree among q(t) and pi (t). In fact, bounds that are known
for N and D are not only depend on n, the number of variable, and d the degree
of p(t), but also the coefficients of p as well.
3
General Algebra
Consider (A, B, ) where is a bilinear operator that is : A × A → B. A and
B are finite-dimensional real linear spaces with dim A = m and dim(B) = n.
Note that bilinearity assumption is equivalent to the distributive law:
• a (αb + βc) = αa b + βa c
• (αb + βc) a = αb a + βc a
Note also that bilinearity means that there are matrices Qi such that (ab)i
= aT Qi b. Indeed, to each element a ∈ A we can associate a linear transformation La mapping A → B, that is La b = a b. The linear transformation La
may be represented by a n × m matrix (also written as La, whose entries are
linear forms in ai .
Our object of interest is the following set:
X
Σ =
ai ai | ai ∈ A ⊆ B.
which we call the sum-of-squares cone (or the SOS cone) associated with the
algebra. Note that Σ is a convex cone, since adding to sums of squares creates
another sum of squares.
Example 2 Suppose A = Pd [t], the set of degree d univariate polynomials, and B = P2d [t], the set of degree 2d univariate polynomials. Then the
(Pd [t], P2d [t], ∗) forms an algebra, with ∗ indicating the multiplication of polynomials. If we represent each polynomial by the vector of its coefficients, then
∗ is the convolution operation:
(p0 , p1 , . . . , pd )∗(q0 , q1 , . . . , qd ) = (p0 q0 , p0 q1+p1 q0 , . . . , p0 qk +p1 qk−1 +· · · , . . . , pd qd )
For this algebra Σ∗ is the set of polynomials of degree 2d which are sum of
squares of polynomials. As we know, in this case this cone is exactly the cone
of polynomials of degree 2d which are nonnegative for every t ∈ R.
We make three assumptions on the algebra (A, B, ) without loss of generality,
which will make the presentation cleaner and more streamlined.
3
scribe: Joonhee Lee
Lecture 10
Date: 11/19/2012
1. is commutative: a b = b a. If it is not commutative, then we
can replace it with its anti-commutator :
ab =
ab+ba
2
Note that for the algebras (A, B, ) and (A, B, ) we have Σ = Σ .
2. B = span(AA) This assumption ensures that B does not contain elements
that are not somehow generated by elements from A, and in turn results
in Σ to be full dimensional in B.
3. The mapping L : A → Rn×m is injective.
Lx1 = Lx2 ⇒ x1 = x2 .
Without the this assumption, there is no way to distinguish x1 and x2 .
In this case we observe that the relation Lx1 = Lx2 defines an equivalence
relation on A:
x1 ' x2 ⇔ Lx1 = Lx2
Then, by replacing A, with A/ ', the set of equivalence classes, and
defining on A/ ':
def
[x] y] = [x y]
Using commutativity it is easy to see that this definition is consistent, that
is, if x1 ' x2 and y1 ' y2 then x1 y1 ' x2 y2 . Therefore, (A/ ', B, )
satisfies the third assumption.
Lemma 1 With assumptions 1, 2, and 3,
1. Σ is full dimensional convex cone.
2. Every element in Σ is sum of at most n = dim(B) squares.
Proof:
1) Since the sum of two sums of squares is a another sum of square, Σ is convex.
To prove it is full-dimensional we claim that B = Σ −Σ . First note that for any
2
2
− ( a−b
On the
a, b ∈ A, a b = ( a+b
2 )
2 ) ; this follows from commutativity.
P
other hand by assumption 2, every element in B is of the form i ai bi . This
shows that B = Σ − Σ , and thus, Σ is full dimensional. 2) By Caratheodory’s
Theorem for cones, every element of Σ is sum of at most n extreme rays. But
the extreme rays of Σ are among perfect squares a a, so each element of Σ is
sum of at most n squares.
(A Trivial Example) A = B = C : Complex number under ordinary
multiplication. Then, Σ = C
4
scribe: Joonhee Lee
Lecture 10
Date: 11/19/2012
Lemma 2 If (A, B, ) is formally real1 ,then Σ is pointed. Conversely, if Σ
is pointed and there are no nilpotent2 elements, then (A, B, ) is formally real.
Proof:
If −
X
a2
i ∈ Σ , then
i
∃ bi :
X
b2
i =−
X
i
⇒
a2
i ⇒
i
X
b2
i +
X
i
a2
i =0
i
ai = 0 and bi = 0 (by formally real)
X
2
a2
i = 0 ⇒ ai = 0 ⇒ ai = 0 (by nilpotent)
i
The dual of Σ is Σ∗ = { z | ha, zi ≥ 0, ∀a ∈ Σ }. Then,
Theorem 3 Σ is a proper cone iff Σ∗ is.
Define 1 Λ and Λ∗ operators
(a, b, ), Σ , Σ∗
Λ : B → SA where SA is a set of symmetric bilinear forms
(Λ (w), a, b) , hw, a biB
aT Λb = bT Λa, (a, b) ∈ A, w ∈ B
Theorem 4 w ∈ Σ∗ iff Λ (w) 0.
Proof: ⇒:
Λ (w) 0
Λ (w)(a, a) = hw, a ai ≥ 0
(∵ w ∈ Σ∗ and a a ∈ Σ )
⇒ Λ(w) ≥ 0
Next, we need to show (⇐)
Λ(w) ≥ 0 ⇒
⇒
⇒
P
2
i ai =
2 a2 = 0
i
1
∀a ∈ A such that Λ (w)(a, a) ≥ 0
hw, xi ≥ 0∀x ∈ Σ
w ∈ Σ∗
0 ⇒ ai = 0
5
scribe: Joonhee Lee
Lecture 10
Date: 11/19/2012
Note that, if Λ∗ : S → B, then
hx , Λ(w)iSA = hx , Λ∗ (x)iB
∀w ∈ B and x ∈ SA
Theorem 5 u ∈ Σ iff ∃Y 0 such that u = Λ∗ (Y ).
Proof:
Show (⇒)
if Y 0 and Λ∗ (mathitY) = u, then
∀v ∈ Σ∗ , hu, viB = hΛ∗ (Y , viB = hY , ΛviSA ≥ 0
⇒
(∵ Y ≥ 0 and Λ(v) ≥ 0)
u ∈ Σ∗∗
= Σ
(⇐)
if u ∈ Σ , then ∃ai such that u =
X
a2
i . Let v ∈ B
i
hu, viB
X
X
h
a2
ha2
i , viB =
i , viB
=
i
X
=
(Let Y =
P
=
⇒
i
i
Λ(v)(ai , ai ) =
i
ai aT
i)
X
hΛ(v), ai aTi iSA ,
i
∗
hΛ(v), Y iSA = hv, Λ (Y iB
u = Λ∗ (Y )
Example 3 For the algebra of univariate polynomials, the Λ operator can be
computed as follows. If we represent Λ(w) as a matrix, then its i,j entry, by
definition, is given by hw, ei ∗ ej i. But, ei ∗ ej = ei+j , thus Λ(w) ij = wi+j .
This means that the i, j entry of Λ in this case only depends on i + j, that is,
all entries of Λ(w) with the same i + j must be equal. It follows that Λ has all
entries on the reverse diagonals are equal.


w0
w1
···
wn
 w1
w2
· · · wn+1 


Λ(w) =  .
.
.. 
..
..
 ..
.
. 
wn wn+1 · · · w2n
6
scribe: Joonhee Lee
Lecture 10
3.1
Date: 11/19/2012
Squared functional systems
Of particular interest are algebras that are induced by functional spaces. Let
, fm } where each fi : ∆ → R, is a real-valued function. Let
F = {f1 , f2 , · · · P
F = span(F) = i αi fi (x), ∀x ∈ ∆ be the linear space spanned by fi , where
(αi fi + αj fj )(x) = αi fi (x) + αj fj (x).
Now define S = {fi , fj } and (fi , fj )(x) = fi (x)fj (x)
P
Let S = span(I) (F, S, )˙ is an algebra and ΣF = { i g2i : gi ∈ F}. The algebra (F, S, ·) along with its SOS cone ΣF is called a squared function of system. The univariate polynomial example given earlier is a special case where
F = {1, t, t2 , . . . , td }, and S = {1, t, t2 , . . . , t2d }.
3.2
The semidefinite and second order cones as SOS cones
For two p × q matrices A and B define the Cracovian multiplication as follows:
A B = ABT and AB =
AB+BA
2
Then, for the algebra (Rp×q , Rp×p , ), the SOS cone Σ is exactly the cone of
positive semidefinite p × p matrices.
4
4.1
Operations on algebras and their SOS cones
Bijective linear transformations
Let (A, B, ) be an algebra which as usual satisfies assumptions 1,2, and 3, and
let C be another linear space, such that the linear transformation F | B → C
is bijective. (Note that this means that necessarily dim(B) = n = dim(C)).
Define a new binary operation ◦ | A × A → C by L◦ = FL . Then (A, C, ◦) is an
algebra, and if satisfying assumptions 1,2, and 3. Furthermore, Σ◦ = FΣ .
Definition 6 Two cones K1 and K2 are linearly isomorphic if ∃ bijective linear
transformation F : K1 = F K2 .
Thus, if Σ1 is an SOS cone, and Σ2 is a cone linearly isomorphic to Σ1 then
Σ2 is also an SOS cone for some algebra.
4.2
Isomorphism and linear isomorphism among algebras
Let (A1 , B1 , ) and (A2 , B2 , ) be two algebras, and let there be two linear
transformation F and G such that
F : A1 → A2
G : B1 → B2
If both F and G are bijective, and we have
G(a 1 b) = F(a) 2 F(b)
7
scribe: Joonhee Lee
Lecture 10
Date: 11/19/2012
we say that these algebras are isomorphic. Note that as opposed to ordinary
algebraic structures we need two maps two define isomorphism.
Lemma 7 If (A1 , B1 , 1 ) and (A2 , B2 , 2 ) are two algebras isomorphic to each
other, then their SOS cones Σ1 and Σ2 are linearly isomorphic.
Let y ∈ Σ2 . Then
X
X
y=
yi 2 yi =
F(xi ) 2 F(xi ) (for some xi ∈ A, since F is surjective)
i
=
X
i
G(xi 1 xi ) (by definition of homomorphism)
i
=G
X
xi 1 xi ∈ G Σ1
(by linearity).
i
The sequence of implications
above goes through in both directions, establishing
that Σ2 = G Σ1 . By definition, if G is bijective, then it is also a linear
isomorphism between Σ1 and Σ2 .
4.3
Direct sums of algebras
For k algebras (A1 , B1 , 1 ), · · · , (Ak , Bk , k ) define a new algebra
(A1 × · · · × Ak , B1 × · · · × Bk , )
with

   
a1 1 b1
b1
a1
 ..   ..  

..
 .  . =

.

ak k bk
bk
ak
This new algebra is called the direct sum algebra. It is immediate that
Σ = Σ1 × · · · × Σk
And the Λ operator is given by
Λ (w1 , . . . , wk ) = Λ1 ⊕ · · · ⊕ Λk .
4.4
Minkowski sum of algebras
Consider the algebras (A1 , B, 1 ), · · · , (Ak , B, 2 ) with possible different Ai , but
all having the same B. The Minkowski sum algebra is the algebra (A1 × · · · ×
Ak , B, ), with
   
a1
b1
 ..   .. 
 .   .  = a1 1 b1 + · · · + ak k bk
ak
bk
Then we have
Σ = Σ1 + · · · + Σk and Λ (w) = Λ1 (w) + · · · + Λk (w)
8
scribe: Joonhee Lee
Lecture 10
4.5
Date: 11/19/2012
Weighted sum-of squares
Combining the Minkowski sum and linear transformations we can show that a
kind of Weighted Sums Of Squares (WSOS) is also in fact a sum-of squares.
This follows from the following observation: Let (Ai , Bi , i ), for ı = 1, . . . , k be
algebras, and let Fi be linear transformations each mapping Bi to a common set
B. Then the cone
F1 Σ1 + · · · + Fk Σk
is also an SOS cone. Here is an example:
d
= { pd (t) | p(t) ≥ ∀t ≥ 0 }, where deg(P) = d. Clearly
Example 4 Let P[0,∞)
this set is a convex cone. Let us show that it is in fact ans SOS cone.
d
P + tPd−2 ,
d is even
d
P[0,∞)
=
Pd−1 + tPd−1 , d is odd
Proof:
We have : p(t) ≥ 0 ∀t ≥ 0 ⇔ p(t2 ) ≥ 0 ∀t ∈ R. Thus, q(t) = p(t2 ) =
p21 (t) + p22 (t) for some polynomials p1 and p2 . We have,
p(t2 ) = q2 (t) + r2 (t) separating odd and even degree terms in q(t) and r(t):
2
2
= q1 (t2 ) + tq2 (t2 ) + r1 (t2 ) + tr2 (t2 )
= q21 (t2 ) + t2 q22 (t2 ) + 2tq1 (t2 )q2 (t2 ) +r21 (t2 ) + t2 r2 (t2 ) + 2tr1 (t2 )r2 (t2 )
{z
}
{z
}
|
|
=0
=0
(> 2tq1 (t2 )q2 (t2 ) = 2tr1 (t2 )r2 (t2 ) = 0 since all terms of p(t2 ) have even degree)
⇒ p(t) = p21 (t) + tp22 (t) (By changing t2 to t)
Thus, we have shown that the cone Pd
[0,∞] [t] is a weighted sum of squares.
However, note that the operation of multiplying by t is a bijective linear transformation mapping the space of degree d polynomials, to the space of degree
d + 1 polynomials with a zero constant term. Thus, tPR [t] is an SOS cone, and,
its Minkowski sum P[t] + tP[t] is also an SOS cone3 .
4.6
Isomorphism by change of basis and change of variables
Our presentation of algebras has been basis-free in that all arguments are made
independent of any particular basis for A or B in the algebra (A, B, ). Of
course in practice, the multiplication operator L is represented by a matrix,
and this representation in turn depends on the particular basis chosen for A
3 More
precisely, if d is even then Pd
[t] = Pd [t] + tPd−2 [t], and if d is odd Pd
[t] =
[0,∞]
[0,∞]
Pd−2 [t] + tPd−1 [t].
9
scribe: Joonhee Lee
Lecture 10
Date: 11/19/2012
and B. When we change basis for A, it is tantamount to replacing L (x) with
L (Fx) where F is the change of basis matrix. Similarly changing basis in B is
the same as replacing L (x) with L (x)G, where G is the change of basis matrix
in B. Needless to say, the resulting algebras are all isomorphic to each other,
and thus, the resulting SOS cones are linearly isomorphic.
Polynomials, and in general squared functional systems, are functional linear
spaces. As such, there are many different ways of choosing a basis for them. For
instance, for polynomials, in the ordinary representation p0 + p1 t + · · · + pd td
we are using the basis {1, t, t2 , . . . , td }. However, there are many other bases:
for instance {1, t+1, (t+1)2 , . . . , (t+1)d } is another basis, and use of orthogonal
polynomials (such as Chebychev, Legendre, Laguerre, etc.) are other ways of
representing polynomials. Clearly, changing the basis in which polynomials are
represented does not affect the SOS cone, nor does it change the fact that it
equals the cone of nonnegative polynomials in the univariate case.
The second observation is the effect of change of variable even in a nonlinear
way. In general, consider the set of polynomials of degree d which are nonnegative over a set ∆ ⊆ R. Let H : Ω → ∆ be an onto mapping from a set Ω to ∆.
Note that Ω need not be a subset of R; it is entirely arbitrary. Then the cone
PΩ [H] = {f(x) | f(x) = p0 + p1 H(x) + . . . + pd Hd (x) ≥ 0∀x ∈ Ω}
is a convex cone linearly isomorphic to P∆ [t].
f(x) ≥ 0
⇔
p0 + p1 H(x) + · · · + pd Hd (x) ≥ 0 ∀x ∈ Ω
⇔
p(t) ≥ 0 ∀x ∈ ∆
This means that from polynomials, we can construct other sets of SOS cones by
functional composition and possibly by change of basis. Two examples follow:
Example 5 Consider the set of polynomials which are nonnegative over a finite
d
interval say [0, 1]. P[0,1]
= { pd (t) | p(t) ≥ ∀t ∈ [0, 1] }
t
Setting H(t) = 1+t , we see that H : [0, ∞] → [0, 1]. Then a polynomial p(t) ≥ 0
for all t ∈ [0, ∞] iff p(s) ≥ 0 for all s ∈ [0, ∞], and this is equivalent to
t
p 1+t
≥ 0 for all t ∈ [0, 1]. But expanding this function, and observing that
multiplying it by (1 + t)d does not change its sign over [0, ∞], we see that
∀t ∈ [ 0, ∞) p
t
1+t
=
q(t)
≥ 0, ⇔
(1 + t)d
∀t ∈ [0, 1] ⇔ q(t) ≥ 0 ∀t ∈ [0, 1]
This implies that P[0,1] [t] ' P[0,∞] [t], and both are weighted SOS, and thus
SOS cones.
Example 6 Cosine polynomials are functions of the form p0 + p1 cos(t) + · · · +
pd cos(dt). We are interested in the cone
Pcos [t] = {p | p(t) = p0 + p1 cos(t) + · · · pd cos(td) ≥ 0∀t ∈ R}
10
scribe: Joonhee Lee
Lecture 10
Date: 11/19/2012
Since the domain of the function cos is the set [-1,1], and since each cos(kt) is
a polynomial of degree k in cos(t) (known as the Chebychev polynomials of the
first kind), we conclude that
Pcos [t] ' P[−1,1] [t] ' P[0,1] [t] ' P[0,∞] [t]
References
[1] Nesterov, Y. Squared functional systems and optimization problems, In High
Performance Optimization, pp. 405-440, Kluwer Acad. Publ., Dordrecht,
2000.
[2] Faybusovich, L. On te Nesterov’s Approach to Semi-Infinte Programming,
Acta. Appli. Math., 74, (2002), pp. 195-215.
[3] Papp, D, and Alizadeh, F., Semidefinite Characterization of Sum-Of-Squares
Cones in Algebras, RUTCOR-rrr Report no 11-14, RUTCOR, Rutgers-State
University of New Jersey, accepted for publication in SIAM J. on Optimization.
11