Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ST 810, Advanced computing Eric B. Laber & Hua Zhou Department of Statistics, North Carolina State University February 4, 2013 “Now all that’s left to do is minimize over θ.” —Famous last words. Convex optization problems I A general optimization problem of Rp minx∈Rp s.t. I I f (x) x ∈ S. If the the optimization problem can be written so that f (x) is a convex function and S is a convex set then the problem is said to be convex. Roadmap: I I I Convex sets and special cases (today) Convex functions and special cases (next time) Special cases of convex optimization problems (two weeks) Convex sets Definition A set C ⊆ Rp is said to be affine if any linear combination of points in C belongs to C provided the coefficients sum to one. P I.e., x1 , . . . , xk ∈ C and θ1 , . . . , θk ∈ R s.t. ki=1 θi = 1, then k X i=1 θi x i ∈ C . Line segment and affine set I I Line between x1 and x2 is {θx1 + (1 − θ)x2 : θ ∈ [0, 1]} Line through x1 and x2 is {θx1 + (1 − θ)x2 : θ ∈ R} x1 x2 Affine sets Claim If C is an affine set and x0 ∈ C then V , C − x0 is a subspace Proof. Let ν1 , ν2 ∈ V and α, β ∈ R we want to show that αν1 + βν2 ∈ V which is equivalent to showing αν1 + βν2 + x0 ∈ C . The left hand side of the above display is equal to α(ν1 + x0 ) + β(ν2 + x0 ) + (1 − α − β)x0 , which is an affine combination of the points ν1 + x0 , ν2 + x0 , and x0 all of which belong to C . Since C is closed under affine combinations the result is proved. Affine sets cont’d I Examples of affine sets I Solution set for linear equalities {x : Ax = b} is affine. I I How would we show this? Any subspace is affine Definition The affine hull of a set C ⊆ Rp is the smallest affine set that contains C . It is given by ( k ) k X X aff C , θi xi : xi ∈ C , θi ∈ R, θi = 1, i = 1, . . . , k . i=1 i=1 What is the affine hull of the C = {x ∈ R2 : ||x||22 = 1}? Convex sets Definition A set C ⊆ Rp is said to be convex if the line segment between any two points in C is also in C . I.e., if x1 , x2 ∈ C then {θx1 + (1 − θ)x2 : θ ∈ [0, 1]} ⊆ C . Convex or not? Convex combinations Definition P P Let θ1 , . . . , θk ∈ R+ satisfy ki=1 θi = 1 then ki=1 θi xi is called a convex combination of x1 , . . . , xk . Definition The convex hull of a set C is the smallest convex set that contains C . The convex hull is given by ( k ) k X X conv C , θi xi : θi ∈ R+ , xi ∈ C , θi = 1, i = 1, . . . , k i=1 i=1 Convex hull Convex hull Infinite convex combinations I I P Let θ1 , . . . ∈ R+ and x1 , . . . ∈ C then ∞ i=1 θi xi is a called a convex combination of x1 , . . . provided the series is convergence. If f (x) is a density over C then Z xf (x)dx C is called a combination of C provided the integral is finite. I Suppose X is a random variable with support C ⊆ Rp and finite expectation. When is EX ∈ C ? Cones Definition A set C ⊆ Rp is said to be a cone if for each x ∈ C the set {θx : θ ≥ 0} is also in C . A cone that is also convex is called a convex cone Cones (0, 0) Convex cone (0, 0) Nonconvex cone Cones cont’d Claim Let C be a convex cone, if x1 , x2 ∈ C and θ1 , θ2 ∈ R+ then θ1 x1 + θ2 x2 ∈ C . Proof. If θ1 = θ2 = 0 then we’re done because 0 ∈ C . Otherwise note that θ1 θ2 θ1 x1 + θ2 x2 = (θ1 + θ2 ) x1 + x2 . θ1 + θ2 θ 1 + θ2 The term in side the square-brackets on the right hand side of the above display is a convex combination of elements in C and thus belongs to C (since C is convex). Because C is a cone it is closed under positive scaling. Conic combinations Definition Let θ1 , . . . , θk ∈ R+ then of points in Rp . Pk i=1 θi xi is called a conic combination How should we define the conic hull of a set C ? Examples of convex sets I Any singleton is affine and thus convex I Rp is convex I Any line is convex. If the line passes through the origin is it a subspace and a convex cone. I A ray {x0 + θx : θ ∈ R+ } is convex but not affine. a ray is a convex cone if it passes through the origin. I Any subspace is affine and a convex cone Working with definitions Claim If U ⊆ Rp is a subspace, then it’s affine and a convex cone. Proof. In class. . . Hyperplanes and half-planes Definition A hyperplane is a set of the form {x : a| x = b} for some nonzero a ∈ Rp and b ∈ R. Equivalently we can express the foregoing hyperplane as a⊥ + x0 where x0 is any point in the hyperplane and a⊥ , {x : a| x = 0}. I A hyperplane partitions the space Rp into two half-planes {x : a| x ≤ b} and {x : a| x > b}. Quick review: norms I A norm || · || on Rp is a map from Rp × Rp into R+ which satisfies: 1. ||x|| ≥ 0 and ||x|| = 0 ⇔ x = 0, 2. ||θx|| = |θ| ||x||, 3. ||x + y || ≤ ||x|| + ||y || for all x, y ∈ Rp and θ ∈ R. I A norm-ball of radius r > 0 centered at xc is defined as B(xc , r , || · ||) , {x : ||x − xc || ≤ r }. It is standard to suppress the norm and write B(xc , r ). Norm balls Claim Let || · || be any norm on Rp . Then for any r > 0 and xc ∈ Rp the norm ball B(xc , r ) is convex. Proof. Let ν1 , ν2 ∈ B(xc , r ) and θ ∈ [0, 1]. We need to show θν1 + (1 − θ)ν2 ∈ B(xc , r ). Note that we can write ν1 = xc + u1 , ν2 = xc + u2 with u1 , u2 ∈ B(0, r ). Then: ||θν1 + (1 − θ)ν2 − xc || = ||θu1 + (1 − θ)u2 || ≤ θ||u1 || + (1 − θ)||u2 || ≤ θr + (1 − θ)r = r . Thus, θv1 + (1 − θ)v2 ∈ B(xc , r ). Norm balls x2 x2 x1 x1 Norm balls x2 x2 x1 x1 Norm cones Definition The norm cone in Rp is defined as {(x, t) : ||x|| ≤ t} ⊆ Rp+1 . Claim The norm cone C = {(x, t) : ||x|| ≤ t} ⊆ Rp+1 is a convex cone. Proof. Let ν1 , ν2 ∈ C and θ ∈ [0, 1]. Want to show θν1 + (1 − θ)ν2 = (x, t) ∈ C . Note that νi = (xi , ti ) with ||xi || ≤ ti for i = 1, 2. Thus ||x|| = ||θx1 + (1 − θ)x2 || ≤ θ||x1 || + (1 − θ)||x2 || ≤ θt1 + (1 − θ)t2 = t. Thus ||x|| ≤ t and ν ∈ C . Second-order cone I When || · || is the Euclidean norm then the norm-cone is called the second-order cone. I In two-dimensions (so (x, t) ∈ R3 ) the second-order norm cone is called the Lorentz cone or icecream cone. Positive semidefinite cone I Let S p denote the space of symmetric p × p real matrices and p S+ the space of symmetric nonnegative definite matrices I In class: Show S+p is a convex cone Polyhedra Definition A polyhedron is a defined as the solution of a finite number of linear equalities and inequalities P = x : al| x ≤ bl , l = 1, . . . , m, cl| x = dl , l = 1, . . . , r Claim The intersection of convex sets is convex. Proof. In class... Corollary A polyhedron is convex. Ex. polyhedron I Succinctly write polyhedron as {x : Ax b, Cx = d}. I Ex. the positive orthant {x : xj ≥ 0, j = 1, . . . , p} is a polyhedron Simplexes Definition The k + 1 points x0 , x1 , . . . , xk ∈ Rp are said to be affinely independent if x1 − x0 , . . . , xk − x0 are linearly independent. Definition Let x0 , x1 , . . . , xk be affinely independent point in Rp . The simplex determined by x0 , x1 , . . . , xk is ( k ) k X X conv {x0 , x1 , . . . , xk } = θi xi : θ 0, θi = 1 . i=0 I Simplexes are polyhedra i=0 Simplex examples I In class: what does a simplex look like in 1D? What about 2D? I In 3D the simplex is a tetrahedron I The unit simplex is the p-dimensional simplex determined by the unit vector and 0, e1 , . . . , ep ∈ Rp : p X x : x 0, xj ≤ 1 j=1 I The probability simplex is generated by e1 , . . . , ep ∈ Rp : p X x : x 0, xj = 1 j=1 Converting to standard form I Ex. How to convert the probability simplex to standard form? I I Use basis elements e1 , . . . , ep ∈ Rp ; these are affinely independent. Convex hull of e1 , . . . , ep is p p X X θj ej , θ 0, θj = 1 . S= x : x= j=1 j=1 Note that this corresponds to all discrete distributions over p elements. The ith component of an element x ∈ S is the probability of observing the ith element. Writing a simplex as a polyhedron I Let x0 , x1 , . . . , xk ∈ Rp be affinely independent and let S denote the generated simplex ( k ) k X X S , θi xi : θ 0, θi = 1 i=0 ( = x0 + i=0 k X θi (xi − x0 ), , θ 0, i=1 I k X ) θi ≤ 1 i=0 Define B = (x1 − x0 , . . . , xk − x0 ), then Rank(B) h i = k by 1 affine independence, and there exists A = A A2 so that A1 B I AB = = A2 B 0 Writing a simplex as a polyhedron, cont’d I For any x ∈ S we have x = x0 + By with y 0 and P p i=1 yi ≤ 1; thus: Ax = Ax0 + ABy , matching equations above A1 x = A1 x0 + y , y 0, p X yi ≤ 1, i=1 A2 x = A2 x0 , Equivalently | A1 x A1 x0 , A2 x = A2 x0 , 1 (A1 x − A1 x0 ) ≤ 1. Building new sets from old I Strategy for building classes of objects with a given property (e.g., convexity, finite VC-dimension, measurability, etc.) 1. Find special cases 2. Find property preserving operations to tranform and combine special cases 3. Build up large classes of objects by transforming and combining simple cases Operations that preserve convexity I Arbitrary intersection: Sα convex for α ∈ A then convex. I T α∈A Sα is Ex. the positive semidefinite cone S+p can be written as \ {x ∈ S p : z | xz ≥ 0} , z6=0 I since x 7→ z | xz is linear in x, S+p is the intersection of half-spaces and hence convex. We will prove (time-permitting) the following converse that if S is a closed convex set then \ S= {H : H a halfspace , S ⊆ H} Operations that preserve convexity cont’d I Affine functions: S convex and f : Rp → Rm , affine (i.e. f (x) = Ax + b) then f (S) , {f (x) : x ∈ S} is convex. I I I Ex. Scaling and/or translating. Ex. Projection of a convex subset onto some of its coordinates. S ∈ Rp × Rm then writing x = (x1 , x2 ) with x1 ∈ Rp , x2 ∈ Rm then f (x) = x1 is affine. Ex. Sum of two convex sets. Let S1 , S2 ∈ Rp be convex, then S1 + S2 is convex. To see this note that S1 × S2 is convex. The map f (x1 , x2 ) = x1 + x2 is affine. Operations that preserve convexity cont’d I Inverse images of affine functions: S convex and g : Rm → Rp then g −1 (S) , {x : g (x) ∈ S} is convex. I p Ex. Linear matrix Pp inequality. Let A1 , . . . , Ap , B ∈ S , and define A(x) = j=1 xj Aj then the set {x : A(x) ≺ B} is convex. Why? Discrete probability distn examples I Let X be a real-valued discrete random variable with P(X = ai ) = pi i = 1, . . . , n with a1 < a2 < . . . < an . Then p belongs to the probability simplex ( ) n X P = p : p 0, pi = 1 . i=1 I Which of the following define convex subsets of P?1 1. 2. 3. 4. 5. 1 α ≤ Ef (X ) ≤ β for known function f P(X > α) ≤ β E|X |3 ≤ αE|X | Var (X ) ≤ α qτ (X ) ≥ α where qτ (X ) , inf{β : P(X ≤ β) ≥ τ }. Adopted form problem 2.15 in Boyd and Vandenberghe. Discrete probability distn examples: parts 1-4 Ans 1: Ef (X ) = convex. Pn i=1 pi ai Ans 2: P(X > α) = thus convex. hence the constraint is linear and thus P i:ai >α pi hence the constraint is linear and the P Ans 3: E|X |3 − αE|X | = ni=1 pi (|ai |3 − α|ai |) hence the constraint is linear and thus convex. P P Ans 4: Var (X ) = ni=1 pi ai2 − ( ni=1 pi ai )2 which we can write as n X i=1 pi ai − X pi pj ai aj , i,j if n = 2, a1 = 1, a2 = 0, then Var (X ) ≤ α becomes p1 − p12 ≤ α which is not convex. Discrete probability distn example: part 5 ∗ Ans 5: Define Pk ∗ k , max{i : ai < α} then qτ (X ) ≥ α is equivalent to i=1 pi < τ . Thus, the constraint qτ (X ) ≥ α is convex. Linear fractional functions I Linear fractional functions are more general than affine functions yet still preserve convexity I A linear fractional function f : Rp → Rm has the form f (x) = Ax + b , c |x + d with domain {x : c | x + d > 0}. I If C is convex and C ⊆ domf then f (C ) is convex I Ex. Suppose X and Y are discrete r.v.s taking values in {1, . . . , n} and {1, . . . , m} respectively. Then pij , P(X = i|Y = j) = P i pij is a linear fractional function of p.