Download A NICE PROOF OF FARKAS LEMMA 1. Introduction Let - IME-USP

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Jordan normal form wikipedia , lookup

Hilbert space wikipedia , lookup

Cartesian tensor wikipedia , lookup

System of linear equations wikipedia , lookup

Congruence lattice problem wikipedia , lookup

Vector space wikipedia , lookup

Invariant convex cone wikipedia , lookup

Bra–ket notation wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Linear algebra wikipedia , lookup

Dual space wikipedia , lookup

Basis (linear algebra) wikipedia , lookup

Transcript
A NICE PROOF OF FARKAS LEMMA
DANIEL VICTOR TAUSK
Abstract. The goal of this short note is to present a nice proof of Farkas
Lemma which states that if C is the convex cone spanned by a finite set and
if x is not in C then there exists a linear functional α that is nonnegative on
C with α(x) < 0. This implies in particular that the convex cone C is closed.
1. Introduction
Let V be a real vector space. Recall that a subset C of V is said to be convex
if for all x, y ∈ C and all t ∈ [0, 1] we have (1 − t)x + ty ∈ C; a simple induction
Pk
argument shows that C is convex if and only if i=1 ti xi ∈ C, for every integer
Pk
k ≥ 1, all x1 , . . . , xk ∈ C and all t1 , . . . , tk ≥ 0 with i=1 ti = 1. The set C is said
to be a cone if for all x ∈ C, t ≥ 0 we have tx ∈ C. It is easy to show that C is a
Pk
convex cone if and only if i=1 ti xi ∈ C, for every integer k ≥ 1, all x1 , . . . , xk ∈ C
and all t1 , . . . , tk ≥ 0.
If {v1 , . . . , vk } is a finite nonempty subset of C then the set:
(1.1)
k
nX
o
ti vi : t1 , . . . , tk ≥ 0
i=1
is easily seen to be the smallest convex cone containing {v1 , . . . , vk }; we call it the
convex cone spanned by {v1 , . . . , vk }.
We have the following:
1.1. Theorem (Farkas lemma). Let V be a real vector space and {v1 , . . . , vk } a
finite nonempty subset of V . If x ∈ V is not on the convex cone (1.1) spanned by
{v1 , . . . , vk } then there exists a linear functional α ∈ V ∗ such that α(x) < 0 and
α(vi ) ≥ 0, for all i = 1, . . . , k.
Clearly, if a linear functional α is nonnegative on v1 , . . . , vk then α is nonnegative
on the entire convex cone (1.1); in particular, the condition of x not being in (1.1)
is also necessary for the existence of the linear functional α with the properties
stated in Theorem 1.1.
The following is a simple consequence of Theorem 1.1:
1.2. Corollary. The convex cone spanned by a finite subset of IRn is closed in IRn .
Proof. If C is the convex cone spanned by a finite subset of IRn and if x ∈ IRn
is not in C then there exists a linear functional α ∈ IRn ∗ with α(x) < 0 which is
nonnegative on C. Hence the open half-hyperplane y ∈ IRn : α(y) < 0 is an open
neighborhood of x contained in the complement of C.
Date: October 2005.
1
A NICE PROOF OF FARKAS LEMMA
2
If one doesn’t use Farkas Lemma, the thesis of Corollary 1.2 apparently has no
immediate proof for it, although it may seem to be a fairly intuitive result. The
following example shows that one should be careful with intuition in this matter.
1.3. Example. If C is an arbitrary compact convex subset of IRn then the cone:
(1.2)
tx : x ∈ C, t ≥ 0
is not necessarily closed in IRn . For instance, let C be the closed disc in IR2 of
radius 1 centered at the point (1, 0).In this case, the cone (1.2) is the union of the
open half-plane (x, y) ∈ IR2 : x > 0 with the origin. Hence it is not closed in IR2 .
1.4. Remark. Corollary 1.2 actually holds when IRn is replaced by an arbitrary
Hausdorff topological vector space V . Namely, the convex cone spanned by a finite
subset of V is contained in a finite-dimensional subspace S of V , which is complete
and thus closed in V . Moreover, S is linearly homeomorphic to IRn and thus
Corollary 1.2 implies that the convex cone is closed in S.
2. The proof of Farkas lemma
Let us prove Theorem 1.1. We start by observing that there is no loss of generality in assuming that the vector space V is finite-dimensional. Namely, let V0
be the subspace of V spanned by {v1 , . . . , vk , x}; assuming the thesis valid for
finite-dimensional spaces, we obtain a linear functional α ∈ V0∗ with α(x) < 0 and
α(vi ) ≥ 0, for all i = 1, . . . , k. The conclusion is thus obtained by considering an
arbitrary linear extension of α to V .
We also observe that we can assume that the vectors vi are nonzero, since we
can obviously delete the zero vectors from the list v1 , . . . , vk .
To prove Theorem 1.1 we need the following:
2.1. Lemma (supporting hyperplane). Let V be a finite-dimensional nonzero real
vector space, C be a convex subset of V and x ∈ V a point not in C. Then there
exists a nonzero linear functional α ∈ V ∗ such that α(y) ≥ α(x), for all y ∈ C.
We leave the proof of Lemma 2.1 to the appendix. Geometrically, the statement
of the lemma can be interpreted as
follows: if the point
x is not on the convex set
C then there exists a hyperplane y ∈ V : α(y) = c passing
through x such that
C is contained on the closed half-space y ∈ V : α(y) ≥ c .
The following is a simple consequence of Lemma 2.1:
2.2. Corollary. Let C be a convex cone on a finite-dimensional nonzero real vector
space V and let x ∈ V be a point not in C. Then there exists a nonzero linear
functional α ∈ V ∗ with α(x) ≤ 0 and α(y) ≥ 0, for all y ∈ C.
Proof. If C is empty the result is trivial, so assume C is nonempty; this implies
0 ∈ C. Let α ∈ V ∗ be as in the statement of Lemma 2.1; then 0 = α(0) ≥ α(x),
for 0 ∈ C. Given y ∈ C, we claim that α(y) ≥ 0; otherwise, α(y) < 0 and
t α(y) < α(x) for t > 0 sufficiently large. Then α(ty) < α(x) and since ty ∈ C, we
obtain a contradiction.
We can now begin the proof of Theorem 1.1. The proof is by induction on the
dimension of V . If V is one-dimensional then the nonzero vector v1 is a basis for V
and the hypothesis that x is not on the convex cone spanned by {v1} implies that
A NICE PROOF OF FARKAS LEMMA
3
x = tv1 with t < 0; we may then choose α ∈ V ∗ to be the linear functional with
α(v1 ) = 1.
Assume now that V is a real vector space with dim(V ) > 1 and assume the
theorem valid for vector spaces of dimension less than dim(V ). We will show that
the theorem holds for V . This will be done by induction on the number k of vectors
on the list v1 , . . . , vk . If k = 1, we consider two possibilities: if v1 and x are linearly
independent, there exists a linear functional α ∈ V ∗ with α(v1 ) = 1 and α(x) = −1
(since the set {v1 , x} can be extended to a basis of V ); otherwise, if v1 and x are
linearly dependent then x = tv1 with t < 0 and thus it suffices to choose a linear
functional α with α(v1 ) = 1.
Assume now that k > 1 and that the result holds when the convex cone is
spanned by less than k vectors. Let x ∈ V be a point outside the convex cone
spanned by {v1 , . . . , vk } and let us show that there exists a linear functional α ∈ V ∗
with α(x) < 0 and α(vi ) ≥ 0, for all i = 1, . . . , k. Corollary 2.2 gives us a nonzero
linear functional α ∈ V ∗ with α(x) ≤ 0 and α(vi ) ≥ 0, for all i = 1, . . . , k. If
α(x) < 0, we are done; thus, we only have to worry about the case where α(x) = 0.
If α(vi ) = 0 for all i = 1, . . . , k then x and v1 , . . . , vk are all in the kernel of
α, whose dimension is less than dim(V ). But we are assuming the result to hold
for spaces of dimension less than dim(V ), so that there exists a linear functional
β defined on the kernel of α with β(x) < 0 and β(vi ) ≥ 0, for i = 1, . . . , k. The
linear functional β has a linear extension to the entire space V , and thus the thesis
is obtained in this case.
Let us assume that α(vi ) > 0 for at least one index i; then the set:
vi : α(vi ) = 0
has cardinality less than k and thus there exists a linear functional β ∈ V ∗ with
β(x) < 0 and such that β(vi ) ≥ 0 for all those i with α(vi ) = 0 (we know nothing
about β(vi ) for those i with α(vi ) > 0). Given ε > 0, consider the linear functional
α̃ = α + εβ. We have α̃(x) = εβ(x) < 0 and α̃(vi ) = εβ(vi ) ≥ 0 for those i
with α(vi ) = 0. But by choosing ε > 0 sufficiently small, we can assure that
α̃(vi ) = α(vi ) + εβ(vi ) is positive for those i with α(vi ) > 0. Thus the linear
functional α̃ has the desired properties and the proof is complete.
Appendix A. The proof of the supporting hyperplane lemma
In this appendix we prove Lemma 2.1. We observe that this lemma is just a
consequence of a version of the Hahn–Banach’s Theorem stated in many Functional
Analysis books. We start by proving such version of Hahn–Banach’s Theorem.
A.1. Theorem (Hahn–Banach). Let V be a real vector space and let p : V → IR be
a nonnegative map satisfying the following two conditions:
(a) p(x + y) ≤ p(x) + p(y), for all x, y ∈ V (subadditivity);
(b) p(cx) = cp(x), for all x ∈ V and all c ≥ 0 (positive homogeneity).
Let α ∈ S ∗ be a linear functional defined in a subspace S of V . If α(x) ≤ p(x) for
all x ∈ S then α extends to a linear functional α̃ ∈ V ∗ such that α̃(x) ≤ p(x) for
all x ∈ V .
A.2. Remark. If V is a real vector space then a nonnegative map p : V → IR
satisfying the conditions p(x + y) ≤ p(x) + p(y) and p(cx) = cp(x), for all x, y ∈ V ,
c ∈ IR is usually called a semi-norm; if, in addition, p(x) > 0 for all x 6= 0, then p is
A NICE PROOF OF FARKAS LEMMA
4
called a norm. Note that conditions (a) and (b) on the statement of Theorem A.1
do not imply that p be a semi-norm, since condition (b) only requires p(cx) = cp(x)
for nonnegative c. A nonnegative map p : V → IR is a semi-norm if and only if
it satisfies (a), (b) and the additional condition p(−x) = p(x), for all x ∈ V . We
observe that if p is a semi-norm and α ∈ V ∗ then the inequality α(x) ≤ p(x) holds
for all x ∈ V if and only if the inequality |α(x)| ≤ p(x) does. Many books state
the Hahn-Banach theorem only for semi-norms, but the proof in this more general
context is exactly the same.
The proof of Theorem A.1 uses the following:
A.3. Lemma (simple extension). Under the hypothesis of Theorem A.1, given a
vector v ∈ V not in S and denoting by S 0 the subspace spanned by S and v, then α
extends to a linear functional α0 on S 0 such that α0 (x) ≤ p(x), for all x ∈ S 0 .
Proof. A linear extension α0 of α to S 0 is uniquely determined by its value on the
vector v; more explicitly, every vector of S 0 can be uniquely written in the form
x + tv, with x ∈ S, t ∈ IR and setting α0 (v) = c for some real number c we obtain:
α0 (x + tv) = α(x) + tc,
for all x ∈ S, t ∈ IR. We wish to determine a value for c ∈ IR that makes the
inequality:
(A.1)
α(x) + tc ≤ p(x + tv)
true, for all x ∈ S, t ∈ IR. For t = 0, inequality (A.1) holds. For t > 0, (A.1) is
equivalent to:
(A.2)
α xt + c ≤ p xt + v
and for t < 0, (A.1) is equivalent to:
(A.3)
α xt + c ≥ −p − xt − v .
Clearly (A.2) holds for all x ∈ S and all t > 0 if and only if α(y) + c ≤ p(y + v), for
all y ∈ S; this is equivalent to:
c ≤ p(y + v) − α(y),
for all y ∈ S. Similarly, (A.3) holds for all x ∈ S and all t < 0 if and only if:
c ≥ −p(z − v) + α(z),
for all z ∈ S. Thus, the proof will be concluded
if we are able to find
a number
c ∈ IR which is an upper bound for the set −p(z − v) + α(z) : z ∈ S and a lower
bound for the set p(y + v) − α(y) : y ∈ S . But this is possible if and only if the
inequality:
(A.4)
−p(z − v) + α(z) ≤ p(y + v) − α(y)
holds, for all y, z ∈ S. Inequality (A.4) is equivalent to:
(A.5)
α(z + y) ≤ p(y + v) + p(z − v),
and (A.5) follows from the subadditivity of p, as we show below:
α(z + y) ≤ p(z + y) = p (y + v) + (z − v) ≤ p(y + v) + p(z − v).
A NICE PROOF OF FARKAS LEMMA
5
Proof of Theorem A.1. If V is finite-dimensional (which is the case that actually
matters to us), one simply applies Lemma A.3 a finite number of times, extending α
successively until we get the desired linear functional α̃ on V . For the general case,
one uses Zorn’s lemma to obtain a maximal extension of α to a linear functional
e then Lemma A.3 implies that
α̃ : Se → IR satisfying α̃(x) ≤ p(x), for all x ∈ S;
Se = V , otherwise we would be able to find a proper extension of α̃.
A.4. Definition. A subset C of a real vector space V is called absorbent in V if
for all x ∈ V there exists ε > 0 such that tx ∈ C for 0 ≤ t ≤ ε.
Obviously absorbent sets must contain the origin.
A.5. Example. If C is a neighborhood of the origin in IRn then C is absorbent;
namely, for x ∈ IRn , the continuous map IR 3 t 7→ tx ∈ IRn sends t = 0 to the
origin and thus we must have tx ∈ C for |t| sufficiently small.
A.6. Lemma. Let V be a real vector space and let C be an absorbent convex subset
of V . Then there exist a nonnegative map p : V → IR satisfying conditions (a) and
(b) in the statement of Theorem A.1, such that:
(A.6)
x ∈ V : p(x) < 1 ⊂ C ⊂ x ∈ V : p(x) ≤ 1 .
Proof. For each x ∈ V , consider the set:
Ix = t > 0 :
x
t
∈C .
The fact that C is absorbent implies that Ix is nonempty; thus, it has a nonnegative
infimum. Set p(x) = inf Ix , for all x ∈ V . Observe that if x = 0 then Ix = ]0, +∞[
and thus p(0) = 0; this proves (b) for c = 0. If c > 0 then clearly t ∈ Icx if and
only if ct ∈ Ix , which show that Icx = cIx ; thus:
p(cx) = inf Icx = c inf Ix = cp(x).
This proves (b). Let us prove (a). Let x, y ∈ V be fixed, set t = p(x), s = p(y) and
choose ε > 0. We know that there exists t0 ∈ Ix , s0 ∈ Iy with t ≤ t0 < t + ε and
s ≤ s0 < s + ε. Thus tx0 ∈ C, sy0 ∈ C and the convexity of C implies that:
t0 x
s0
y
x+y
= 0
+ 0
0
0
0
0
t +s
t +s t
t + s0 s0
is also in C; thus t0 + s0 ∈ Ix+y and:
p(x + y) ≤ t0 + s0 < t + s + 2ε = p(x) + p(y) + 2ε.
Since ε > 0 is arbitrary, we obtain (a). Finally, let us prove (A.6). Given x ∈ V
with p(x) < 1 then there exists t ∈ Ix with p(x) ≤ t < 1. Thus xt ∈ C, 0 ∈ C
and the convexity of C implies that x = t xt + (1 − t)0 is in C. Moreover, given
x ∈ C then obviously 1 ∈ Ix , so that p(x) ≤ 1. This proves (A.6) and concludes
the proof.
A.7. Remark. The map p constructed in the proof of Lemma A.6 is not in general
a semi-norm, unless C is symmetric, i.e., if x ∈ C implies −x ∈ C.
A.8. Corollary. Lemma 2.1 holds if we add the hypothesis that the convex set C be
absorbent (in which case the hypothesis of finite-dimensionality for V is no longer
necessary).
A NICE PROOF OF FARKAS LEMMA
6
Proof. Choose p as in the statement of Lemma A.6. Since 0 ∈ C and x 6∈ C, we
have x 6= 0 and thus {x} is a basis for a one-dimensional subspace S of V . Let
α ∈ S ∗ be the linear functional such that α(x) = 1. Notice that p(x) ≥ 1 for,
otherwise, (A.6) would imply x ∈ C. Thus, for t ≥ 0 we have:
α(tx) = t ≤ tp(x) = p(tx);
obviously, for t < 0 we have α(tx) = t < 0 ≤ p(tx). We have thus shown that
α(y) ≤ p(y) for all y ∈ S, and now Theorem A.1 gives us a linear extension α̃ of
α to V such that α̃(y) ≤ p(y), for all y ∈ V . For all y ∈ C, we have p(y) ≤ 1 and
thus:
α̃(y) ≤ p(y) ≤ 1 = α̃(x).
Hence −α̃(y) ≥ −α̃(x), for all y ∈ C, so that −α̃ is the linear functional we are
looking for.
A.9. Lemma. If V is a finite-dimensional real vector space and C is a nonempty
convex subset of V then there exists
a point p ∈ C and a subspace S of V such that
the set C − p = x − p : x ∈ C is contained in S and is absorbent in S.
Proof. Choose an arbitrary point q ∈ C and set C 0 = C − q, so that C 0 is a
convex subset of V containing the origin. Now let {b1 , . . . , bk } be a maximal linearly independent subset of C 0 ; the existence of such subset follows easily from the
finite-dimensionality of V . The subspace S spanned by {b1 , . . . , bk } contains C 0 ;
otherwise, there would exist x ∈ C 0 \ S and thus {b1 , . . . , bk , x} would be a linearly
independent subset of C 0 properly containing {b1 , . . . , bk }. Let L : IRk → S be the
linear isomorphism that carries the canonical basis of IRk to {b1 , . . . , bk }. Then
L−1 (C 0 ) is a convex subset of IRk containing the origin and the canonical basis;
this implies that L−1 (C 0 ) also contains the set:
Pk
∆k = (t1 , . . . , tk ) ∈ IRk : i=1 ti ≤ 1, ti ≥ 0, i = 1, . . . , k .
1
1
1
, k+1
, . . . , k+1
∈ IRk is clearly an interior point of ∆k , so
The point u = k+1
that ∆k − u is a neighborhood of the origin in IRk . Thus L−1 (C 0 ) − u is also
a neighborhood of the origin in IRk and it is therefore an absorbent subset of IRk
(recall Example A.5). The image of L−1 (C 0 )−u under the isomorphism L : IRk → S
is equal to C 0 − L(u) and it is an absorbent subset of S. But
C 0 − L(u) = C − q + L(u)
and the conclusion is obtained by setting p = q + L(u).
Proof of Lemma 2.1. If C is empty, any nonzero linear functional α will do. Assume
that C is nonempty and choose p and S as in the statement of Lemma A.9. If the
point x−p is not in S then there exists a linear functional α ∈ V ∗ such that α|S = 0
and α(x − p) = −1. Then, for all y ∈ C we have y − p ∈ S and thus:
α(y) − α(p) = α(y − p) = 0 > −1 = α(x − p) = α(x) − α(p),
which implies α(y) > α(x), for all y ∈ C. On the other hand, if x − p is in S then,
since C − p is convex and absorbent in S, Corollary A.8 gives us a nonzero linear
functional α on S with α(z) ≥ α(x − p), for all z ∈ C − p; this implies:
α(y) − α(p) = α(y − p) ≥ α(x − p) = α(x) − α(p),
and thus α(y) ≥ α(x), for all y ∈ C. The conclusion is obtained by considering an
arbitrary linear extension of α to V .
A NICE PROOF OF FARKAS LEMMA
Departamento de Matemática,
Universidade de São Paulo, Brazil
E-mail address: [email protected]
URL: http://www.ime.usp.br/~tausk
7