Download Smoothed Fischer-Burmeister equation methods for the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

System of linear equations wikipedia , lookup

Transcript
Smoothed Fischer-Burmeister Equation Methods for the
Complementarity Problem
1
Houyuan Jiang
CSIRO Mathematical and Information Sciences
GPO Box 664, Canberra, ACT 2601, Australia
Email: [email protected]
Abstract : By introducing another variable and an additional equation, we describe a technique
to reformulate the nonlinear complementarity problem as a square system of equations. Some
useful properties of this new reformulation are explored. These properties show that this new
reformulation is favourable compared with some pure nonsmooth equation reformulation and
smoothing reformulation because it combines some advantages of both nonsmooth equation
based methods and smoothing methods. A damped generalized Newton method is proposed for
solving the reformulated equation. Global and local superlinear convergence can be established
under some mild assumptions. Numerical results are reported for a set of the standard test
problems from the library MCPLIB.
AMS (MOS) Subject Classications. 90C33, 65K10, 49M15.
Key Words. Nonlinear complementarity problem, Fischer-Burmeister functional,
semismooth equation, Newton method, global convergence, superlinear convergence.
1 Introduction
We are concerned with the solution of the nonlinear complementarity problem (NCP)
[35]. Let F : <n ! <n be continuously dierentiable. Then the NCP is to nd a vector
x 2 <n such that
x 0; F (x) 0; F (x)T x = 0:
(1)
Reformulating the NCP as a constrained or unconstrained smooth optimization
problem, and as constrained or unconstrained systems of smooth or nonsmooth equations, has been a popular strategy in the last decade. Based on these reformulations,
many algorithms such as merit function methods, smooth or nonsmooth equation methods, smoothing methods, and interior point methods have been proposed. In almost
all these methods, one usually tries to apply techniques in traditional nonlinear programming or systems of smooth equations to the reformulated problem considered.
Dierent descent methods have been developed for the NCP by solving the system
of nonsmooth equations reformulated by means of the Fischer-Burmeister functional
[18]. See for example [10, 16, 17, 19, 25, 26, 27, 37, 42, 45]. In particular, global
convergence of the damped generalized Newton method and the damped modied
Gauss-Newton method for the Fischer-Burmeister functional reformulation of the NCP
has been established in [25].
1
This work was carried out initially at The University of Melbourne and was supported by the
Australian Research Council.
1
A number of researchers have proposed and studied dierent smoothing methods.
We refer the reader to [1, 2, 3, 4, 5, 6, 7, 14, 15, 20, 21, 23, 29, 30, 31, 32, 33, 41, 43, 44]
and references therein. The main feature of smoothing methods is to reformulate
the NCP as a system of nonsmooth equations, and then to approximate this system
by a sequence of systems of smooth equations by introducing one or more parameters. Newton-type methods are applied to these smooth equations. Under certain
assumptions, these solutions of smooth systems converge to the solution of the NCP
by appropriately controlling these parameters. It seems that a great deal of eort is
usually needed to establish global convergence of smoothing methods. The introduction of parameters results in underdetermined systems of equations, which may be the
reason from our viewpoint that makes global convergence analysis complicated.
The use of smoothing methods by means of the Fischer-Burmeister functional starts
from Kanzow [29] for the linear complementarity problem. It has now become one of
the main smoothing tools to solve the NCP and related problems. In particular,
Kanzow [30] and Xu [44] have proved global as well as local superlinear convergence
of their smoothing method for the NCP with uniform P -functions respectively. Burke
and Xu [1] proved global linear convergence of their smoothing method for the linear
complementarity problem with both the P -matrix and S -matrix properties. Global
convergence and local fast convergence analysis is usually complicated because some
techniques are required in order to drive the smoothing parameter to zero. This feature
seems to be shared by the other smoothing methods mentioned in the last paragraph.
Motivated by the above points, we shall introduce a technique to approximate the
system of nonsmooth equations by a square system of smooth equations. This can
be fullled by introducing a new parameter and a new equation. The solvability of
the generalized Newton equation of this system can be guaranteed under very mild
conditions. Since the reformulated system still gives rise to a smooth merit function, it
turns out that the global convergence of the generalized Newton method can be established by following the standard analysis with some minor modications. Moreover,
the damped modied Gauss-Newton method to the smooth equations can be extended
to our system of nonsmooth equations without diculties. We would like to use the
Fischer-Burmeister functional [18] to demonstrate our new technique though it may
be adapted for other smoothing methods.
In Section 2, the NCP is reformulated as a square system of equations by introducing a parameter, an additional equation and using the Fischer-Burmeister functional.
We then study various properties which include semismoothness of the new system,
equivalence between the new system and the NCP, and dierentiability of the least
square merit function of the new system. Section 3 is devoted to study of sucient
conditions that ensure nonsingularity of generalized Newton equations, a stationary
point of the least square merit function to be a solution of the NCP, and boundedness
of the level set associated with the least square merit function, respectively. In Section 4, we propose a damped generalized Newton method for solving this new system.
Its global and local superlinear convergence can be established under mild conditions.
Numerical results are reported for a set of the test problems from the library MCPLIB.
We conclude the paper by oering some remarks in the last section.
The following notion is used throughout the paper. For the vector x; y 2 <n , xT
is the transpose of x and thus xT y is the inner product of x and y . kxk indicates the
Euclidean norm of the vector x 2 <n . For a given matrix M = (mij ) 2 <nn and the
index sets I ; J f1; . . . ; ng, MIJ denes the submatrix of M associated with the row
0
2
0
indexes in I and the column indexes in J . For a continuously dierentiable functional
f : <n ! <, its gradient at x is dened by rf (x). If the function F : <n ! <n is
continuously dierentiable at x, then let F 0 (x) denote its Jacobian at x. If F : <n !
<n is locally Lipschitz continuous at x, then @F (x) indicates its Clarke generalized
Jacobian at x [8]. The notion (A) () (B ) means that the statements (A) and (B)
are equivalent.
2 Reformulations and Equivalence
In order to reformulate the NCP (1), let us recall two basic functions. The rst one is
now known as the Fischer-Burmeister functional [18] which is dened by : < ! <
2
p
(b; c) b + c ? (b + c):
2
2
The second one, denoted by : < ! <, is a modication of or a variation of its
counterpart of in < . More precisely, : < ! < is dened by
3
3
p
3
(a; b; c) a + b + c ? (b + c):
2
2
2
Note that the function is introduced to study linear complementarity problems by
Kanzow in [29], where a is treated as a parameter rather than an independent variable.
Using these two functionals, we dene two functions associated with the NCP as
follows. For any given x 2 <n and 2 <, dene H : <n ! <n by
0 (x ; F (x)) 1
CA
..
H (x) B
@
.
1
1
(xn ; Fn(x))
and G : <n ! <n , G~ : <n ! <n by
+1
+1
0
e ? 1
BB (; x ; F (x))
G(; x) B
..
B@
.
1
1
(; xn; Fn (x))
1
!
CC
?1
e
CC ~
G(; x) ;
A
where e is the Euler constant (or the natural logrithmic base). Consequently, we may
dene two systems of equations:
H (x) = 0
(2)
and
G(; x) = 0:
(3)
Note that the rst system has been extensively studied for the NCP (See for example
[10, 16, 17, 19, 25, 26, 27, 37, 42, 45] and the references therein). If the rst equation
is removed in the second system, then it reduces to the system introduced by Kanzow
[29] for proposing smoothing or continuation methods to solve the LCP. Thereafter,
this smoothing technique has been used for solving other related problems (See for
example [1, 15, 20, 23, 29, 30, 31, 44]).
The novelty of this paper is to introduce the rst equation, which makes (3) a
square system. As it will be seen later, this new feature will overcome some diculties
3
encountered by the generalized Newton-type methods based on the system (2), and
facilitate the analysis of global convergence, which is, from our point of view, usually
complicated in the smoothing methods. Some nice properties for the methods based
on the system (2) can be established for the similar methods based on (3). Moreover,
our analysis is much closer to the spirit of the classical Newton method than smoothing
methods. The global convergence analysis of the generalized Newton and the modied
Gauss-Newton method for the system (2) has been done in [25]. In the sequel, the
second system will be the main one to be considered despite some connections and
dierences between (2) and (3) are explored.
One may dene other functions which may play the same role as e ? 1. For
simplicity of analysis, we use this special function in the sequel. See the discussions in
Section 6 for more details on how to dene these kinds of functions.
The least squares of H and G are denoted by and , namely,
(x) 1 kH (x)k ;
2
(; x) 12 kG(; x)k :
and are usually called merit functions.
The denitions of the functions H and G heavily depend on the functional and
respectively. Certainly, the study of some fundamental properties of and will
help to get more insights into the functions H and G. Let E : <n ! <n be locally
Lipschitz continuous at x 2 <n . Then the Clarke generalized Jacobian @E (x) of E at
x is well-dened and can be characterized by the convex hull of the following set
f lim
E 0(xk )j E is dierentiable at xk 2 <ng:
k
2
2
x
!x
@E (x) is a nonempty, convex and compact set for any xed x [8]. E is said to be
semismooth at x 2 <n if it is directionally dierentiable at x, i.e., E 0(x; d) exists for
any d 2 <n , and if
V d ? E 0(x; d) = o(kdk)
for any d ! 0 and V 2 @E (x + d). E is said to be strongly semismooth at x if it is
semismooth at x and
V d ? E 0(x; d) = O(kdk ):
2
See [39, 36, 19] for other characterizations and dierential calculus of semismoothness
and strong semismoothness.
We now present some properties of , G and . Note that similar properties for
, H and have been studied in [10, 17, 18, 22, 27, 28].
Lemma 2.1 (i) When a = 0, then (a; b; c) = 0 if and only if b 0, c 0 and
bc = 0.
(ii) is locally Lipschitz, directionally dierentiable and strongly semismooth on
< . Furthermore, if a + b + c > 0, then is continuously dierentiable at
(a; b; c) 2 < . Namely, is continuously dierentiable except at (0; 0; 0). The
generalized Jacobian of at (0; 0; 0) is
3
2
2
2
3
80 1
9
>
>
=
<B C
@ (0; 0; 0) = O >@ A j + ( + 1) + ( + 1) 1> :
: ;
2
4
2
2
(iii)
is smooth on < . The gradient of
2
3
2
at (a; b; c) 2 < is
3
r (a; b; c) = 2 (a; b; c)@ (a; b; c):
2
(iv) @b (a; b; c)@c (a; b; c) 0 for any (a; b; c) 2 < . If (0; b; c) 6= 0, then @b (0; b; c)@c (0; b; c) >
0.
(v) (0; b; c) = 0 () @b (0; b; c) = 0 () @c (0; b; c) = 0 () @b (0; b; c) = @c (0; b; c) = 0 .
3
2
2
2
2
2
Proof. (i) Note that
p (0; b; c) = (b; c). The result can be veried easily. T
a + b + c is the Euclidean norm of the vector (a; b; c) . Then
(ii)
Note
that
p
a + b + c is locally Lipschitz, directionally dierentiable and strongly semismooth
on < . ?(b + c) is continuously dierentiable on < , hence locally Lipschitz, directionally dierentiable and strongly semismooth on < . Fischer [19] has proved that the
2
2
2
2
2
2
3
3
3
composition of strongly semismooth functions is still strongly semismooth. Therefore,
is locally Lipschitz,
p directionally dierentiable and strongly semismooth on < . If
a + b + c > 0, a + b + c is continuously dierentiable at (a; b; c), and so is .
Let d 2 < and d 6= 0. Then is continuously dierentiable at td for any t > 0. And
3
2
2
2
2
2
2
3
r (td) = ( q d
;q d
? 1; q d
? 1)T :
d +d +d d +d +d
d +d +d
1
2
1
2
2
2
2
3
2
1
3
2
2
2
3
2
1
2
2
2
3
For simplicity, let r (td) be denoted by (; ; )T . Clearly,
+ ( + 1) + ( + 1) = 1:
2
2
2
Let t tend to zero. By the semicontinuity property of the Clarke Jacobian, we obtain
that
(; ; ) 2 @ (0; 0; 0):
It follows from the convexity of the generalized Jacobian that
O @ (0; 0; 0):
On the other hand, for any (a; b; c) =
6 0,
(ra (a; b; c)) + (rb (a; b; c) + 1) + (rc (a; b; c) + 1) = 1:
2
2
2
By the denition of the Clarke generalized Jacobian, one may conclude that
@ (0; 0; 0) O:
This shows that @ (0; 0; 0) = O.
(iii) Since is smooth everywhere on < except at (0; 0; 0), (0; 0; 0) is the only point
at which is possibly not smooth. But it is easy to prove that is also smooth at
(0; 0; 0). Therefore, is smooth on < . Furthermore,
3
2
2
2
3
r (a; b; c) = 2 (a; b; c)@ (a; b; c):
Note that 2 (0; 0; 0)@ (0; 0; 0) = f0g is singleton though @ (0; 0; 0) = fOg is a set.
(iv) By (ii), for any (a; b; c) 2 < and any (; ; )T 2 @ (a; b; c), we have
+ ( + 1) + ( + 1) 1:
2
3
2
2
2
5
This shows that 0. Suppose (0; b; c) 6= 0. Then it holds that either minfb; cg < 0
or bc 6= 0. In both cases, (ii) implies that 6= 0 and 6= 0. Consequently, > 0.
(v) Clearly, if (0; b; c) = 0, then (iii) implies all the other results. If either
@b (0; b; c) = 0 or @c (0; b; c) = 0, then we must have (0; b; c) = 0. If this is not
so, (iv) implies that @b (0; b; c)@c (0; b; c) > 0, which is a contradiction. The proof is
complete.
2
2
2
2
2
Proposition 2.1 (i) If (; x) is a solution of (3), then = 0. And x is a solution
of the NCP if and only if (0; x) is a solution of (3), i.e. G(; x) = 0.
(ii) G is continuously dierentiable at (; x) when 6= 0 and F is continuously
dierentiable at x. G is semismooth on <n if F is continuously dierentiable
on <n , and G is strongly semismooth on <n if F 0 (x) is Lipschtiz continuous
on <n . If V 2 @G(; x), then V is of the following format,
+1
+1
V = eC DF 0(x0) + E
!
where C 2 <n , and both D and E are diagonal matrices in <nn satisfying
;
Ci = q
+ xi + (Fi (x))
xi
Dii = q
? 1;
+ xi + (Fi (x))
Fi (x)
? 1;
Eii = q
+ xi + (Fi (x))
2
2
2
2
2
2
2
2
2
if + xi + Fi (x) > 0, and
2
2
2
Ci = i ;
Dii = i ;
Eii = i;
with i + (i + 1) + (i + 1) 1 if + xi + Fi (x) = 0.
(iii) (; x) 0 for any (; x) 2 <n . And when the NCP has a solution, x is a
solution of the NCP if and only if (0; x) is a global minimizer of over <n .
(iv) is continuously dierentiable on <n . The gradient of at (; x) is
2
2
2
2
2
2
+1
+1
+1
+ C T G~ (; x)
r(; x) = V G(; x) = F 0e(x()eT D?G~1)
(; x) + E G~ (; x)
T
for any V 2 @G(; x).
(v) In (iv), for any and x
(DG~ (; x))i(E G~ (; x))i 0; 1 i n
If G~ i (0; x) 6= 0, then
(DG~ (0; x))i(E G~ (0; x))i > 0:
6
!
(vi) The following four statements are equivalent.
G~ (0; x))i = 0;
(DG~ (0; x))i = 0;
(E G~ (0; x))i = 0;
(DG~ (0; x))i = (E G~ (0; x))i = 0:
Proof. (i) If G(; x) = 0, then e ? 1 = 0, i.e., = 0. The rest follows from (i) of
Lemma 2.1.
(ii) When 6= 0 and F is continuously dierentiable at x, (; xi; Fi(x)) is continuously dierentiable at (; x) for 1 i n. Hence G(; x) is dierentiable at (; x).
Note that the composition of any two semismooth functions or strongly semismooth
functions is semismooth or strongly semismooth (See [19]). Since is strongly semismooth on < by (ii) of Lemma 2.1, semismoothness or strong semismoothness of G
follows respectively if F is smooth at x or if F 0 is Lipschitz continuous at x. The
form of an element V in @G(; x) follows from the Chain Rule Theorem (Theorem
2.3.9 of [8]) and the generalized Jacobian form of in (ii) of Lemma 2.1. It should be
pointed out that unlike @ , we only manage to give an outer estimation of @G(; x).
Nevertheless, this outer estimation will be enough for the following analysis.
(iii) Trivially, (; x) 0 for any (; x). If x is a solution of the NCP, (i) shows
that G(0; x) = 0, i.e., (0; x) is a global minimizer of . Conversely, if the NCP has a
solution, then the global minimum of is zero. If in addition, (0; x) is also a global
minimizer of , then (0; x) = 0 and G(0; x) = 0. The desired result follows from (i)
again.
(iv) can be rewritten as follows:
3
n
X
(; x) = 21 (e ? 1) + 21 k (; xi; Fi(x))k :
i
2
2
=1
The smoothness of over <n follows from the smoothness of F and . The form
of r follows from the Chain Rule Theorem and the smoothness of .
(v) and (vi) The proof is analogous to that of (vi) and (v) of Lemma 2.1. It is
omitted.
2
+1
2
Remark. Let W denote the set of all elements DF 0(x) + E such that there exists a
vector C which makes the following matrix
1
0
C DF 0 (x) + E
!
an element of @G(0; x). On the one hand, any element of @G(0; x) is very much like
the element of @H (x), and @H (x) W . Because of this similarity, some standard
analysis on @H (x) can be extended to @G(0; x) as we shall see in the next section. On
the other hand, we must be aware that @H (x) and W are not the same in general. See
[8] for more details. Therefore, some extra care needs to be taken if we say that some
techniques on @H can be extended to W or @G(0; x).
The results below reveal that , G~ and reduce to , H and when = 0.
Further relationships between them can be explored. But we do not proceed here.
Lemma 2.2 (i) (0; b; c) = (b; c) for any b; c 2 <.
7
(ii) G~ (0; x) = H (x) for any x 2 <n .
(iii) (0; x) = (x) for any x 2 <n .
3 Basic Properties
In this section, some basic properties of the functions G and are investigated. These
properties include nonsingularity of the generalized Jacobian of G, sucient conditions
for a stationary point of to be a solution of the NCP, and the boundedness of the
level set of the merit function .
In the context of the nonlinear complementarity, the notions of monotone matrices,
monotone functions and other related concepts play important roles. We review some
of them in the following.
A matrix M 2 <nn is called a P -matrix (P -matrix) if each of its principal minors
is positive (nonnegative). A function F : <n ! <n is said to be a P -function over the
open set S <n if for any x; y 2 S with x 6= y , there exists i such that xi 6= yi and
0
0
(xi ? yi )(Fi (x) ? Fi (y )) 0:
F is a uniform P -function over S if there exists a positive constant such that for any
x; y 2 S
(x ? y )(F (x) ? Fi (y )) kx ? y k :
max
in i i i
Obviously, a P -matrix must be a P -matrix, and a uniform P -function must be a P function. It is well known that the Jacobian of a P -function is always a P -matrix
and the Jacobian of a uniform P -function is a P -matrix (See [9, 34]).
The following characterization on a P -matrix can be found in Theorem 3.4.2 of
2
1
0
0
0
0
0
[9].
Lemma 3.1 A matrix M 2 <nn is a P -matrix if and only if for every nonzero x
there exists an index i (i i n) such that xi 6= 0 and xi (Mxi ) 0.
0
To guarantee nonsingularity of the generalized Jacobian of G at a solution of (3),
R-regularity introduced by Robinson [40] will be proved to be one of the sucient
conditions. Suppose x is a solution of the NCP (1). Dene three index sets
I := f1 i n j xi > 0 = Fi(x)g;
J := f1 i n j xi = 0 = Fi(x)g;
K := f1 i n j xi = 0 < Fi (x)g:
The NCP is said to be R-regular at x if the submatrix F 0 (x)II of F 0 (x) is nonsingular
and the Schur-complement
F 0(x)J J ? F 0 (x)J I F 0 (x)?II F 0 (x )IJ
1
is a P -matrix.
Proposition 3.1 (i) If 6= 0 and F 0(x) is a P -matrix, then V is nonsingular for
any V 2 @G(; x).
(ii) If F 0 (x) is a P -matrix, then V is nonsingular for any V 2 @G(; x).
0
8
(iii) If = 0 and the NCP is R-regular at x , then V is nonsingular for any V 2
@G(0; x).
Proof. From the denition of the generalized Jacobian of G(; x), it follows that for
any V 2 @G(; x), V is nonsingular if and only if the following submatrix of V is
nonsingular;
DF 0 (x) + E:
(i) If 6= 0, then both ?D and ?E are positive denite diagonal matrices. The
nonsingularity of DF 0 (x) + E is equivalent to the nonsingularity of the matrix F 0 (x) +
D? E with D? E a positive denite diagonal matrix. It follows that F 0 (x) + D? E
is a P -matrix hence nonsingular if F 0 (x) is a P -matrix.
(ii) If F 0 (x) is a P -matrix, as remarked after Proposition 2.1, the technique to prove
nonsingularity of the matrix DF 0 (x) + E is quite standard. We omit the detail here
and refer the reader to [27] for a proof.
(iii) If = 0 and the NCP is R-regular at x, the techniques to prove nonsingularity
of DF 0 (x) + E are also standard. See for example [17]. Therefore, nonsingularity of
@G at (0; x) follows from nonsingularity of DF 0(x) + E .
2
1
1
1
0
The next result provides a sucient condition so that a stationary point of the
least square merit function implies a solution of the NCP.
Proposition 3.2 If (; x) is a stationary point of and F 0(x) is a P -matrix, then
0
= 0 and x is a solution of the NCP.
Proof. Suppose (; x) is a stationary point of , i.e., r(; x) = 0. By Lemma
2.1, r(; x) = V T G(; x) = 0 for any V 2 @G(; x). We now prove that = 0.
Otherwise, assume =
6 0. Then V is nonsingular by Proposition 3.1. This shows that
G(; x) = 0, which implies = 0. This is a contradiction. Therefore, = 0. In this
case, V T G(0; x) = 0 implies that
(F 0 (x))T DG~ (0; x) + E G~ (0; x) = 0;
and
DiiG~(; x)(F 0(x)T DG~(; x))i + DiiG~ i(; x)EiiG~ i(; x) = 0:
Suppose G~ i (0; x) 6= 0 for some index i. By (v) and (vi) of Proposition 2.1,
DiiG~ (0; x)(F 0(x)T DG~ (0; x))i < 0;
for any index i such that G~ i (0; x) 6= 0. By Lemma 3.1, F 0 (x)T and F 0 (x) are not
P -matrices. This is a contradiction. Therefore,
G~(0; x) = 0;
0
which, together with = 0 shows that G(; x) = 0. The desired result follows from
(i) of Proposition 2.1.
2
Lemma 3.2 If F is a uniform P -function on <n and fxk g is an unbounded sequence,
then there exists i (1 i n) such that both the sequences fxki g and fFi (xk )g are
unbounded.
9
Proof. See the proof of Proposition 4.2 of Jiang and Qi [27].
2
Lemma 3.3 Suppose that f(ak ; bk; ck)g is a sequence such that fak g is bounded, fbk g
and fck g are unbounded. Then f (ak ; bk ; ck )g is unbounded.
Proof. Without loss of generality, we may assume that bk ! 1 and ck ! 1 as k
tends to innity. By the denition of , it is clear that k ! +1 if either bk or ck
tends to ?1. Now assume that bk ! +1 and ck ! +1. Then for suciently large
k, it follows that
?(ak ) + 2bkck
(ak ) + (bk ) + (ck ) + bk + ck
k
k k
k k
= ?q(a ) + 2 maxfb ; c g minfb ; c g
(ak ) + (bk ) + (ck ) + bk + ck
k
k k
k k
q ?k(a ) + 2 maxfkb ;kc g minfb ; c gk k
?(a ) + 2(maxfb ; c g) + 2 maxfb ; c g
Hence, it follows from the boundedness of fak g that k is unbounded. This completes
the proof.
2
Proposition 3.3 If F is a uniform P -function on <n and k is bounded, then the set
j (ak; bk ; ck)j = q
2
2
2
2
2
2
2
2
2
2
of the level sets
2
k ( ) f(k ; x) : (k ; x) g
is bounded for any 0.
Proof. Assume that k () is unbounded. Then there exists a sequence fk ; xkg
which is unbounded such that (k ; xk ) . This implies that fxk g is unbounded by
the boundedness of k . By Lemma 3.2, there exists an index i such that both xki and
Fi (xk ) are unbounded. Lemma 3.3 shows that (k ; xki ; Fi (xk )) is unbounded. Clearly,
we obtain that (k ; xk ) is unbounded. This is a contradiction. Therefore, k ( ) is
bounded for any 0.
2
4 A Damped Generalized Newton Method and Convergence
In this section, we develop a generalized Newton method for the system (3). The
method contains two main steps. The rst one is to dene a search direction, which
we call the Newton step, by solving the following so-called generalized Newton equation
V d = ?G(; x):
(4)
where V 2 @G(; x). The generalized Newton equation can be rewritten as follows
ed
= ?(e ? 1);
Cd + (DF 0 (x) + E )dx = ?G~ (; x);
where G~ (; x) is dened as in Section 2. The second main step is to do a line search
along the generalized Newton step to decrease the merit function . The full description of our method is stated as follows. For simplicity, let z = (; x), z = ( ; x )
and z k = (k ; xk ). Similarly, dk = (dk ; dxk ), etc.
Algorithm 1 (Damped generalized Newton method)
+
10
+
+
Step 1 (Initialization) Choose an initial starting point z = ( ; x ) 2 <n such
that > 0, two scalars ; 2 (0; 1), and let k := 0.
Step 2 (Search direction) Choose Vk 2 @G(zk ) and solve the generalized Newton
0
0
0
+1
0
equation (4) with = k , z = z k and V = Vk . Let dk be a solution of this
equation. If d = 0 is a solution of the generalized Newton equation, the algorithm
terminates. Otherwise, go to Step 3.
Step 3 (Line search) Let k = ik where ik is the smallest nonnegative integer i
such that
(z k + ()idk ) ? (z k ) ()ir(z k )T dk :
Step 4 (Update) Let zk := zk + k dk and k := k + 1. Go to Step 2.
+1
The above generalized Newton method reduces to the classical damped Newton
method if G is smooth. See Dennis and Schnabel [11]. A similar algorithm for solving
the system (2) is proposed in [25]. It has been recognized for a long time that nonmonotone line search strategies are superior to the monotone line search strategy from
a numerical point of view. As shall be seen later, we shall implement a non-monotone
line search in our numerical experiments. In a non-monotone version of the damped
generalized Newton method, (z k ) on the left-hand side of the inequality in Step 3 is
replaced by
maxf(z k ); (z k? ); . . . ; (z k?l )g;
where l is a positive integer number. When l = 0, the non-monotone line search
coincides with the monotone line search.
1
Lemma 4.1 If G(z) 6= 0 and the generalized Newton equation (4) is solvable at z, then
its solution d is a descent direction of the merit function at z , that is G0(z )T d < 0.
Furthermore, the line search step is well-dened at z .
Proof. It trivially follows from the dierentiability of and the generalized Newton
2
equation.
Since is continuously dierentiable on <n , it is easy to see that Algorithm 1
is well-dened provided that the generalized Newton direction is well-dened at each
step. In Step 2, the existence of the search direction depends on the solvability of the
generalized Newton equation. From Proposition 3.1, the generalized Newton equation
is solvable if rF (x) is a P -matrix and 6= 0.
We repeat that the main dierence between (2) and (3) is that (3) has one more
variable and one more equation than (2). This additional variable must be driven
to zero in order to obtain a solution of (3) or a solution of the NCP from Algorithm
1. So we next present a result on and d.
+1
0
Lemma 4.2 When > 0, then d 2 (?; 0). Moreover,
+ td 2 (0; ) if > 0:
for any t 2 (0; 1].
11
Proof. By the the rst equation of the generalized Newton equation (4) and the Taylor
series, we have
d = ? e e? 1
P1 1 n
i
= ? P i1!
1 n
i i!
P
1 n
1
i (i + 1)! = ? P1 n ;
=1
=0
=0
1
i=0 i!
which implies that d 2 (?; 0) when > 0. It is easy to see that + td 2 (0; ) for
any t 2 (0; 1].
2
Simply speaking, the above result says that after each step, the variable will
be closer to zero than the previous value. Namely, is driven to zero automatically.
However, is always positive. This implies two important observations. Firstly, G
is continuously dierentiable at z k = (k ; xk ), which is nice. Secondly, the solvability
of the generalized Newton equation becomes more achievable in the case 6= 0 than
= 0; see Proposition 3.1.
Theorem 4.1 Suppose the generalized Newton equation in Step 2 is solvable for each
k. Assume that z = ( ; x) is an accumulation point of fz k g generated by the damped
generalized Newton method. Then the following statements hold:
(i) x is a solution of the NCP if fdk g is bounded.
(ii) x is a solution of the NCP and fz k g converges to z superlinearly if @G(z ) is
nonsingular and 2 (0; ). The convergence rate is quadratic if F 0 is Lipschitz
continuous on <n .
1
2
Proof. The proof is similar to that of Theorem 4.1 in [25] where the damped generalized Newton method is applied to the system (2). We omit the details.
2
Corollary 4.1 Suppose F is a P -function on <n and 2 (0; ). Then Algorithm 1
is well-dened. Assume z = ( ; x) is an accumulation point of fz k g and @G(z )
1
0
2
is nonsingular or F 0 (x) is a P -matrix. Then = 0, x is a solution of the NCP,
and z k converges to (0; x) superlinearly. If F 0 is Lipschitz continuous on <n , then the
convergence rate is quadratic.
Proof. By Lemma 4.2, k > 0 for any k. Since F is a P -function, it follows from
0
Proposition 3.1 that @G(k ; xk ) is nonsingular, which implies that the generalized
Newton equation is solvable for any k. The result follows from Theorem 4.1.
2
Corollary 4.2 Suppose F is a uniform P -function on <n and 2 (0; ). Then Algorithm 1 is well-dened, fz k g is bounded and z k converges to z = (0; x) superlinearly
1
2
with x the unique solution of the NCP, and the convergence rate is quadratic if F 0 is
Lipschitz continuous on <n .
12
Proof. The results follow from Proposition 3.3 and Corollary 4.1.
2
Reamrk. One point worthy mentioning is about the calculation of the generalized
Jacobian of G(; x) since we only managed to give an outer estimation of @G(; x) in
Proposition 2.1. However, we never have to worry about this in Algorithm 1. The
reason is that the parameter k is never equal to zero for any k. This implies that
G is actually smooth at (k ; xk ) for any k. Therefore, the generalized Jacobian of G
reduces to the Jacobian of G which is singleton and easy to calculate.
5 Numerical Results
In this section, we present some numerical experiments for Algorithm 1 in Section 4
with a non-monotone line search strategy. We chose l = 3 for k 4 and l = k ? 1
for k = 2; 3, where k is the iteration index. We also made the following change in
our implementation: k is replaced by 10? when k < 10? because our experience
showed that numerical diculties occur sometimes if k is too close to zero.
Algorithm 1 was implemented in MATLAB and run on a Sun SPARC workstation.
The following parameters were used for all the test problems: = 10:0, = 10? ,
= 0:5. The default initial starting point was used for each test problem in the library
MCPLIB [12, 13]. The algorithm is terminated when one of the following criteria is
satised: (i) The iteration number reaches to 500; (ii) The line search step is less than
10? ; (iii) The minimum of
6
6
0
4
10
k min(F (xk ); xk )k1
and
kr(zk )k
2
is less than or equal to 10? .
We tested the nonlinear and linear complementarity problems from the library
MCPLIB [12, 13]. The numerical results are summarized in Tables 1, where Dim
denotes the number of variables in the problem, Iter the number of iterations, which
is also equal to the number of Jacobian evaluations for the function F , NF the number
of function evaluations for the function F , and " the nal value of k min(F (x); x)k1
at the found solution x .
6
The algorithm failed to solve bishop, colvdual, powell and shubik initially.
Therefore, we perturbed the Jacobian matrices for these problems by adding I to
F 0 (xk ), where > 0 is a small constant and I is an identity matrix. We used = 10?
for bishop, powell and shubik, and = 10? for colvdual. Our code failed to solve
tinloi within 500 iterations whether Jacobian perturbation is used or not. However,
our experiment showed that it did not make any meaningful progress from the 33-rd
iteration to the 500-th iteration. In fact, " = 2:07? in the both iterations and " is
very close to 10? that was used for termination.
All other problems have been solved successfully. One may see that most problems
were solved in small number of iterations. One important observation is that the
number of function evaluations is very close to the number of iterations for most of
the test problems. This implies that full Newton steps are taken most times and
superlinear convergence follows.
5
2
6
6
13
Problem
bertsekas
billups
bishop
colvdual
colvnlp
cycle
degen
explcp
hanskoop
jel
josephy
kojshin
mathinum
mathisum
nash
pgvon106
powell
scarfanum
scarfasum
scarfbsum
shubik
simple-red
sppe
tinloi
tobin
+
Dim
15
1
1645
20
15
1
2
16
14
6
4
4
3
3
10
106
16
13
14
40
45
13
27
146
42
Iter
16
14
83
17
16
14
14
14
22
14
14
15
22
15
14
39
15
19
21
22
169
14
14
32 (500)
14
NF
17
15
176
18
17
15
15
15
33
15
15
16
23
16
15
71
23
20
23
32
1093
15
15
118 (14540)
15
"
1.08e-07
6.09e-07
4.87e-07
2.40e-07
1.25e-08
1.23e-11
1.21e-10
1.75e-10
7.03e-08
7.20e-11
1.32e-10
8.59e-07
6.45e-07
4.86e-07
6.68e-11
3.44e07
2.22e-09
1.29e-08
1.83e-08
4.12e-08
7.45e-07
2.27e-08
1.57e-10
2.07e-06
1.18e-10
Table 1: Numerical results for the problems from MCPLIB
6 Concluding Remarks
By introducing another variable and an additional equation, we have reformulated the
NCP as a square system of nonsmooth equations. It has been proved that this reformulation shares some desirable properties of both nonsmooth equations reformulations
and smoothing techniques. The semismoothness of the equation and the smoothness
of its least square merit function enable us to propose the damped generalized Newton method, and to prove global as well as local superlinear convergence under mild
conditions. Encouraging numerical results have been reported.
The main feature in the proposed methods is the introduction of the additional
equation
e ? 1 = 0:
As we have seen, fk g is a monotonically decreasing positive sequence if > 0. This
property ensures the following important consequences: (i) the reformulated system
is smooth at each iteration, which might not be so important for our methods since
the system is semismooth everywhere; (ii) the linearized system has a unique solution
0
14
at any iteration k under mild conditions such as P -property; (iii) the fact that k
must be driven to zero is usually satised in order to ensure right convergence (i.e.,
the accumulation point should be the solution of the equation or a stationary point of
the least square merit function).
One may nd other functions which may play a similar role. For example, e + ?
1 = 0 might be an alternative. In general, the equation e ? 1 = 0 can be replaced by
the equation () = 0, where satises the following conditions:
(i) : < ! < is continuously dierentiable with 0() > 0 for any (ii) () = 0 implies that = 0
0
(iii) d = ? 2 (?; 0) for any > 0.
(
)
0(
)
Some comments on the requirements imposed on the function are explained as
follows. The condition (i) is to ensure that is smooth and that d is well-dened.
The condition (ii) guarantees that G(; x) = 0 implies that = 0 and x is a solution
of the NCP and a stationary point of the merit function is a solution of the NCP
under some mild conditions; see Propositions 2.1 and 3.2. The condition (iii) implies
that 0 < + td < for any t 2 (0; 1], which is required in Armijo line search of
Algorithm 1, and which also ensures that always remains positive and in a bounded
set.
In [38], Qi, Sun and Zhou also treated smoothing parameters as independent variables in their smoothing methods. In their algorithm, these smoothing parameters
are updated according to both the line search rule and the quality of the approximate
solution of the problem considered. See the mentioned paper for more details. As has
been seen in Algorithm, our smoothing parameter is updated by the line search rule.
The techniques introduced in this paper seem to be applicable for variational inequality, mathematical programs with equilibrium constraints, semi-denite mathematical programs and related problems. The technique of introducing an additional
equation may be useful in other methods to solve the NCP and related problems as
far as parameters are needed to be introduced. In an early version [24] of this paper,
a damped modied Gauss-Newton method and another damped generalized Newton
method based on a modied functional of were proposed, and global as well as local fast convergence results were established. The interested reader is referred to the
report [24] for more details.
Acknowledgements. The author is grateful to Dr. Danny Ralph for his numerous
motivative discussions and many constructive suggestions and comments, and to Dr.
Steven Dirkse for providing him the test problems and an MATLAB interface to access
these problems. I am also thankful to anonymous referees and Professor Liqun Qi for
their valuable comments.
References
[1] J. Burke and S. Xu, The global linear convergence of a non-interior path-following
algorithm for linear complementarity problem, Mathematics of Operations Research 23 (1998) 719-735.
15
[2] B. Chen and X. Chen, A global and local superlinear continuation-smoothing
method for P + R and monotone NCP, SIAM Journal on Optimization 9 (1999)
624-645.
[3] B. Chen and P.T. Harker, A continuation method for monotone variational inequalities, Mathematical Programming (Series A) 69 (1995) 237-254.
[4] B. Chen and P.T. Harker, Smooth approximations to nonlinear complementarity
problems, SIAM Journal on Optimization 7 (1997) 403-420.
[5] C. Chen and O.L. Mangasarian, Smoothing methods for convex inequalities and
linear complementarity problems, Mathematical Programming 71 (1995) 51-69.
[6] C. Chen and O.L. Mangasarian, A class of smoothing functions for nonlinear and
mixed complementarity problems, Computational Optimization and Applications
5 (1996) 97-138.
[7] X. Chen, L. Qi and D. Sun, Global and superlinear convergence of the smoothing Newton methods and its application to general box constrained variational
inequalities, Mathematics of Computation 67 (1998) 519-540.
[8] F.H. Clarke, Optimization and Nonsmooth Analysis, Wiley, New York, 1983.
[9] R.W. Cottle, J.-S. Pang and R.E. Stone, The Linear Complementarity Problems,
Academic Press, New York, 1992.
[10] T. De Luca, F. Facchinei and C. Kanzow, A semismooth equation approach to
the solution of nonlinear complementarity problems, Mathematical Programming
75 (1996) 407-439.
[11] J.E. Dennis and R.B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equation, Prentice Hall, Englewood Clis, New Jersey, 1983.
[12] S.P. Dirkse, MCPLIB and MATLAB interface MPECLIB and MCPLIB models,
http://www.gams.com/mpec, 2001.
[13] S.P. Dirkse and M.C. Ferris, MCPLIB: A collection of nonlinear mixed complementarity problems, Optimization Methods and Software 5 (1995) 407-439.
[14] J. Eckstein and M. Ferris, Smooth methods of multipliers for complementarity
problems, Mathematical Programming 86 (1999) 65-90.
[15] F. Facchinei, H. Jiang and L. Qi, A smoothing method for mathematical programs
with equilibrium constraints, Mathematical Programming 85 (1999) 81-106.
[16] F. Facchinei and C. Kanzow, A nonsmooth inexact Newton method for the solution of large-scale nonlinear complementarity problems, Mathematical Programming (Series B) 17 (1997) 493-512.
[17] F. Facchinei and J. Soares, A new merit function for nonlinear complementarity
problems and a related algorithm, SIAM Journal on Optimization 7 (1997) 225247.
0
0
16
[18] A. Fischer, A special Newton-type optimization method, Optimization 24 (1992)
269-284.
[19] A. Fischer, Solution of monotone complementarity problems with locally Lipschitzian functions, Mathematical Programming 76 (1997) 513-532.
[20] M. Fukushima, Z.-Q. Luo and J.-S. Pang, A globally convergent sequential
quadratic programming algorithm for mathematical programs with linear complementarity constraints, Computational Optimization and Applications 10 (1998)
5-34.
[21] S.A. Gabriel and J.J. More, Smoothing of mixed complementarity problems, in
Complementarity and Variational Problems, Michael C. Ferris and Jong-Shi Pang,
eds., SIAM Publications, 1997, pp. 105-116.
[22] C. Geiger and C. Kanzow, On the resolution of monotone complementarity problems, Computational Optimization and Applications 5 (1996) 155-173.
[23] K. Hotta and A. Yoshise, Global convergence of a class of non-interior-point algorithms using Chen-Harker-Kanzow functions for nonlinear complementarity problems, Mathematical Programming 86 (1999) 105-133.
[24] H. Jiang, Smoothed Fischer-Burmeister Equation Methods for the complementarity problem, Manuscript, Department of Mathematics, The University of Melbourne, June 1997.
[25] H. Jiang, Global convergence analysis of the generalized Newton and GaussNewton methods for the Fischer-Burmeister equation for the complementarity
problem, Mathematics of Operations Research 24 (1999) 529-543.
[26] H. Jiang, M. Fukushima, L. Qi and D. Sun, A trust region method for solving
generalized complementarity problems, SIAM Journal on Optimization 8 (1998)
140-157.
[27] H. Jiang and L. Qi, A new nonsmooth equations approach to nonlinear complementarity problems, SIAM Journal on Control and Optimization 35 (1997)
178-193.
[28] C. Kanzow. An unconstrained optimization technique for large-scale linearly constrained convex minimization problems, Computing 53 (1994) 101-117.
[29] C. Kanzow, Some noninterior continuation methods for linear complementarity
problems, SIAM Journal on Matrix Analysis and Applications 17 (1996) 851-868.
[30] C. Kanzow, A new approach to continuation methods for complementarity problems with uniform P-functions, Operations Research Letters 20 (1997) 85-92.
[31] C. Kanzow and H. Jiang, A continuation method for (strongly) monotone variational inequalities, Mathematical Programming 81 (1998) 103-125.
[32] M. Kojima, N. Megiddo and S. Mizuno, A general framework of continuation
methods for complementarity problems, Mathematics of Operations Research 18
(1993) 945-963.
17
[33] M. Kojima, S. Mizuno and T. Noma, Limiting behaviour of trajectories generated
by a continuation method for monotone complementarity problems, Mathematics
of Operations Research 15 (1990) 662-675.
[34] J.J. More and W.C. Rheinboldt, On P ? and S ?functions and related classes of
n-dimensional nonlinear mappings, Linear Algebra and its Applications 6 (1973)
45-68.
[35] J.-S. Pang, Complementarity problems, in: R. Horst and P. Pardalos, eds., Handbook of Global Optimization, Kluwer Academic Publishers, Boston, 1994, pp. 271338.
[36] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equations,
Mathematics of Operations Research 18 (1993) 227-244.
[37] L. Qi, Regular pseudo-smooth NCP and BVIP functions and globally and quadratically convergent generalized Newton methods for complementarity and variational inequality problems, Mathematics of Operations Research 24 (1999) 440471.
[38] L. Qi, D. Sun and G. Zhou, A new look at smoothing Newton methods for nonlinear complementarity problems and box constrained variational inequalities, Mathematical Programming 87 (2000) 1-35.
[39] L. Qi and J. Sun, A nonsmooth version of Newton's method, Mathematical Programming 58 (1993) 353-368.
[40] S.M. Robinson, Strongly regular generalized equation, Mathematics of Operations
Research 5 (1980) 43-61.
[41] H. Sellami and S. Robinson, Implementations of a continuation method for normal
maps, Mathematical Programming (Series B) 76 (1997) 563-578.
[42] P. Tseng, Growth behavior of a class of merit functions for the nonlinear complementarity problem, Journal of Optimization Theory and Applications 89 (1996)
17-38.
[43] P. Tseng, An infeasible path-following method for monotone complementarity
problems, SIAM Journal on Optimization 7 (1997) 386-402.
[44] S. Xu, The global linear convergence of an infeasible non-interior path-following
algorithm for complementarity problems with uniform P -functions, Mathematical
Programming 87 (2000) 3, 501-517
[45] N. Yamashita and M. Fukushima, Modied Newton methods for solving a semismooth reformulation of monotone complementarity problems, Mathematical Programming 76 (1997) 469-491.
18
Attachment
Proof of Theorem 4.1:
(i) The generalized Newton direction in Step 2 is well-dened by the solvability
assumption of the generalized Newton equation. By the generalized Newton equation
and the smoothness of , we have
r(zk )T dk = G(zk)T Vkzk = ?kG(zk)k = ?2(zk ) < 0:
In view that dk 6= 0 and that d = 0 is not a solution of the generalized Newton
equation, it follows that dk is a descent direction of the merit function at xk . Therefore,
the well-denedness of the line search step (Step 3) and the algorithm follows from
dierentiability of the merit function .
Without loss of generality, we may assume that z is the limit of the subsequence fz k gk2K where K is a subsequence of f1; 2; . . .g. If fk gk2K is bounded
away from zero, using a standard argument from the decreasing property of the merit
function
after each iteration and nonnegativeness of theP merit function over <n ,
P
then k2K ?k r(z k )T dk < +1, which implies that k2K (z k ) < +1. Hence,
limk! 1;k2K (z k ) = (z ) = 0 and z is a solution of (3). On the other hand, if
fk gk2K has a subsequence converging to zero, we may pass to the subsequence and
assume that limk!1;k2K k = 0. From the line search step, we may show that for all
suciently large k 2 K
(z k + k dk ) ? (z k ) k r(z k )T dk ;
(z k + ? k dk ) ? (z k ) > ? k r(z k )T dk :
Since fdk g is bounded, by passing to the subsequence, we may assume that limk! 1;k2K dk =
d. By some algebraic manipulations and passing to the subsequence, we obtain
r(z)T d = r(z)T d;
which means that r(z )T d = 0. By the generalized Newton equation, it follows
that
G(z k )T G(z k ) + G(z k )T Vk dk = G(z k )T G(z k ) + r(z k )T dk = 0:
This shows that limk!1;k2K G(z k )T G(z k ) = G(z )T G(z ) = 0, namely, z is a solution
of (3).
(ii) Since @G(z ) is nonsingular, it follows that
k(Vk)? k c;
for some positive constant c and all suciently large k 2 K . The generalized Newton
equation implies that fdk gk2K is bounded. Therefore, (i) implies that G(z ) = 0.
We next turn to the convergence rate. From semismoothness of G at z , for any
suciently large k 2 K ,
G(zk + dk ) = G(z + z k + dk ? z ) ? G(z )
= U (z k + dk ? z ) + o(kz k + dk ? z k);
2
+1
+
1
1
+
1
where U 2 @G(z k + dk ) and
G(zk ) = G(z + z k ? z ) ? G(z )
= V (z k ? z ) + o(kz k ? z k);
19
where V 2 @G(z k ). Let V = Vk in the last equality. Then the generalized Newton
equation and uniform nonsingularity of Vk (k 2 K ) imply that
kzk + dk ? zk = o(kzk ? zk):
(5)
and kdk k = kz k ?z k+o(kz k ?z k) which implies that limk1;k2K dk = 0. Consequently,
it follows from nonsingularity of @G(z ), for any suciently large k 2 K
kG(zk )k > 0;
lim
k!1;k2K kz k ? z k
kG(zk + dk )k > 0:
lim
k!1;k2K kz k + dk ? z k
Hence, (5) shows that
kG(zk + dk )k = o(kG(zk)k):
By the generalized Newton equation and 2 (0; ), we obtain that k = 1 for all
suciently large k 2 K , i.e., the full generalized Newton step is taken. In another
word, when k is suciently large, both z k and z k + dk are in a small neighborhood of
z by (5), and the damped Newton method becomes the generalized Newton method.
Then convergence and the convergence rate follow from Theorem 3.2 of [39].
2
1
2
20