Download Optimal Stochastic Linear Systems with Exponential Performance

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetic algorithm wikipedia , lookup

Lateral computing wikipedia , lookup

Perturbation theory wikipedia , lookup

Computational electromagnetics wikipedia , lookup

Multi-objective optimization wikipedia , lookup

Knapsack problem wikipedia , lookup

Control theory wikipedia , lookup

Control system wikipedia , lookup

Travelling salesman problem wikipedia , lookup

Computational complexity theory wikipedia , lookup

Inverse problem wikipedia , lookup

Mathematical optimization wikipedia , lookup

Secretary problem wikipedia , lookup

Weber problem wikipedia , lookup

Multiple-criteria decision analysis wikipedia , lookup

Transcript
124
IEEE TRANSACTIONS OX AUTOMATIC CONTROL, VOL.
Dimitri P. Bertsekas fS'i0-1\1'71) was born
in -$thew, Greece, in 1942. He received the
Mechanical and ElectricalEngineering
Diploma from t.he Kational Technical University of Athens. Athens, Greece, in 1965,
the 3I.P.E.E. degree from George xt-ashington
University,Washington, D.C.? i n 1969, and
the Ph.D. degree in system science from the
3Iazsachusetts Institttte of Technologg, Cambridge, in 19i1.
From 1966 to 1967 he performed research
a t the Xational Technical 1-niversity of Athen.. and from 1967
to 1969he wa3 with the U.S. Army Research Laboratories, Fort
Belvoir, \-a. In the summer of 1971 he worked at S?-stems Cont.ro1,
Inc.. Palo Alto, Calif. Since 1971he h a been an Acting Xsistant
Professor in t.he Department of Engineering-Economic Systems,
StanfordUniversity,Stanford,
Calif., and has taught courses in
optimization by vector space nlethodr and nonlinear programming.
His present and past research interests include the areas of estimation
and control of uncertainsystems,
minimax problems, dynamic
programming,optimization
problems m-ith nondifferentiable cost
functionals, and nonlinear programming algorithms.
AC-18, NO. 2,
APRIL
1973
Ian B. Rhodes (hI'67) was born in Melbourne, Australia, on ?Jay
29,
1941. He
received the B.E. and 31. Eng. Sc. degrees in
electrical engineering from the University of
?\Ielbourne, JIelbourne, Australia, in 1963
and 1965. respectively, and the Ph.D. degree
in electrical engineering from Stanford
ITniversity, Stanford, Calif., in 1968.
In January1968 he was appointed Ax':
-1-tant
Professor of ElectricalEngineering
atthe
IIassachusetts Institnte of Technology, Cambridge, and taught, there until September 1970.. when he joined the
faculty of WashingtonUniversity,
St. Louis, 310.: as lssociate
Profmor of Engineering and Applied Science in the graduate program of ControlSystems Science and Engineering.His research
interests lie in mathematical system theory and ita applications.
Dr. Rhodes k a member of the Society for Industrial and Applied
11athematics and Sigma Xi. He is an hsociate Editor of the InternationalFederation of Automat.ic Control Journal =1ufomutica:an
-4ssociat.e Editor of t h e I E E E T ~ a s s a c T r o s sos .%CTOUATIC CONTROL, and Chairman of the Technical Committee on Large Systems,
Differential Games of the IEEE Control Systems Society.
Optimal Stochastic Linear Systems with Exponential
Performance Criteria and Their Relation to
Deterministic Differential Games
DAVID H. JACOBSON
Absfracf-Two stochastic optimal control problems are solved
whose performance criteria are the expected values of exponential
functions of quadratic forms. The optimal controller is linear in both
cases but dependsupon the covariance matrix of the additive process
noise so that the certainty equivalence principle does not hold. The
controllers are shown to be equivalent to those obtained by solving a
cooperative and a noncooperative quadratic (dserential) game, and
this leads to some interesting interpretations andobservations.
Finally, some stabilityproperties of the asymptoticcontrollers
are discussed.
variables. Second, this 1inea.r controller is identical to
that, xhichis obtained by neglecting the addit.ive Gaussian
noise and solving the resulta.nt deterministic linear-quadratic problem(LQP) * (certaintyequivalenceprinciple).
Thusthe controller for the stochasticsystem is independent of thestatistics of the additive noise. This is
appealing for small noise intensity.but forlarge noise
(large Covariance) one has the intuitive feeling that perhaps a different controllerwould be more appropriate.
I n this paper 11-e consider optimal control of linear sysI. IXTRODFCTION
temsdist.urbedbyadditiveGaussian
noise, \Those assoHE SO-CALLED linear-quadratic-Gaussian (LQG) ciatedperformancecriteria
are t.he expected values of
a exponential functionsof nega.t.ive-semidefiniteand positiveproblem' of optimal stochastic control [ l ]pozL <$esse8
number of interesting features. First, theopt.ima1feedback semidefinite quadratic forms. We shall refer to the former
controller is a linear (time-varying) function of the state case as the LE-G problem, and the latter as the LE+G
problem, and t.0 their deterministic counberparts as LE-P
Manuscript received February 10, 1972: revised October 20, 1972. and LE-P, respectively. In the deterministic cases LE'P,
Paper recommended by I. B. Rhodes, Chairman of the IEEE SCS the solutions are identica.1to that for the LQP (the natural
Large Systems, Differential Games Committ.ee. This work w s s u p
yields
ported by the Joint Services Electronic3 Program under Contracts logarithm of t.he exponential performance criteria
N00014-67-A-0295-0006,0003, and 0008.
quadratic forms). However, n-hen noise is present. LE*G
The author was with the Division of Engineering and Applied
Physics, Harvard University, Cambridge, , J I a s . He is non- wit.h the problems, the optimal controllers are different from that of
Department. of Applied llathematics, Un1versit.y of the Witwatersthe LQG problem. I n particular, though a5 in the cme of
rand? Johannesburg, South Africa.
This 1s a problem with linear dynamics disturbed
by additive the LQG problem. these are linea? functions of the state
T
Gausian noise, together m-ith a performance criterion which is the
expected value of a positive-semidefinite quadratic form.
This is the same as the LQG problem, hut with noise set. to zero.
EREKTIAL
125
JdCOBSOh': DETERMIXISTIC
variables, theydependexplicitly
upon the covaria.nce matrices of theadditiveGau.ssiannoise.
For small noise int.ensit,y (small covariance) the solut.ions of the LE*G and
LQGproblems are close, but forlarge noise int.ensity
t.here is a marked difference. In particula,r, as the noise
intensit.y tends t,o infinity t,he optimal gains for t.he LE-G
problem tend t.0 zero; intuitively this implies t.hat if t.he
random input is "very wild" 1itt.le can be .gained (in t.he
sense of reducing the va.lue of t.his particular performance
crit.erion) by controlling t.he system. In the LE+G
problem
the opt.imal controller ceases to exist if t.he noise int.ensity
is sufKciently large (t,hat is, the performance criterion becomes infinite, regardless of the cont,rol input,).
These new controllers, which ret,ain thesimplicity of the
solution of the LQG problem, could prove t,o be attractive
in certain applications.
I n addition to formulatingand
solving the LE*G
problems, --e demonstrate that. their solut,ions are equivalent t o the solutions of cooperative and noncooperat,ive
linear-quadrat.ic zero-sum (differential) ga.mes. These
equivalencesprovide interpretations for t.he stochastic
controllers in terms of solutioas of deterministic zero-sum
games, and srice versa. It is hoped that t.hese equivalences
will aid in t,he quest for new formulations a.nd (proofs of
existence of) solut,ions of stochastic nonlinear systems and
nonlinear differential games.
We investiga.t.e briefly the infinite-time version of the
LE*Gproblems
andpoint
out, t.hatthesteady-st.ate
optimal controller for theLE-G problem is not, necessarily stable. On the other hand, the steady-state optimal
controller for the LE,+G problem, if it exists, isstable.
Thus the LE+G formulationmaybepreferableinthe
infinite-time case.
P,
>0
k
(positive definite) ;
0,. . . ,N - 1.
=
(4)
1ot.e t,hat
&[a!:]
= 0,
&[cQc~!:']
=
k
Px-l;
=
0 , . * . , N - 1 (5)
--here & denotes expectation.
C . Perjormance Criterion
The performance of t.he stochasticlinear syst,ems is
measuredby the criterion (vit.h u = - for LE-G and
U=
for LE+G)
+
s- 1
~ " ( x oA) u&lr,
II ~,"(~l;;lc.)~,"(~~;k)~~"(~.~-~N
(6)
f: = 0
where
pzu(xk;k) = exp
{ u ~ x k T Q ~;x k } k
pU"(uk;k)= exp { u+ukTRkuk]
;
k
=
0,. . -,W
=
0,. . ,N - 1 (8)
k
=
(7)
and
Qk
2 0
(positive semidefinit,e);
R, > 0 (positive definite) ;
k
=
0,.
0,.
,N
, N - 1.
(9)
(10)
Note t,hat, (6) can be writt.en as
D . Problem
We are required t,o find a policy
~ k "
11. FORMULATIOX
OF DISCRETE-TIME
LEfG PROBLEMS
C,"(X,),
k
=
0, * - . ,AT - 1;
x,
{ ZO,Zl, . ,x,} (12)
A . D ymmics
(11).
We shall consider a. linear discrete-time dynanuc system which minimizesperformancecriterion
Yote
that
V--(zo)
and
V
+
(
x
o
)
for
arbitrary
controls {uk)
described by
are bounded as follows :
Xk+l = A $ k
B~UJ; rkok,
k = 07 . . . 1AT - 1;
-1 5 V - ( X ~5) 0,
1 5 B'(z~) 5 a . (13)
x. given, (1)
111. FORMULATION
OF LE+P
where the"state"vector
xk E R", thecontrolvector
If no noise is present,
u,E R", and the Gaussian noise input ax.
E RQ. Thematrices A,, B,, r!: have appropriate dimensions and depend
a k EO;
k = O , - - . , N - 1.
(14)
upon the time k.
Minimization of (11)is equivalentto minimization of
B. hToise
+
+
The noise input is asequence {ak)of independently
distributed Gaussian random variables having probability
subject. t.0
densit,y
x-1
p,(Wl;
. .,a.v-d
=
II
~ k + l
p(ak;k)
(2)
=
A,x,
+
Bs21.k;
X-
=
O , . . . , N - 1,
(16)
which is astandardLQP.ThusLE-Pand
LE+P are
equivalent.
and
both
will
be
referred
to
as
LEP.
As the
wherep,:R@XR " - t R + a n d p : R Q X I + ~ R + a r e g i v e n b ;
solution of the LQP is well known, we st.ate itnow without
proof.
The optima.1controller for the LEP(LQP) is
k=O
126
IEEE TRANSACTIOSS ON AUTOWSATIC CONTROL, APRIL
IT'.\-" = QN.
1973
(34)
I n addition, we have that
FA-''
=
1 (35)
and the optimal policy is
?&!/.x." =
-
c/::"x,
(36)
where
C," 4 (R,
+ BpTT-i;;+l"B,)
-1B,T,+1"&;
k
=
0 , . -.,A: - 1. (37)
I n order to prove that, (30) and (36) solve ( 2 8 ) , we need
the following probably n-ell-known but underexploit.ed
lemma.
Lenmza: If (P, - urkTw,+lurn)
> 0, then
1:
. exp u
5 (ii,x,
+ B ~ u k ) T T ~ ~ ' X . ++l uR,u,)}
( A k z P (38)
where TT',+l" is defined in (33).
Pr00.f: See the Appendix.
Substituting (30) into (28) and using the lemma and
( 3 5 ) ,we obtain
0
exp f a+zl,TJV,-"a,]= nlin up,"(~~:;k)pu(r(~,;k)
UL
. exp { u $ ( d , r ,
+ B,uk)T~~/:+1u(L41:Z,
+ B&/:)J. (39)
Equation (39) is satisfied by (32), (36): and (37), so that
solved. As in the LEP
the LE*G problemisindeed
(LQP), it is easy t o verify that, under assumptions (-it),
Alternatively, the development could be continuedusing (X),
(9), and (lo), IFJ:- and IT7/:- are positive semidefinite for
and identical resultswould be obtained.
RENTISL
k
=
O , - - . , N ;so tha.t,
(R,
+ B,T@-,+l-Bk) > 0,
which ensures that.(32),
defined for negat.ive u.
V.
A .
127
GAMES
JACOBSON: DETEFMINISTIC
(33),(35),
(a)so t,hat
PROPERTIES
OF SOLGTIOXS
OF DISCRETE-TIME
LE*G PROBLEXS
The LE-G Problem
The optimal feedback controller for the LE-G problem
is a linear f u n d o n of the syst.em state
u,-
=
-Ck-~g;
k
=
c,--+ 0 ;
and (37) are well
O , . . . , N - 1.
Cr-
+
Dk;
k
=
0 ; . .,A7 - 1,
(42)
the optimal gains forthe LQP (LEP). Not,e, from (30) and
(33), that
J-(x,;k)
-+
- exp { - + Z , ~ W ~ - Z ~ ] ,
k
=
0,- . .,A’.
(43)
Thus, for small noise intensities (Pp-l small, k = 0; . . ,
N - l ) , the solution of the LE-G problem is close to that
of the LEP,LQP, and LQG problem.
Case 2 (A,,,in(Pk-l) +.m; k = 0 ; . .,A7 - 1): Here we
shall assume that
rsTQx+Jk
> 0,
k
=
0, . * ,N - 1;
(44)
< ~0;
0 5 Pk-’
As P,
IT,+^-
4
-+
k
=
o,.. - , N -
1.
(45)
w,+’- ~ ~ k + l - r , ( r , ~ ~-lrkTwk+l-;
l~+l-r,)
0,. . . , N - 1
.=
(50)
As in t.he LE-G problem t,he cert.aint,yequivalence
principle does not hold because C k + dependsupon the
covariance of the additive process noise. We a.gain consider the t.wo limiting cases of zero noise and “infinite”
noise.
Case 1 (A,,,in(Pk) m; k = 0,. .
- 1): I n t.his case,
as t,he covariance mat,rix tends to zero, n-e see that.
-+
Ck++ Dk;
k
=
O ; . . ,N 7 1
(31)
and
J+(xk;k)4exp
{+X~~W,+Z~),
k
=
0,. ,N - 1; (52)
+
so that for small noise intensity the solution of t,he LE+G
problem is close t.0 that of the LEP,LQP, LQG problem.
Case 8 (Amin(P,-l) + m; k = 0 ; . - , N - 1): For P ,
sufficiently small (Le., largecovariance),
J+(xk;k) can
cease t.0 exist. To see this, let us assume that
rkTQk+Jr
> 0;
k
=
O,.*-,A7 - 1
(53)
and that
p j - rjTwHl+rj
>0;
j
=
k
+ 1,.. .,A7 - 1. (54)
From (53), (54), (32), and (33) we have that
rl:Twr+l+r,
> 0;
(55)
so t h a t for P , su&cientJy small
P,
0, then, we have
k
B. T h e LE+G Problem
so that, from (31)-(33),
rkTwx-+l-rk
> 0;
(49)
for which the new controller (36) offers an alternative to
the standard LQG solution.
e ,
-
0,. . . , N - 1.
=
An explanat.ion for (49) is: If all component.s of x, are
disturbed by aninfinitely wild additive noise, then t,here is
no point [as far as performance criterion (6) is concerned]
in exercising control to t.ry t o count.eract these infinite unpredictable disturbances.
Of major interest, are
t,he cases in which
(41)
The main difference between this and the feedba.ck law
for the LQG problem is that Ck- depends upon. P,;-l, the
covariancematrix of the Gaussianadditivedisturbance
0 1 ~ In
~ . the LQG case the optimalfeedbacklawis
indepemlent of the Covariance of the input noise and, indeed,
is the same as that for the deterministic LQP (so-died
cert.aint,y equivalence principle). Here, in the case where
our criterion 1s the expected value of minus, an exponent.ia1
function of anegative-semidefinite quadratic form, the
cert.ainty equivalence principle does not hold.
It is int,eresting to invest.igate two limit,ing cases: the
firstin which Amin ( P b )+ m (input ollr
0, k = 0,. .
N - 1); and t.he second in which Amin (P,-’) 4 a (input
“infinitely wild”).
Case 1 (Amin(Pk)
+ m; k = 0,. . ,N - 1):In this case
it is clear4 from (30),(32), and (33) that
k
-
rlcTw,+l+rk
> 0,
(56)
which implies that the left-hand side of (38) is infinite.
Clearly, then,
k = O,...,N - 1 (46)
J+(x,;k) is infinite.
and, from (30) and (35),
( 57)
Since k is arbitrary, k E {O,. . ,A7 - I}, we can conclude
that if the noise covariance is sufficient.ly large, the performancecriterion
P+(z0)isinfinite,regardless
of the
Sote that,if rkhas rank n for k = 0,. . .,N - 1, that
choice of controls {u.,} . We sha.11 have more to say about.
this interesting case when we t,reat the continuous-time
These limiting cases can be argued rigorously; the arguments are
LE+G problem in Section VIII.
st,raightforward and are left
t o the reader.
J-(x&)
+0;
k
=
O , . - - , N - 1.
(47)
128
IEEE TRANSSCTIOXS ON AETOMATIC COXTROL, .URIL
1973
VI. THEDISCRETE-TILIE
LE';G PROBLEJIS
4KD
DETERMIXISTIC
GASIES
A . T h e LE-G Problem
then
The solution of the LE-G problem is, by inspection (or
short,calculation),equivalent
tothe
solution of the
following cooperative deterministic game (LQP) :
1
- xkTWj;+xk
= min nmx
2
: x i ) \ai)
1 .\-- 1
[
2
+U.~~R,U.~
(xiTQixf
i=k
1.
1
2
- aiTP$aj)+ 7 xNTQh7xx
(66)
If the determinant of t,he left-hand side of (65) is nonzero
but the matrixfails to be positive definite, then, as is well
known, (63) cea.5es to be bounded. Hon-ever, if the lefthand side of (65) is singular for some values of X. E { 0,. . . ,
N - I ] then (63) may exist. Thus, provided
subject. to the dynamic constraint
I P, - rcTwlz+l+rlz/
+0;
x0 given.
(59)
It turns out t.hat
k = 0,. . .,x - 1, (6'7)
me have that J+(x,;k) is finite if and only if (63) is finite.
Ourinterpretation of the abovenoncooperative
deterministic game is as follows: If player u I : assumes that
at will not. cooperate in minimizing the quadraticcriterion
(even though ukk n o w t h a t ap behaves as a Gau:-'
wan
randomvariable),thenthefeedbackcontroller
(policy)
that is obt.ained for uk,upon solving (63), namely,
uk+ = -CJc+xk;
k
=
O , . . . , N - 1,
(68)
Kotethat
in the above formulat.ion we determine
opt,imal control l a m
is optimal for the LE+G problem. Thus this rat.her conservative game formulationin
which the noise at is
'&- = -c ,;-X,:, CY^- = - A e - ~ k ;
treated as a noncooperatiL:e player gives rise to a control
k = 0:. . . ! X - 1.(61)
policy which solves the LE+G stochastic control problem.
When looked at from this viewpoint the min-max game
We now havea
new interpretation for the lincar- solution for uk(‘karst case design") does not appear t o
quadratic game. If player u k ussumes that player a&:will be t o o pessimistic, since the performance criterion of the
cooperatein
minimizing thequadratic
criterion (ercn LE +G problem is rather appealing.
though u p knom that c y I ; behaves as a Gaussian random
variable),thenthe
feedbackcontroller (policy) that is
1'11. FORMULAT~OK
OF C O S T ~ K U O U S - T ~ ~ I E
obtained for u6,upon solving (58) and (59): namely,
LE*G PROBLENS
u?:- ==
k
--Ck-xk;
=
O,.. . , X - 1,
(62)
I n cont.inuoustime, the LEhG
problems t.ake the form
is optimal also for the LE-G problem. Thus thepolicy for
treating cyk as a cooperatit-eplayer makes
sense when interpreted as the solution of the stochastic
LE-G problem.
ut obtained by
game that. has an equivalent
+
(xJzTQtxk uBTRkuk
- akTPkak)
+
uk+
=
- c k + ~ kaL+
, =
and
1
1
+xTQxxx] (63)
CY,+are
determined
as
-&+xk;
XIt is well known that if
(6%
subject to
= AX
solution is noncooperat.ive, namely,
subject. to (59), n-here
feedback l a m (policies)
-
2
B. The LE+G Problem
Here, thedeterministic
+ 1xUr)QfWd])
=
0,. . .!X - 1. (64)
+ BU + ra;
x(toj given
(TO)
where, for notational simplicity, t,ime dependence of the
variables has been suppressed5 and where a( - ) is a Ga.ussian x-hite-noise process having
E [ a ( t )1
=
0;
t E [fo,t,l
r [ f f ( t ) f f T ( s )=
] P-Wt - s ) ;
t,s E [to$,]
(71)
(72)
n-here 6 is the Diracde1t.afunction.
Sote
that solving
in
(69) we seekoptimal
an control
policy
Kote that Q 2 0, R > 0, P > 0 for all t E [to,t,], and that Q j 2 0.
129
JACOBSON: DETERMINISTIC D I F F E R E X T U GAMES
u"(X,t) = C"(X,t),t
x
st-here c": e
arguments.
subject to
E [to,t,I;
x R 1 + R"
4 (zb) ;7
E
[to,tI]
(73)
is a measurable function of its
where we require theopt.ima1controls in feedback(policy)
form
VIII. SOLUTIOS
OF COKTIKUOUS-TIME
LE+G PROBLEM
AND RELATION
TO DIFFERESTIAL
G-kl\iEs
A . Solution of LE*G Problems
We can solve thecontinuous-time
LE*G problem
either by formally taking the limit
of t.he solut.ions for the
discrete-t,ime cases or by solving t.he "genera.lized"
Hamilton- Jacobi-Bellman equation
aJ"
- - W )
at
=
nlin
+
u(zTQz uTRu)J"(z,t)
which resu1t.s in
1
- xTS-(t)z(t) =
2
min
U(.)d.)
[ltf(xT@ + uTRu.+ aTPa)d t
Because of our assumptions of positive (sen1i)defhitmess
of Q , R , P , and Q f Jit is known that S-(t) exists for all
t E [fo,t,j so t.hat (69) is well posed.
In the case of theLE+G problem, theappropriat,e
differentia.1 game is noncooperative, namely,
subject to (83). The opt,imalfeedback laws are
and
where
1
z'(t)X+(t)z(t)= min ma,x
2
u(.) a(*)
-
is the optimalpolicy.
Using either method, we find that
[J''a (x'Qx + uTRu
- aTPa)dt
+ 21 z T ( t f ) Q f d t f ) ](88)
-
provided t,lIat
s+(tf)= Qr
(89)
has a solution in [t,t,].
Kote that, by standard result,s on Riccati differential
equations, (89) has a solution for all t E [to,tl] if
(BR-'BT
B. Relation to Continuous-Time Differential Games
By inspection we see that. the optimal controllerfor the
LE-G problem (u negative) is obtained from the solution
of the following cooperative d8erent.ialgame :
minimize
u(-)a(.)
lf'
(X'QX
+
UTRU
+a
~ ~dt a )
+ 21 z'(tr)C?f~(tr) (82)
-
-
rP-TT) 2 0 ,
t
E [to,tf];
(90)
and so (90) guarantees the existence of J+(z,t);t E [2$,tf].
If (90) is not satisfied [say for Amin(P-I)sufEcient.ly large],
then (89) may exhibit a finite esca.pe time (S(t)+ 03 for
some t E [to,tf])
, which would implythat, (86) is unbounded
and also that J+(zo;to)
is unbounded.
Ix.
SONE S T A 4 B I L I T PP R O P E R T I E S
OF UNDISTURBED
LIKEARSYSTEM
CONTROLLED
BY SOLUTIOK
OF LE*G PROBLEMS
In this section we assume that all parameters are time
invariant, andwe briefly invest,igat,estability of the system
130
I E E E TFZANS.4CTIONS ON ATjTOblATIC CONTROL, APRIL
A . Stability Properties of C,Here we assume that the pair (A,B)is controllable and
that Q > 0. These assumptions guarantee the existence of
Sm-, the unique positive-definite steady-state solution of
the Riccati equation. That is, S,- > 0 satisfies
Q
1973
asymptotic st,abilitg of (91) with controllers C,- or C,+.
In the first case, (97) is used to guarantee negativity of
L-, xhile in thesecond it is used t o guarantee existence of
S , +.
X. COKCLUSIOS
+ S,-A + ATS,-
In this paper u-e have presented explicit (modulo solution
of Riccat.i difference or differential equations) solu- S,-(BR-~BT rP-lrT)s,- = 0, (92)
tions of stochasticcontrolproblemshavinglineardynamics, additive Gaussiannoise, and exponential objective
and n-e have the steady-state feedback gain
functions.Thesesolutions
are linearfeedbackcontrol
,C=- = R-’BTS,-.
(93) policies which depend upon the covariance matrix of the
additive process noise so t.hat. the certainty equivalence
We now define
principle of LQG theory does not hold. I n certain applicaL+xTS,-x,
(94) tions these new controllers may be preferable,
especially
which is posit.ive definite. Along trajectories of (91): we perhaps in economics n-here multiplicativc objective functions are of intrinsic interest..
ha.ve
Bydemonstratingcertain
equivalencesbetweenour
i- = +xT(S,-A ilTS,-)x - xTS,-BR-‘BTS,-x, (95) stochasticcontrolformulationsanddeterministic
different.ial
games:
we
are
able
to
give
a
stochastic
interpretation
which, upon using (92), is
to min-max (worst case) design of linear s-stems. This
min-max design is not. unL = -+.x‘[Q
S,-(BR-’BT - r P - ’ T T ) S m - ] ~ .(96) suggests that the LLpes~imisti~”
at.tractive since it corresponds, in a stochastic setting, to
Kow if
minimization of the expectedvalue of an exponent.ia1
BR-lBT - rP-TT 2 0
(97) function of a quadratic form, which is quite an a.ppealing
criterion. Another significant result of t.hese equivalences
we h a r e
is that existence of solutions of the stochasticcontrol
L-<o,
Y x f O
(98) problems implies and is implied by existence of solut.ions of
the differential games. Hopefullythesenotionscanbe
and system (91), x%-ithcontroller C,-, is asymptotically
extended to provide existence results for nonlinearstostable.
chastic control problems and nonlinear differential games.
Kote that simple examples show that (91) can be m Certain stability properties of the steady-state solutions
stable if condition (97) is violated.
of the stochasticcontrolproblemare
also investigated.
In
particular,
we
point,
out
that
the
steady-state
controller
B. Stability Properties of Cm+
for t.he LE-G problem can result in an unstable dynamic
In this case we assume condition (go), namely,
system, while the steady-state controller for the LE+G
BR-’BT - rP-TT 2 0
(99) problem, if it exists, a.lways shbilizes t.he dynamic system.
In this sense, the LE+G formulation is preferable.
and also that, Q > 0. Sote that. because of (99) we can
Sote thatwe have not considered in this paper themore
write
complex problemin which noisy measurements of the
l$TAiT 4 BR-lBT - rp-lrT.
(100) state are made, viz.,
+
+
+
If we assume non- that thepair (A$’) is controllable, then
it follows tha.tthere exists aunique
positive-definite
matrix S,+ u-hich satisfies
Q
+ S,+A + ATS,+
~k
=
=
R-’BTS,+,
(102)
=
k
utu = CCU(Z,)
;
where u is - or
C,+
E
/3k;
0 , . . .! X - 1
ZI: L
=
0,. . .,x - 1,
L+ A
+XTS,fX.
(103)
k
( % , ~ 1 , * ..,zL};
=
O , . - - , M - 1.
It is casy to verify that, L i is a Lyapunov function and
(91) with controller C,+ is asymptotically stable.Kote
the interesting point that (97) is suffjcient to guarant.ee
{.[s- (xrTQkxh-k
1
Vu(.,) p
(105)
+ and where
The appropriate performance criterion is
Define
(104)
where { @a,crJ~,xo}
are independent. Gaussian random variables. I n this ca3e the optimal controls are restricted to
the form
- S,+(BR-~BT- rP-lrT)s_+
= o (101)
and
+
H,;x~
u&[zo
exp
k=O
(106)
JACOBSOX: DETERMINISTIC
131
DIFFEREXTIAL GAbIES
The above problenlappears to beintrinsicallymuch
harder than t.he perfect, state case and could be t,he t,opic
of a future paper.
-
UrkTwjz+lur,)
+B ~ U J .
( 110)
The Lernnla is proved by (109) since t,he integrand is a
proba.bilitydensityfunctionlmving
mea.n E , ~and comriance
APPENDIX
Lemma: If Pk -
E~ A
crI'TW,+lurK
> 0, then
(P, -
urkTw,+lurk)
-1.
(111)
ACKNOmLEDGlhiENT
The author wishes to thank L. Zadeh for stimulating
discussions, during the Spring
of 1971, on fuzzy set. theory,
which cont.ributed to the development of certain resu1t.s
inthispaper.
Also,crit.ica1 commentsfrom D. i l l a p e ,
L. Ho, and J. Speyer are appreciated.
REFERENCE
108)
where TTk+luis d e h e d in (33).
Proof: The left-hand sideof (108) is, using ( l ) , equal t.o
where
[ 11 3.1.At.hans, "The role and use of the etochast.ic linear-quadraticGaussian problem incontrolsystem
design,,' ZEEE Trans.
Automat. Con&. (Special Issue on Linear-&i.ta~ratic-GQussian
Problem), vol. $GIB, pp. 529-652, Dec. 1971.
David H. Jacobson (M'69) was born
in
Johannesburg, Sout,h Africa, on February 23,
1943. He received the B.Sc. degree in engineering from t.he Universit.y of the Witwatersrand,Johannesburg, South Africa, in
1963, andthePh.D.
degree andD.I.C. in
engineering from theImperial
College of
Science and Technology, University of London, London, England, in 1967.
He was a Postdoctoral Fellow in the Division of Engineering and Applied Physics,
Harvard University, Cambridge, Mass., and in 1968 was appointed
Assist.ant Professor. In July 19'71 he became Associat,e Professor of
Applied h,lathenmtics. Current,lyhe is Professor in the mathematical
field, t,he University of t,he Witwatersrand. His interests are in t,he
and applications and computing
areas of optimalcontroltheory
methods for the solut.ion of dynamic optimizat.ion problems. I n t.he
lat,ter area hehas developed and applied the technique of differential
dynamicprogramming. He is coaut.hor of the book Differential
Dynamic Programming (New York: Elsevier, 1970).
Dr. Jacobson is a graduate of the SouthAfrican Inst.it,ute of Electrical Engineers and was an Associate Editor of the IEEE TRANSACTIONS os ~UTobraTrc CONTROL
from Sept,ember 19'71 to June1972,
when he returned to SouthAfrica.