Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
A QUARTET OF SEMIGROUPS FOR MODEL SPECIFICATION, ROBUSTNESS, PRICES OF RISK, AND MODEL DETECTION Evan W. Anderson University of North Carolina Lars Peter Hansen Thomas J. Sargent New York University and Hoover Institution University of Chicago Abstract A representativeagent fears thathis model, a continuoustime Markovprocess with jump and diffusion components, is misspecified and therefore uses robust control theory to make decisions. Under the decision maker's approximatingmodel, cautious behavior puts adjustments for model misspecification into market prices for risk factors. We use a statistical theory of detection to quantifyhow much model misspecificationthe decision maker should fear, given his historical data record. A semigroup is a collection of objects connected by something like the law of iteratedexpectations. The law of iteratedexpectations defines the semigroup for a Markov process, while similar laws define other semigroups. Related semigroupsdescribe (1) an approximatingmodel; (2) a model misspecificationadjustmentto the continuationvalue in the decision maker's Bellman equation;(3) asset prices; and (4) the behavior of the model detection statistics that we use to calibratehow much robustness the decision makerprefers.Semigroups2, 3, and 4 establish a tight link between the marketprice of uncertaintyand a bound on the error in statistically discriminatingbetween an approximating and a worst case model. (JEL: COO,D51, D81, El, G12) 1. Introduction 1.1 Rational Expectationsand Model Misspecification A rational expectationseconometricianor calibratortypically attributesno concernaboutspecificationerrorto agentseven as he shuttlesamongalternative specifications.1Decision makersinside a rationalexpectationsmodel know the Acknowledgments:We thank FernandoAlvarez, Pierre-Andre Chiappori,Jose Mazoy, Eric Renault,Jose Scheinkman,GraceTsiang,andNeng Wangfor commentson earlierdraftsandNan Li for valuable researchassistance. This paper supersedesour earlier manuscriptRisk and Robustnessin Equilibrium(1998). This researchprovided the impetus for subsequentwork andWilliams(2002). HansenandSargentgratefully includingHansen,Sargent,Turmuhambetova, acknowledgesupportfromthe NationalScience Foundation. E-mailaddresses:Anderson:evan_anderson@ unc.edu;Hansen:[email protected]; Sargent: [email protected] 1. For example,see the two papersaboutspecificationerrorin rationalexpectationsmodels by Sims (1993) and Hansenand Sargent(1993). © 2003 The EuropeanEconomicAssociation E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 69 model.2 Their confidence contrasts with the attitudes of both econometricians and calibrators. Econometricians routinely use likelihood-based specification tests (information criteria or IC) to organize comparisons between models and empirical distributions. Less formally, calibrators sometimes justify their estimation procedures by saying that they regard their models as incorrect and unreliable guides to parameterselection if taken literally as likelihood functions. But the agents inside a calibrator's model do not share the model-builder's doubts about specification. By equating agents' subjective probability distributions to the objective one implied by the model, the assumption of rational expectations precludes any concerns that agents should have about the model's specification. The empirical power of the rational expectations hypothesis comes from having decision makers' beliefs be outcomes, not inputs, of the model-building enterprise. A standard argument that justifies equating objective and subjective probability distributions is that agents would eventually detect any difference between them, and would adjust their subjective distributions accordingly. This argument implicitly gives agents an infinite history of observations, a point that is formalized by the literature on convergence of myopic learning algorithms to rational expectations equilibria of games and dynamic economies.3 Specification tests leave applied econometricians in doubt because they have too few observations to discriminate among alternative models. Econometricians with finite data sets thus face a model detection problem that builders of rational expectations models let agents sidestep by endowing them with infinite histories of observations "before time zero." This paper is about models with agents whose data bases are finite, such as econometricians and calibrators. Their limited data leave agents with model specification doubts that are quantitatively similar to those of econometricians and that make them value decision rules that perform well across a set of models. In particular, agents fear misspecifications of the state transition law that are sufficiently small that they are difficult to detect because they are obscured by random shocks that impinge on the dynamical system. Agents adjust decision rules to protect themselves against modelling errors, a precaution that puts model uncertainty premia into equilibrium security market prices. Because we work with Markov models, we can avail ourselves of a powerful tool called a semigroup. 2. Thisassumptionis so widelyusedthatit rarelyexcitescommentwithinmacroeconomics.Kurz (1997) is an exception.The rationalexpectationscritiqueof earlierdynamicmodelswith adaptive expectationswas that they implicitlycontainedtwo models, one for the econometricianand a worse one for the agentswho are forecastinginside the model. See Jorgenson(1967) and Lucas (1976). Rationalexpectationsmodellingrespondedto this critiqueby attributinga commonmodel to the econometricianand the agents within this model. Econometriciansand agents can have differentinformationsets, but they agree aboutthe model (stochasticprocess). 3. See Evansand Honkapohja(2001) and Fudenbergand Levine (1998). 70 Journalof the EuropeanEconomicAssociation March2003 1(1):68- 123 1.2 IteratedLaws and Semigroups The law of iteratedexpectationsimposesconsistencyrequirementsthatcause a collection of conditional expectations operatorsassociated with a Markov process to form a mathematicalobject called a semigroup.The operatorsare indexedby the time thatelapses betweenwhen the forecastis made and when the randomvariablebeingforecastis realized.This semigroupandits associated generatorcharacterizethe Markovprocess. Because we consider forecasting randomvariablesthat are functions of a Markov state, the currentforecast dependsonly on the currentvalue of the Markovstate.4 The law of iteratedvalues embodies analogousconsistencyrequirements for a collection of economic values assigned to claims to payoffs that are functionsof futurevalues of a Markovstate.The family of valuationoperators indexedby the time thatelapsesbetweenwhen the claims are valuedand when theirpayoffs arerealizedformsanothersemigroup.Justas a Markovprocessis characterizedby its semigroup,so prices of payoffs that are functions of a Markovstate can be characterizedby a semigroup.Hansen and Scheinkman (2002) exploitedthis insight.Here we extendtheirinsightto othersemigroups. In particular,we describefour semigroups:(1) one that describesa Markov process; (2) anotherthat adjusts continuationvalues in a way that rewards decision rules that are robustto misspecificationof the approximatingmodel; (3) anotherthatmodelsthe equilibriumpricingof securitieswith payoff datesin the future;and (4) anotherthat governs statisticsfor discriminatingbetween alternativeMarkovprocessesusing a finite time series datarecord.5We show the close connectionsthatbind these four semigroups. 1.3 Model Detection Errors and MarketPrices of Risk In earlierworkHansen,Sargent,andTallarini(1999), henceforthdenotedHST, and Hansen,Sargent,and Wang (2002), henceforthdenotedHSW, we studied various discretetime asset pricing models in which decision makers'fear of model misspecificationput model uncertaintypremiainto marketpricesof risk, therebypotentiallyhelpingto accountfor the equitypremium.Transcendingthe detaileddynamicsof our exampleswas a tight relationshipbetweenthe market price of risk and the probabilityof distinguishingthe representativedecision maker's approximatingmodel from a worst-case model that emerges as a by-productof his cautiousdecision-makingprocedure.Althoughwe hadoffered only a heuristicexplanationfor thatrelationship,we neverthelessexploitedit to help us calibratethe set of alternativemodels that the decision makershould 4. The semigroupformulationof Markovprocesses is common in the literatureon applied probability.See Ethierand Kurtz(1986) for a generaltreatmentof semigroupsand Hansenand Scheinkman(1995) for theiruse in studyingthe identificationof continuous-timeMarkovmodels. 5. Here the operatoris indexedby the time horizonof the availabledata. In effect there is a statisticaldetectionoperatorthat measuresthe statisticalvalue of informationavailableto discriminatebetweentwo Markovprocesses. E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 71 plausibly seek robustness against. In the context of continuous time Markov models, this paper analytically establishes a precise link between the uncertainty component of risk prices and a bound on the probability of distinguishing the decision maker's approximating and worst case models. We also develop new ways of representing decision makers' concerns about model misspecification and their equilibrium consequences. 1.4 Related Literature In the context of a discrete-time, linear-quadratic permanent income model, HST considered model misspecifications measured by a single robustness parameter. HST showed how robust decision making promotes behavior like that induced by risk aversion. They interpreteda preference for robustness as a decision maker's response to Knightian uncertainty and calculated how much concern about robustness would be required to put market prices of risk into empirically realistic regions. Our fourth semigroup, which describes model detection errors, provides a statistical method for judging whether the required concern about robustness is plausible. HST and HSW allowed the robust decision maker to consider only a limited array of specification errors, namely, shifts in the conditional mean of shocks that are i.i.d. and normally distributed under an approximating model. In this paper, we consider more general approximating models and motivate the form of potential specification errors by using specification test statistics. We show that HST's perturbationsto the approximatingmodel emerge in linear-quadratic, Gaussian control problems as well as in a more general class of control problems in which the stochastic evolution of the state is a Markov diffusion process. However, we also show that misspecifications different from HST's must be entertained when the approximating model includes Markov jump components. As in HST, our formulation of robustness allows us to reinterpret one of Epstein and Zin's (1989) recursions as reflecting a preference for robustness rather than aversion to risk. As we explain in Hansen, Sargent, Turmuhambetova,and Williams (henceforth HSTW) (2002), the robust control theory described in Section 5 is closely connected to the min-max expected utility or multiple priors model of Gilboa and Schmeidler (1989). A main theme of the present paper is to advocate a workable strategy for actually specifying those multiple priors in applied work. Our strategy is to use detection error probabilities to surroundthe single model that is typically specified in applied work with a set of empirically plausible but vaguely specified alternatives. 7.5 Robustness versus Learning A convenient feature of rational expectations models is that the model builder imputes a unique and explicit model to the decision maker. Our analysis shares 72 Journalof the EuropeanEconomicAssociation March2003 1(1):68- 123 this analyticalconvenience.While an agent distrustshis model, he still uses it to guide his decisions.6But the agent uses his model in a way thatrecognizes To quantifyapproximation, we measurediscrepancy thatit is an approximation. between the approximatingmodel and other models with relativeentropy,an expectedlog-likelihoodratio,wherethe expectationis takenwith respectto the distributionfromthe alternativemodel.Relativeentropyis used in the theoryof large deviations, a powerful mathematicaltheory about the rate at which uncertaintyaboutunknowndistributionsis resolvedas the numberof observations grows.7An advantageof using entropyto restrainmodel perturbationsis that we can appealto the theoryof statisticaldetectionto provideinformation abouthow much concernaboutrobustnessis quantitativelyreasonable. Ourdecision makerconfrontsalternativemodels thatcan be discriminated among only with substantialamountsof data, so much data that, because he discountsthe future,the robustdecisionmakersimply acceptsmodel misspecificationas a permanentsituation.He designs robustcontrols,and does not use datato improvehis modelspecificationover time.He adoptsthis stancebecause relativeto his discountfactor,it would take too much time for enough datato accruefor him to disposeof the alternativemodelsthatconcernhim. In contrast, many formulationsof learninghave decision makersfully embracean approximatingmodel when makingtheirchoices.8Despite theirdifferentorientations, learnersandrobustdecisionmakersbothneed a convenientway to measurethe proximityof two probabilitydistributions.This fact builds technicalbridges betweenrobustdecisiontheoryandlearningtheory.The sameexpressionsfrom large deviation theory that govern bounds on rates of learningalso provide boundson value functionsacrossalternativepossiblemodelsin robustdecision theory.9 More importantlyhere, we shall show that the tight relationship between detection errorprobabilitiesand the marketprice of risk that was encounteredby HST and HSW can be explainedby formallystudyingthe rate at which detectionerrorsdecreaseas sample size grows. 6. The assumptionof rationalexpectationsequatesa decision maker'sapproximatingmodel to the objectivedistribution.Empiricalapplicationsof models with robustdecision makerssuch as HSTandHSWhaveequatedthosedistributionstoo. The statementthatthe agentregardshis model as an approximation, and thereforemakescautiousdecisions,leaves open the possibilitythatthe agent's concern about model misspecificationis "just in his head,"meaningthat the data are actually generatedby the approximatingmodel. The "just in his head" assumptionjustifies model with the econometrician'smodel, a step thatallows us equatingthe agent'sapproximating to bringto bearmuchof the powerfulempiricalapparatusof rationalexpectationseconometrics. In particular,it providesthe same economicalway of imputingan approximatingmodel to the agentsas rationalexpectationsdoes. The differenceis thatwe allow the agent'sdoubtsaboutthat model to affect his decisions. 7. See Cho, Williams,and Sargent(2002) for a recentapplicationof largedeviationtheoryto a model of learningdynamicsin macroeconomics. 8. See Bray (1982) and Kreps(1998). 9. See Hansenand Sargent(forthcoming)for discussionsof these bounds. E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 73 1.6 Reader's Guide A reader interested only in our main results can read Section 2, then jump to the empirical applications in Section 9. 2. Overview This section briefly tells how our main results apply in the special case in which the approximating model is a diffusion. Later sections provide technical details and show how things change when we allow jump components. A representative agent's model asserts that the state of an economy xt in a state space D follows a diffusion10 dxt = n(xt)dt + A(xt)dBt (2.1) where Bt is a Brownian vector. The agent wants decision rules that work well not just when (2.1) is true but also when the data conform to models that are statistically difficult to distinguish from (2.1). A robust control problem to be studied in Section 5 leads to such a robust decision rule together with a value function V(xt) and a process y(xt) for the marginal utility of consumption of a representativeagent. As a by-product of the robust control problem, the decision maker computes a worst-case diffusion that takes the form dxt = [juU) + A(xt)g(xt)]dt + A(xt)dBh (2.2) where g = -(l/d)A'(dV/dx) and 6 > 0 is a parameter measuring the size of potential model misspecifications. Notice that (2.2) modifies the drift but not the volatility relative to (2.1). The formula for g tells us that large values of 6 are associated with g/s that are small in absolute value, making model (2.2) difficult to distinguish statistically from model (2.1). The diffusion (2.6) below lets us quantify just how difficult this statistical detection problem is. Without a preference for robustness to model misspecification, the usual approachto asset pricing is to compute the expected discounted value of payoffs with respect to the 'risk-neutral' probability measure that is associated with the following twisted version of the physical measure (diffusion (2.1)): (2.3) dxt = |>(jt,) + A(xt)g(xt)]dt + A(xt)dBt. In using the risk-neutral measure to price assets, future expected returns are discounted at the risk-free rate p(xt), obtained as follows. The marginal utility of + ay(xt)dBr Then the representative household y(xt) conforms to dyt = jjLy(xt)dt the risk-free rate is p(xt) = 8 - [iLy(xt)/y(xt)],where 8 is the instantaneous rate at which the household discounts future utilities; the risk-free rate thus equals the negative of the expected growth rate of the representative household's contingent on a Markov state in marginal utility. The price of a payoff <\>{xN) N is then period 10. Diffusion(2.1) describesthe approximating'physicalprobabilitymeasure.' 74 Journalof the EuropeanEconomicAssociation March2003 1(1):68-123 - \ (Np{xu)du](f)(xN)xo = x\\ E exp - 0 (2.4) / where E is the expectation evaluated with respect to the distribution generated by (2.3). This formula gives rise to a pricing operator for every horizon N. Relative to the approximating model, the diffusion (2.3) for the risk-neutral measure distorts the drift in the Brownian motion by adding the term A(x)g(xt), where g = A'[d log y(x)/dx]. Here g is a vector of 'factor risk prices' or 'market prices of risk.' The equity premium puzzle is the finding that with plausible quantitative specifications for the marginal utility yQt), factor risk prices g are too small relative to their empirically estimated counterparts. In Section 7, we show that when the planner and a representative consumer want robustness, the diffusion associated with the risk-neutral measure appropriate for pricing becomes dxt = M*,) + A(xt)[g(xt) + g(xt)])dt + A(xt)dBh (2.5) where g is the same process that appears in (2.2). With robustness sought over a set of alternative models that is indexed by 0, factor risk prices become augmented to g + g. The representative agent's concerns about model misspecification contribute the g component of the factor risk prices. To evaluate the quantitative potential for attributing parts of the market prices of risk to agents' concerns about model misspecification, we need to calibrate 6 and therefore \g\. To calibrate 6 and g, we turn to a closely related fourth diffusion that governs the probability distribution of errors from using likelihood ratio tests to detect which of two models generated a continuous record of length N of observations on xr Here the key idea is that we can represent the average error in using a likelihood ratio test to detect the difference between the two models (2. 1) and (2.2) from a continuous record of data of length N as .5£'(min{exp(€7V), 1}|jc0 = jc) where E is evaluated with respect to model (2.1) and (,N is a likelihood ratio of the data record of model (2.2) with respect to model (2. 1). For each a E (0, 1), we can use the inequality £'(min{exp(€^v), 1}|jc0 = jc) < E({exp(a£N)}\x0 = x) to attain a bound on the detection error probability. For each a, we show that the bound can be calculated by forming a new diffusion that uses (2.1) and (2.2) as ingredients, and in which the drift distortion g from (2.2) plays a key role. In particular, for a E (0, 1), define dxat= |>(*,) + aA(xt)g(xt)]dt + A(xt)dB» (2.6) and define the local rate function pa(x) = [(1 a)a/2]g(x)'g(x). Then the bound on the average error in using a likelihood ratio test to discriminate between the approximating model (2.1) and the worst case model (2.2) from a continuous data record of length TVis E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection \ I (" av error < .5Ea exp - \ 1 pa(xt) \dt xo = x , 75 (2.7) where Ea is the mathematical expectation evaluated with respect to the diffusion (2.6). The error rate pa(x) is maximized by setting a = .5. Notice that the right side of (2.7) is one-half the price of a pure discount bond that pays off one unit of consumption for sure N periods in the future, treating pa as the risk-free rate and the measure induced by (2.6) as the risk-neutral probability measure. It is remarkablethat the three diffusions (2.2), (2.5), and (2.6) that describe the worst case model, asset pricing under a preference for robustness, and the local behavior of a bound on model detection errors, respectively, are all obtained by perturbingthe drift in the approximatingmodel (2.1) with functions of the same drift distortion g(x) that emerges from the robust control problem. To the extent that the bound on detection probabilities is informative about the detection probabilities themselves, our theoretical results thus neatly explain the pattern that was observed in the empirical applications of HST and HSW, namely, that there is a tight link between calculated detection errorprobabilities and the market price of risk. That link transcends all details of the model specification.11 In Section 9, we shall encounter this tight link again when we calibrate the contribution to market prices of risk that can plausibly be attributed to a preference for robustness in the context of three continuous time asset pricing models. Subsequent sections of this paper substantiate these and other results in a more general Markov setting that permits x to have jump components, so that jump distortions also appear in the Markov processes for the worst case model, asset pricing, and model detection error. We shall exploit and extend the asset-pricing structure of formulas such as (2.4) and (2.7) by recognizing that they reflect that collections of expectations, values, and bounds on detection error rates can all be described with semigroups. 3. Mathematical Preliminaries The remainder of this paper studies continuous-time Markov formulations of model specification, robust decision making, pricing, and statistical model detection. We use Feller semigroups indexed by time for all four purposes. This section develops the semigroup theory needed for our paper. 3.1 Semigroups and Their Generators Let D be a Markov state space that is a locally compact and separable subset of Rm. We distinguish two cases. First, when D is compact, we let C denote the space of continuous functions mapping D into R. Second, when we want to 11. See Figure8 of HSW. 76 Journal of the European Economic Association March 2003 1(1):68- 123 studycases in whichthe state space is unboundedso thatD is not compact,we shall use a one-pointcompactificationthatenlargesthe state space by addinga point at oo.In this case we let C be the spaceof continuousfunctionsthatvanish at oo.We can thinkof suchfunctionsas havingdomainD or domainDU°°. The compactificationis used to limit the behaviorof functionsin the tails when the state space is unbounded.We use the sup-normto measurethe magnitudeof functionson C and to define a notion of convergence. We areinterestedin a stronglycontinuoussemigroupof operators{ift : t > 0} with an [email protected] [ift : t > 0} to be a semigroupwe requirethatif0 = 3 and &)t+r= iftifr for all t, t > 0. A semigroupis strongly continuousif = <f> lim if T</> where the convergenceis uniformfor each (f>in C. Continuityallows us to computea time derivativeand to define a generator -^- (f> = lim-ifT(f)-. <8<f> T (3.1) t|0 This is again a uniformlimit and it is well definedon a dense subset of C. A generatordescribesthe instantaneousevolution of a semigroup.A semigroup can be constructedfrom a generatorby solving a differentialequation.Thus applyingthe semigrouppropertygives lim - = <S3>,<k (3.2) a differentialequationfor a semigroupthatis subjectto the initialconditionthat if0 is the identityoperator.The solutionto differentialequation(3.2) is depicted heuristicallyas: $ft= exp(/<S) and thus satisfiesthe semigrouprequirements.The exponentialformulacan be justifiedrigorouslyusing a Yosida approximation,which formallyconstructsa semigroupfrom its generator. In what follows, we will use semigroupsto model Markov processes, intertemporalprices, and statistical discrimination.Using a formulationof Hansenand Scheinkman(2002), we firstexaminesemigroupsthatare designed to model Markovprocesses. 3.2 Representationof a Generator We describe a convenient representationresult for a strongly continuous, positive, contractionsemigroup.Positivity requiresthat ift maps nonnegative functions</> into nonnegativefunctions<f)for each t. When the semigroupis a E. W. Anderson,L. P. Hansen,& T. J. Sargent SemigroupsandModel Detection 77 contraction,it is referredto as a Feller semigroup.The contractionproperty restrictsthe normof ift to be less thanor equalto one for each t andis satisfied for semigroupsassociatedwith Markovprocesses.Generatorsof Feller semigroupshave a convenientcharacterization: whereX has the productform = jf<Mx) - <Kx)M<ty\x) [<t>(y) (3.4) wherep is a nonnegativecontinuousfunction,/x is an m-dimensionalvectorof continuousfunctions, 2 is a matrix of continuousfunctions that is positive semidefiniteon the state space, and tj(#|jc)is a finite measurefor each x and continuousin x for Borel subsetsof D. We requirethatX map C\ into C where C\ is the subspace of functions that are twice continuouslydifferentiable functionswith compactsupportin D. Formula(3.4) is valid at least on C2K}2 To depict equilibriumprices we will sometimesgo beyond Feller semigroups.Pricing semigroupsare not necessarilycontractionsemigroupsunless the instantaneousyield on a realdiscountbondis nonnegative.Whenwe use this approachfor pricing,we will allow p to be negative.Althoughthis puts us out of the realmof Fellersemigroups,as arguedby HansenandScheinkman(2002), knownresultsfor Fellersemigroupscan oftenbe extendedto pricingsemigroups. We can thinkof the generator(3.3) as being composedof threeparts.The firsttwo componentsare associatedwith well-knowncontinuous-timeMarkov processmodels,namely,diffusionandjumpprocesses.The thirdpartdiscounts. The next three subsectionswill interpretthese componentsof equation(3.3). 3.2.1 DiffusionProcesses. The generatorof a Markovdiffusion process is a second-orderdifferentialoperator: (d<f)\ l / #<j>\ 12. See Theorem1.13 in ChapterVII of Revuz and Yor (1994). Revuz and Yor give a more generalrepresentationthat is valid providedthat the functionsin C£ are in the domainof the generator.Their representationdoes not requirethat tj(*|jc)be a finite measurefor each x but imposesa weakerrestrictionon this measure.As we will see, when tj(#|jc)is finite,we can define a jump intensity.Weakerrestrictionspermitthereto be an infinitenumberof expectedjumpsin finite intervalsthat are arbitrarilysmall in magnitude.As a consequence,this extra generality involves more cumbersomenotationand contributesnothingessentialto our analysis. 78 Journal of the European Economic Association March 2003 1(1):68- 123 where the coefficientvector /x is the driftor local mean of the processand the coefficientmatrix2 is the diffusionor local covariancematrix.The corresponding stochasticdifferentialequationis: dxt = ix(xt)dt+ A(xt)dBt where {Bt} is a multivariatestandardBrownianmotion and AA' = X. Sometimes the resultingprocess will have attainableboundaries,in which case we either stop the process at the boundaryor impose otherboundaryprotocols. 3.2.2 JumpProcesses. The generatorfor a Markovjump process is: = A[a^- <fl = J((f> <%n$ (3.5) where the coefficient X = fr](dy\x) is a possibly state-dependentPoisson intensity parameterthat sets the jump probabilitiesand 2, is a conditional expectationoperatorthat encodes the transitionprobabilitiesconditionedon a jump takingplace. Withoutloss of generality,we can assumethatthe transition distributionassociatedwith the operator2, assignsprobabilityzero to the event y = x providedthatx ± oo,wherex is the currentMarkovstate andy the state aftera jumptakesplace.Thatis, conditionedon a jumptakingplace,the process cannotstay put with positive probabilityunless it reachesa boundary. The jump and diffusion componentscan be combined in a model of a Markovprocess.Thatis, +*+ ^ +*«*=*•(&J+2trace Is &a?) (3'6) is the generatorof a family (semigroup)of conditionalexpectationoperatorsof = x]. a Markovprocess {jc,},say ift<\)(x)= E[<f>(xt)\x0 3.2.3 Discounting. The third part of (3.3) accounts for discounting.Thus, considera Markovprocess {xt} with generatorcSd+ cSn.Constructthe semigroup: = eI exp SP,<K*) *o= x\ <K*r) p(xT)dr on C We can think of this semigroupas discountingthe future state at the stochastic rate p(x). Discount rates will play essential roles in representing shadow prices from a robust resource allocation problem and in measuring statisticaldiscriminationbetween competingmodels.13 13. When p > 0, the semigroup is a contraction. In this case, we can use ^ as a generator of a Markov process in which the process is curtailed at rate p. Formally, we can let o° be a terminal state at which the process stays put. Starting the process at state x i=-o°, £(exp[- J*op(jct)^t]|jc0= x) is the probability that the process is not curtailed after t units of time. See Revuz and Yor (1994, E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 79 3.3 Extendingthe Domain to Bounded Functions While it is mathematicallyconvenientto constructthe semigroupon C, sometimes it is necessaryfor us to extendthe domainto a largerclass of functions. For instance, indicatorfunctions 1D of nondegeneratesubsets D are omitted from C. Moreover,lD is not in C when D is not compact;nor can this function be approximateduniformly.Thus to extend the semigroupto bounded,Borel measurablefunctions,we need a weakernotionof convergence.Let {<f)j:j= 1, 2, . . . } be a sequenceof uniformlyboundedfunctionsthatconvergespointwise to a boundedfunction4>o.We can thenextendthe ifTsemigroupto 4>ousing the formula: j ^ wherethe limit notionis now pointwise.The choice of approximatingsequence does not matterand the extensionis unique.14 Withthis construction,we definethe instantaneousdiscountor interestrate as the pointwisederivative 1 -lim-logyrlD= T p t|0 when the derivativeexists. 3.4 Extendingthe Generatorto UnboundedFunctions Value functionsfor controlproblemson noncompactstate spaces are often not bounded.Thus for our study of robustcounterpartsto optimization,we must extend the semigroupand its generatorto unboundedfunctions.We adoptan approachthatis specificto a Markovprocessandhencewe studythis extension only for a semigroupgeneratedby <S= c@d+ c§n. We extend the generatorusing martingales.To understandthis approach, in the domainof the generator, we firstremarkthat for a given <j> - <j>(x0) ~ Mt = <f>(xt) ^(xT)dr Jo is a martingale.In effect, we producea martingaleby subtractingthe integralof This martingaleconstructionsuggests the local meansfromthe process {(f>(xt)}. a way to build the extendedgenerator.Given 4>we find a functiont//such that p. 280) for a discussion.As in Hansen and Scheinkman(2002), we will use the discounting interpretationof the semigroupand not use p as a curtailmentrate. Discountingwill play an importantrole in our discussionof detectionand pricing.In pricingproblems,p can be negative in some statesas might occur in a real economy,an economywith a consumptionnumeraire. 14. This extensionwas demonstrated by Dynkin(1956). Specifically,Dynkindefinesa weak (in to this semigroupandshowsthatthereis a weakextensionof the sense of functionals)counterpart this semigroupto bounded,Borel measurablefunctions. 80 Journalof the EuropeanEconomicAssociation March2003 1(1):68- 123 - cj>(x0) Mt = <]>(xt) il*(xT)dT (3.7) is a local martingale(a martingaleunderall membersof a sequenceof stopping times that increasesto o°).We then define^ = i//.This constructionextends the operator^ to a largerclass of functionsthanthose for which the operator in the domainof the generator, differentiation(3.1) is well defined.Forevery </> not in the domain if/= ^ in (3.7) producesa martingale.However,thereare </>s In the case of a of the generatorfor which (3.7) also producesa martingale.15 Feller process definedon a state-spaceD that is an open subset of Rm, this extended domain contains at least functions in C2, functions that are twice continuouslydifferentiableon D. Such functionscan be unboundedwhen the originalstate space D is not compact. 4. A Tour of Four Semigroups In the remainderof the paperwe will studyfour semigroups.Before describing each in detail, it is useful to tabulatethe four semigroupsand their uses. We have alreadyintroducedthe firstsemigroup,which describesthe evolutionof a state vector process {xt}. This semigroupportraysa decision maker'sapproximatingmodel. It has the generatordisplayedin (3.3) with p = 0, which we repeathere for convenience: / ^ \ (d(f)\ 1 Althoughuntil now we used ^ to denote a generic generator,from this point forwardwe will reserveit for the approximatingmodel. We can think of the decision makeras using the semigroupgeneratedby ^ to forecast functions 4>(xt).This semigroupfor the approximatingmodel can have both jump and Browniancomponents,but the discountrate p is zero. In some settings, the semigroupassociatedwith the approximatingmodel includes a descriptionof endogenousstatevariablesandthereforeembedsrobustdecisionrulesof one or more decision makers,as for examplewhen the approximatingmodel emerges from a robustresourceallocationproblemof the kind to be describedin Section 5. With our first semigroupas a point of reference,we will consider three additionalsemigroups.The second semigrouprepresentsan endogenousworstcase model that a decision maker uses to promote robustnessto possible misspecificationof his approximatingmodel (4.1). For reasonsthatwe discuss in Section8, we shallfocus the decisionmaker'sattentionon worst-casemodels 15. Thereare otherclosely relatednotionsof an extendedgeneratorin the probabilityliterature. Sometimescalendartime dependenceis introducedinto the function<f),or martingalesareused in place of local martingales. E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 81 that are absolutely continuous with respect to his approximating model. Following Kunita (1969), we shall assume that the decision maker believes that the data are actually generated by a member of a class of models that are obtained as Markov perturbations of the approximating model (4.1). We parameterize this class of models by a pair of functions (g, h), where g is a continuous function of the Markov state x that has the same number of coordinates as the underlying Brownian motion, and h is a nonnegative function of (y, jc) that distorts the jump intensities. For the worst-case model, we have the particular settings g = g and h = h. Then we can represent the worst-case generator ^ as i ~ /- #<M + <8=^U) 2tnce(2^)+^ (d(f)\ (4"2) where ji = /i, + A| t =2 r,(dy\x) = h(y, x)if(dy\x). The distortion g to the diffusion and the distortion ft to the jump component in the worst case model will also play essential roles both in asset pricing and in the detection probabilities formulas. From (3.5), it follows that the jump intensity under this parameterizationis given by A(jc)= / ft{y, x)r)(dy\x) and the jump distribution conditioned on x is [h(y, x)/k(x)]r)(dy\x). A generator of the form (4.2) emerges from a robust decision problem, the perturbationpair (g, h) being chosen by a malevolent player, as we discuss next. Our third semigroup modifies one that Hansen and Scheinkman (2002) developed for computing the at time t. Hansen time zero price of a state contingent claim that pays off <^{x^) and Scheinkman showed that the time zero price can be computed with a risk-free rate p and a risk-neutralprobability measure embedded in a semigroup with generator: /- ^\ (d<f>\ 1 +2^ (2 **= -**+*'U) ^ +*_ * (43a) Here /jt = /jl + Att I = 2 ^(rfy|jc) (4.3b) = n(y, jc)T|(rfy|jc). In the absence of a concern about robustness, rr = g is a vector of prices for the Brownian motion factors and fl = h encodes the jump risk prices. In Markov settings without a concern for robustness, (4.3b) represents the connection 82 Journalof the EuropeanEconomicAssociation March2003 1(1):68-123 betweenthe physicalprobabilityand the so-calledrisk-neutralprobabilitythat is widely used for asset pricingalong with the interestrate adjustment. We altergenerator(4.3) to incorporatea representativeconsumer'sconcern about robustnessto model misspecification.Specifically a preferencefor robustnesschanges the ordinaryformulasfor tt and ft that are based solely on pricing risks under the assumptionthat the approximatingmodel is true. A concern about robustnessalters the relationshipbetween the semigroupsfor representingthe underlyingMarkovprocessesand pricing.With a concernfor robustness,we representfactorrisk prices by relatingjl to the worst-casedrift /x: /I = jx + Kg andrisk-basedjumppricesby relatingfj to the worst-casejump measurefj: r\(dy\x)= h(y, x)r)(dy\x).Combiningthis decompositionwith the relationbetween the worst-caseand the approximatingmodels gives the new vectors of pricingfunctions K= g + g n = hfi where the pair (g, ft) is used to portraythe (constrained)worst-casemodel in (4.2). Laterwe will supplyformulasfor (p, g, h). A fourth semigroupstatisticallyquantifiesthe discrepancybetween two competingmodels as a functionof the time intervalof availabledata.We are particularlyinterestedin measuringthe discrepancybetweenthe approximating andworstcase models.Foreach a E (0, 1), we developa boundon a detection errorprobabilityin terms of a semigroupand what looks like an associated 'risk-free interest rate.' The counterpartto the risk-free rate serves as an instantaneousdiscrimination rate.Foreach a, the generatorfor the boundon the detectionerrorprobabilitycan be representedas: V* = -M + m»• y + j trace (s» ^) + *•*, where /xa = ix + Aga £* = £ if(dy\x) = ha(y,x)i,(dy\x). The semigroupgeneratedby ^ governsthe behavioras samplesize grows of a boundon the fractionof errorsmadewhen distinguishingtwo Markovmodels using likelihoodratiosor posteriorodds ratios.The a associatedwith the best boundis determinedon a case by case basis andis especiallyeasy to findin the special case thatthe Markovprocess is a purediffusion. Table 1 summarizesour parameterization of these four semigroups.Subsections formulas for in this table. the entries sequent supply E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 83 Table 1. Parameterizations of the Generators of Four Semigroups Generator Semigroup Approximating model Worst-case model ^00 <S Rate Drift distortion 0 g(x) Pricing <§ p(x) Detection ^a pa(x) tr(x) = g(x) + g(x) Jump distortion density _ 1 d(y, x) U(x) = h(y, x)d(y,x) £(x) ha(y, x) The rate modifies the generatorassociated with the approximatingmodel by adding -p<f>to the generatorfor a test to the generatorassociated with the approximatingmodel. The function </>. The drift distortionadds a term Ag • (d<f>/dx) jump distortiondensity is h(y, x)r](dy\x)instead of the jump distributiont)(dy\x)in the generatorfor the approximating model. 5. Model Misspedfication and Robust Control We now study the continuous-timerobust resource allocation problem. In additionto an approximatingmodel, this analysis will producea constrained worst-casemodel that by helping the decision makerto assess the fragilityof any given decisionrulecan be used as a device to choose a robustdecisionrule. 5./ LyapunovEquation under a MarkovApproximatingModel and a Fixed Decision Rule Under a Markovapproximatingmodel with generator<§ and a fixed policy functioni(x), the decision maker'svalue functionis exp(-8t)E[U[xt, i(xt)]\x0= x]dt. V(x)= Jo The value functionV satisfiesthe continuous-timeLyapunovequation: (5.1) 8V(x) = U[x, i'(jc)]+ <SV(jc). Because V may not be bounded,we interpret<§as the weak extensionof the generator(3.6) definedusing local martingales.The local martingaleassociated with this equationis: Mt = V(xt)- V(xo)~ (8V(xs)- U[xs, i(xs)])ds. As in (3.6), this generatorcan includediffusionandjump contributions. We will eventuallybe interestedin optimizingover a control /, in which case the generator^ will dependexplicitlyon the control.Fornow we suppress thatdependence.We referto ^ as the approximating model;^ can be modelled using the triple (/a, 2, 17)as in (3.6). The pair (jll,2) consists of the drift and diffusioncoefficientswhile the conditionalmeasure17encodes both the jump intensityand the jump distribution. We want to modify the Lyapunovequation(5.1) to incorporatea concern about model misspecification.We shall accomplishthis by replacing<§with 84 Journalof the EuropeanEconomicAssociation March2003 1(1):68- 123 anothergeneratorthat expresses the decision maker's precautionabout the specificationof CS. 5.2 EntropyPenalties We now introduceperturbations to the decision maker'sapproximatingmodel that are designed to make finite horizon transitiondensities of the perturbed model be absolutelycontinuouswith respect to those of the approximating model. We use a notion of absolute continuitythat pertainsonly to finite intervalsof time. In particular,imaginea Markovprocessevolving for a finite lengthof time. Ournotionof absolutecontinuityrestrictsprobabilitiesinduced by the path {xT:0 < t < t] for all finite t. See HSTW (2002), who discuss the notionas well as an infinitehistoryversionof absolutecontinuity.Kunita(1969) shows how to preserveboth the Markovstructureand absolutecontinuity. thatcan FollowingKunita(1969), we shall considera Markovperturbation be parameterizedby a pair (g, h), where g is a continuousfunction of the Markov state x and has the same numberof coordinatesas the underlying Brownianmotion, and h is a nonnegativefunctionof (y, x) used to model the jump intensities.In Section 8, we will have more to say aboutthese perturbations includinga discussionof why we do not perturb2. Forthe pair(g, h), the perturbedgeneratoris portrayedusing a driftjll + Ag, a diffusionmatrix2, and a jump measureh(y, x)r)(dy\x).Thus the perturbedgeneratoris <S(g, h)cf>(x)= <3<Kx)+ [A(x)g(x)] • ^^ox + [h(y, x) l][<Ky) <Xx)Mdy\x). Forthis perturbedgeneratorto be a Fellerprocesswouldrequirethatwe impose additionalrestrictionson h. For analyticaltractabilitywe will only limit the perturbationsto have finite entropy.We will be compelledto show, however, that the perturbationused to implementrobustnessdoes indeed generate a will be constructedformallyas the solution Markovprocess.This perturbation to a constrainedminimizationproblem.In whatfollows, we continueto use the notation<Sto be the approximatingmodel in place of the moretedious^(0, 1). 5.3 ConditionalRelativeEntropy At this point, it is useful to have a local measure of conditional relative entropy.16Conditionalrelativeentropyplays a prominentrole in largedeviation 16. This will turnout to be a limitingversionof a local Chemoffmeasurepa to be definedin Section 8. E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 85 theoryand in classical statisticaldiscriminationwhere it is sometimesused to studythe decay in the so-calledtype II errorprobabilities,holdingfixed type I errors(Stein's Lemma).For the purposesof this section, we will use relative entropy as a discrepancymeasure. In Section 8 we will elaborate on its connectionto the theoryof statisticaldiscrimination.As a measureof discrepancy, it has been axiomatizedby Csiszar(1991) althoughhis defense shall not concernus here. By it we denote the log of the ratioof the likelihoodof model one to the likelihoodof model zero, given a datarecordof lengtht. For now, let the data be either a continuousor a discretetime sample.The relativeentropyconditionedon x0 is definedto be: E(£t\x0, model 1) = £[€, exp(€,)|jtb, model 0] = (5.2) model 0]|a=1, -^ E[exp(a€,)|*b, where we have assumed that the model zero probability distribution is absolutelycontinuouswith respect to the model one probabilitydistribution. To evaluate entropy, the second relation differentiatesthe moment-generating function for the log-likelihood ratio. The same informationinequality thatjustifies maximumlikelihood estimationimplies that relative entropyis nonnegative. When the model zero transitiondistributionis absolutelycontinuouswith respectto the model one transitiondistribution,entropycollapsesto zero as the lengthof the datarecordt - » 0. Therefore,with a continuousdatarecord,we shalluse a conceptof conditionalrelativeentropyas a rate,specificallythe time derivativeof (5.2). Thus, as a local counterpartto (5.2), we have the following measure: e(g, h)(x) = 8{X)28{X)+ [1 - h(y, x) + h(y, jc)log h(y, x)Mdy\x) (5.3) wheremodelzero is parameterized by by (0, 1) andmodelone is parameterized fromthe diffusioncontribution,andthe comes (g, h). The quadraticformg'g/2 term [1 - h(y9x) + h(y,*)log h(y,Jc)]r|(rfy|jc) measuresthe discrepancyin thejump intensitiesanddistributions.It is positive by the convexity of h log h in h. 86 Journal of the European Economic Association March 2003 1(1):68- 123 Let A denote the space of all such perturbationpairs (g, h). Conditional relativeentropye is convex in (g, h). It will be finite only when 0< h(y, x)r)(dy\x)< oo. When we introduceadjustmentsfor model misspecification,we modify Lyapunovequation(5.1) in the following way to penalizeentropy 8V(x) = min U[x, /(*)] + Oe(g,h) + <S(g,h)V(x), (g,h)<=A where 0 > 0 is a penaltyparameter.We areled to the followingentropypenalty problem. ProblemA inf 0e(g9h) + <S(g,h)V. J(V)= (5.4) Theorem5.1. Supposethat(i) Vis in C2and (ii) / exp[-V(y)/6]r)(dy\x)< ^for all x. The minimizerof ProblemA is ^) = 1 -0A(*)' u x), = exp\V(x) h{y, -dV(x) ~ V(y)l (5.5a) . The optimizedvalue of the criterionis: 4eXP(~^)l j(V)= -e i J expr e) (5-5b) Finally, the impliedmeasureof conditionalrelativeentropyis: e* ~ VS[exp(-V/0)] - «[Vexp(-V/e)] 0exp(-W0) Proof. The proof is in the Appendix. - 0S[exp(-V/0)] • (5>5c) E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 87 The formulas (5.5a) for the distortions will play a key role in our applications to asset pricing and statistical detection. 5.4 Risk-Sensitivity as an Alternative Interpretation In light of Theorem 5.1, our modified version of Lyapunov equation (5.1) is 8V(x) = min U[x, i(x)] + Oe(g, h) + <8(g, h)V(x) (g,A)GA (5.6) = "kfr>]-* 4exp(-^)|« r v(x)i exp[" tJ If we ignore the minimization prompted by fear of model misspecification and instead simply start with that modified Lyapunov equation as a description of preferences, then replacing ^V in the Lyapunov equation (5.1) by 0{cS[exp(- V/0)]/exp(- VIS)} can be interpreted as adjusting the continuation value for risk. For undiscounted problems, the connection between risk-sensitivity and robustness is developed in literatureon risk-sensitive control (e.g., see James 1992 and Runolfsson 1994). Hansen and Sargent's (1995) recursive formulation of risk sensitivity accommodates discounting. The connection between the robustness and the risk-sensitivity interpretations is most evident when ^ = c&dso that x is a diffusion. Then expr ~ej In this case, (5.6) is a partial differential equation. Notice that -1/20 scales (dV/dx)fX(dV/dx),the local variance of the value function process { V(xt)}. The interpretationof (5.6) under risk sensitive preferences would be that the decision maker is concerned not about robustness but about both the local mean and the local variance of the continuation value process. The parameter 0 is inversely related to the size of the risk adjustment. Larger values of 0 assign a smaller concern about risk. The term 1/0 is the so-called risk sensitivity parameter. Runolfsson (1994) deduced the S = 0 (ergodic control) counterpartto (5.6) to obtain a robust interpretationof risk sensitivity. Partial differential equation (5.6) is also a special case of the equation system that Duffie and Epstein (1992), Duffie and Lions (1992), and Schroder and Skiadas (1999) have analyzed for stochastic differential utility. They showed that for diffusion models, the recursive utility generalization introduces a variance multiplier that can be state dependent. The counterpart to this multiplier in our setup is state independent and equal to the risk sensitivity parameter 1/0. For a robust decision maker, this 88 Journalof the EuropeanEconomicAssociation March2003 1(1):68- 123 and alternative variancemultiplierrestrainsentropybetweenthe approximating models. The mathematicalconnectionsbetween robustness,on the one hand, and risk sensitivityand recursiveutility, on the other,let us draw on a set of analyticalresultsfrom those literatures.17 5.5 The O-ConstrainedWorstCase Model Given a value function,Theorem5.1 reportsthe formulasfor the distortions (g, fi) for a worst-casemodelused to enforcerobustness.This worstcase model is Markovand depictedin termsof the value function.This theoremthus gives us a generator^ andshows us how to fill out the secondrow in Table 1. In fact, a separateargumentis needed to show formallythat%does in fact generatea Fellerprocessor moregenerallya Markovprocess.Thereis a host of alternative sufficientconditionsin the probabilitytheoryliterature.Kunita(1969) gives one of the more generaltreatmentsof this problemand goes outside the realm of Feller semigroups.Also, Ethierand Kurtz(1986, Chapter8) give some sufficient conditionsfor operatorsto generateFeller semigroups,includingrestrictions on the jump component%nof the operator. of % we can applyTheorem8.1 to Using the Theorem5.1 characterization obtain the generatorof a detection semigroupthat measures the statistical discrepancybetweenthe approximatingmodel and the worst-casemodel. 5.6 An AlternativeEntropyConstraint We brieflyconsideran alternativebutclosely relatedway to computeworst-case models and to enforcerobustness.In particular,we consider: ProblemB 7*(V)= inf <§(g9h)V. (5.7) (g,/0eA,€(g,/0<e This problemhas the same solutionas thatgiven by ProblemA except that 6 mustnow be chosenso thatthe relativeentropyconstraintis satisfied.Thatis, 0 shouldbe chosen so that e(g, h) satisfiesthe constraint.The resulting0 will typicallydependon x. The optimizedobjectivemustnow be adjustedto remove the penalty: VcS[exp(-W0)] _ <S[Vexp(-V/0)] _ ae* = - -- Jj*(V) j(Y) - U€ {V) = AV) exp(-W0) which follows from (5.5c). These formulassimplifygreatlywhen the approximatingmodel is a diffusion. Then 6 satisfies 17. See Section9.2 for alternativeinterpretations of a particularempiricalapplicationin termsof risk-sensitivityandrobustness.For thatexample,we show how the robustnessinterpretation helps us to restrict0. E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 89 This formulationembedsa versionof the continuous-timepreferenceorderthat ChenandEpstein(2001) proposedto captureuncertaintyaversion.We had also suggestedthe diffusion version of this robustadjustmentin our earlierpaper (Anderson,Hansen,and Sargent1998). 5.7 Enlarging the Class of Perturbations to an approximating In thispaperwe focus on misspecificationsor perturbations Markovmodel that themselvesare Markovmodels. But in HSTW, we took a more general approachand began with a family of absolutely continuous to an approximatingmodel that is a Markovdiffusion.Absolute perturbations even finiteintervalsputsa precisestructureon the perturbations, over continuity when the Markov specificationis not imposed on these perturbations.As a specconsequence,HSTWfollow James(1992) by consideringpath-dependent ificationsof the driftof the BrownianmotionJqgsds,wheregs is constructedas a general function of past jc's. Given the Markov structureof this control functionof the state problem,its solutioncan be representedas a time-invariant = vectorxt thatwe denotegt g(xt). 5.8 Adding Controlsto the OriginalState Equation We now allow the generatorto depend on a control vector. Consider an approximatingMarkov control law of the form i(x) and let the generator associatedwith an approximatingmodel be ^(Z).For this generator,we introduce perturbation(g, h) as before. We write the correspondinggeneratoras ^(g, h, i). To attaina robustdecision rule, we use the Bellmanequationfor a two-playerzero-sumMarkovmultipliergame: 8V= max min U(x, i) + de(g, h) + <S(g,h9i)V. / (5.8) (g,/i)GA The Bellmanequationfor a correspondingconstraintgame is: 8V = max i min U(x, i) + <S(g,h, i)V. (g,/i)EA(i),e(g,/i)<e to terminalconditionsmust be imSometimesinfinite-horizon counterparts Bellman to these on the solutions equations.Moreover,applicationof a posed to needed will be VerificationTheorem guaranteethatthe impliedcontrollaws Bellman these equationspresumethatthe value actuallysolve the game.Finally, is It functionis twice continuouslydifferentiable. well knownthatthis differentiability is not always presentin problemsin which the diffusionmatrixcan be to each thereis typicallya viscositygeneralization singular.In thesecircumstances and Soner of these Bellmanequationswith very similarstructures. (See Fleming Markov 1991for a developmentof the viscosityapproachto controlled processes.) 90 Journal of the European Economic Association March 2003 1(1):68- 123 6. Portfolio Allocation To put some of the results of Section 5 to work, we now consider a robust portfolio problem. In Section 7 we will use this problem to exhibit how asset prices can be deduced from the shadow prices of a robust resource allocation problem. We depart somewhat from our previous notation and let [xt : t > 0} denote a state vector that is exogenous to the individual investor. The investor influences the evolution of his wealth, which we denote by wr Thus the investor's composite state at date t is {wt,xt). We first consider the case in which the exogenous component of the state vector evolves as a diffusion process. Later we let it be a jump process. Combining the diffusion and jump pieces is straightforward.We focus on the formulation with the entropy penalty used in Problem (5.4), but the constraint counterpart is similar. 6.1 Diffusion An investor confronts asset markets that are driven by a Brownian motion. Under an approximating model, the Brownian increment factors have date / prices given by ir(xt) and xt evolves according to a diffusion: dxt = iL(xt)dt+ A(xt)dBt. (6.1) Equivalently, the x process has a generator cSdthat is a second-order differential operator with drift /x and diffusion matrix 2 = AA\ A control vector bt entitles the investor to an instantaneous payoff bt • dBt with a price ir(xt) • bt in terms of the consumption numeraire. This cost can be positive or negative. Adjusting for cost, the investment has payoff - Tr(xt)• bjdt + bt • dBt. There is also a market in a riskless security with an instantaneous risk free rate p(x). The wealth dynamics are therefore dwt = [wtp(xt) - ir(xt) • bt - ct]dt + bt • dBt, (6.2) where ct is date t consumption. The control vector is /' = (b\ c). Only consumption enters the instantaneous utility function. By combining (6.1) and (6.2), we form the evolution for a composite Markov process. But the investor has doubts about this approximating model and wants a robust decision rule. Therefore he solves a version of game (5.8) with (6.1), (6.2) governing the dynamics of his composite state vector w, x. With only the diffusion component, the investor's Bellman equation is 8V(w, x) = max min U(c) + 0e(g) + ^(g, b, c)V (c,b) g where ^(g, b, c) is constructed using drift vector i r i»ix) + A(x)g - ir(x) • b - c + b • [wp(x) g\ 91 E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection and diffusion matrix The choice of the worst case shock g satisfies the first-order condition: dg + Vwb+ A'VX= O (6.3) where Vw= dV/dwand similarly for Vx.Solving (6.3) for g gives a special case of the formula in (5.5a). The resulting worst-case shock would depend on the control vector b. In what follows we seek a solution that does not depend on b. The first-order condition for consumption is Vw(w,x) = Uc(c), and the first-order condition for the risk allocation vector b is -Vwtt+ VJb + A'VXW+Vwg= 0. (6.4) In the limiting case in which the robustness penalty parameter is set to oo, we obtain the familiar result that b= irVw-A'Vxw v ww , in which the portfolio allocation rule has a contribution from risk aversion measured by - Vw/wVwwand a hedging demand contributed by the dynamics of the exogenous forcing process x. Take the Markov perfect equilibrium of the relevant version of game (5.8). Provided that Vwwis negative, the same equilibrium decision rules prevail no matter whether one player or the other chooses first, or whether they choose simultaneously. The first-orderconditions (6.3) and (6.4) are linear in b and g. Solving these two linear equations gives the control laws for b and g as a function of the composite state (w, x): « 67rVw- OA'V^ + VWA'VX b _qy _ (vw) (y \2 vvww g (65) - (vg2 evww Notice how the robustness penalty adds terms to the numeratorand denominator of the portfolio allocation rule. Of course, the value function V also changes when we introduce 6. Notice also that (6.5) gives decision rules of the form b = £(w,x) g = g(w, x), (66) 92 Journal of the European Economic Association March 2003 1(1):68- 123 and in particularhow the worst case shock g feeds back on the consumer's endogenousstatevariablew. Permittingg to dependon w expandsthe kinds of misspecificationsthatthe consumerconsiders. 6.1.1 RelatedFormulations.So far we have studiedportfoliochoice in the case of a constant robustnessparameter6. Maenhout(2001) considers portfolio problemsin whichthe robustnesspenaltydependson the continuationvalue. In his case, the preferencefor robustnessis designedso thatasset demandsare not sensitive to wealth levels as is typical in constant0 formulations.Lei (2000) uses the instantaneousconstraintformulationof robustnessdescribedin section 5.6 to investigateportfoliochoice. His formulationalso makes 6 state dependent, because 6 now formally plays the role of a Lagrangemultiplierthat restrictsconditionalentropyat every instant.Lei specificallyconsidersthe case of incompleteassetmarketsin whichthe counterpart to b has a lowerdimension thanthe Brownianmotion. 6.7.2 Ex-Post Bayesian Interpretation.While the dependenceof g on the endogenousstatew seems reasonableas a way to enforcerobustness,it can be unattractiveif we wish to interpretthe implied worst case model as one with misspecifiedexogenousdynamics.It is sometimesasked whethera prescribed decision rule can be rationalizedas being optimalfor some set of beliefs, and then to find what those beliefs must be. The dependenceof the shock distributions on an endogenousstate variablesuch as wealthw mightbe regardedas a peculiarset of beliefs becauseit is egotisticalto let an adversenaturefeedback on personalstate variables. But thereis a way to makethis featuremoreacceptable.It requiresusing a dynamiccounterpartto an argumentof BlackwellandGirshick(1954). We can producea differentrepresentationof the solution to the decision problemby forming an exogenous state vector W that conforms to the Markov perfect equilibriumof the game. We can confronta decision makerwith this law of motionfor the exogenousstate vector,have him not be concernedwith robustness againstmisspecificationof this law by setting6 = o°?andpose an ordinary decisionproblemin whichthe decisionmakerhas a uniquemodel.We initialize the exogenous state at Wo= w0. The optimaldecision processesfor {(bt, ct)} (but not the controllaws) will be identicalfor this decision problemand for game (5.8) (see HSWT). It can be said that this alternativeproblemgives a Bayesianrationalefor the robustdecision procedure. 6.2 Jumps Suppose now that the exogenous state vector {xt} evolves according to a Markovjump process with jump measure17.To accommodateportfolio allocation, introducethe choice of a functiona that specifies how wealth changes when a jump takes place. Consideran investorwho faces asset marketswith date-stateArrowsecurityprices given by Yl(y,xt) where {xt} is an exogenous E. W. Anderson, L. P. Hansen, & T. J. Sargent Semigroups and Model Detection 93 state vector with jump dynamics.In particular,a choice a with instantaneous in termsof payoff a(y) if the statejumps to y has a price J Il(y, xt)a(y)r](dy\xt) the consumptionnumeraire.This cost can be positiveor negative.Whena jump does not take place, wealthevolves accordingto dwt = p(xtJ)wt-- U(y, xtJ)a{y)r)(dy\xt-)- c,_ dt where p(x) is the risk-freerate given state x and for any variablez, zt- = limTf t zT.If the statex jumpsto y at datet, the new wealthis a(y). The Bellman equationfor this problemis 8V(w,x) = max min U(c) + Vw(w,x)\ p(x)wtc,a /i£A + 0 + II (j, x)a(y)ri(dy\x)- c J [ [1 - h(y, x) + h(y, *)log h(y, x)Mdy\x) h(y,x)(V[a(y),y] - V(w, jc))i)(rfy|*) The first-orderconditionfor c is the same as for the diffusion case and equatesVwto the marginalutilityof consumption.The first-orderconditionfor a requires h(y, x)VJLa(y), y] = Vw{w, x)U(y, x), and the first-orderconditionfor h requires - 6 log h(y, x) = V[d(y), y] - V(w,x) . Solving this second conditionfor fi gives the jump counterpartto the solution assertedin Theorem5.1. Thus the robusta satisfies: VJid(y),y]= Vw(w,x) II(y,s) (~V[a(y),y] + V(x)\- j o expl case 8 = oo,ft is set to one. Since In the limitingno-concern-about-robustness Vwis equatedto the marginalutility for consumption,the first-ordercondition for a equatesthe marginalrateof substitutionof consumptionbefore and after thejumpto the priceTl(y,x). Introducingrobustnessscalesthe priceby thejump distributiondistortion. In this portrayal,the worst case h dependson the endogenousstate w, but it is again possible to obtain an alternativerepresentationof the probability 94 Journal of the European Economic Association March 2003 1(1):68- 123 distortion that would give an ex post Bayesian justification for the decision process of a. 7. Pricing Risky Claims By building on findings of Hansen and Scheinkman (2002), we now consider a third semigroup that is to be used to price risky claims. We denote this semigroup by {2P,: t > 0} where 9^ assigns a price at date zero to a date t payoff (^(Xf).That pricing can be described by a semigroup follows from the Law of Iterated Values: a date 0 state-date claim (j)(xt)can be replicated by first and then at time t - r buying a claim 4>(xt).Like our buying a claim 2PT</>(;t,_T) other semigroups, this one has a generator, say (5, that we write as in (3.3): where ti<t>= [<Hy)- <Kx)]rt(dy\x). The coefficient on the level term p is the instantaneous riskless yield to be given in formula (7.3). It is used to price locally riskless claims. Taken together, the remaining terms comprise the generator of the so-called risk neutral probabilities. The risk neutral evolution is Markov. As discussed by Hansen and Scheinkman (2002), we should expect there to be a connection between the semigroup underlying the Markov process and the semigroup that underlies pricing. Like the semigroup for Markov processes, a pricing semigroup is positive: it assigns nonnegative prices to nonnegative functions of the Markov state. We can thus relate the semigroups by importing the measure-theoretic notion of equivalence. Prices of contingent claims that pay off only in probability measure zero events should be zero. Conversely, when the price of a contingent claim is zero, the event associated with that claim should occur only with measure zero, which states the principle of no-arbitrage. We can capture these properties by specifying that the generator^ of the pricing semigroup satisfies: ji(jc) = jj,(jc)+ A(x)v(x) 2W = 2(jc) j,(x) = h(y9 x)i,(dy\x) (7.1) E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 95 where II is strictly positive. Thus we construct equilibrium prices by producing a triple (p, tt, II). We now show how to construct this triple both with and without a preference for robustness. 7.1 Marginal Rate of Substitution Pricing To compute prices, we follow Lucas (1978) and focus on the consumption side of the market. While Lucas used an endowment economy, Brock (1982) showed that the essential thing in Lucas's analysis was not the pure endowment feature. Instead it was the idea of pricing assets from marginal utilities that are evaluated at a candidate equilibrium consumption process that can be computed prior to computing prices. In contrast to Brock, we use a robust planning problem to generate a candidate equilibrium allocation. As in Breeden (1979), we use a continuous-time formulation that provides simplicity along some dimensions.18 7.2 Pricing without a Concern for Robustness First consider the case in which the consumer has no concern about model misspecification. Proceeding in the spirit of Lucas (1978) and Brock (1982), we can construct market prices of risk from the shadow prices of a planning problem. Following Lucas and Prescott (1971) and Prescott and Mehra (1980), we solve a representative agent planning problem to get a state process {xt}, an associated control process {/,}, and a marginal utility of consumption process {y,}, respectively. We let <§* denote the generator for the state vector process that emerges when the optimal controls from the resource allocation problem with no concern for robustness are imposed. In effect, <§*is the generator for the 0 = oo robust control problem. We construct a stochastic discount factor process by evaluating the marginal rate of substitution at the proposed equilibrium consumption process: mrs, = exp(-&) ^-^ where y(x) denotes the marginal utility process for consumption as a function of the state x. Without a preference for robustness, the pricing semigroup satisfies 9Mx) = E*[mrsMxt)\xo = x] where the expectation operator E* is the one implied by <S*. 18. This analysisdiffersfrom thatof Breeden(1979) by its inclusionof jumps. (7.2) 96 Journalof the EuropeanEconomicAssociation March2003 1(1):68-123 Individualssolve a version of the portfolioproblemdescribedin Section of 6withouta concernfor robustness.This supportsthe followingrepresentation the generatorfor the equilibriumpricingsemigroup2P,: p= y -+ 8 d log 7 Jl = ju* + A*^- = )u* + A*A*' -^ (7.3) r,(dy\x)= U(y, *)Tj*(dy|*)= ^Jr,*^!*)These are the usual rationalexpectationsrisk prices. The risk-freerate is the subjective rate of discount reduced by the local mean of the equilibrium marginalutilityprocessscaledby the marginalutility.The vectorrrof Brownian motionriskpricesareweightson the Brownianincrementin the evolutionof the marginalutilityof consumption,againscaledby the marginalutility.Finallythe jump risk prices ft are given by the equilibriummarginalrate of substitution between consumptionbefore and aftera jump. 7.3 Pricing with a Concernfor Robustnessunder the WorstCase Model As in our previous analysis, let ^ denote the approximatingmodel. This is the model that emerges after imposing the robust control law I while assumingthatthereis no model misspecification(g = 0 and h = 1). It differs from c@*9 which also assumes no model misspecificationbut instead imposes a rule derived without any preferencefor robustness.But simply attributing the beliefs <$to private agents in (7.3) will not give us the correct equilibrium prices when there is a preference for robustness. However, formula (7.3) will yield the correct equilibriumprices if we in effect impute to the individual agents the worst-case generator<$instead of ^* as their model of state evolution when making their decisions without any concerns about its possible misspecification. To substantiatethis claim, we consider individualdecision makerswho, when choosingtheirportfolios,use the worst-casemodel%as if it were correct (i.e., they have no concern about the misspecificationof that model, so that a familyof models,the individualscommitto the worstratherthanentertaining case ^ as a model of the state vector {xt : t > 0}). The pricingsemigroupthen becomes (7.4) 9>,<K*)= £[mrs,<K*,)|*o= x] where E denotes the mathematicalexpectationwith respect to the distorted measuredescribedby the generator<S.The generatorfor this pricingsemigroup is parameterizedby E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 97 kj i p= y -+ 8 j],= ji+Ag=ji d log y + AA' -±J- (7.5) V(dy\x) = My, x)ftdy{x) = ^^(dy\x). As in Section 7.2, y(x) is the marginal utility of consumption except it is evaluated at the solution of the robust planning problem. Individuals solve the portfolio problem described in Section^ using the worst-case model of the state {xt} with pricing functions rr = g and H = h specified relative to the worst-case model. We refer to g and h as risk prices because they are equilibrium prices that emerge from an economy in which individual agents use the worst-case model as if it were the correct model to assess risk. The vector g contains the so-called factor risk prices associated with the vector of Brownian motion increments. Similarly, h prices jump risk. Comparison of (7.3) and (7.5) shows that the formulas for factor risk prices and the risk free rate are identical except that we have used the distortedgenerator <§in place of <§*. This comparison shows that we can use standard characterizations of asset pricing formulas if we simply replace the generator for the approximating model <§with the distorted generator <S.19 7.4 Pricing under the Approximating Model There is another portrayal of prices that uses the approximating model ^ as a reference point and that provides a vehicle for defining model uncertainty prices and for distinguishing between the contributions of risk and model uncertainty. The g and h from Section 7.3 give the risk components. We now use the discrepancy between ^ and <§to produce the model uncertainty prices. To formulate model uncertainty prices, we consider how prices can be representedunder the approximatingmodel when the consumer has a preference for robustness. We want to represent the pricing semigroup as 9VK*) = £[(mre,)(mpu,)<K^k>= x] (7.6) where mpu is a multiplicative adjustmentto the marginal rate of substitution that allows us to evaluate the conditional expectation with respect to the approximating model ratherthan the distorted model. Instead of (7.3), to attain (7.6), we portray the drift and jump distortion in the generator for the pricing semigroup as 19. In the applicationsin HST, HSW, and Section 9, we often take the actualdata generating modelto studyimplications.In thatsense,the approximating model modelto be the approximating suppliesthe same kinds of empiricalrestrictionsthat a rationalexpectationseconometricmodel does. 98 Journal of the European Economic Association March 2003 1(1):68- 123 V(dy\x)= h(y, x)ii(dy\x) = h(y, x)h{y9x)r1(dy\x). Changingexpectationoperatorsin depicting the pricing semigroupwill not change the instantaneousrisk-freeyield. Thus from Theorem5.1 we have: Theorem7.1. Let Vpbe the valuefunctionfor the robustresourceallocation problem.Supposethat (i) Vpis in C2 and (ii) j expf-VfyyOJ^dylx) < ™for all x. Moreover,y is assumedto be in the domainof the extendedgenerator°S. Thenthe equilibriumprices can be representedby: p= <$y -+ 8 y + A« ir(x)= -\ A(x)'Vpx(x) log fl(y, x)=-~ j^|J = g(x)+ g(x) [V(y) - W(x)] + log y(y) - log y(x) = log h(y,x) + log h(y,x). /v. This theorem follows directly from the relation between ^ and °S given in Theorem5.1 andfromthe riskpricesof Section7.3. It suppliesthe thirdrow of Table 1. 7.5 Model UncertaintyPrices: Diffusion and Jump Components We have already interpretedg and h as risk prices. Thus we view g = - (l/6)AfVp as the contributionto the Brownianexposureprices that comes from model uncertainty.Similarly,we think of fi(y, x) = - (l/6)exp[Vp(y) V'ix)] as the model uncertaintycontributionto the jump exposureprices. HST obtainedthe additivedecompositionfor the Brownianmotionexposureasserted in Theorem7.1 as an approximationfor linear-quadratic, Gaussianresource allocationproblems.By studyingcontinuoustime diffusion models we have been able to sharpentheirresultsandrelax the linear-quadratic specificationof constraintsand preferences. 7.6 Subtletiesabout Decentralization In HansenandSargent(2003), we confirmthatthe solutionof a robustplanning problemcan be decentralizedwith householdswho also solve robustdecision problemswhile facingthe state-datepricesthatwe derivedbefore.We confront the household with a recursive representationof state-dateprices, give the household the same robustnessparameter6 as the planner, and allow the householdto choose a new worst-casemodel. The recursiverepresentationof E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 99 the state-date prices is portrayed in terms of the state vector X for the planning problem. As in the portfolio problems of Section 6, among the households' state variables is their endogenously determined financial wealth, w. In equilibrium, the household's wealth can be expressed as a function of the state vector X of the planner. However, in posing the household's problem, it is necessary to include both wealth w and the state vector X that propels the state-date prices as distinct state components of the household's state. More generally, it is necessary to include both economy- wide and individual versions of household capital stocks and physical capital stocks in the household's state vector, where the economy-wide components are used to provide a recursive representationof the date-state prices. Thus the controls and the worst case shocks chosen by both the planner, on the one hand, and the households in the decentralized economy, on the other hand, will depend on different state vectors. However, in a competitive equilibrium, the decisions that emerge from these distinct rules will be perfectly aligned. That is, if we take the decision rules of the household in the decentralized economy and impose the equilibrium conditions requiring that the representative agent be representative, then the decisions and the motion of the state will match. The worst-case models will also match. In addition, although the worst-case models depend on different state variables, they coincide along an equilibrium path. 7.7 Ex-Post Bayesian Equilibrium Interpretation of Robustness In a decentralized economy, Hansen and Sargent (2003) also confirm that it is possible to compute robust decision rules for both the planner and the household by endowing each such decision maker with his own worst-case model, and having each solve his decision problem without a preference for robustness, while treating those worst-case models as if they were true. Ex-post it is possible to interpret the decisions made by a robust decision maker who has a concern about the misspecification of his model as also being made by an equivalent decision maker who has no concern about the misspecification of a different model that can be constructed from the worst case model that is computed by the robust decision maker. Hansen and Sargent's (2003) results thus extend results of HSTW, discussed in Section 6.1.2, to a setting where both a planner and a representative household choose worst-case models, and where their worst-case models turn out to be aligned. 8. Statistical Discrimination A weakness in what we have achieved up to now is that we have provided the practitionerwith no guidance on how to calibrate our model uncertainty premia of Theorem 7.1 - or what formulas (5.5a) tell us is virtually the same thing, 100 Journalof the EuropeanEconomicAssociation March2003 1(1):68- 123 which is the decision maker's robustness parameter 0. It is at this critical point that our fourth semigroup enters the picture.20 Our fourthsemigroupgoverns bounds on detection statisticsthat we can use to guide our thinking about how to calibrate a concern about robustness. We shall synthesize this semigroup from the objects in two other semigroups that represent alternativemodels that we want to choose between given a finite data record. We apply the bounds associated with distinguishing between the decision maker's approximatingand worst case models. In designing a robust decision rule, we assume thatour decision makerworries about alternativemodels that available time series data cannotreadilydispose of. Therefore,we study a stylized model selection problem. Suppose that a decision maker chooses between two models that we will refer to as zero and one. Both are continuous-time Markov process models. We construct a measure of how much time series data are needed to distinguish these models and then use it to calibrate our robustness parameter 6. Our statistical discrepancy measure is the same one that we used in Section 5 to adjust continuation values in a dynamic programming problem that is designed to acknowledge concern about model misspecification. 8.1 Measurement and Prior Probabilities We assume that there are direct measurementsof the state vector [xt : 0 < t < N) and aim to discriminatebetween two Markov models: model zero and model one. We assign priorprobabilitiesof one-half to each model. If we choose the model with the maximumposteriorprobability,two types of errorsarepossible, choosing model zero when model one is correctand choosing model one when model zero is correct. We weight these errorsby the prior probabilitiesand, following Chernoff (1952), study the errorprobabilitiesas the sample interval becomes large. 8.2 A Semigroup Formulation of Bounds on Error Probabilities We evade the difficult problem of precisely calculating error probabilities for nonlinear Markov processes and instead seek bounds on those error probabilities. To compute those bounds, we adapt Chernoffs (1952) large deviation bounds to discriminate between Markov processes. Large deviation tools apply here because the two types of error both get small as the sample size increases. Let CS°denote the generator for Markov model zero and CS1the generator for Markov model one. Both can be represented as in (3.6). 8.2.1 Discrimination in Discrete Time. Before developing results in continuous time, we discuss discrimination between two Markov models in discrete time. Associated with each Markov process is a family of transition probabilities. For any interval t, these transition probabilities are mutually absolutely continuous when restricted to some event that has positive probability under both proba20. As we shall see in Section9, our approachto discipliningthe choice of 0 dependscritically on our adoptinga robustnessand not a risk-sensitivityinterpretation. E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 101 bility measures. If no such event existed, then the probability distributions would be orthogonal, making statistical discrimination easy. Let /?T(y|jc)denote the ratio of the transition density over a time interval r of model one relative to that for model zero. We include the possibility that pT(y\x) integrates to a magnitude less than one using the model zero transition probability distribution. This would occur if the model one transition distribution assigned positive probability to an event that has measure zero under model zero. We also allow the density pT to be zero with positive model zero transition probability. If discrete time data were available, say jc0,jct, jc2t,. . . , xTrwhere N = 7Y, then we could form the log likelihood ratio: T 7=1 Model one is selected when (8.1) C>0, and model zero is selected otherwise. The probability of making a classification error at date zero conditioned on model zero is Pr« > O|jco= x, = jc, model 0). model 0} = £(1k»>O}|jco It is convenient that the probability of making a classification error conditioned on model one can also be computed as an expectation of a transformed random variable conditioned on model zero. Thus, Pr{€? < 0\x0 = jc, model 1} = £[1{^<o}|jco= *> model !] = £[exp(Ol{<<o}k> = x, model 0]. The second equality follows because multiplication of the indicator function by the likelihood ratio exp(€^) converts the conditioning model from one to zero. Combining these two expressions, the average error is: av error = V^(min{exp(O, 1}|jc0= jc, model 0). (8.2) Because we compute expectations only under the model zero probability measure, from now on we leave implicit the conditioning on model zero. Instead of using formula (8.2) to compute the probability of making an error, we will use a convenient upper bound originally suggested by Chernoff (1952) and refined by Hellman and Raviv (1970). To motivate the bound, note that for any 0 < a < 1 the piece wise linear function min{s, 1 } is dominated by the concave function sa and that the two functions agree at the kink point s = 1. The smooth dominating function gives rise to more tractable computations as we alter the amount of available data. Thus, setting log s = r = £^ and using (8.2) gives the bound: 102 104 Journalof the EuropeanEconomicAssociation March2003 1(1):68-123 Journalof the EuropeanEconomicAssociation March2003 1(1):68- 123 Supposethat2° =£21 and thatthe probabilitydistributionsimpliedby the two models are equivalent(i.e., mutuallyabsolutelycontinuous).Equivalence will alwaysbe satisfiedwhen2° and21 arenonsingularbutwill also be satisfied when the degeneracyimplied by the covariancematricescoincides. It can be shown that lim3C?lD<l suggestingthat a continuous-timelimit will not result in a semigroup.Recall that a semigroupof operatorsmust collapse to the identitywhen the elapsed intervalbecomes arbitrarilysmall. When the covariancematrices2° and 21 differ, the detection-errorboundremainspositive even when the data interval becomes small.This reflectsthe fact thatwhile absolutecontinuityis preserved for each positive t, it is known from Cameronand Martin(1947) that the probabilitydistributionsimpliedby the two limitingcontinuous-timeBrownian motions will not be mutuallyabsolutelycontinuouswhen the covariancematrices differ. Because diffusion matricescan be inferredfrom high frequency data,differencesin these matricesare easy to detect.22 Supposethat2° = 21 = 2. If jll° jui1is not in the rangeof 2, then the discrete-timetransitionprobabilitiesfor the two models over an intervalr are not equivalent, making the two models easy to distinguishusing data. If, however, /x° - jll1is in the range of 2, then the probabilitydistributionsare argument, equivalentfor any transitionintervalt. Using a complete-the-square it can be shown that = exp(-rpj 9C?<K*) - x)dy <t>(y)PaT(y wherePTa is a normaldistributionwith meant(1 - a) jll0+ Ta/i1andcovariance matrixt2, a(1~a) Pa= (m°- M')'S-V - I*') (8.6) and 2"1 is a generalizedinverse when 2 is singular.It is now the case that lim3C;iD= 1. The parameterpa acts as a discountrate, since 22. The continuous-timediffusion specificationcarries with it the implicationthat diffusion E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 103 8.3 Rates for Measuring Discrepancies between Models Locally Classification errors become less frequent as more data become available. One common way to study the large sample behavior of classification error probabilities is to investigate the limiting behavior of the operator (%™)Tas T gets large. This amounts to studying how fast (3{")r contracts for large T and results in a large deviation characterization. Chernoff (1952) proposed such a characterization for i.i.d. data that later researchers extended to Markov processes. Formally, a large-deviation analysis can give rise to an asymptotic rate for discriminating between models. Given the state dependence in the Markov formulation, there are two different possible notions of discrimination rates that are based on Chernoff entropy. One notion that we shall dub 'long run' is state independent; to construct it requires additional assumptions. This long-run rate is computed by studying the semigroup {3t" : t > 0} as t gets large. This semigroup can have a positive eigenfunction, that is, a function cf>that solves (8.5) 3C,a<f>=exp(-&)<f> for some positive 8. When it exists, this eigenfunction dominates the remaining ones as the time horizon t gets large. As a consequence, for large t this semigroup decays approximately at an exponential rate 8. Therefore, S is a long-run measure of the rate at which information for discriminating between two models accrues. By construction and as part of its long-run nature, the rate 8 is independent of the state. In this paper, we use another approximationthat results in a state-dependent or short-run discrimination rate. It is this state-dependent rate that is closely linked to our robust decision rule, in the sense that it is governed by the same objects that emerge from the worst-case analysis for our robust control problem. The semigroup {9£" : t > 0} has the same properties as a pricing semigroup, and furthermore it contracts. We can define a discrimination rate in the same way that we define an instantaneous interest rate from a pricing semigroup. This leads us to use Chernoff entropy as pa(x). It differs from the decay rate 8 defined by (8.5). For a given state x, it measures the statistical ability to discriminate between models when a small interval of data becomes available. When the rate pa(x) is large, the time series data contain more information for discriminating between models. Before characterizing a local discrimination rate pa that is applicable to continuous-time processes, we consider the following example. 8.3.1 Constant Drift. Consider sampling a continuous multivariate Brownian motion with a constant drift. Let jll°, S° and /x1, 21 be the drift vectors and constant diffusion matrices for models zero and one, respectively. Thus under model zero, xjr - x(j_l)r is normally distributed with mean tjul0and covariance matrix rS°. Under an alternative model one, xjT- X(j_1)Tis normally distributed with mean tjjl1 and covariance matrix tX1. 104 Journalof the EuropeanEconomicAssociation March2003 1(1):68-123 Supposethat2° =£21 and thatthe probabilitydistributionsimpliedby the two models are equivalent(i.e., mutuallyabsolutelycontinuous).Equivalence will alwaysbe satisfiedwhen2° and21 arenonsingularbutwill also be satisfied when the degeneracyimplied by the covariancematricescoincides. It can be shown that lim 3{?1D< 1 rlO suggestingthat a continuous-timelimit will not result in a semigroup.Recall that a semigroupof operatorsmust collapse to the identitywhen the elapsed intervalbecomes arbitrarilysmall. When the covariancematrices2° and 21 differ, the detection-errorboundremainspositive even when the data interval becomes small.This reflectsthe fact thatwhile absolutecontinuityis preserved for each positive r, it is known from Cameronand Martin(1947) that the probabilitydistributionsimpliedby the two limitingcontinuous-timeBrownian motions will not be mutuallyabsolutelycontinuouswhen the covariancematrices differ. Because diffusion matricescan be inferredfrom high frequency data,differencesin these matricesare easy to detect.22 Supposethat2° = 21 = 2. If jui0 /x1is not in the rangeof 2, then the discrete-timetransitionprobabilitiesfor the two models over an intervalt are not equivalent, making the two models easy to distinguishusing data. If, however, jul°- /x1 is in the range of 2, then the probabilitydistributionsare argument, equivalentfor any transitionintervalt. Using a complete-the-square it can be shown that WMx) = exp(-rPa) - x)dy <f>(y)PaT(y is a normaldistributionwith meant(1 - a)jui0+ rajut1andcovariance wherePTa matrixt2, Pa = a(1~a) (/*° - /*')'2- V - M1) (8.6) and 2"1 is a generalizedinverse when 2 is singular.It is now the case that lim3C?lD= 1. The parameterpa acts as a discountrate, since 22. The continuous-timediffusion specificationcarries with it the implicationthat diffusion matricescan be inferredfrom high frequencydatawithoutknowledgeof the drift.Thatdata are discretein practicetempersthe notion that the diffusionmatrixcan be inferredexactly. Nevertheless, estimatingconditionalmeans is much more difficultthanestimatingconditionalcovariances. Ourcontinuous-timeformulationsimplifiesour analysisby focusingon the morechallenging driftinferenceproblem. E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 105 3ClD = exp(-rpa). The best probability bound (the largest pa) is obtained by setting a = Vi, and the resulting discount rate is referred to as Chernoff entropy. The continuoustime limit of this example is known to produce probability distributions that are absolutely continuous over any finite horizon (see Cameron and Martin 1947). For this example, the long-run discrimination rate 8 and the short-run rate p" coincide because pa is state independent. This equivalence emerges because the underlying processes have independent increments. For more general Markov processes, this will not be true and the counterpartto the short-runrate will depend on the Markov state. The existence of a well defined long-run rate requires special assumptions. 8.3.2 Continuous Time. There is a semigroup probability bound analogous to (8.3) that helps to understand how data are informative in discriminating between Markov process models. Suppose that we model two Markov processes as Feller semigroups. The generator of semigroup zero is and the generator of semigroup one is In specifying these two semigroups, we assume identical 2s. As in the example, this assumption is needed to preserve absolute continuity. Moreover, we require that jll1can be represented as: /i.1 = Ag + v? for some continuous function g of the Markov state, where we assume that the rank of 2 is constant on the state space and can be factored as 2 = AA' where A has full rank. This is equivalent to requiring that /ul°- jul1is in the range of 2.23 In contrast to the example, however, both of the /xs and 2 can depend on the Markov state. Jump components are allowed for both processes. These two operators are restricted to imply jump probabilities that are mutually absolutely continuous for at least some nondegenerate event. We let h{ • , x) denote the density function of the jump distribution of N1 with respect to the distribution of K°. We assume that h(y, x)rf(y\x) is finite for all x. Under absolute continuity we write: 23. This can be seen by writingAg = XA'(A'A) lg. 106 Journalof the EuropeanEconomicAssociation March2003 1(1):68- 123 ^>W = h(y, x)[cf>(y) *(jc)]Ti°(rfy|jc). Associated with these two Markov processes is a positive, contraction semigroup {3{J*: t > 0} for each a E (0, 1) that can be used to bound the probability of classification errors: av error < Vi (3Q1d(jc). This semigroup has a generator <§"with the Feller form: « - -p> + * • y + tmce + * *• 2 (s **?) (8-7) The drift jli" is formed by taking convex combinations of the drifts for the two models + <V = jul°+ aXg\ jwa= (1 a)jUL° the diffusion matrix 2 is the common diffusion matrix for the two models, and the jump operator Xa is given by: .NXx) = [h(y, x)n<t>(y) - ^x)]j,0(dy\x) Finally, the rate pa is nonnegative and state dependent and is the sum of contributions from the diffusion and jump components: pa(x) = pad(x) + pan(x). (8.8) The diffusion contribution (1 - a) a pad(x)= g(x)'g(x) (8.9) is a positive semidefinite quadratic form and the jump contribution p*{x) = ((1 - a) + ah(y9 x) - [h(y, x)]a)7i°(dy\x) (8.10) is positive because the tangent line to the concave function (h)a at h = 1 must lie above the function. Theorem 8.1. (Newman 1973 and Newman and Stuck 1979). The generator of the positive contraction semigroup {%" : t > 0} on C is given by (8. 7). Thus we can interpret the generator cSa that bounds the detection error probabilities as follows. Take model zero to be the approximating model, and E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 107 model one to be some other competing model. We use 0 < a < 1 to build a mixed diffusion-jump process from the pair (g", ha) where ga = ag and A" = (hY. Use the notation E? to depict the associated expectation operator. Then \ I [Npa(xt)dt\\ x0 = x 1. \ h I av error < ViEa exp - (8.11) Of particularinterest to us is formula (8.8) for pa, which can be interpreted as a local statistical discrimination rate between models. In the case of two diffusion processes, this measure is a state-dependent counterpart to formula (8.6) in the example presented in section 8.3.1. The diffusion component of the rate is maximized by setting a = V2. But when jump components are also present, a = V2 will not necessarily give the maximal rate. Theorem 8.1 completes the fourth row of Table 1. 8.4 Detection Statistics and Robustness Formulas (8.9) and (8.10) show how the local Chernoff entropy rate is closely related to the conditional relative entropy measure that we used to formulate robust control problems. In particular, the conditional relative entropy rate e in continuous time satisfies da a=l In the case of diffusion process, the Chernoff rate equals the conditional relative - a).24 entropy rate without the proportionality factor a{\ 8.5 Further Discussion In some ways, the statistical decision problem posed above is too simple. It entails a pairwise comparison of ex ante equally likely models and gives rise to a statistical measure of distance. That the contenders are both Markov models greatly simplifies the bounds on the probabilities of making a mistake when choosing between models. The implicit loss function that justifies model choice based on the maximal posterior probabilities is symmetric (e.g., see Chow 1957). Finally, the detection problem compels the decision maker to select a specific model after a fixed amount of data have been gathered. Bounds like Chernoff s can be obtained when there are more than two models and also when the decision problem is extended to allow waiting for more data before making a decision (e.g., see Hellman and Raviv 1970 and 24. The distributionsassociatedwith these rates differ, however. Bound (8.11) also uses a Markovevolutionindexedby a, whereaswe usedthe a = 1 modelin evaluatingthe robustcontrol objective. 108 Journalof the EuropeanEconomicAssociation March2003 1(1):68-123 Moscariniand Smith 2002).25Like our problem,these generalizationscan be posed as Bayesianproblemswith explicit loss functionsandpriorprobabilities. Although the statistical decision problem posed here is by design too simple, we nevertheless find it useful in bounding a reasonable taste for robustness.The hypothesisof rationalexpectationsinstructsagents and the modelbuilderto eliminateas misspecifiedthose modelsthataredetectablefrom infinitehistoriesof data.Chernoffentropygives us one way to extendrational expectationsby askingagentsto exclude specificationsrejectedby finite histories of databutto contemplatealternativemodelsthataredifficultto detectfrom finite histories of data. When Chernoffentropyis small, it is challengingto choose between competingmodels on purelystatisticalgrounds. 8.6 Detection and Plausibility the equilibriumallocationundera preference In Section 6.1.2, we reinterpreted for robustnessas one thatinsteadwouldbe chosenby a Bayesiansocial planner who holds fast to a particularmodel thatdiffersfromthe approximatingmodel. If the approximatingmodel is actuallytrue,thenthis artificialBayesianplanner has a false model of forcing processes,one that enough data should disabuse him of. However, our detection probabilitytools let us keep the Bayesian planner'smodel sufficientlyclose to the approximatingmodel that more data than are availablewould be needed to detect that the approximatingmodel is reallybetter.Thatmeansthata rationalexpectationseconometricianwouldhave a difficult time distinguishingthe forcing process under the approximating modelfromthe Bayesianplanner'smodel.Nevertheless,thatthe Bayesianplanner fordecisions uses sucha nearbymodelcanhaveimportant quantitative implications sucheffectson assetpricesin Section9.26 and/orassetprices.We demonstrate 9. Entropy and the Market Price of Uncertainty In comparingdifferentdiscretetime representativeagenteconomieswith robust decision rules, HSW computationallyuncovered a connection between the marketprice of uncertaintyandthe detectionerrorprobabilitiesfor distinguishing betweena planner'sapproximatingand worstcase models.The connection was so tightthatthe marketpriceof uncertaintycouldbe expressedas nearlythe same linearfunctionof the detectionerrorprobability,regardlessof the details HSW used this fact to calibrate0s, which differed of the model specification.27 25. In particular,Moscariniand Smith(2002) considerBayesiandecisionproblemswith a more generalbut finiteset of modelsandactions.Althoughthey restricttheiranalysisto i.i.d. data,they of the large sampleconsequencesof accumulatinginforobtaina more refinedcharacterization mation.Chernoffentropyis also a key ingredientin theiranalysis. 26. Forheterogeneousagenteconomies,worst-casemodelscan differacrossagentsbecausetheir preferencesdiffer.But an argumentlike the one in the text could still be used to keep each agent's worst case model close to the approximatingmodel as measuredby detectionprobabilities. 27. See Figure8 in HSW. E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 109 across different models because the relationship between detection error probabilities and 6 did depend on the detailed model dynamics. As emphasized in Section 2, the tight link that we have formally established between the semigroup for pricing under robustness and the semigroup for detection error probability bounds provides the key to understanding HSW's empirical finding, provided that the detection error probability bounds do a sufficient job of describing the actual detection error probabilities. Subject to that proviso, our formal results thus provide a way of moving directly from a specification of a sample size and a detection errorprobability to a prediction of the market price of uncertainty that transcends details of the model. Partly to explore the quality of the information in our detection error probability bounds, this section takes three distinct example economies and shows within each of them that the probability bound is quite informative and that consequently the links between the detection error probability and the market price of uncertainty are very close across the three different models. All three of our examples use diffusion models, so that the formulas summarized in Section 2 apply. Recall from formula (8.9) that for the case of a diffusion the local Chernoff rate for discriminating between the approximating model and the worst-case model is a(l-a)^, (9.1) which is maximized by setting a = .5. Small values of the rate suggest that the competing models would be difficult to detect using time series statistical methods. For a diffusion, we have seen how the price vector for the Brownian increments can be decomposed as g + g. In the standard model without robustness, the conditional slope of a mean-standard deviation frontier is the absolute value of the factor risk price vector, \g\, but with robustness it is \g + g\, where g is the part attributableto aversion to model uncertainty. One possible statement of the equity-premium puzzle is that standard models imply a much smaller slope than is found in the data because plausible risk aversion coefficients imply a small value for g. This conclusion extends beyond the comparison of stocks and bonds and is also apparent in equity return heterogeneity. See Hansen and Jagannathan (1991), Cochrane and Hansen (1992), and Cochrane (1997) for discussions. In this section we explore the potential contributionfrom model uncertainty. In particular, for three models we compute \g\, the associated bounds on detection error probabilities, and the detection error probabilities themselves. The three models are: (1) a generic one where the worst-case drift distortion g is independent of jc;(2) our robust interpretationof a model of Bansal and Yaron (2000) in which g is again independent of jcbut where its magnitude depends on whether a low frequency component of growth is present; and (3) a continuous 110 Journalof the EuropeanEconomicAssociation March2003 1(1):68- 123 Table 2. Prices of Model Uncertainty mpu (= |||) 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.30 0.40 Chernoffrate (a = .5) 0.0001 0.0002 0.0004 0.0008 0.0013 0.0018 0.0015 0.0032 0.0040 0.0050 0.0113 0.0200 and Detection-Error Probability bound 0.495 0.480 0.457 0.426 0.389 0.349 0.306 0.264 0.222 0.184 0.053 0.009 Probabilities Probability 0.444 0.389 0.336 0.286 0.240 0.198 0.161 0.129 0.102 0.079 0.017 0.002 When g is independentof jc,N = 200; mpu denotes the marketprice of model uncertainty,measuredby \g\. The Chernoff rate is given by (9.1). time version of HST's equilibriumpermanentincome model, in which g dependson jc.Very similarrelationsbetween detectionerrorprobabilitiesand market prices of risk emerge across these three models, though they are associatedwith differentvalues of 6. 9.1 Links among Sample Size, Detection Probabilities,and mpu Wheng Is Independentof x In this section we assumethatthe approximatingmodel is a diffusionand that the worst-case model is such that g is independentof the Markov state x. Withoutknowinganythingmoreaboutthe model,we can quantitativelyexplore the links among the marketprice of model uncertainty(mpu = |g|) and the detectionerrorprobabilities. For N = 200, Table 2 reportsvalues of mpu = \g\ togetherwith Chernoff bound(8.11), andthe actual entropyfor a = .5, the associatedprobability-error of detection on the left side of (8.11) (which we can calculate probability in this The bounds and the probabilitiesare case). analytically probability under the that the driftand diffusioncoeffisimplifyingassumption computed cients are constant,as in the examplein section 8.3.1. With constantdrift and diffusion coefficients, the log-likelihoodratio is normallydistributed,which allows us easily to computethe actualdetection-errorprobabilities.28 The numbersin Table2 indicatethatmarketpricesof uncertaintysomewhat less than0.2 are associatedwith misspecifiedmodelsthataredifficultto detect. However, market prices of uncertaintyof 0.40 are associated with easily 28. Thinkingof a quarteras the unitof time,we took the sampleintervalto be 200. Alternatively, we mighthave used a sampleintervalof 600 to link to monthlypostwardata.The marketprices of risk andmodeluncertaintyare associatedwith specifictime unitnormalizations.Since, at least locally, driftcoefficientsanddiffusionmatricesscale linearlywith the time unit,the marketprices of risk and model uncertaintyscale with the squareroot of the time unit. E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 111 detectable alternative models. The table also reveals that although the probability bounds are weak, they display patterns similar to those of the actual probabilities. Empirical estimates of the slope of the mean-standarddeviation frontier are about 0.25 for quarterly data. Given the absence of volatility in aggregate consumption, risk considerations only explain a small component of the measured risk-returntrade-off using aggregate data (Hansen and Jagannathan1991). In contrast, our calculations suggest that concerns about statistically small amounts of model misspecification could account for a substantial component of the empirical estimates. The following subsections confirm that this quantitative conclusion transcends details of the specification of the approximating model. 9.2 Low Frequency Growth Using a recursive utility specification, Bansal and Yaron (2000) study how low frequency components in consumption and/or dividends become encoded in risk premia. Here we take up Bansal and Yaron's theme that the frequency decomposition of risks matters but reinterpret their risk premia as reflecting model uncertainty. Consider a pure endowment economy where the state xu driving the consumption endowment exp(;cu) is governed by the following process: dxu = (.0020 + .0177*2,)* + .004SdBt (9.2) dx2t = ~.0263x24t + .03l2dBt. The logarithm of the consumption endowment has a constant component of the growth rate of 0.002 and a time varying component of the growth rate of x2t\ x2t has mean zero and is stationary but is highly temporally dependent. Relative to the i.i.d. specification that would be obtained by attaching a coefficient of zero on x2t in the first equation of (9.2), the inclusion of the x2t component alters the long run properties of consumption growth. We calibrated the state evolution equation (9.2) by taking a discrete-time consumption process that was fit by Bansal and Yaron and embedding it in a continuous-time state-space model.29 We accomplished this by sequentially applying the conversions tf, d2c, and ss in the MATLAB control toolbox. The impulse response of log consumption to a Brownian motion shock, which is portrayedin Figure 2, and the spectral density function, which is shown in Figure 3, both show the persistence in consumption growth. The low frequency component is a key feature in the analysis of Bansal and Yaron (2000). The impulse response function converges to its supremum from below. It takes about ten years before the impulse response function is close to its supremum. Corresponding to this behavior of the impulse response function, there is a 29. We base our calculationson an ARMA model for consumptiongrowth,namely,log ct reportedby BansalandYaron(2000) where logc,_! = .002 + [(1 .860L)/(l .974L)](.0048)j/r { vt : t > 0 } is a seriallyuncorrelatedshock with meanzero and unit variance. 112 Journalof the EuropeanEconomicAssociation March2003 1(1):68- 123 Figure 2. ImpulseResponseFunctionfor the Logarithmof the ConsumptionEndowmentxlt to a BrownianIncrement Figure 3. SpectralDensity Functionfor the ConsumptionGrowthRate dxu E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 113 narrow peak in the spectral density of consumption growth at frequency zero, with the spectral density being much greater than its infimum only for frequencies with periods of more than ten years. In what follows, we will also compute model uncertainty premia for an alternative economy in which the coefficient on x2t in the first equation of (9.2) is set to zero. We calibrate this economy so that the resulting spectral density for consumption growth is flat at the same level as the infimum depicted in Figure 2 and so that the corresponding impulse response function is also flat at the same initial response reported in Figure 1. Because we have reduced the long run variation by eliminating x2t from the first equation, we should expect to see smaller risk-premia in this economy. Suppose that the instantaneous utility function is logarithmic. Then the value function implied by Theorem 5.1 is linear in jc, so we can write V as V(x) = vo + vlxl + v2x2.The distortion in the Brownian motion is given by the following special case of equation (6.3) 1 -+ .0312t;2) g = u (.00481;, and is independent of the state x. The coefficients on the vts are the volatility coefficients attached to dB in equation (9.2).30 Since g is constant, the worst-case model implies the same impulse response functions but different mean growth rates in consumption. Larger values of 1/0 lead to more negative values of the drift g. We have to compute this mapping numerically because the value function itself depends on 6. Figure 4 reports g as a function of 1/0 (the gs are negative). Larger values of \g\ imply larger values of the rate of Chernoff entropy. As in the previous example this rate is constant, and the probability bounds in Table 2 continue to apply to this economy.31 The instantaneous risk free rate for this economy is: r{=8+ .0020 + .1154jc2,+ axgt (o"i)2 - where <jj = .0048 is the coefficient on the Brownian motion in the evolution equation for xu. Our calculations hold the risk free rate fixed as we change 6. This requires that we adjust the subjective discount rate. The predictability of consumption growth emanating from the state variable x2t makes the risk free rate vary over time. 30. The stateindependenceimpliesthatwe can also interpretthese calculationsas comingfrom a decisionproblemin which \g\is constrainedto be less thanor equalto the sameconstantfor each dateandstate.This specificationsatisfiesthe dynamicconsistencypostulateadvocatedby Epstein and Schneider(2001). 3 1. However,the exact probabilitycalculationsreportedin Table2 will not applyto the present economy. 114 Journal of the European Economic Association March 2003 1(1):68- 123 9.3 Risk-Sensitivityand Calibrationof 6 This economy has a risk-sensitiveinterpretationalong the lines assertedby Tallarini(2000) in his analysisof businesscycle models, one thatmakes 1/0 a risk-sensitivityparameterthat imputesrisk aversionto the continuationutility and makes g become the incrementalcontributionto risk aversioncontributed by this risk adjustment.UnderTallarini's risk-sensitivityinterpretation,1/0 is a risk aversionparameter,and as such is presumablyfixed acrossenvironments. Figure4 shows thatholding 6 fixed but changingthe consumptionendowmentprocesschangesthe detectionerrorrates.For a given 0, the impliedworst case model could be more easily detectedwhen thereis positive persistencein the endowmentgrowth rate. On the robustnessinterpretation,such detection error calculationssuggest that 6 should not be taken to be invariantacross environments.However,on the risk sensitivityinterpretation, 6 shouldpresumably be held fixed across environments.Thus while concernsabout risk and robustnesshave similarpredictionswithina given environment,calibrationsof 6 can dependon whetherit is interpretedas a risk-sensitivityparameterthatis fixed across environments,or a robustnessparameterto be adjustedacross Figure 4. Drift Distortion g for the Brownian Motion This distortion is plotted as a function of 1/0. For comparison, the drift distortionsare also given for the economy in which the first-differencein the logarithmof consumptionis i.i.d. In generatingthis figure,we set the discountparameter 8 so that the mean risk-free rate remainedat 3.5 percent when we change 0. In the model with temporallydependent growth the volatility of the risk-freerateis 0.38 percent.The risk-freerateis constantin the model with i.i.d. growthrates. E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 115 Figure5. ImpulseResponseFunctionsfor the Two IncomeProcessesto Two Independent BrownianMotionShocks environmentsdepending on how detection error probabilitiesdiffer across environments. 9.4 PermanentIncome Economy Our previous two examples make Chernoff entropy be independentof the Markovstate. In our thirdexample,we computeddetection-errorboundsand detection-error probabilitiesfor the versionof HST's robustpermanentincome modelthatincludeshabits.32The Chernoffentropiesarestatedependentfor this model, but the probabilityboundscan still be computednumerically.We used the parametervalues from HSTs discrete-timemodel to form an approximate continuous-timerobustpermanentincomemodel,againusingconversionsin the MATLABcontroltoolbox. HST allowed for two independentincome components when estimatingtheirmodel. The impulseresponsesfor the continuoustime version of the model are depicted in Figure 5. The responses are to independentBrownianmotion shocks. One of the processes is much more persistentthanthe otherone. Thatpersistentprocessis the one thatchallenges saver. a permanent-income-style 32. HST estimatedversionsof this model with and withouthabits. 116 Journalof the EuropeanEconomicAssociation March2003 1(1):68-123 Figure 6. ImpulseResponseFunctionsfor the PersistentIncomeProcessunderthe ApproximatingModel and the Model for which mpu = .16 The more enduringresponse is from the distortedmodel. The implied risk-freerate is constantand identical across the two models. When we change the robustnessparameter0, we alter the subjective discount rate in a way that completely offsets the precautionarymotive for so thatconsumpsavingin HST's economyandits continuous-timecounterpart, tion andinvestmentprofilesandrealinterestratesremainfixed.33It happensthat the worst case g vector is proportionalto the marginalutility of consumption andthereforeis highlypersistent.This outcomereflectsthatthe decisionrulefor the permanentincome model is well designedto protectthe consumeragainst transientfluctuations,butthatit is vulnerableto modelmisspecificationsthatare highly persistentunder the approximatingmodel. Under the approximating model, the marginalutilityprocessis a martingale,but the (constrained)worstcase model makesthis processbecome an explosive scalarautoregression.The choice of d determinesthe magnitudeof the explosiveautoregressivecoefficient for the marginalutilityprocess.The distortionis concentratedprimarilyin the persistentincome component.Figure 6 comparesthe impulse responseof the 33. See HST and HSW for a proof in a discretetime setting of an observationalequivalence propositionthatidentifiesa locus of (8, 6) pairsthatareobservationallyequivalentfor equilibrium quantities. E. W. Anderson, L. P. Hansen, & T. J. Sargent Semigroups and Model Detection Table 3. Prices of Model Uncertainty and Detection-Error Permanent Income Model, N = 200 117 Probabilities for the mpu (= \g\) i - | u Probability bound Simulated probabilities 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.30 0.40 1.76 3.51 5.27 7.02 8.78 10.53 12.29 14.04 15.80 17.56 26.33 35AI 0.495 0.479 0.453 0.416 0.372 0.323 0.271 0.220 0.171 0.128 0.015 0.0004 0.446 0.388 0.334 0.282 0.231 0.185 0.143 0.099 0.072 0.054 0.004 0.000 The probabilitybound calculations were computed numerically and optimized over the choice of a. The simulated probabilitieswere computedby simulatingdiscrete-approximationswith a time intervalof length 0.01 of a quarterand with 1,000 replications. distortedincomeprocessto thatof the incomeprocessunderthe approximating model.Underthe distortedmodelthereis considerablymorelong-runvariation. Decreasing6 increasesthis variation. Table 3 gives the detection-errorprobabilitiescorrespondingto those reportedin Table2. In this case, the Chernoffentropyis statedependent,leading us to computethe right side of (8.11) numericallyin Table 2.34The values of 6 are set so that the marketprices of uncertaintymatchthose in Table 2. Despite the differentstructuresof the two models, for mpus up to about 0.12, the resultsin Table 3 are close to those of Table 2, thoughthe detectionerrorprobabilitiesarea little bit lowerin Table3. However,the probabilitiesdo decay fasterin the tails for the permanentincome model. As noted, the model in Table 2 has no dependenceof g on the state x. As shown by HST, the fluctuationsin \g\ are quite small relativeto its level, which contributesto the similarityof the results for low mpus. It remains the case that statistical detectionis difficultfor marketpricesof uncertaintyup to abouthalf thatof the empiricallyestimatedmagnitude. We concludethata preferencefor robustnessthatis calibratedto plausible valuesof the detectionerrorprobabilitiescan accountfor a substantialfraction, but not all, of estimatedequity premia,and that this fractionis imperviousto detailsof the model specification.Introducingotherfeaturesinto a model can help to accountmore fully for the steep slope of the frontier,but they would 34. See Hansen and Sargent (forthcoming) for computational details. In computing the probability bounds, we chose the following initial conditions: we set the initial marginal utility of consumption to 15.75 and the mean zero components of the income processes to their unconditional means of zero. 118 Journalof the EuropeanEconomicAssociation March2003 1(1):68- 123 have to work throughg and not the robustnesscomponentg. For example, marketfrictionsor alterationsto preferencesaffect the g component.35 10. Concluding Remarks In this paperwe have appliedtools fromthe mathematicaltheoryof continuoustime Markovprocessesto explore the connectionsbetween model misspecification,risk,robustness,andmodeldetection.We usedthesetools in conjunction with a decision theorythatcapturesthe idea thatthe decision makerviews his model as an approximation.An outcome of the decision-makingprocess is a constrainedworst-casemodel that the decision makeruses to create a robust decision rule. In an equilibriumof a representativeagent economy with such robust decision makers,the worst-case model of a fictitious robust planner coincides with the worst-casemodel of a representativeprivateagent. These worst-casemodelsareendogenousoutcomesof decisionproblemsandgive rise to model uncertaintypremiain securitymarkets.By adjustingconstraintsor penalties,the worst-casemodelcan be designedto be close to the approximating model in the sense that it is difficult statisticallyto discriminateit from the originalapproximatingmodel. These same penaltiesor constraintslimit model uncertaintypremia. Because mean returns are hard to estimate, nontrivial departuresfrom an approximatingmodel are difficult to detect. As a consequence, withinlimits imposedby statisticaldetection,therecan still be sizable model uncertaintypremiain securityprices. There is anothermaybe less radicalinterpretationof our calculations.A rationalexpectationseconometricianis compelledto specify forcingprocesses, often without much guidance except from statistics,for example, taking the form of a least squaresestimate of an historicalvector autoregression.The explorationof worst-casemodelscan be thoughtof as suggestingan alternative way of specifyingforcingprocessdynamicsthatis inherentlymorepessimistic, but that leads to statistically similar processes. Not surprisingly,changing forcingprocessescan changeequilibriumoutcomes.Moreinterestingis the fact that sometimessubtlechangesin perceivedforcingprocessescan have quantitatively importantconsequenceson equilibriumprices. Our altereddecision problemintroducesa robustnessparameterthat we restrictby using detectionprobabilities.In assessingreasonableamountsof risk aversion in stochastic,general equilibriummodels, it is common to explore preferencesfor hypotheticalgamblesin the mannerof Pratt(1964). In assessing 35. We find it fruitfulto explore concernaboutmodel uncertaintybecausethese other model modificationsarethemselvesonly partiallysuccessful.To accountfully forthe marketpriceof risk, Campbelland Cochrane(1999) adopt specificationswith substantialrisk aversionduringrecessions. Constantinidesand Duffie (1996) accommodatefully the high marketprices of risk by attributingempiricallyimplausibleconsumptionvolatilityto individualconsumers(see Cochrane transaction 1997).Finally,HeatonandLucas(1996) show thatreasonableamountsof proportional costs can explainonly abouthalf of the equitypremiumpuzzle. E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 119 reasonablepreferencesfor robustness,we proposeusing largesampledetection probabilitiesfor a hypotheticalmodelselectionproblem.We envisiona decision makeras choosing to be robustto departuresthat are difficultto detect statistically. Of course,using detectionprobabilitiesin this way is only one way to disciplinerobustness.Explorationof otherconsequencesof being robust,such as utility losses, would also be interesting. We see threeimportantextensionsto our currentinvestigation.Like builders of rationalexpectationsmodels, we have side-steppedthe issue of how decision makers select an approximatingmodel. Following the literatureon robustcontrol,we envision this approximatingmodel to be analyticallytractable, yet to be regardedby the decision makeras not providinga correctmodel of the evolutionof the statevector.The misspecificationswe have in mind are small in a statisticalsense but can otherwisebe quite diverse.Just as we have not formallymodelled how agents learnedthe approximatingmodel, neither have we formallyjustified why they do not botherto learn about potentially complicatedmisspecificationsof that model. Incorporatingforms of learning would be an importantextensionof our work. The equilibriumcalculationsin our model currentlyexploit the representative agent paradigmin an essential way. Reinterpretingour calculationsas in some circumstances applyingto a multipleagent setting is straightforward in with (see Tallarini2000), but general,even completesecuritymarketstructures,multipleagent versionsof our model look fundamentallydifferentfrom their single-agentcounterparts(see Anderson1998). Thus, the introductionof heterogeneousdecision makerscould lead to new insightsaboutthe impactof concernsaboutrobustnessfor marketoutcomes. Finally,althoughthe examplesof Section9 all werebasedon diffusions,the effects of concernabout robustnessare likely to be particularlyimportantin environmentswith large shocks that occur infrequently,so that we anticipate that our modellingof robustnessin the presenceof jump componentswill be useful. Appendix This appendixcontainsthe proof of Theorem5.1. Proof. To verifythe conjectured solution,firstnotethatthe objectiveis additively the in h. Moreover and objectivefor the quadraticportionin g is: separable g W g'g + 0^ 8'A'yx. (A.I) Minimizingthiscomponentof the objectiveby choiceof g verifiesthe conjectured to the optimizedobjectiveincluding(A.I) is: solution.The diffusioncontribution 120 Journalof the EuropeanEconomicAssociation March2003 1(1):68-123 ^AV) ° exp(-W0) where we are using the additivedecomposition:^ = c§d+ <§„.Very similar reasoningjustifies the diffusioncontributionto entropyformula(5.5c). Considernext the componentof the objectivethat dependson h: 6 2d\dxJX\dx) [1 - h(y, x) + h(y, *)log h(y, x)]V(dy\x) (A.2) + h(y9x)[V(y)-V(x)Mdy\x) To verify thatthis objectiveis minimizedby fi, firstuse the fact that 1 - h + h log h is convex and hence dominatesthe tangentline at fi.36Thus 1 - h + h log h > 1 - fi + ft log ft + log fi(h - fi). This inequalitycontinuesto hold when we multiplyby 6 and integratewith respectto 7](dy\x).Thus 6e(h)(x) - 6 h(y, x)logfi(y, x)71(dy\x) > de(h)(x) - 6 fi(y, *)k>g%, xMdy\x). Substitutingfor \og[fi(y, x)] shows that ft minimizes (A.2), and the resulting objectiveis: 36. In the specialcase in which the numberof statesis finite and the probabilityof jumpingto any of these statesis strictlypositive, a directproofthatfi is the minimizeris available.Abusing > 0 denote the probabilitythat the statejumps to its ith possible notationsomewhat,let tjCV/Ijc) value,yi given thatthe currentstateis x. We can writethe componentof the objectivethatdepends upon h (which is equivalentto equation(A.2) in this special case) as 0 + X V(yix)[-Bh(yh x) + 6h(yhJc)logh{yhx) + h(yhx)V(yi)- h(yhx)V(x)]. i=\ Differentiatingthis expressionwith respectto h(yhx) yields T7(yr|jt)[0 log h(yt,x) + V{y^)- V(x)]. Settingthis derivativeto zero andsolvingfor h(yt,x) yields h(yt,x) = exp[(V(x)- V(y,))/0],which is the formulafor fi given in the text. E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 121 which establishes the jump contributionto (5.5b). Very similar reasoning justifies the jump contributionto (5.5c). References Anderson,Evan W. (1998). "Uncertaintyand the Dynamics of Pareto OptimalAllocations," University of Chicago, dissertation. Anderson,Evan W., Hansen, Lars Peter, and Sargent,Thomas J. (1998). "Risk and Robustness in Equilibrium,"manuscript. Bansal, Ravi and Yaron, Amir (2000). "Risks for the Long Run: A Potential Resolution of Asset Pricing Puzzles." NBER Working Paper. Blackwell, David and Girshick, Meyer (1954). Theory of Games and Statistical Decisions. New York: John Wiley and Sons. Bray, Margaret(1982). "Learning,Estimation, and the Stability of Rational Expectations." Journal of Economic Theory, 26(2), pp. 318-339. Breeden, Douglas T. (1979). "An IntertemporalAsset Pricing Model with Stochastic Consumption and Investment Opportunities."Journal of Financial Economics, 7, pp. 265296. Brock, William A. (1979). "An Integrationof Stochastic GrowthTheory and the Theory of Finance, Part I: The Growth Model." General Equilibrium,Growthand Trade, JerryR. Green and Jose A. Scheinkman,editors. New York: Academic Press. Brock, William A. (1982). "Asset Pricing in a ProductionEconomy." In The Economics of Informationand Uncertainty,J. J. McCall, editor. Chicago: University of Chicago Press, for the National Bureau of Economic Research. Cameron,Robert H. and Martin, William T. (1947). "The Behavior of Measure and MeasurabilityunderChangeof Scale in Wiener Space."Bulletin of the AmericanMathematics Society, 53, pp. 130-137. Campbell,JohnY. and Cochrane,JohnH. (1999). "By Force of Habit:A Consumption-Based Explanationof Aggregate Stock MarketBehavior."Journal of Political Economy, 107(2), pp. 205-251. Chen, Zengjing and Epstein, Larry G. (2001). "Ambiguity, Risk and Asset Returns in ContinuousTime." manuscript. Chernoff, Herman (1952). "A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations."Annals of MathematicalStatistics, 23, pp. 493-507. Cho, In-Koo, Williams, Noah, and Sargent, Thomas J. (2002). "EscapingNash Inflation." Review of Economic Studies, 69(1), pp. 1-40. Chow, Chi-Keung (1957). "An Optimum CharacterRecognition System Using Decision Functions."Instituteof Radio Engineers (IRE) Transactionson Electronic Computers,6, pp. 247-254. Cochrane, John H. (1997). "Where is the Market Going? Uncertain Facts and Novel Theories."Federal Reserve Bank of Chicago Economic Perspectives, 21(6), pp. 3-37. Cochrane,JohnH. and Hansen,LarsP. (1992). "Asset PricingLessons for Macroeconomics," NBERMacroeconomicsAnnual, O. Blanchardand S. Fischer, eds. Cambridge,MA: MIT Press, pp. 1115-1165. Constantinides,George M. and Duffie, Darrell (1996). "Asset Pricing with Heterogeneous Consumers."Journal of Political Economy, 104, pp. 219-240. Cox, John C, Ingersoll, JonathanE., Jr., and Ross, Stephen A. (1985). "An Intertemporal General EquilibriumModel of Asset Prices." Econometrica,53, pp. 363-384. Csiszar, Imre (1991). "Why Least Squaresand MaximumEntropy?An Axiomatic Approach to Inferencefor LinearInverse Problems."TheAnnals of Statistics, 19(4), pp. 2032-2066. 122 Journalof the EuropeanEconomicAssociation March2003 1(1):68- 123 Duffie, Darrelland Epstein,LarryG. (1992). "StochasticDifferentialUtility."Econometrica, 60(2), pp. 353-394. Duffie, Darrell and Lions, Pierre-Louis (1992). "PDE Solutions of Stochastic Differential Utility." Journal of MathematicalEconomics, 21, pp. 577-606. Duffie, Darrell and Skiadas, Costis (1994). "Continuous-TimeSecurity Pricing: A Utility GradientApproach."Journal of MathematicalEconomics, 23, pp. 107-131. Dynkin, Evgenii B. (1956). "MarkovProcesses and Semigroups of Operators."Theory of Probability and its Applications, 1, pp. 22-33. Epstein, Larry G. and Schneider, Martin (2001). "RecursiveMultiple Priors,"manuscript. Epstein, Larry G. and Zin, S. E. (1989). "Substitution,Risk Aversion and the Temporal Behavior of Consumptionand Asset Returns:A TheoreticalFramework."Econometrica, 57(4), pp. 937-969. Ethier, Stewart N. and Kurtz, Thomas G. (1986). Markov Processes: Characterizationand Convergence. New York: John Wiley and Sons. Evans, George W. and Honkapohja,Seppo (2001). Learningand Expectationsin Macroeconomics. Princeton,New Jersey: PrincetonUniversity Press. Fleming, Wendell H. and Soner, H. Mete (1991). ControlledMarkovProcesses and Viscosity Solutions. New York: Springer-Verlag. Fudenberg, Drew and Levine, David K. (1998). The Theory of Learning in Games. Cambridge, Massachusetts:MIT Press. Gilboa, Itzhak and Schmeidler, David (1989). "MaxminExpected Utility with Non-unique Prior."Journal of MathematicalEconomics, 18, pp. 141-153. Hansen, Lars P. and Jagannathan,Ravi (1991). "Implicationsof Security Market Data for Models of Dynamic Economies." Journal of Political Economy, 99(2), pp. 225-262. Hansen, Lars P. and Sargent, Thomas J. (1993). "Seasonalityand ApproximationErrorsin Rational ExpectationsModels." Journal of Political Economy, 55, pp. 21-55. Hansen, Lars P. and Sargent,Thomas J. (1995). "DiscountedLinear ExponentialQuadratic Gaussian Control."IEEE Transactionson Automatic Control, 40(5), pp. 968-971. Hansen, Lars P. and Sargent,Thomas J. (2003). "DecentralizingEconomies with Preferences for Robustness,"manuscript. Hansen, Lars P. and Sargent,Thomas J. (forthcoming).Robust Controland Economic Model Uncertainty.Princeton,New Jersey: PrincetonUniversity Press. Hansen,Lars P., Sargent,ThomasJ., and Tallarini,Thomas D. Jr. (1999). "RobustPermanent Income and Pricing."Review of Economic Studies, 66, pp. 873-907. Hansen, Lars P., Sargent, Thomas J., Turmuhambetova,GauharA., and Williams, Noah. (2002). "Robustnessand UncertaintyAversion," manuscript. Hansen, Lars P., Sargent,ThomasJ., and Wang, Neng E. (2002). "RobustPermanentIncome and Pricing with Filtering."Macroeconomic Dynamics, February6, pp. 40-84. Hansen, Lars P. and Scheinkman,Jose. A. (1995). "Back to the Future:GeneratingMoment Implicationsfor Continuous-TimeMarkov Processes." Econometrica, 63, pp. 767-804. Hansen, Lars P. and Scheinkman,Jose. (2002). "SemigroupPricing,"manuscript. Heaton,JohnC. and Lucas, DeborahJ. (1996). "Evaluatingthe Effects of IncompleteMarkets on Risk Sharing and Asset Pricing."Journal of Political Economy, 104, pp. 443-487. Hellman, Martin E. and Raviv, Josef (1970). Probability Error, Equivocation, and the Chernoff Bound."IEEE Transactionson InformationTheory, 16, pp. 368-372. James, Matthew R. (1992). "Asymptotic Analysis of Nonlinear Stochastic Risk-Sensitive Control and Differential Games." Mathematics of Control, Signals, and Systems, 5, pp. 401-417. Jorgenson, Dale W. (1967). "Discussion." The American Economic Review, 57(2), pp. 557-559. Knight, Frank.H. (1921). Risk, Uncertaintyand Profit. Boston: Houghton Mifflin. Kreps, David (1998). "AnticipatedUtility and Dynamic Choice." Kalai, E. and Kamien,M. I. (eds.), Frontiers of Research in Economic Theory: The Nancy L. Schwartz Memorial E. W. Anderson,L. P. Hansen,& T. J. Sargent Semigroupsand Model Detection 123 Lectures, Econometric Society Monographs,no. 29, Cambridge:CambridgeUniversity Press, 242-274. Kunita,Hiroshi (1969). "AbsoluteContinuityof MarkovProcesses and Generators."Nagoya MathematicalJournal, 36, pp. 1-26. Kurz, Mordecai (1997). Endogenous Economic Fluctuations: Studies in the Theory of Rational Beliefs. New York: Springer-Verlag. Lei, Chon I. (2000). Why Don't Investors Have Large Positions in Stocks? A Robustness Perspective, Ph.D. dissertation,University of Chicago. Lucas, Robert E., Jr. (1976). "Econometric Policy Evaluation: A Critique." Journal of MonetaryEconomics, 1(2), pp. 19-46. Lucas, RobertE., Jr. (1978). "Asset Prices in an Exchange Economy."Econometrica,46(6), pp. 1429-1445. Lucas, Robert E., Jr. and Prescott, Edward C. (1971). "Investmentunder Uncertainty." Econometrica,39(5), pp. 659-681. Maenhout,Pascal (2001). "RobustPortfolio Rules, Hedging and Asset Pricing,"manuscript, INSEAD. Moscarini,Guiseppeand Smith, Lones (2002). "TheLaw of LargeDemandfor Information," Econometrica,70(6), 2351-2366. Newman, Charles M. (1973). "The Orthogonalityof IndependentIncrementProcesses." in Topics in Probability Theory, D. W. Stroock and S. R. S. Varadhan(eds.), pp. 93-111, CourantInstituteof MathematicalSciences, New York University. Newman, Charles M. and Stuck, B. W. (1979). "Chernoff Bounds for Discriminating Between Two Markov Processes." Stochastics, 2, pp. 139-153. Pratt,John W. (1964). "Risk Aversion in the Small and Large."Econometrica, 32(1-2), pp. 122-136. Prescott, Edward C. and Mehra, Rajesh (1980). "Recursive Competitive Equilibrium:The Case of Homogeneous Households."Econometrica,48, pp. 1365-1379. Revuz, Dariel and Yor, Mark (1994). ContinuousMartingales and Brownian Motion, 2nd edition, New York: Springer-Verlag. Runolfsson, Thordur(1994). "The Equivalencebetween Infinite-horizonOptimalControlof Stochastic Systems with Exponential-of-integralPerformanceIndex and Stochastic Differential Games."IEEE Transactionson Automatic Control, 39, pp. 1551-1563. Schroder,Mark and Skiadas, Costis (1999). "OptimalConsumptionand Portfolio Selection with Stochastic Differential Utility." Journal of Economic Theory, 89, pp. 68-126. Sims, Chris A. (1993). "RationalExpectationsModelling with Seasonally Adjusted Data." Journal of Econometrics, 55 (1-2), pp. 9-19. Tallarini,ThomasD., Jr. (2000). "Risk-SensitiveReal Business Cycles." Journal of Monetary Economics, 45(3), pp. 507-532.